SpreadsheetLook Tutorial
Introduction
Spreadsheet Look is an application for data processing in a spreadsheet-like environment. It provides functionalities like importing, viewing, editing, and exporting data, calculating new columns, sorting, creating filters, and taking samples from data. Some of the application's main characteristics are summarized next.
The image above shows the application's basic appearance.
* Data viewer - View datasets consisting of any combination of data types. Data elements displayed in cell can be of any type: text, numbers, graphical objects, and symbols, among other.
* Data editor - Data elements can be edited. Finding specific elements can be done by navigating or by using Find option.
* Freezing columns is possible and also freezing data windows. Hiding data columns is also provided as a way of simplifying data navigation, viewing, and editing.
Spreadsheet Look is designed to manage very large datasets and the dataset size limit is defined by the computer resources and not by the software. For example a file of 20 columns by one million rows will be managed smoothly as long as there is enough computer memory.
* Data import and export - Import and export from most widely used data formats (xls, csv, txt, etc.). Even data formatted as cross tabs can be imported from spreadsheets.
* Data archival - Datasets can be saved in Spreadsheet Look Sheet (.sls) format for easy access (File+Open). Sls files are text files which keep data, formats, and formulas the same way spreadsheet files do.
* Print - Print formatted data in table form.
One of the most significant characteristics of the application is that it creates formatted datasets for easy access on different working sessions. That is to say, as in spreadsheets, you can save your data and working scenario such that it can be retrieved as a file in which you can continue working the next time the file is open. In fact, stand-alone instances of SpreadsheetLook can be created which render exactly as left when saved.
Another interesting feature of the application is that it can be used to modify datasets and share the new dataset as a global variable which can be used as input to user-defined applications. For example, the original dataset can be reduced to a subset of rows and columns defined by a filter or a sample and the resulting dataset is readily available for the user. An example will be shown later.
Following is a brief description of some data modification features available in the Data menu.
* Transform data columns - Data transformations like changing data type or date format can be easily done without the need of creating a new column.
* Calculate new columns - Formulas based on data headers can be easily setup by means of the Function Editor. Any number of new columns can be added. New columns can modified (sorted, filtered,...), and processed as any other column.
* Sort data - Data can be sorted by any number of columns.
* Hide columns - Data columns can be hidden for simplifying data viewing.
* Sample data - Define a sample size and take a sample from data from top, bottom or random.
* Filter data - Define basic filters by selecting values from a menu or create highly customizable filters based on any number of data columns. Filters can be created using functions like Equal, Greater, StringQ, NumberQ, ListQ, Positive, Negative, IntegerQ, EvenQ, OddQ, StringMatchQ, and MemberQ, among other.
A graphical user interface enables data modifications and also allows the user to save as a template the current data modification session. This template can be used to modify other datasets which have the same data headers.
The image below shows an example of the dynamic module associated to each data column.
Data Sharing and Exploring
* Share - Data is constantly being shared as a
Mathematica variable for simplifying user access to data. This feature can be used to convert Spreadsheet Look into the data importing and processing interface for any user-defined application. In the Applications session of the
SpreadsheetLook documentation there is an example of a user-defined application based on Spreadsheet Look.
* Explore - Data columns can be explored to assess their contents by identifying data types and creating a tally of the data elements. For numeric columns, basic statistics are calculated, and a line plot and a histogram is shown.
Modifying Datasets
The dimensions and contents of the data imported inside SpreadsheetLook can be modified by means of the options available in the Data menu.
In general data modification option tabs have three key buttons which behave in a similar fashion across tabs: Enable/Disable, Reset, and Apply. First of all, the
Update button must be pressed for data modifications to take place
. Pressing Update will always execute all enabled data modifications in the order displayed (i.e. Crop, Transform, etc.). To
enable the action of a modification tab the checkbox beside the tab name must be checked. In the example below it can be seen that data should be sorted by Price, however sorting is disabled, so data will not be sorted unless sorting is enabled. In general, disabled tabs are grayed out while active tabs are blue. An active tab is one where there are enabled definitions. Additionally, any tab which name is underlined is a tab which has a setup which will modify the contents of the spreadsheet when enabled and updated. The Reset button (red warning sigh at the left of the tab name) will
clear or
reset all definitions in the current tab.
As described earlier, the Update button has a global effect as it updates all enabled and active data modification tabs. Right below Update there are other buttons that have global effect. Two of these are the global analogs of the clear and enable buttons described earlier. The return button has the effect of making all data modifications permanent. That is to say, to update all active data modifications and replace the unmodified or source dataset by the modified dataset. By using this button more than one sequence of data modifications (Crop, Transform,...) can be applied to data. To simplify the application of sequences of data modifications to any dataset the Templates button is used to store in memory all modifications currently defined in a spreadsheet.
Crop
Cropping can be used to delete unwanted rows or columns from the originally loaded dataset. It is strongly recommended that once data is cropped it is saved as a new dataset (Modified Data) such that the other operations (Sort, Hide,...) are based on a clean dataset.
To define the limits of the new dataset just click on the two corner points of interest which will be formatted to red color. Elements excluded will be painted gray.
Transform
Transforming the contents of a column is performed by selecting the target column and the functions which will be used to transform the column. In the example below the column Price will be transformed by first converting it to an expression (assuming it is a column of string values), and then rounding with the function Round. Note that column names must be wrapped with FieldName to ensure the right interpretation of the Function Composition.
The More opener provides a way to copy to other variables the transformation applied to one variable.
When using the DateString function an editor is available to select or define the default DateString format. This format will be used the next time the DateString function is selected from the Function Assistant palette. The image below shows the options available in the 'Select Format' tab of the editor.
Calculate
Calculating new columns is possible in the Calculate tab. Note that by default the new column is named MyColumnName and is painted red to help the user recall that a new name must be given to the new column. Similarly, the function for the new column is presented as a sum of a constant and a function of two other columns (wrapped with FieldNames to ensure the right interpretation of the equation).
Note that there is a check mark (paste button) below each column name. These buttons will paste in the Input Field the column name wrapped with FieldName (e.g. FieldName["Quantity"]).
The example below shows how KSales was calculated using the column names Quantity and Price. Note that new column names are added at the end of the dataset. This may be confusing as the user may not see the new column displayed in the spreadsheet unless the slider is used to scroll to the last column in the displayed dataset.
Also note that the Function Assistant is now presented as paste buttons (not checkboxes as in the Transform tab) as now Functions are pasted in the input field. The function can be edited manually or using any other function palette.
Sort
Sorting occurs in the order variables are selected as shown below.
Hide
There are two main things to say about hidden columns: First, the column RowID is appended to data and by default is hidden. RowID can be used to recover the original ordering of data. Second, hidden columns are considered deleted in 'Modified Data'. This should be present when performing any other data modification after the Hide tab or when selecting target dataset from the File menu.
Filters
By default the Basic mode is used to define simple filters by just selecting the column to filter and then picking values from a list. In the example below the filter will cause the selection of rows in which the Employee is either Jean or John.
The Review tab in filters shows a script version of the filter which will be applied to data after pressing the Update button.
Note that Advanced filters can be defined on any number of columns by selecting the desired column names and defining the required filter. The example below shows that also more than one condition can be defined for a single column. The + button is used to add new elements to the filter.
Also note that the Review tab shows the complete filter for all columns including the option of taking the complement of the resulting filter (by checking Not), and the wrapping function for the filters defined for each filtered column (usually And).
Filters can be defined using many useful constructs of
Mathematica. Define filters using the following guidelines:
1. Start selecting a Column from the menu or from the data displayed.
2. Then select a Test from the menu as shown in the image below.
3. The parameter value for the test can be input manually or from the menu. The distinct values in the menu are taken from the modified dataset. To pick values from the unmodified dataset the filters must be disabled temporarily.
4. Repeat the former steps for other columns.
5. The Not checkbox can be used to negate the currently displayed filter.
6. The Function (And, Or,...) besides the Not will be used as the outermost external operator for all defined conditions for each column. For example, And[condition1, condition2, ...].
Sampling
The Sample tab is used to take sample rows from data. The sample can be taken from the top or bottom of data. If the sample is Random then it can be completely random or it could be taken respecting the original ordering of data. For example, if there is an underlying time order in the original ordering of data, then the random sample is taken and shown respecting that order. Note that the sample fraction is calculated taking the unmodified dataset as a base.
Share
Sharing is one of the most useful features of SpreadsheetLook. The idea is that at any time data inside SpreadsheetLook can be used externally for any user-defined application. In the example below all columns are being shared automatically. By setting Sharing Method to Automatic any modification made to data is continuously being written to the global variable defined in Variable Name, in this example "mydata". If needed only the desired columns can be shared as a new dataset which will be formed using the selection order of columns.
In summary, SpreadsheetLook can be used as a data input tool for any user-defined application. See an example in the Applications session of the documentation for
SpreadsheetLook.
Explore
The Explore tab is provided as a way of taking a closer look at the contents of a data column. The image below shows the output generated for the numeric column Sales. A Histogram and a Line plot is also available in the Statistics Tab View. Additionally, the High-Frequency and Low-Frequency tabs can be very useful to identify outliers or just to see how distinct values are distributed in the dataset.
Saving Data and Working Sessions
When a SpreadsheelLook object is created with the "StandAlone" option set to true and the notebook containing the object is saved, when reopening the notebook the SpreadsheetLook object will simply be there the same way it was saved. However, setting "StandAlone"->True is not a good idea for large datasets as spreadsheet objects may become "heavy and slow".
For optimizing speed and memory usage "StandAlone" should be set to False. In this case when a notebook containing a SpreadhseetLook object is saved and reopened in a new
Mathematica session it will look as shown below. By clicking the Continue button the data saved will be restored and all settings will remain with same values saved.
The data inside a SpreadsheetLook that has been created with the "StandAlone" option set to False will not be saved by saving the notebook containing the object. Data must be saved independently using the saving icon in the SpreadsheetLook object.
Datasets in SpreadsheetLook format (sls) are created by saving (using Put) the data along with the active SpreadsheetLook options such that when an sls file is open in a new session it will conserve many of its main characteristics: all SpreadsheetLook option values and some other internal variable states. This is done with the intention of replicating the experience users have when opening files in spreadsheets.
General Functionalities
Usage
To use SpreadsheetLook just create a dataset and use it as the main parameter of the SpreadsheetLook function as shown below.
Out[7]//TableForm= |
| |  |
For more information see the documentation for
SpreadsheetLook.
Appearance
The next screen shot shows the general appearance of SpreadsheetLook. There are sliders available to navigate the contents of the dataset and the window size can be dynamically modified with the buttons near to the sliders. The two buttons at the lower-left and upper-right corners are used to turn pages up/down and left/right. Finally, there are printing and data saving icons. The icon at the right of the printer is used to show/hide data while the icon at the left of the saving icon is a toggler for enabling editing
Scope
- Any dataset of any combination of data types can be imported and manipulated. The data must be rectangular or in form of table.
- Data does not need to have headers as these are automatically appended if needed. Column headers are used to define data filters and other data modifications. For this reason it is required to define headers as strings.
- Managing very large datasets is possible as long as there is enough computer memory. For example, datasets of one million records by ten or twenty columns can be managed very efficiently in most personal computers.
- Data modifications are sequentially applied to data in the order shown in the image below.
The menu options File, View, Format, and Data provide many useful features. As mentioned before, at any point while using SpreadsheetLook data can be saved and eventually use it again using the Open option. As an example of the data importing capabilities, when importing data from spreadsheets the user can select the sheet name and even import data which was saved as cross tabs.
Navigation
- Instantaneous scrolling of complete dataset.
- Find specific data elements including sub strings in strings or any other user-defined pattern.
- Go to specific row or column number.
Editing
- Edit source data elements.
- The working page can be divided into up to four screens to display and edit simultaneously at different parts of the dataset.
The image below is a sample Spreadsheet Look screen shot showing its capability of simultaneously showing and editing up to four different regions of a dataset.
Data Format
- Define font color, size, and family.
- Set alignment: Center, Left, Right, decimal place.
- Set number format at user-defined decimal places (AccountingForm, NumberForm)
The next image shows formatted cell contents.
Note that data formats will not be visible while editing.
Page Format
- Define number of columns and rows to display
- Define cell height and width
- Split page in up to four independent subsections
The image below shows four sections of the data with two frozen columns. Note that all data displayed can be edited at any time. The red buttons indicate that the window has been 'captured' so it won't be affected by scrolling.
Output
- Save .sls datasets for easy access within the interface. SpreadsheetLook sheets (sls) format is created for easily saving and retrieving working datasets, formats and scenarios. Sls files are simply text files which contain all data and all user-defined spreadsheet attributes.
- Save, and export to many data formats like xls, csv, and tsv among other.
- Print any user-defined area or subset of data
Once datasets are defined the Open option in the File menu allows bringing the data into the application.
Note that the saving icon has a red background. This means that the working data has not been saved. After saving it will go back to normal, however, if data is edited, then it will come back to red again.
The Send To tab is used to extract subsets of the working data. The Share option in the Data tab also provides ways of sharing data outside the spreadsheet.
Dataset Modification
The data menu provides ways to change the contents of the dataset loaded and also some basic exploratory data analysis. Data modification is made in the order specified by the numbers on the tab names shown below.
-
Crop dataset to clean from empty cells or other undesired elements
-
Transform columns. For example, when importing data from spreadsheets sometimes numbers are imported as strings. In this case ToExpression can be used to transform data columns.
-
Calculate new columns. Defining new column functions is very simple as functions are defined using other column names as input.
-
Sort by one of more columns
-
Filter: Using a GUI or user defined query, filter data using
Mathematica commands like Greater, Less, Equal, SameQ, StringQ, NumberQ, OddQ, IntegerQ, among other.
-
Sample: Take samples from data: from top, from bottom or random
The image below shows a query defined in data.
File options like the one shown below will always ask for the target dataset of interest: Unmodified Data, Modified Data, or Current View.
-
Unmodified Data is data prior to the modifications made in the 'Data' tab. Editing cell values manually takes effect directly on the Unmodified Data.
- The dataset resulting form the defined modifications is the
Modified Data.
-
Current View refers to the small dataset which is visible in the screen.