BEST Viewpoints Data
Data
Data importing, viewing and dataset assessment
The first option to consider is selecting the target data source for analysis. For this purpose there are several ways in which data can be retrieved into the software. Once the target dataset is defined click on Load Data to complete the data importing or reading process. Then, proceed to View the loaded data.
By default it is assumed that the loaded dataset is rectangular with string headers on the first row. This is a fundamental assumption since the way BEST Viewpoints manages data is by making reference to each field name, so when a dataset without string-headers is imported the option Add Headers should be selected to automatically add labels to data.
Another important assumption is that there are no repeated data labels. To ensure that this assumption is met, the option Fix Headers is always selected to rename repeated headers. However, it remains as an option because in some cases (e.g. Relational Viewpoints) the user might be interested in analyzing non-labeled datasets.
When importing datasets with a known extension BEST Viewpoints will automatically import those files. If the file extension is txt then the dataset will be read as is. Advanced users might want to use the optional Script checkbox to understand better how data will be imported.
Details on how to use the option Database to read data will be discussed in the section Joining Datasets. BEST Viewpoints Examples is a list of preloaded or randomly generated datasets which can be used to test the application in different scenarios. User Preset Variable is an advanced option for
Mathematica users which allows to bring into BEST Viewpoints any dataset associated to a user-defined variable. Similarly
Mathematica Command allows advanced users to write their own data importing script. The option Data Entry allows the user to create a new dataset by editing each record and field. This will be discussed in the section Data Entry.
Once the data is loaded the user can view the complete content of the dataset by using the interface shown above. This interface provides many options that for editing, and formatting the output as well as to save the displayed content in another file.
A basic per-field assessment of the dataset composition is obtained in the Assess tab as shown below. This information is obtained as part of the data loading process.
In the Advanced tab a more detailed data assessment can be obtained as shown below. Note that selecting one or more Group-by fields is optional. To get results click on the BEST Viewpoints Manual Analyzer button (↵) as described below.
The results obtained are shown above. When a field is numeric some basic statistics and plots (Histogram and Line) are created. For other data types only the information about data types and frequencies is provided.
Note that Date and MonthNo are now shown in the image above. That is because by default only five fields are shown but other fields can be displayed by using the GUI objects in the tabs.
Reading from databases
Reading from databases is really simple. Initially click on Connection... to define or select a DB connection. If you don't have a connection defined and you are not related to databases you might want to ask for systems experts to help you defining the connection the first time.
After the Connection button is pressed you should see a window similar to the one below. From here you select the desired connection and the interface will be populated with one Table from the selected connection.
Once selected queries can be created by selecting the desired table, fields and conditions as shown below. Note that the yellow highlight is used to emphasize that the data shown is just a preview of the data in the table selected (Authors in this case), and that the data viewer (at the bottom) is turned off to avoid displaying both: the DB preview and the data already loaded.
The screen-shot below shows a query of two conditions. Note that the logical operator for these three conditions is Or. The data shown is just for viewing purposes and only a small fraction of the results of the defined query will be shown in this viewer. Once the query is ready the data in loaded into BEST Viewpoints for analysis by means of the Load Data button.
If desired the query being created can be viewed or modified by selecting 'Script' at the left. Note that the query created is a
Mathematica command, so it can be copied and saved in the 'Mathematica Command' data importing tab for future use.
By default queries are created using
Mathematica Query Language or M-SQL. If the user prefers to type SQL commands a click on the M-SQL toggler will switch to SQL. Note that the SQL query can be saved for future use.
Once the query is defined the Load Data button is pressed to upload the dataset for analysis.
Note that the data Viewer needs to be turned on to see the data.
Reading from spreadsheets
In general it is assumed that any dataset imported is already a string-labeled rectangular array of data. However, provided that data stored in spreadsheets may be formatted in other ways some solutions are provided for special types of spreadsheet data. A sample spreadsheet dataset (SampleXLSData.xls) is provided for testing and should be available for selection in the file menu. As shown in the image below when a workbook like this is selected for reading the user has several options to specify how to import spreadsheet data.
The file SampleXLSData has five spreadsheets: Rectangular, RawSpreadsheet, 2D Cross Tabs, MD Cross Tabs, and Labeled MD Cross Tabs. Each of these spreadsheets contain a specially formatted dataset. The user may want to import each of these spreadsheets under the corresponding 'Data Format'. The options for Data Format are: Rectangular, Unformatted, and Crosstabs. For example, in the image above the Rectangular spreadsheet was imported using the Rectangular Data Format. In the example below the spreadsheet RawSpreadsheet is imported under the Rectangular format.
Note that the result is a dataset for which the labels are not in the first row and column of the spreadsheet. In fact BEST Viewpoints is assigning the labels _1, _2, etc but these labels are not good for analysis purposes if Label 1, Label 2 are the real data labels. To solve this problem the RawSpreadsheet data format should be used. The results are shown below. Note that even the first empty column of data was removed. So the Unformatted data format solve some poor formatting scenarios in spreadsheets.
Using a similar approach cross-tab types of formats can also be imported. The dataset shown below is an example of a common way to format data in spreadsheets. In this case the intention is to label values using more than one field (or variable name). In this case 'Label 1 - A1' represent the first of many groups in the data. This is what we are naming cross tabs because the real data labels are crossed in the spreadsheet tab.
In other words the values 1. and 6. are under the group 'Label 1 - A1', and the values 15., 20. are under the group 'Label 5, A2'. To make this dataset properly formatted it should be imported using the CrossTab format and setting All Dimensions to False as shown below. Note that the contents of the first cell (LabelA, NumLabel2, and Variable) were used to name the new fields. The resulting dataset is one canonically formatted for BEST Viewpoints use.
Other combinations of data formats and parameters can be used to provide the results desired. The user is encouraged to try parameters for the multi-dimensional cross tabs (MD Cross Tabs).
Joining Datasets
In many cases analysts need to combine datasets for analysis. The option Join Datasets provides the capability of combining datasets from different sources. In other words, any two (or more) datasets that can be imported into BEST Viewpoints can be joined together into a single file.
The second dataset will be taken from Sales Details table with a query to limit SONUM to values below 10 as shown in the image below.
Once two datasets have been uploaded (using Load Data) the user can define the linking fields and the type of merge to tab used. Inner stands for InnerJoin, Outer for OuterJoin, Join Records provides the capability of merging two datasets which have equal field names while Join Fields joins files with distinct field names (one beside the other). In this case the file with fewer records will be used to trim the other files to be merged. Thus only in this case, to ensure that the result of the Join is a rectangular array of data some data might be lost.
Once the setup is made, click on Join to create the new dataset.
Once the data is joined the user may want to use the new dataset for analysis. This is accomplished by exiting the Join Dataset menu and going back to Main Dataset. From there the file just created can be uploaded from the Joined Dataset as shown below. After pressing the Load Data the file is loaded and set ready for analysis. Of course, this dataset can be exported in other formats using the File menu.