BEST Viewpoints Tutorial
Introduction
BEST Viewpoints is a powerful, flexible and user-friendly data analysis software which combines in
Mathematica many data manipulation and analysis capabilities found in spreadsheets and database languages. Spreadsheet enthusiasts will find in BEST Viewpoints a good complement to simplify their everyday data mining activities. Some features of the product are: simplifying the creation and expanding the scope of pivot table analysis, performing analysis like statistical control charts, box plots, basic text mining, and market basket analysis. Additionally, users are empowered by the full library of
Mathematica functions to symbolically represent and perform spreadsheet-like calculations in large datasets of numeric or string data types. Finally, to enable portability, hundreds of data formats are available for importing data and exporting results. In particular, a graphical user interface is provided to read data from databases without the need of typing SQL scripts.
The software is design for data importing, manipulation, and analysis by means of:
1.
A data importing interface for uploading data or joining datasets from many data sources: databases, spreadsheets text files, user defined variables, a
Mathematica script, and the outputs of the Basic and Advanced Spreadsheets (described later). When importing from databases the user can create and continuously update a selectable and editable list of SQL commands. When importing from spreadsheets users can preview any of the spreadsheets in the workbook and define the desired region of interest to be imported. Importing data in cross-tab format is also possible.
2.
A spreadsheet-like interface for data transformation visualization and manipulation. In this module the user can calculate new fields, transform existing fields, summarize data, query data, sort by one or more columns. Defined procedures including sort, queries, and symbolically defined equations can be saved as Analysis Templates for future use in different datasets.
3.
Three applications for analytics: Hierarchical Viewpoints, Exploratory Viewpoints, and Relational Viewpoints. These modules are designed to enable the visualization of the information contained in one or more data columns (or fields) grouping by one or more categorical fields.
In the
Hierarchical Viewpoints module data summaries can be represented as
Pivot Tables or Cross Tabs of any of the following formats: Bar Charts, Pie Charts, Pareto Plots, Box Plots, Confidence Intervals and Line Plots.
The
Exploratory Viewpoints module facilitates the creation of
Histograms,
Scatter Plots,
Bubble Charts,
Statistical Control Charts,
Capability Analysis,
Hypothesis Tests (including goodness of fit tests),
Box Plots,
Distribution Charts,
3D Plots,
Line Plots,
Quantile Plots, and
Probability Plots among other.
Finally, the
Relational Viewpoints module is for determining and plotting associations found in one or more analysis variables. This way of analyzing and portraying the information in data is sometimes referred as
Market Basket Analysis or as Association Analysis.
Additionally, a well documented library of functions for data processing and analysis is provided for
Mathematica users. This
Mathematica package named
MXL Plus (data
Miners e
Xtensible
Library) is intended to be used by programmers who would not only use this program for analyzing data, but that would also like to create programs using the functions that support BEST Viewpoints user interface.
Startup
After installing BEST Viewpoints the commands below (Needs["BESTViewpoints"] and BESTViewpoints[]) are used to start the application and creating the directory BESTViewpoints in your $HomeDirectory. This directory is used to store datasets and other information generated while using BESTViewpoints. Other directories inside BESTViewpoints will also be created. Thus, administrator privileges in the computer running BESTViewpoints may be needed at least the first time the program is run.
The following sections are an introduction to the use of BESTViewpoints for data importing, manipulation and analysis. Note that BESTViewpoints is a user interface created to simplify the use of a library of functions created for simplifying the process of analyzing data within
Mathematica. This library of functions is named
MXL Plus and is the basis on top of which BESTViewpoints has been built.
Main Interface
While using BEST Viewpoints the user will be allowed to print, save, extract or send by email the results displayed on the screen. These options are available in the File menu which can also be used to exit the program. For further details see
BEST Viewpoints Options tutorial.
When the program is started the main interface looks as shown below. The applications loads a default dataset which can be used for testing purposes. Note that by default the
Analysis option (on top) is selected. The analysis session is usually started by selecting at leas one data "Field" or column name for analysis followed by at least one
Group By or
Categorical field.
The image below shows the results of selecting "Sales" as the analysis field and the categorical fields "Country" followed by "Product". The total sales were calculated for each Country, then the information for each Country was grouped by Product and the results were displayed graphically. Note that the Image Size width has also been reset to 9 (inches). This is just an introductory example to show how data can be easily summarized using Hierarchical Viewpoints.
Before discussing in detail how data analysis is made, the data importing interface will be discussed briefly.
Data
BEST Viewpoints imports data from spreadsheets (xls), databases (SQLServer, Oracle, MySQL, etc.), delimited files (csv, txt, tsv, etc.) and other formats supported by
Mathematica. A detailed explanation on how to import data is provided in the
BEST Viewpoints Data tutorial.
The image below shows the main data importing interface which is always initiated with the dataset in the Data Viewer. The data imported can be browsed with this viewer which also has some file processing capabilities.
The contents of the data viewer is a mirror image of the data loaded (in memory); thus, although this data can be edited or sorted, the changes in this viewer will only occur in the viewer unless the user uses the option
File: Send to Viewpoints which uploads the edited data in the viewer as a new dataset.
This viewer can
Compress data such that large text lines or images are displayed as a representative object like Ink Cartri... for Ink Cartridge. This can be very useful when the contents of the data cells are lists, graphics or any other large data container.
The initial setup is for uploading a data file (
Source: File). Note that there is an blue rectangle wrapping the options for importing a data File. When changing to other data sources like Database this rectangle will contain the importing options for the selected data source.
At any data source the data is not imported until the
Load Data button is pressed. This may be confusing as the Database data importing options always display a sample of the that would be imported by means of the defined query, however, the importing process is formally finished when the
Load Data button is pressed.
Spreadsheets
BEST Viewpoints spreadsheet-like interfaces provide users the capability of performing many operations like data transformation, calculation of new fields and summaries, selecting/deselecting fields, sorting data by one or more fields or columns, and querying datasets. A detailed explanation on the use of spreadsheets is available in the
BEST Viewpoints Spreadsheets tutorial. Following is a brief description of these useful tools.
After defining any combination of these operations (Transform, Calculate, Select, Sort, Query) the user can save a Template where all transformations, calculations, queries, etc are saved for future use in similar datasets.
All operations are always executed in the order shown (i.e. Transform, Calculate, Select, Sort, and Query), and should also be defined in this order.
In general these two spreadsheet-like interfaces are similar but the
Advanced Spreadsheet can operate on the output of the
Basic Spreadsheet and can also group calculations by levels of one or more categorical variables. After data is modified it can be selected for analysis from any of the analysis modules.
The basic appearance of the spreadsheets is shown above. In this interface the
Calc is used to enable or disable automatic analysis. It is recommended to always leave automatic analysis on unless the dataset is so large that the analysis time is too long and thus, the user prefers to set the spreadsheet commands and setup and then manually evaluate it.
The Summaries button is useful for displaying the calculated summaries only. This is particularly convenient when displaying grouped data summaries in the
Advanced Spreadsheet. The options at the right of
Templates are used to fix some characteristics of the Data Viewer. For example the
Compress button can be used to view decompressed data. The other buttons are used to define the size of the viewer (10X10) and the maximum number of records displayed (10,000). Note that the data viewer does not need to contain all the data loaded (or in memory), but the viewer is there to help the user understand or evaluate how the dataset is being modified by means of the user actions in the user interface.
The
Assess button is used to create an assessment of the data types and some basic analysis of the data. This is not recommended for large datasets (e.g. more than two hundred thousand records) because it may take longer than the user is usually willing to wait.
Data Analysis via Viewpoints Modules
Viewpoints are a series of analysis modules which provide interfaces to apply analytics techniques to data. Some of the analyses provided are Pareto plots, Cross Tabs, Box and Whisker Plots, Bubble Plots, Tooltip Plots, Scatter Plots, Confidence Intervals for the mean or standard deviation, Hypothesis testing, Line Plots, and Histograms. Also, relational analysis for categorical variables is also provided.
One common and key element to these modules (
Hierarchical Viewpoints,
Exploratory Viewpoints, and
Relational Viewpoints) is that the input data for analysis can either be the originally loaded dataset, the output of the basic Spreadsheet, the output of the Advanced Spreadsheet or the Summaries from the Advanced Spreadsheet. This creates an extremely flexible data manipulation and analysis environment where analysts can generate many different types of analyses in parallel.
Main Analysis Interface
The basic layout of the Analysis interface is shown below. Usually analysis is started by selecting one ore more
Analysis field (e.g. Sales), then the user may
Group By one or more categorical fields (e.g. CountryID, Product) by just selecting the field names from the menu. The order in which analysis fields and categories are selected is used to create the selected analysis type (
Chart in this case). This order can be changed in several places: In the plot, when the mouse is over the field label, or in the menu interface, where the selected field names are displayed (e.g. Country, or Sales).
In the upper left corner the
Calc button is used to enable or disable the automatic analysis option. Automatic calculations are usually acceptable in speed. With extremely large datasets the user may prefer to deselect the automatic analyzer to evaluate manually instead.
The
Menu button can be used to hide most of the menu items such that only the output is displayed. This option is usually a good complement of the
Play option (inside the
Data tab) which allows to dynamically modify the source data analyzed or the category values for any of the selected categories.
The
Templates button is one of the most attractive features of BEST Viewpoints as it enables recording the complete setup the user has defined to create the plot displayed. This setup can be uploaded later for recreating the same analysis. This can be done with any dataset containing the same column headers and corresponding data types as those in the dataset where the template was created.
The
Image Size and
Aspect Ratio of the image displayed can be dynamically controlled by means of the interface tools with these names. Note that the Image Size label is a menu which contains also
Font Size,
Pie Size, and
Arc Thickness.
The
Output menu is used to control the type of output. In some cases the user may want to see the results of the analysis as a plot (e.g. Leaves) or as a table of values (e.g. Leaves Data). As an example, if the user selects
Tree (instead of
Leaves) in the
Output menu, then the Pie Size and Arc Thickness can be used to modify the output plot.
The user may also control the
Source of input
Data for analysis. For example, if the Basic Spreadsheet has been used to calculate a new field, then user may want to set the output of the basic spreadsheet as the target data for analysis. Additionally, a
Sample from the data can be analyzed instead of the full dataset. This may be useful for large datasets or for statisticians which want to explore different analyses by sampling from data.
Once the
Analysis and
Group By fields are selected the
Build menu will provide many dynamically available options for modifying the analysis in course. Most of the options in this menu are designed to be easy to understand thus, only when considered necessary some of these options will be discussed further in the documentation. The
Mathematica documentation for the selected type of analysis is fundamental to understand the Build menu. For example, "Element Function" is an option for many types of charts. Thus, it is expected that the user will make reference to the
Mathematica documentation for complementing this documentation.
The
Categories menu is used for selecting which values of the categorical fields selected should be included in the analysis. The
Categories menu in the Hierarchical Viewpoints module provides particular options that will be discussed later. Just as an example, the image below shows the top three salespeople in each Month after excluding Allison and Frank from analysis.
The
Hierarchy Level is used to fold the analysis created in the
Root level by means of
Cross Tabs or
Multi Dimensional Cross Tabs.
Once the analysis is finished the
File menu can be used to
Save, or
Print results (see image below). The
Send To Notebook option can also be used to take current results to a new window for further manipulation using
Mathematica.
Redefining Categorical Fields
The
By Category button (see image below) is used in the Hierarchical Viewpoints and Exploratory Viewpoints to redefine the fields (or columns) that will be used as categorical fields. The user simply selects those fields that are going to be used for grouping and forming the hierarchies of analysis. Note that also the
G+ and
G- buttons are used to move fields from the upper set of
Analysis to the
Group By set of fields.
Input Data Source
In general, the Basic Spreadsheet and the Advanced Spreadsheet are used to modify data prior to analyzing it. For example, one column may be transformed by applying the function Round. To simplify the process of accessing the currently modified data in spreadsheets, at any point the user may switch the source dataset for analysis by means of the
Data +
Pick +
Source Tab (see image below).
Additionally, the
Data +
Sample Tab can be used to take random samples from data. This may be useful to explore large datasets or to make statistical inferences based on random samples taken from data. In the image above only 50% of the input data is being analyzed.
Mathematica Scripts
Advanced
Mathematica users can get scripts for recreating some of the analyses done by the interface. At any point in any analysis session the Main Menu option
View +
Script displays an updated script for the currently working session.
Analysis Templates
One of the most attractive features of this application is the possibility of saving Analysis
Templates at any point. These templates work as bookmarks that the user may use to repeat or come back to exactly the state of analysis defined in any analysis session. These templates are available for Spreadsheets as well as for the Viewpoints tabs (Hierarchical, Exploratory, and Relational).
The image below shows the Templates Manager window which is activated using the Templates button. By pressing the Load button the template shown in the image can be uploaded to get the same analysis performed. Note that the templates will replicate the desired analysis only if the dataset loaded has the required data columns or fields used to create the plot. Additionally, the data types must also match accordingly. That is not only the data column must exist, but also it must contain the same type of information (e.g. numbers in the Sales example below).
Note that templates can be organized by projects to simplify its use. It is recommended to define projects associated to the data being analyzed to simplify the search and use, and to make sure that templates will match the currently loaded dataset.
More About
For information about the developers and updated information on this product visit Pronto Analytics Inc. web site: