BEST Viewpoints: Exploratory Viewpoints
Exploratory Viewpoints
This module provides several exploratory data analysis methods useful for visualizing basic patterns and information in data. Line plots (control charts, date list plots, etc.), histograms, box plots, scatter plots, bubble charts, confidence intervals, and hypothesis testing are easily setup for data exploration.
This module is similar in appearance to Hierarchical Viewpoints thus, it is recommended to study the documentation for Hierarchical Viewpoints to understand this module because many options for Exploratory Viewpoints are exactly the same as those for Hierarchical Viewpoints and thus, are only explained once.
Introduction
Once data is loaded
Analysis fields are ready for selection as shown below. The multiple options in the
Explorer menu can be used to select the type of analysis or chart type to be applied to the selected data fields. In the next example the field Sales was selected as the target analysis field and a
Combo of charts was generated due to the selection on the Explorer menu.
The selection made in the Explorer menu as well as other options selected in the Build menu may cause changes in the menu at its right. For example, initially this menu is set at
Plot Type which is the type of plot used for the line plots in the
Combo (explained in section named Combo), however, when the Build menu is set to
Histogram the second menu becomes the
Preset menu (see image below) which facilitates the selection of options for the Histogram.
If the
Lines menu is selected in the Build menu then Plot Type becomes available again and now the type of Line Plot to be used can be selected as shown below.
This dynamic behavior of menus and options has the intention of making the application as user friendly as possible by only making available the options that are useful depending on the selections made by the user. Finally, the meaning of the different options in menus are designed to be self explanatory when the
Mathematica options for the selected type of chart are known and understood. Thus, this document does not have the intention of describing the effect that options may have on the selected type of chart but the user should read the pertinent
Mathematica documentation for understanding them.
Total Groups Limit
BEST Viewpoints can process virtually any combination of analysis fields (categorical and non categorical). This is a very powerful and convenient feature of the program however, this may allow the user to ask for generating more information than needed. To avoid this situation and to processing time efficiently the program will ask the user to confirm when the current setup results in too many groups of data for analysis. The image below shows the warning message generated when the current setup exceeds the Total Groups Limit. The user may continue and complete calculations or stop the program and make the necessary changes to get more meaningful results.
When the user decides to stop the evaluation of the current setup the program will provide hints on how to reduce the complexity of the current setup (see image below). In general, a simpler setup can be defined by reducing the number of categorical fields or the category values for the selected categories. Of course, the user may also increase the Total Groups Limit in the Build menu to avoid the generation of the warning message.
The following sections are dedicated to the options available on the Explorer menu.
Combo
Initially the application is ready to create a combo of charts and the user may select which charts to combine together and also can modify each one independently. By analyzing Sales the two default charts (
Histogram and
Line) are created as shown below. These charts can be modified by the options in the Build menu.
Note that the
Build menu has several other tabs. The name and contents of these tabs will change as options are selected but the
Setup tab will always be there. This menu contains general options for all the charts. For example, note that in the image below the Lines and Histogram charts are selected (by default). The
Combine sub-menu has options to modify the way the plots are arranged and displayed. These options will be more important when there are many charts created simultaneously.
Note that the menu on top of the charts also changes dynamically to make it fit to the user navigation. For example, originally the menu
Plot Type is there available to change the type of plot used to create the list plot (ListLinePlot, ListPlot, DateListPlot, ProbabilityPlot, etc.).
For example, when the
Histogram menu is selected the options change to a Preset set of definitions to create a histogram (see image below).
Each of these presets will automatically set some options in the Build menu for the histogram (and line in some cases). For example, the
Smooth PDF results is summarized in the image shown next. In this case the
Smooth Histogram, and the
Distribution options are checked. Additionally, the
Function used for plotting the histogram is the
PDF. Note that the distribution used is not part of the preset, thus, the LogNormal distribution shown in the example were manually selected after the preset.
The next image is what the
PDF preset creates. Note that now the checkbox
Distribution is selected, and thus, the menu for distributions is now open. The
Normal Distribution is by default used for testing and parameters are estimated. Other continuous distributions included in this menu are:
LogNormal,
Gamma,
Weibull,
Beta, and
Exponential distributions.
Note that there are also discrete distributions that can be used for the same purposes: Bernoulli, Binomial, Negative Binomial, Geometric, Hipergeometric, Beta Binomial, Beta Negative Binomial, and Poisson. The image below shows a goodness of fit test for the Binomial distribution with estimated parameter p and user-provided parameter n.
Adding the
Goodness of Fit Test option to make a
Kolmogorov-Smirnov test and selecting the Normal Distribution results in the image below. Note that the test was made and the P-Value for the test along with the statistic and sample size is displayed as the plot label.
Grouping by Country (only China and Germany selected) can be used to compare the tests for two different categorical values. Note that the options for Box Whisker Charts and Distribution Charts are displayed on the
Charts menu because now these two plots are now selected as part of the Combo.
Note that when the parameters of the distribution tested are displayed in the plot label is because there is no mix of distributions in the plot. Thus, as an example consider the probability plot below. Although two variables are present in the plot, both are being compared against the same parameters which result from combining both variables in the same dataset. In this example this may not seem useful however, if the two variables are for example diameter1 and diameter2 and these two variables need to be compared independently for a given set of parameters, then the example below may be more meaningful. In this case, it is also possible to test against a user-defined set of parameters which can be input in the Histogram menu.
Note that data can be combined in plots in different ways using the
Combine menu in the
Setup tab. In the image above the data is combined by
Categories but the in the next example data is combined by
Fields. When combining by Fields (example below) every field has an independent set of plots, while when combining by Categories (example above) every combination of category values (e.g. China) has an independent chart associated where fields (Sales & Price are combined).
Histogram
When
Histograms is selected in the Explorer menu the
Histogram tab in the Options menu provides ways to get more information from the data being analyzed. In the previous section several options for the histograms were presented already.
As a complementary example to the already discussed information for histograms note that in the image below the number of bins is being defined manually by deselecting the
Binning Method checkbox. Additionally, the histogram is compared to the Weibull Distribution with user defined parameter of 1.1 and auto-estimated parameter value Beta of ~2404.26. Finally, all applicable goodness of fit tests are displayed with corresponding statistic and P-value.
Additionally, the
Explore option opens a window for graphically and dynamically calculating upper or lower tail probabilities for the currently displaying distribution. The image below shows an example stating that P[Sales>4740.057]

0.1212 assuming that Sales is distributed according to a Weibull distribution with the provided or estimated parameters.
Note that the values shown in the Parameter input field will remain fix at the user-defined value for all populations evaluated. In the other hand, the estimated parameters shown in the input field are the last set of parameters estimated. Thus, for example, if more than one variable is being analyzed the parameters in the field are those estimated for the last variable selected (i.e last variable analyzed).
Line Plots
As presented in the previously discussed section Combo, several types of
Line Plots can be created in the Exploratory Viewpoints module. Some complementary examples will be presented in this section.
Default options for Line Plots are presented in the menu below. To avoid trying to plot a large number of points that could slow down rendering time of the plot
Max Points is set to 500 by default.
The default Plot Type is set to
ListLinePlot but when DateListPlot is selected the user must provide the
Date Field which is the data field that contains the dates (see example below). Note that once the date field is provided the date format can be selected from a menu.
Data is expected to be sorted by the selected Date Field for this plot to produce meaningful results. Sorting can be done in the basic or advanced spreadsheet, but this task is left to the analyst instead of including it as part of the automatic data preparation in Exploratory Viewpoints to emphasize and ensure that line plots always use the original ordering of the imported data.
By selecting
ProbabilityPlot in
Plot Type, the desired
Goodness of Fit Test and
Confidence level, a quick distribution assessments can be made. The example below tests Sales and Price against the estimated Weibull Distribution when the parameter Alpha is set to 1.1 for both Sales and Price. Setting Goodness of Fit Output to
Test Conclusion provides a written conclusion on both tests: Sales is not rejected but Price is rejected. The
Estimate Parameters options is causing for parameters to be estimated for both fields. If the analyst provides distribution parameters then the same parameters will be used for all variables tested.
The Combine sub-menu in the Setup tab is used to decide how data will be combined in plots. In the example below line plots for Sales and Price for China and Germany (Group-By field: Country) are being shaped by the option
Combine - Categories. That is, different category values are combined in the same chart and a new chart is created for each analysis field (Sales, Price).
By selecting
Combine - Fields the analysis fields (Sales and Price) are combined in the same chart, and a new chart is created for each of the category values of the Group By field (Country).
Statistics
Selecting
Statistics in the
Explorer menu allows for the creation of distribution charts and box plots in a different format than that shown in the Combo menu. Additionally, mean confidence intervals (Mean CI) , standard deviation confidence intervals (Std Dev CI) and hypothesis testing (equal means and variance ratio tests) can be created.
The mean confidence intervals (
Mean CI) in the image below are displayed graphically and as tooltips for each plot. Note that the confidence level can be changed in the Statistics menu.
The
Hypotheses option in the
Output menu shows the result of making hypotheses on the difference of means (lower left side of the matrix) and the variance ratio tests are performed (upper right side of the matrix). The corresponding two-sided P-values are displayed for each test. The red is used when the p value is smaller than 0.05 to highlight significance.
For example the variance in Sales are statistically different in France and China. The red P-Value of 0.028 is smaller than the 0.05 significance level used for the test.
Additional documentation for these tests can be found in the documentation for the HypothesisTesting package.
Bubble Charts
The
Bubble Charts can display up to five fields in a single plot. Consider the example below where Sales, Price and Quantity are analyzed by the levels of the categorical fields Product and Country. Note that the order in which fields and categories are selected determines their position in the plots.
In the menu for
Bubbles there are some parameters that can be controlled like the min and max bubble size, and whether a tooltip is displayed in each bubble.
Markers and Tips
Markers&Tips is a line plot with a legend to identify the categorical source of each data point. Some descriptive information for each point will be displayed as a legend and other will be displayed as a tip when the mouse is near the marker in the plot.
Scatter Plots
Scatter Plots can also be created in several formats. The Combine menu can be used to create separate scatter plots by combining by Categories or by Fields.
Control Charts
This perspective is for creating statistical
Control Charts as another Exploratory Data Analysis tool. The image below shows an example of an X-Bar and EWMA for X-Bar control charts with
Sample Size of 5.
In the example below the XBar and R charts are displayed such that the two charts can be compared. Other types of control charts available include EWMA for XBar, P, Np, C, nC, U, R, and S. Control chart parameters are automatically and continuously estimated. Once the parameters are estimated the
Parameters checkbox can be deselected to use the avoid re estimating parameters.
The
Control Condition option is used to establish whether control is reached taking into account two limits or one, and in what direction. For example, x<UCL is used to state that a point is considered to be in statistical control when it is value is less than UCL.
Data Format refers to the way that input data is interpreted. 'Subgroups' of sample size n are formed automatically for control charts other than P, NP, and U. For these charts it will be required to enter the data as two columns; the statistic column and the sample size column . Thus in this case the Data Format is called 'Statistic & n'.
Max Proportion is used to set the maximum proportion of samples that the parameter estimation algorithm can discard to estimate parameters. Additional information about the control charts can be found in the
SPC section of the
MXLPlus Guide.
Capability Analysis
Capability Analysis is a method used to understand how capable is a process of meeting a given set of design specifications. In the image below values have been assigned to the Lower Specification Limit (
LSL),
Target, and Upper Specification Limit (
USL). Additionally, the
Mean and
Standard Deviation parameters are also assigned a value for testing purposes. By default analysis is made assuming that the real parameters are those estimated from data (dashed distribution). This can be seen in the
Capability menu where the
Analyze option is set to
Statistical. In this case the estimated process capability is Cp=0.43. The conclusion would be that the process is not capable of meeting design specifications if parameters are those estimated from data.
However, as shown below, the
Parametric distribution can be used to estimate the capability of the process which then becomes Cp=1.41. The conclusion is that the process is capable of meeting design specifications if the user-provided parameters are used for the analysis (i.e.
Analyze option set to
Parametric). Note that the
P. Factor (P stands for probability) is a factor used to change the scale of the calculated probability. In the example below probabilities are presented as parts per million (
P.Factor = 1M) such that small probabilities are easier to understand.
In the examples above both distributions are shown in the output (
Display Both). In the next example only the parametric distribution is presented (
Display Parametric).
Finally, the
Histogram of the data being analyzed can be shown in the output as shown below. Note that the
P. Factor used below is 100 such that probabilities are displayed as percentages.