BEST Viewpoints: Hierarchical Viewpoints
Hierarchical Viewpoints
Introduction: the Analysis Menu
Hierarchical Viewpoints is designed to hierarchically summarize the contents of one or more data fields as grouped by one or more categorical fields.
When a dataset is uploaded BEST Viewpoints automatically groups the fields contained in the dataset into two: Analysis
Fields, and Categorical or
Group-By Fields. Thus, the upper set of fields in the menu is the set of potential analysis fields while the lower group is the group of categorical fields used to combine or group results.
In the example below Date, Price and Sales have been automatically assigned to be Analysis Fields while Country, Product, Employee, Quantity, and MonthNo have been set as Categorical Fields. This is automatically done based on the total distinct values associated to each field. Those fields with less distinct fields are selected as Categorical Fields and the rest as Analysis Fields. In any case, these automatic definitions can be modified by pressing any of the two buttons (
Fields or
Group-By). In this way, if fields are not automatically arranged conveniently the user can manually assign which fields will be considered categorical and which will not. Additionally,
G+ can be used to move the currently selected
Fields to the
Group By window. The
G- button has the opposite effect, to remove the currently selected Group By fields from that window.
In the
Hierarchical Viewpoints module analysis is performed hierarchically for the selected Field (or Analysis Field) as described by the values of the selected category(ies) as shown below. By selecting Sales and Product the
Stacked Bar Chart analysis shows that for the five products in the data Inkjet and Laptop have the largest associated Total Sales.
By default data summaries are represented graphically, however, the data can also be obtained by selecting the
Leaves Data in the
Output menu. Both, data or the graphical output can be exported by means of the
File menu. Note that information about the contents of each data field or column is displayed as a tooltip when the mouse is on top of the field name.
The hierarchical nature of this analysis module becomes more apparent when more than one categorical fields are selected. Adding Country and Employee to the previously selected categorical field Product results in the next image. Now the chart is displaying the Total[Sales] for each Employee within each of the combinations of Product and Country. For example, the product LaserJet was sold in two countries as shown in the top two labels of the chart: Laser Jet - France, and Laser Jet - China.
Another way to represent this hierarchy is by selecting the
Tree option in the
Output menu. The results are shown below. Note that now the Total[Sales] for LaserJet are shown to be 22,082 which are divided as 9,320 for China and 12,762 for France. In France Noah sold 7,234 and other Employee Sales are not displayed as the small values are being filtered by the (optional)
Threshold value of 3,000 (see image above).
BEST Viewpoints can process virtually any combination of analysis fields (categorical and non categorical). This is a very powerful and convenient feature of the program however, this may allow the user to ask for generating more information than needed. To avoid this situation and to processing time efficiently the program will ask the user to confirm when the current setup results in too many groups of data for analysis. The image below shows the warning message generated when the current setup exceeds the max bins limit. The user may continue and complete calculations or stop the program and make the necessary changes to get more meaningful results. The Max Bins limit can be modified in the Options menu (Evaluation+Options).
Build Options
The
Build tab (besides Analysis) provides options to modify the results obtained by selecting the
Analysis and
Group-By Fields. The
Setup tab shows basic options like plot
Orientation,
Plot Label, and pane
Size Action. Most of these options have intuitive meaning so not all of them will be described in detail. In fact, reading the
Mathematica documentation for the selected type of chart makes the Build menu more meaningful as most of the options shown here are chart options documented in
Mathematica. Additionally, there are mouse activated tooltips which pup up a description of the option when the user puts the mouse over the button, pop up menu, or similar graphical interface. For example, the tip associated to Size Action is shown in the image below.
The
Calculate menu provides ways of modifying the analysis made by default. For illustration purposes the example below shows the analysis for the Max[Sqrt[Sales]] instead of the default Total[Sales]. Besides the category values for Country have been sorted ascending.
A more interesting example of a data transformation would be when the target analysis field is a string and a function to convert that string into numbers is used. For example a function to count the number of times that a sequence of characters or a word is found in the string. The example below counts for each Employee, the number of records (transactions) which the last three letters of the product name is Jet.
Finally, the
Annotate tab provides options to modify the bar labels. The
Threshold of 5000 in the example below is used to avoid showing the label for small bars such that all labels are readable. Additionally, two lines have been drawn using the
Line option.
Categories
The
Categories tab shows several options to manage categorical fields. For example consider that for some reason Laptop needs to be excluded from the analysis. This is done be deselecting Laptop from the menu and the result is that now Total[Sales] are calculated with Laptop excluded from data. Note that the analyst setup is summarized in the plot label.
Additionally, in the image below the user asked BEST Viewpoints to display only the
Top 2 selling products (
Part sub menu). Note that the analyst setup is summarized in the plot label and that now not
All values are shown in the output but only the top 2. As shown in the screen shot below, there is a tooltip (below 'Part') that will summarize the current setup for the selected category. The information in this tooltip is automatically updated every time the setup for the category is changed.
Selecting a second category activates the Hierarchical analysis in the sense that now the analysis is performed hierarchically within each category. That is to say, in this example the former by-product analysis is now grouped by the values of the newly selected category Country. In this case, to provide the top three countries where each of the top two selling products were sold. Thus, the hierarchy used for analysis is determined by the order in which categories are selected which is the order in which they are displayed in the
Categories tab.
Pareto Analysis using the Categories Menu
There are several buttons in the Categories menu that can be used to modify the analysis made by Hierarchical Viewpoints. In the
Part sub-menu we have previously shown how to select
Top (or bottom -
Bot) category values in terms of the
Summarizing Function used (e.g.
Total[Sales]).
The example below shows how to select the 80% most influential products in terms of Total[Sales]. This is done by selecting the % check box which is by default deselected.
Furthermore, by selecting the weighting checkbox (
W) we get those products that are responsible for at least 80% of the Total[Sales]. Again, the tooltip describes how to make the calculations to identify these products.
Calculating the percentages of Total[Sales] (see Build, Calculate menu below) helps understanding that the cumulative percentage of Total sales associated to the three products identified is at least 80% as expected (42+32+11 = 85%).
Note that provided the hierarchical nature of the analysis made in this module, we could identify the products that are responsible for 80% of the Total[Sales] for each country by inserting that category in the analysis. The results are shown below.
Finally, as an example that there are no limits on the number of categorical fields used for analysis, the next example shows: The countries with associated Top 3 Total[Sales]. Then, for each country, the employees that are responsible for at least 90% of the Total[Sales] in each country, and for each employee, the products that make at least 80% of the employee's Total[Sales].
The next screen shot is the same analysis as the previous one but now displaying the
Output as
%Tree. This format may be preferred for a clearer description of the hierarchical analysis done.
There is no limit on the number of categorical fields that can be selected from the menu. Just as an example, the next screen shot shows analysis with four categorical fields: Sales by Product, Country, Employee, and MonthNo.
Hierarchical Viewpoints Algorithm
This is a brief description of the algorithm behind Hierarchical Viewpoints.
1. Select from data the fields corresponding to selected 'Field' and selected 'Categories'
2. If needed, use the order in which categories were selected to partition data by the intersection of all values associated to 'Categories',
3. If needed, transform data by applying Transforming Function to each subgroup,
4. If needed, calculate summary function for all subgroups in data,
5. Filter data by picking from each 'Category' the selected 'Category Values' (call the new set of data 'summarized data'),
6. In the order each 'Category' was selected take from 'summarized data' the corresponding 'Part' (see dynamic tooltip below 'Part'). This means: Partition 'summarized data' by the first 'Category' and from each subgroup select the corresponding part.
7. For each of the new subgroups repeat the partitioning process and taking the corresponding part until all Categories are evaluated. This process creates a tree or hierarchy of grouped data which is then plotted or analyzed by specifications like 'Perspective'.
Top Menu
In this menu there are several options available to modify the output. The first three (
Chart,
Output,
Hierarchy Level) may affect the analysis made while the last two only change the appearance (Aspect Ratio, Image Size, Pie Size, Font Size, and Arc Thickness).
Chart and Graph Type
There are several types of charts available to represent the output.
Pareto creates an ordered version of
Bars while
Stacked Bars and
Percentile Bars are more efficient in organizing the output. The other types of charts available are
Lines,
Box Plots, and confidence intervals for the sample mean (
95% Mean CI) and the sample standard deviation (
95% Std CI).
When Tree or %Tree Output options are selected (discussed in the next section) the Chart menu is transformed into the Graph Type menu. This menu only allows the user to select between Tree, 2D Graph Plot and 3D Graph Plot which are ways to represent the output in form of a web or tree.
Output
There are six potential options that can be used to display analysis output. By default the
Leaves Output tab is selected which shows the analysis just discussed. Selecting the
Tree output provides more details about the same analysis and displays it in the form of a GraphPlot or TreePlot. In the example shown the top two stems of Total[Sales] are 23,203 for Desktop and 85,129 for Inkjet. Similarly, the top two stems for Desktop are 8,312 for Germany and 14,891 for France.
Note that the Setup tab in the Build is dynamically updated with options to modify the Graph and the
Pie Charts in the output.
The summary information calculated and used to make the previous plot can be seen when the
Leaves Data output is selected.
The same information in the plot above can be displayed as Percentages by selecting
%Tree as shown below. The graphical arrangement or position of the information could vary but its interpretation remains unchanged. For example, note that the Total Sales of 85,129 associated to Inkjet represent 61.66% of the total sales in the dataset being analyzed excluding that belonging to the product Laptop which has been deliberately excluded from the analysis.
The information generated from the analysis can also be shown in tabular form as displayed below (
% Tree Data). Note that in the data for Tree and %Tree the arcs are represented as Rules.
Data
On the
Data tab the user may choose to analyze the complete dataset or just a
Sample from it. This may be used for statistical purposes or just to have an idea of the results when the dataset is too large and completing the analysis takes longer. Note that there is a bar indicating what fraction of the original dataset is being analyzed (
Random Fraction is 75% in the example shown below).
By
Random Sequence it is meant that the sample is taken respecting the order in which data is contained in the target dataset. This may be useful when the selected Perspective is a line plot and the underlying x axis is associated to the order in which data is provided. An example would be in the Exploratory Viewpoints module when data is originally ordered by date and the chart selected is a DateListPlot.
Another option in the Data tab is the
Source data. By default this is set to the
Main Dataset but the user may want to analyze the output of the Basic Spreadsheet or any of the other options there displayed. Note the button to show the contents of the selected data in spreadsheet format.
Note the
Pick and
Play options in the data tab. The Pick option was just discussed. The Play option allows for dynamically changing the source data as a window.
Cross Tabs
So far the results of our analysis was that displayed on the
Root tab of the
Hierarchy Level. However, the user might also deploy as
Cross Tabs the same analysis that has been setup in the Root. Cross Tabs can be defined in one or two dimensions. Note that it might be good to have small image sizes at the Root level as the Cross Tab analysis will create many realizations of the Root setup.
The example below presents cross tabs for MonthNo for the analysis presented in the previous image.
In Cross Tabs the analysis is not performed automatically but it is completed when the user clicks on the Analysis button. Note that all the setup defined on the Standpoint is completely preserved on the Cross Tabs.
The
Multi-D Cross Tabs is another
Hierarchy Level similar to the Cross Tabs tab but it displays results independently for each combination of the selected values of the crossed categories for which there is data. Thus, if there is an empty combination (which are common is cross tabs) this is just not displayed in the output. Additionally, Multi-D Cross tabs can be defined on more than two categorical fields.