How to use a top N filter / control for visualizations

Applying the top N filter to visualizations is notoriously tricky for many Sigma users. This post is a deep-dive intended to set you up for success when filtering the X-axis or color-by category.

Please note: although all of the following examples use bar charts, the same principles can be applied to any other visualization type.

Example 1

Say you have a bar chart with store region in the X-axis and count distinct of store keys in the Y-axis.

If you want to show the three regions with the highest count of distinct store keys, it might be tempting to use the top N filter on the X-axis column, but this would actually return the wrong results (when applying the top N filter to a text column, the top N will be determined by alphanumeric sorting). Instead, use the top N filter on the Y-axis column to return the three regions with the highest count of distinct store keys.

The key concept to remember here is that the top N filter should be applied to whatever column that needs to be ranked. If you want to do an alphanumeric ranking, apply the ranking to an alphanumeric (text) column. If you want to rank by numbers, rank by a numeric column.

Example 2

Now, say you had a bar chart colored by category, such as this:

and you want to show the sum of profit over all months for the 10 states with the highest sum of profit over time. The implementation of the top N filter gets a bit trickier when you want to apply it to the aggregate ranking of a color-by category, but you essentially have two ways of going about doing this:

  1. First, sort the color-by category column according to the Y-axis column. Then, manually right-click to exclude each legend category that doesn’t fall in the top 10.
  2. Take a step back (aka, start over) to restructure your parent table so that you can use the top N filter (this is the only appropriate option for analyses that are meant to persist through time).

The first option is easy and potentially helpful for one-off ad-hoc analyses, although it could be time consuming depending on how many categories you have in the legend. The benefit of this option is that you can work entirely within the visualization element, and you don’t need to worry about re-creating the visualization. However, the serious drawback is that the results are not necessarily dynamically accurate (that is, if you come back to the unchanged visualization a year later, the same categories will be excluded even if some of them somehow moved into the top 10). That said, the second option is “best-practice” - so please follow the process detailed below whenever possible:

  1. Make the following groupings/calculations in the parent table (which should already be present, whether in a hidden page of the workbook or not):

    • Group by the color-by category first (in this example, it would be store state).
    • Group by the X-axis column second (in this example, it would be month of date).
    • In the second grouping level, create a calculation column to represent your Y-axis values (in this example, the column would be called [Profit] and the formula would be Sum([Price]*[Quantity]-[Cost])).
    • In the first grouping level, create a calculation column to sum over the Y-axis values in the second grouping level (i.e., write Sum([Profit]) to get each state’s sum of profit across all months).
  2. Apply the top N filter to the calculation column ([Sum of Profit]) under the first grouping level to ensure that the top N states are chosen dynamically as time goes on.

  1. With the [Month of Date] grouping expanded, create a new child visualization element.
  2. Re-create the visualization (in this example, use [Month of Date] in the X-axis, [Profit] in the Y-axis, and [Store State] in the color-by category) to see the top N states with the highest profit.
  3. If desired, sort the color-by category column according to the Y-axis column.

Example 3

If you wanted to use a bar chart to show the percent of total costs by year of date and product type (i.e., you want [Year of Date] in the X-axis, [Sum of % of Total for Sum of Cost] in the Y-axis, and [Product Type] as the color-by category), and you only wanted to show the top N X-axis values, you’d need to adopt a slightly different approach than the one described previously:

  1. Group by the X-axis column first (in this example, use [Year of Date]).
  2. Group by the color-by category second (in this example, use [Product Type]).
  3. Under the second grouping level, calculate the [% of Total for Sum of Cost] using the formula PercentOfTotal(Sum([Cost]), "grand_total").
  4. Under the first grouping level, calculate the [Sum of % of Total for Sum of Cost] using the formula Sum([% of Total for Sum of Cost]).
  5. Apply the top N filter to the calculation column under the first grouping level (remember the first concept taught in this article - apply the top N filter to the column which is actually being ranked).

  1. Create a child visualization using [Year of Date] in the X-axis, [Sum of % of Total for Sum of Cost] in the Y-axis, and [Product Type] in the color-by category.

Please note how the top N filter is always applied to the first grouping level in the parent table, and the first grouping level reflects whatever column the top N filter should apply to (so, if you want to show the top N color-by categories, group by the color-by category first; if you want the top N to apply to the X-axis values, use the X-axis column in the first grouping level).

Also note that the second grouping level should use whatever column needs to be aggregated over (in this example, since we wanted the top N X-axis values across all [Product Type] categories, we used [Product Type] as the second grouping level; in the previous example, we wanted the top N color-by categories across all X-axis values, so we used the X-axis column [Month of Date] as the second grouping level).

Example 4

Finally, let’s say you want to see the top N product family categories along with an “Other” category, so you can show the total profit each year (as opposed to showing the total profit of the top N product families only). The process of setting this up is a bit more involved, but it can be done by working within the parent table and using a top N parameter (rather than a top N filter, as used above).

The steps are as follows:

  1. Create a new control element (choose the “Textbox” option) and set the control type to “parameter”. Then, set the value type to “Number” and change the control ID to top-N-param (if desired, change the display text to something easier to read, like “Top N parameter”). Make sure you’ve checked the box that says “Required” if you want users to choose the top N before the workbook renders results.
  2. In your parent table:
    • Group by the color-by category first (in this example, we’ll use [Product Family]).
    • Group by the X-axis column second (in this example, we’ll use [Year of Date]).
    • Under the second grouping level, calculate your Y-axis values.
      • In this example, we calculate [Profit] using the formula:
        Sum([Price]*[Quantity]-[Cost])
    • Under the first grouping level, calculate the aggregation over Y-axis values.
      • In this example, we’ll calculate [Sum of Profit] using the formula Sum([Profit])
    • Under the first grouping level, add three new columns using the following three formulas:
      • [RankDense of Sum of Profit] = RankDense([Sum of Profit], "desc") (assign the lowest rank to the product family with the highest sum of profit)
      • [Adjusted Rank] = If([RankDense of Sum of Profit] > [top-N-param], [top-N-param] + 1, [RankDense of Sum of Profit]) (keep the original rank for the top N product families, and assign the next-highest number to be the rank of all product families that will be grouped into the “Other” category)
      • [Adjusted Product Family] = If([Adjusted Rank] = [top-N-param] + 1, "Other", [Product Family]) (keep the original product family name for the top N product families, and group the others into the “Other” category)
  3. If desired, sort the [RankDense of Sum of Profit] column ascending.

  1. Create a child visualization element using [Year of Date] in the X-axis, [Profit] in the Y-axis, and [Adjusted Product Family] as the color-by category.
  2. If desired, sort the color-by category column according to the [Adjusted Rank] ascending (use the custom sort option and choose the “Min” or “Max” aggregation for the [Adjusted Rank] column).

That concludes this lesson on top N filtering… please comment if you have any questions or if you think I missed any possible scenarios! Happy to go into further detail upon request.

6 Likes

Added FiltersControls