Improve utility of scatter plot functionality

  • Describe Feature

Scatter plots are valuable visualizations for revealing the details of the relationship between two columns over a large data set. Sigma’s current scatter plot functionality is limited to 25K points and plots the first 25K points in the data table. This could be addressed in a number of ways (in order of preference):

  • Eliminate the 25K limit. Sigma’s value prop is the ability to visualize large volumes of data, and scatter plots are typically not used for summary data.

  • Create functionality to select a random sample of records from a parent table so that the scatter plot will look like the overall distribution, rather that whatever happens to be at the top of the table.

  • What is the use case?

Plotting relationship between two columns over large data sets. For example, predicted vs actual sales prices for properties sold in the US in a given year.

Exploration between of values in training set and model predictions over national scale data

  • How often would this feature be used?

Weekly

  • What is the impact of this feature on your organization?

Eliminates the need to create visualizations in other tools/languages. Enables rapid exploration of relationships within data sets and between training data and model output