- Describe Feature
Scatter plots are valuable visualizations for revealing the details of the relationship between two columns over a large data set. Sigma’s current scatter plot functionality is limited to 25K points and plots the first 25K points in the data table. This could be addressed in a number of ways (in order of preference):
-
Eliminate the 25K limit. Sigma’s value prop is the ability to visualize large volumes of data, and scatter plots are typically not used for summary data.
-
Create functionality to select a random sample of records from a parent table so that the scatter plot will look like the overall distribution, rather that whatever happens to be at the top of the table.
- What is the use case?
Plotting relationship between two columns over large data sets. For example, predicted vs actual sales prices for properties sold in the US in a given year.
Exploration between of values in training set and model predictions over national scale data
- How often would this feature be used?
Weekly
- What is the impact of this feature on your organization?
Eliminates the need to create visualizations in other tools/languages. Enables rapid exploration of relationships within data sets and between training data and model output