Video How-To: Boost Query Performance with Materialization

How To: Boost Query Performance with Materialization

Introduction

In the world of data exploration and management, performance and consistency are two critical factors that can impact the efficiency and effectiveness of data retrieval processes. Materialization is a technique that can help address these challenges. In this post, we’ll delve into materialization—what it is, when and why to use it, the requirements, factors that determine the frequency of scheduling, and its limitations and caveats.

What is Materialization?

Materialization is a powerful technique used in databases and data warehouses where query results are precomputed and stored for later use. It is particularly useful for complex queries or calculations that can be time-consuming. By leveraging precomputed results, materialization can significantly reduce the time needed for data retrieval.


(Pictured Above) Here, we see the query history showing two separate instances of the same query for a 175 million row table in Sigma. The top line query has been materialized, and returns results in 12.6 seconds. The bottom row shows an unmaterialized query which runs for over 2 minutes before being canceled due to reaching the connection’s default timeout limit.

Prerequisites for Materialization

Before implementing materialization, consider the following requirements:

  1. Data Source: You need a data source from which the data will be fetched, such as a relational database, data lake, or other data storage system.

  2. Query Definition: Define the query you want to materialize, including all necessary information to retrieve the data and perform desired calculations or aggregations.

  3. Enable Write-Back: The connection in which your data source resides must have write back enabled, as your materialized table must utilize storage space in the same connection as it’s sources

  4. Schedule: A scheduler is required to periodically refresh the materialized results as the underlying data changes.

When and Why Should You Use Materialization?

Consider materialization in the following scenarios:

  1. Slow Queries: Materialization can significantly speed up data retrieval for queries that take a long time to run due to complexity or data size.

  2. Repetitive Use: Precomputing results saves time and resources when the same query is run frequently by different users or applications.

  3. Consistency: Materialization ensures consistent results across different users or applications since the same precomputed results are used.

Factors Determining the Frequency of Scheduling

When setting the schedule for a materialization, consider the following factors:

  1. Data Volatility: Frequently changing data may require more frequent refreshes.

  2. Query Complexity: Less frequent refreshes may be beneficial for complex queries to reduce the computational burden.

  3. Usage Patterns: Frequently accessed materialized data may need more frequent updates.

  4. SLAs and Business Needs: Specific business requirements or service level agreements (SLAs) may dictate the refresh frequency.

How to Configure Materialization

  • For Workbook Elements
    1. Navigate to the 3-Dot menu in the upper right hand corner of the element
    2. Select ‘Schedule Materialization (BETA)’. It will automatically pre-populate the element from which this menu was accessed as the ‘Selected Element’

  1. Ensure that the correct grouping level is selected. If you do not wish to materialize all columns, you can select a higher grouping level of particular relevance.

  2. Set your schedule. You’ll need to determine if the materialization should run daily, weekly, monthly or on a custom schedule. Note: Custom schedules are defined by Cron Strings

  3. Set a timezone for the corresponding materialization schedule. Typically this would be your local timezone, or your Sigma org’s default timezone.

  4. Click ‘Save Schedule’. If this is a new schedule, your materialization will begin running upon a successful save. If the element has not yet been published, the materialization will begin upon publication.

  5. Upon success, you will see a green check mark appear in the lower right hand corner of the (i) icon that appears to indicate a successful completion

  6. Refreshing a materialization

    8.1. To refresh your materialization while in the schedule modal, click on the 3-dot menu for the schedule line item corresponding to the element, and select ‘Materialize Now’


    8.2. To refresh your materialization while in a regular workbook tab, hover over the menu in the upper right corner of the element, and click the ‘View Materialization Info’ badge. In the pop-up that appears, select the ‘Materialize Now’ option

  • For a Dataset
    1. Ensure that the following conditions have been met in advance of materializing:

      1.1. The dataset has been published, and you have been granted ‘Edit’ access to it.


      1.2. The connection has write back enabled

      1.3. You have a Creator or Admin license type with Materialization privileges

    2. Navigate to the dataset’s Materialization tab. If you see the ‘Create Schedule’ button or a pre-existing schedule, you should have all of the prerequisites covered.

    3. Click ‘Create Schedule’ and follow the same process outlined in the steps above. Once you click ‘Save’, your materialization will begin

Limitations and Caveats of Materialization

Materialization has its limitations and caveats:

  1. Storage Overhead: Materialized data requires additional storage space, which can be significant for large volumes of data.

  2. Latency: Precomputed results may not reflect the most recent data changes due to periodic refreshes.

  3. Maintenance: Managing and maintaining materialized results and their refresh schedules adds complexity to the data management process.

  4. Cost: Consider the added storage and computational costs associated with materialization.

  5. System Functions: Use of system functions is not supported in Materialized Datasets or workbook elements.

Conclusion

Materialization is a valuable technique that can enhance query performance and ensure consistent results. However, it is essential to consider the requirements, use cases, scheduling factors, and limitations before implementing materialization in your data management process. Stay tuned for future posts where we’ll explore topics like referencing a materialized dataset in a workbook and parent-child relationships.

2 Likes