Materializing Datasets built on other Materialized Datasets

When materializing a dataset based on a materialized dataset, does Sigma take into account when the first datasets has finished materializing?
What’s the best way to stagger schedules to ensure the freshest data?

Looking into this more, it looks like when materializing a Dataset build on a materialized Dataset, we don’t hit the materialized table. We instead re-run the entire query and materialize the results.
To get the most efficient query, I ended up rebuilding from base sources instead of off of other datasets, since I was hitting some Snowflake memory errors. Letting Sigma optimize the full query instead of breaking it into part helped fix the issue.

This is a current design limitation of nested materialized datasets.

For example, if you have a materialized dataset C, which is sourced from materialized dataset B, which is in turn sourced from materialized dataset A.

When Sigma refreshes C, instead of using pre-materialized results of B, it instead will build a single nested query from all of the underlying datasets (as Erica described above), with the simplified SQL of dataset C under the hood looking something like this:

select * from
(select * from
(select * from …) A
) B

This type of design works fine with relatively small amounts of data and limiting nesting, but it has a scaling limitation (there are only so many levels that you can nest, before the performance drops and SQL becomes too complex for the CDW to handle).

The current workaround is:

If you have a complex nested design of materialized dataset, and you are working with Snowflake or BigQuery, change your source relationships between the child and the parent datasets as follows:

Pre-requisites: this will only work with Snowflake or BigQuery, and will require the warehouse Write permissions to be enabled