Problem:
The structure of the data sometimes is not in a manner useful for modeling and analysis. Looking at the data and the end result we would like it to transform to, the natural inclination is to try to use Sigma’s Pivot tables. But soon you will realize that It’s not giving us the expected result.
For example,
I have data that looks like this:
(i)
How do I get it into:
(ii)
How To:
In the screenshot example above, If you want to analyze the data by Interest(s) and calculate the total True / False for each of the interests, It can be difficult to work with this table because True/False are spread across multiple Interest columns in (i).
This is where an “Unpivot” SQL Operation will be helpful.
Unpivot allows you to transform columns into rows.
Let’s walk through how we can achieve this through a Custom SQL that utilizes the Unpivot
function for Snowflake
Data Warehouse for demonstration purposes. But equivalent functionalities exist for other Data Warehouses and Databases, for example: PostgreSQL
has cross lateral joins
to achieve the same.
Step 1:
Let’s grab the source dataset/table’s Data Warehouse view name for which we would like to unpivot the columns into rows.
“Copy Path” for the Dataset name as per screenshot above.
Step 2:
Let’s create a new table with source as WRITE SQL as shown in the screenshot below:
Step 3:
Let’s write the custom sql with the Unpivot function and use the Data Warehouse view name from Step 1
to do the select on, as shown in the screenshot below:
Step 4:
Resulting table from the custom sql above:
Step 5:
Now, You can use standard operations, such as GROUP BY and SUM, to perform analysis on the data.
In the below screenshot, I have it Grouped by Interest column and related Calculations
for two new columns to have 1 s for True and False, to aid in the sum of total True and False for each Interest.
Note: The emails in the Email column are all distinct, serving as a Primary key within the source table in the Problem statement.
Step 6:
Final result : Unpivot SQL
table, after creating new calculated columns for Sum of True/ False grouped by Interest (and hiding the intermediary calculated columns from Step 5)
Resources:
Sample csv for the Source table that I worked on for this example.
Multiple SPORT INTERESTS.csv (419.0 KB)