I have a fairly challenging calculation: Based on start date/end date I need to know if an employee was employed at all in a given month and then be able to count those for the last 12 months. I was able to do this in SQL in my source (snowflake). However I need the appropriate month/year names in the column headers and the SQL won’t let me set dynamic column names that Sigma can read in.
My base dataset without the SQL-fu is (simplified):
Employee | Start Date | End Date
Bob | 9/30/2022 | 11/15/2022
Goal would be a dataset something like this:
Sept 2022 | Oct 2022 | Nov 2022…
15 | 28 | 29
Is there a way to either:
Dynamically set a column name from within a workbook via formula or…
Do this calculation within a workbook and have column names with Month/Year?
So there’s a few problems with that which I see. Firstly it doesn’t take into account years and I’m looking at a rolling 12-month period. So if I was employed from November 2021 - February 2022 it wouldn’t compute right. Secondly I don’t think that approach conforms to the dataset I have, since I only have startdate and enddate.
One way I have solved this in the past is manually adding a column per month I am looking at, and then using a calculation like [Start Date] < MakeDate(2022,12,01) AND [End Date] > MakeDate(2022,12,01) And just updating the MakeDate calculation for each month I want represented.
This works but involves a lot of manual creation of columns and formulas. The benefit is that it is conceptually simple, and just gives you a TRUE or FALSE value for each month for each employee.
I feel like you could also do something with joining in a date dimension but that’s not something I’ve tried or fully reasoned through.
So every month I’d need to go in, edit the workbook and add in the new column appropriately. Yeah, that’s always been something I figured I could do, I just wasn’t super jazzed about the idea.
I hadn’t thought about using a separate date dimension but I’m not sure it would be any different from just using month(date). Unless it’s been too long a week and my brain is going out.
I created a date dimension table that was just a list of dates for all the months I wanted to do analysis for. I then did a cross join so that every employee had a row for all of the months in the dimension table.
Next, I created a new column with the formula DateTrunc("month", [Start date]) <= [Date] and If(IsNotNull([End Date]), DateTrunc("month", [End Date]) >= [Date], True)
This checks if the start month is equal to or before the date I am testing for. It then check if there is an end date, and if so if the end date is after or equal to the date I am testing. If there is no end date, then the expression evaluates as true.
Last, I grouped by my date dimension. This then gives me a list of all employees and a true false value for whether they were employed in the given month. I then added a CountIf function to count the number of employees for each month.
Benefits of this approach:
Easy to upload a date dimension table that covers a good chunk of time. While there is still some maintenance required, swapping in a new date dimension table is straight forward and quick.
Drawbacks:
You are adding a lot of rows to the table and that may get unwieldy. You will have number of months in the dimension table + 1 rows for every employee.
So looking at this it’s possible so long as we also included a comparison on Year([Start Date]) and Year([End Date]). I’ll play around with it. But yeah, ~365 days * X employees is a lot of data. I suppose I could materialize it. I also realized I’d need to update the date dim table but I think I can script that out automagically in snowflake pretty easily. Or really just have that be a view.
Okay, here’s the solution I got and I’ll give the details in case this helps others down the line. For starters I created a dimension view as follows. It’s limited seeing as I knew I only wanted a year back and wanted it to change dynamically when as the months changed (I know I didn’t take into account leap years but it’ll be right 99.999% of the time and I’m not going to stress about it).
create or replace view v_dim_dates_last_year as
with cte_date as (
select
dateadd(day, seq4(), dateadd(
dd, (-1 * day(current_date()-1)), --first of the month
dateadd(yy, -1, current_date()) --1 year ago
)
) as my_date
from
table(generator(rowcount => 400)) --365 days + max of 30 give or take to cover back the previous year-ish
)
select
to_date(my_date) as date,
to_timestamp(my_date) as datetime,
year(my_date) as year,
month(my_date) as month,
monthname(my_date) as monthname,
day(my_date) as day,
dayofweek(my_date) as dayofweek,
weekofyear(my_date) as weekofyear,
dayofyear(my_date) as dayofyear
from
cte_date;
I created another view against my core dataset and that view cross joined as mentioned earlier. That I loaded in as a new dataset to Sigma.
To make it simple(-er) I extracted the month and year as separate columns from both the term date and hire dates. Then I made this monstrosity to count within the year/month grouping.
CountDistinctIf([Employee Number], ([Hire Year] < [Year] or ([Hire Year] = [Year] and [Hire Month] <= [Month])) and ((([Term Year] = [Year] and [Term Month] >= [Month]) or [Term Year] > [Year]) or IsNull([Termination Date])))
I also filtered out any rows where [Date] > today() since I don’t need to put in future dates.
Once I had an all active staff by month grouped table, I could duplicate it and filter it to certain kinds of staff, re-join the two together and do things like this:
I also want to make sure to mention that you can do a cross join directly in the Sigma join interface. If you choose “custom formula” for both join keys and just put in “1” for both, then it will be a cross join.