Introduction
A/B Testing & Hypothesis Testing:
In the realm of datadriven decision making, A/B testing and hypothesis testing stand as powerful methodologies that fuel innovation and guide strategic choices. A/B testing has long been celebrated for its ability to compare different versions of a webpage, app interface, or marketing campaign, enabling organizations to optimize their digital presence. However, when combined with the rigor and structure of hypothesis testing, the potential for extracting valuable insights and making informed decisions is further amplified. In this article, we embark on a journey to explore the dynamic synergy between A/B testing and hypothesis testing, uncovering the unique strengths they bring to the table and how they complement one another.
Zscore
There are different types of statistical tests to explain difference, but today we will be going with a twotailed ztest. A twotailed ztest plays a crucial role in both hypothesis testing and A/B testing, serving as a statistical framework to validate or reject hypotheses and draw meaningful conclusions. In hypothesis testing, a twotailed ztest allows researchers to evaluate whether there is a significant difference between two population means, without making any specific directional assumptions. It enables them to determine if the observed data provides sufficient evidence to reject the null hypothesis and support the alternative hypothesis. Similarly, in the context of A/B testing, a twotailed ztest is employed to compare the performance of two variants and determine if there is a statistically significant difference between them. By calculating the zscore and comparing it to the critical value, A/B testers can make datadriven decisions on which variant outperforms the other or if there is no significant difference. Thus, the twotailed ztest serves as a fundamental statistical tool that underpins the hypothesis testing framework and empowers A/B testers to extract actionable insights from their experiments.
Letâ€™s Get Started!
Data Background
For todayâ€™s example we will be using sample data for a mobile game called â€śCookie Catsâ€ť, obtained through kaggle provided by user MĂĽrĹźide YarkÄ±n.
From MĂĽrĹźideâ€™s post on Kaggle:
Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. Itâ€™s a classic â€śconnect threeâ€ťstyle puzzle game where the player must connect tiles of the same color to clear the board and win the level.
As players progress through the levels of the game, they will occasionally encounter gates that force them to wait a nontrivial amount of time or make an inapp purchase to progress. In addition to driving inapp purchases, these gates serve the important purpose of giving players an enforced break from playing the game, hopefully resulting in the playerâ€™s enjoyment of the game being increased and prolonged.
But where should the gates be placed? Initially the first gate was placed at level 30, but in this notebook weâ€™re going to analyze an ABtest where we moved the first gate in Cookie Cats from level 30 to level 40. In particular, we will look at the impact on player retention. But before we get to that, a key step before undertaking any analysis is understanding the data.
Step 1: Load Necessary Data + Additional Prep
For this example, we will be utilizing Sigmaâ€™s upload csv feature to import two data elements:
Note: for this you will need to enable write access.
Our independent variable will be column â€śversionâ€ť â€“ whether users were placed in the condition of the first gate at level 30 (version = gate_30) or at level 40 (version = gate_40).
Our dependant variables will be:
 retention_1: the percentage of players who are still playing after 1 day
 retention_7: the percentage of players who are still playing after 1 week
 sum_gamerounds: the total number of rounds played within the first 2 weeks
To easily switch among these dependent variables, we will create a control element.
 Create New List Control named â€śDependent Variableâ€ť.
 Check: Required
 Uncheck: Allow Multiple selection & Show Null Option
 Add Values:
â€“ â€śGame Roundsâ€ť
â€“ â€ś1 Day Retentionâ€ť
â€“ â€ś1 Week Retentionâ€ť
 Add a new column titled â€śDVâ€ť:
If([DependentVariable] = "Game Rounds", [sum_gamerounds], [DependentVariable] = "1 Day Retention", CountIf([retention_1]), [DependentVariable] = "1 Week Retention", CountIf([retention_7]))
Step 2: Calculate Skewness
Statistical skewness is a measure that describes the asymmetry or departure from symmetry in a probability distribution. It indicates the extent to which the values of a dataset or a probability distribution are concentrated towards one tail compared to the other.
 Positive skewness: If a distribution has a positive skewness, it means that the tail on the right side of the distribution is longer or fatter than the left tail. The majority of the data points are concentrated on the left side of the distribution, while the right side has a few extreme values that drag the mean towards the right.
 Negative skewness: If a distribution has a negative skewness, it means that the tail on the left side of the distribution is longer or fatter than the right tail. The majority of the data points are concentrated on the right side of the distribution, while the left side has a few extreme values that pull the mean towards the left.
 Zero skewness: If a distribution has zero skewness, it means that the dataset is perfectly symmetrical. The left and right tails are of equal length, and the mean, median, and mode are all at the same point.
source
 Group by Version
 Inside Grouping calculate N for both groups  New Column in Grouping â€śNâ€ť â†’
Count()
 Inside Grouping calculate the Mean for both groups  New Column in Grouping â€śSDâ€ť â†’
Avg([DV]))
 Inside Grouping calculate the SD for both groups  New Column in Grouping â€śMâ€ť â†’
Stddev([DV]))
 Outside of grouping create a new column â€śDeviationâ€ť 
Power(([DV]  [M]), 3)
 Inside Grouping calculate a new column â€śSkewâ€ť 
Sum([Deviation]) / (([N]  1) * Power([SD], 3))
Step 3: Calculate Kurtosis
Kurtosis is a statistical measure that describes the shape of a probability distribution by quantifying the tailedness and peakedness of the distribution compared to a normal distribution. It provides information about the presence of outliers or extreme values in a dataset.

Excess kurtosis: Excess kurtosis refers to the kurtosis of a distribution minus 3. This measure is commonly used and allows for comparisons with a normal distribution. It can take positive or negative values.
 Positive excess kurtosis: If a distribution has positive excess kurtosis, it means that the distribution has heavier tails and a sharper, more peaked central region compared to a normal distribution. This indicates the presence of outliers or extreme values in the dataset, resulting in a higher concentration of data points in the tails.
 Negative excess kurtosis: If a distribution has negative excess kurtosis, it means that the distribution has lighter tails and a flatter central region compared to a normal distribution. This suggests that the dataset has fewer outliers and is less peaked.

Kurtosis independent of normal distribution: Kurtosis can also be considered without reference to a normal distribution. In this case, it measures the overall shape of the distribution without indicating whether it deviates from a normal distribution or not.
 High kurtosis: A high kurtosis value indicates a distribution with heavy tails and a sharp peak. This indicates a higher likelihood of outliers and extreme values.
 Low kurtosis: A low kurtosis value indicates a distribution with lighter tails and a flatter peak. This suggests a lower probability of outliers and extreme values.
 Outside of grouping create a new column â€ś2nd Moment Prestepâ€ť 
Power([DV]  [M], 2)
 Outside of grouping create a new column â€ś4th Moment Prestepâ€ť 
Power([DV]  [M], 4)
 Inside of grouping create a new column â€ś2nd Momentâ€ť 
Sum([2nd Moment Prestep]) / [N]
 Inside of grouping create a new column â€ś4th Momentâ€ť 
Sum([4th Moment Prestep]) / [N]
 Inside of grouping create a new column â€śKurtosisâ€ť 
[4th Moment] / Power([2nd Moment], 2)
Step 4: Calculate ZScore
Finally on to conducting a ztest which will inform us if there are any significant differences.
 In a table summary, create a new value called â€śTwo Tailed Z Scoreâ€ť using the following formula:
(First([M])  Last([M])) / Sqrt(((Power(First([SD]), 2) / First([N]) + (Power(Last([SD]), 2) / Last([N])))))
 In another table summary, create a new value called â€śPValue (Z Score Table)â€ť using the following formula:
Lookup([Z Score Table/PValue], Round([Two Tailed Z Score], 2), [Z Score Table/ZScore])
 In another table summary, create a new value called â€śSignificanceâ€ť using the following formula:
If([PValue (Z Score Table)] < 0.05, "Significant", "Not Significant")
Now we will explore our three dependent variables. As we cycle through we see that there is no statistical difference among â€śGame Roundsâ€ť or â€ś1 Day Retentionâ€ť, however there is for â€ś1 Week Retentionâ€ť. Namely, we can conclude that there is a significantly higher retention rate (19.0%) when the gate is at level 30 compared to when the gate is at level 30 (18.2%)