AB - testing the introduction of new technologies into the game.
AB - testing the introduction of new technologies into the game.
- 1. Now, read in the data.
- 2. Checking the distribution
- 3. AB Test
- 4. Recommendation to Client
Recently I found an interesting dataset with indicators for the game broken down by a number of indicators that were recorded before and after the innovation. It was suggested to make a comparison of how the new technology had an impact on the product. To do this, I decided to use the AB test. Based on the analysis, it is necessary to make a business decision on whether to adhere to the baseline (option A) or promote option B.
To get started, let's import our libraries.
import pandas as pd
import numpy as np
import seaborn as sns
from scipy.stats import mannwhitneyu
from scipy.stats import ttest_ind
from scipy.stats import norm
from scipy.stats import shapiro
from scipy.stats import levene
from scipy.stats import normaltest
import matplotlib.pyplot as plt
Create a function for bootstrap
def get_bootstrap(
data_column_1, # numeric values of the first sample
data_column_2, # numeric values of the second sample
boot_it = 10000, # number of bootstrap subsamples
statistic = np.mean, # statistics of interest to us
bootstrap_conf_level = 0.95 # significance level
):
boot_len = max([len(data_column_1), len(data_column_2)])
boot_data = []
for i in range(boot_it): # extracting subsamples
samples_1 = data_column_1.sample(
boot_len,
replace = True # return parameter
).values
samples_2 = data_column_2.sample(
boot_len, # to preserve the variance, we take the same sample size
replace = True
).values
boot_data.append(statistic(samples_1-samples_2))
pd_boot_data = pd.DataFrame(boot_data)
left_quant = (1 - bootstrap_conf_level)/2
right_quant = 1 - (1 - bootstrap_conf_level) / 2
quants = pd_boot_data.quantile([left_quant, right_quant])
p_1 = norm.cdf(
x = 0,
loc = np.mean(boot_data),
scale = np.std(boot_data)
)
p_2 = norm.cdf(
x = 0,
loc = -np.mean(boot_data),
scale = np.std(boot_data)
)
p_value = min(p_1, p_2) * 2
return {"p_value": p_value}
game = pd.read_csv('RhinoGames.csv', skiprows=1)
game.head()
b. Use the call below to find information about the dataset.
game.info()
c. Separate the various characteristics used for the AB test into separate datasets
rvaw = game[['A', 'B']][:642]
# Interstitial Ads Watched
iaw = game[['A.1', 'B.1']][:839]
iaw.columns = ['A', 'B']
# User Progress Level
upl = game[['A.2', 'B.2']][:2115]
upl.columns = ['A', 'B']
# Daily Session Number
dsn = game[['A.3', 'B.3']][:20]
dsn.columns = ['A', 'B']
# Session Duration (in seconds)
sd = game[['A.4', 'B.4']][:2540]
sd.columns = ['A', 'B']
shapiro(rvaw['A'].dropna())
Based on the Shapiro-Wilkes test, it can be concluded that the distribution of data Rewarded Videos Ads Watched (A) differs significantly from normal, because pvalue = 3.58e-24 < 0.05.
shapiro(rvaw['B'].dropna())
Based on the Shapiro-Wilkes test, it can be concluded that the distribution of data Rewarded Videos Ads Watched (B) differs significantly from normal, because pvalue = 6.44e-23 < 0.05.
shapiro(iaw['A'].dropna())
Based on the Shapiro-Wilkes test, it can be concluded that the distribution of data Interstitial Ads Watched (A) differs significantly from normal, because pvalue = 2.63e-33 < 0.05.
shapiro(iaw['B'].dropna())
Based on the Shapiro-Wilkes test, it can be concluded that the distribution of data Interstitial Ads Watched (B) differs significantly from normal, because pvalue = 2.35e-29 < 0.05.
shapiro(dsn['A'].dropna())
Comment: H0 hyptothesis was rejected because pvalue = 3.4377018209852395e-07 < 0.05. Statistically, it could be rejected that the normal distribution assumption of the data set in which the Daily Session Number(A) was measured was met.
shapiro(dsn['B'].dropna())
Comment: H0 hyptothesis was rejected because pvalue = 2.6840217515200493e-07 < 0.05. Statistically, it could be rejected that the normal distribution assumption of the data set in which the Daily Session Number(B) was measured was met.
- H0 : Variances are homogeneous.
- H1 : Variances are not homogeneous.
levene(rvaw['A'].dropna(), rvaw['B'].dropna())
Comment: Since the pvalue = 0.92 > 0.05, the H0 hypothesis, that is, the variances were not statistically rejected as homogeneous.
levene(iaw['A'].dropna(), iaw['B'].dropna())
Comment: Since the pvalue = 2.98e-35 < 0.05, the H0 hypothesis, that is, the variances were statistically rejected as homogeneous.
levene(upl['A'].dropna(), upl['B'].dropna())
Comment: Since the pvalue = 1.10e-13 < 0.05, the H0 hypothesis, that is, the variances were statistically rejected as homogeneous.
levene(dsn['A'].dropna(), dsn['B'].dropna())
Comment: Since the pvalue = 0.95 > 0.05, the H0 hypothesis, that is, the variances were not statistically rejected as homogeneous.
levene(sd['A'].dropna(), sd['B'].dropna())
Comment: Since the pvalue = 0.95 > 0.05, the H0 hypothesis, that is, the variances were not statistically rejected as homogeneous.
rvaw.describe()
sns.boxplot(data=rvaw)
sns.displot(rvaw, stat="probability")
iaw.describe()
sns.boxplot(data=iaw)
sns.displot(iaw, stat="probability")
upl.describe()
sns.boxplot(data=upl)
sns.displot(upl, stat="probability")
dsn.describe()
sns.boxplot(data=dsn)
sns.displot(dsn, stat="probability")
sd.describe()
sns.boxplot(data=sd)
sns.displot(sd, stat="probability")
- H0 : There is no statistically significant difference between the two groups.
- H1 : ... there is a difference
ttest_ind(rvaw['A'].dropna(), rvaw['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the T Test was pvalue = 0.84 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Rewarded Videos Ads Watched (A) and Rewarded Videos Ads Watched (B).
ttest_ind(iaw['A'].dropna(), iaw['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the T Test was pvalue = 8.544370502591074e-34 < 0.05.
So, we found a statistically significant difference between the Intermediate Ads Watched (A) and the Intermediate Ads Watched (B).
ttest_ind(upl['A'].dropna(), upl['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the T Test was pvalue = 1.0974048105809188e-13 < 0.05.
So, we found a statistically significant difference between the User Progress Level (A) and the User Progress Level (B).
ttest_ind(dsn['A'].dropna(), dsn['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the T Test was pvalue = 0.96 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Daily Session Number (A) and Daily Session Number (B).
ttest_ind(sd['A'].dropna(), sd['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the T Test was pvalue = 0.78 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Sassion Duration (A) and Sassion Duration (B).
mannwhitneyu(rvaw['A'].dropna(), rvaw['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Mann-Whitney Rank Criterion was pvalue = 0.79 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Rewarded Videos Ads Watched (A) and Rewarded Videos Ads Watched (B).
mannwhitneyu(iaw['A'].dropna(), iaw['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the Mann-Whitney Rank Criterion was pvalue = 1.5014095689910844e-21 < 0.05.
So, we found a statistically significant difference between the Intermediate Ads Watched (A) and the Intermediate Ads Watched (B).
mannwhitneyu(upl['A'].dropna(), upl['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the Mann-Whitney Rank Criterion was pvalue = 7.967988711774177e-16 < 0.05.
So, we found a statistically significant difference between the User Progress Level (A) and the User Progress Level (B).
mannwhitneyu(dsn['A'].dropna(), dsn['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Mann-Whitney Rank Criterion was pvalue = 0.96 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Daily Session Number (A) and Daily Session Number (B).
mannwhitneyu(sd['A'].dropna(), sd['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Mann-Whitney Rank Criterion was pvalue = 0.07 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Session Duration (A) and Session Duration (B).
get_bootstrap(rvaw['A'].dropna(), rvaw['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Bootstrap was pvalue = 0.83 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Rewarded Videos Ads Watched (A) and Rewarded Videos Ads Watched (B).
get_bootstrap(iaw['A'].dropna(), iaw['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the Bootstrap was pvalue = 4.805207678906211e-39 < 0.05.
So, we found a statistically significant difference between the Intermediate Ads Watched (A) and the Intermediate Ads Watched (B).
get_bootstrap(upl['A'].dropna(), upl['B'].dropna())
Comment: The H1 hypothesis could not be rejected because the result of the Bootstrap was pvalue = 2.69955949765377e-14 < 0.05.
So, we found a statistically significant difference between the User Progress Level (A) and the User Progress Level (B).
get_bootstrap(dsn['A'].dropna(), dsn['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Bootstrap was pvalue = 0.96 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Daily Session Number (A) and Daily Session Number (B).
get_bootstrap(sd['A'].dropna(), sd['B'].dropna())
Comment: The H0 hypothesis could not be rejected because the result of the Bootstrap was pvalue = 0.75 > 0.05.
So, we could not reject that there was no statistically significant difference between the purchase amounts of the Session Duration (A) and Session Duration (B).
As a result of the analysis of the presented data, it was revealed that it is statistically impossible to confirm that their distribution is normal. In this connection, the comparison of samples was carried out using nonparametric methods. Statistically significant difference was found in Interstitial Ads Watched and User Progress Level. At the same time, the increase in the number of Interstitial Ads Watched did not affect the Daily Session Number and Session Duration (in seconds), which indicates the absence of a negative effect. The statistically significant increase in User Progress Level is probably due to the fact that the sample of users did not change. Also, in both cases, there is an increase in the spread of data.