-
Notifications
You must be signed in to change notification settings - Fork 0
/
calculate_effects.py
51 lines (41 loc) · 2.78 KB
/
calculate_effects.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
'''
1. Obtain sample data:
Start by collecting the observed sample data from your study, including the number of cases and controls, means, standard deviations, and any other relevant statistics.
2. Choose significance level (alpha):
Decide on the desired level of significance for your study. The standard choice is often 0.05, but you can adjust it based on your specific research question and context.
3. Calculate effect size:
Use the observed means and standard deviations to calculate the effect size. The effect size depends on the specific statistical test being used. For example, for a t-test, Cohen's d is a commonly used measure of effect size.
4. Estimate power:
Once you have the effect size, sample size, and significance level, you can use a power analysis tool or formula to estimate the statistical power of your study. This will tell you the probability of detecting a true effect if it exists.
5. Iterate if necessary:
If the estimated power is not satisfactory, you may need to adjust the parameters (e.g., sample size, effect size, significance level) and recalculate until you achieve the desired level of power.
Here's a Python code snippet that demonstrates how you might perform a reverse power analysis for a t-test using the statsmodels library:
'''
import numpy as np
from sklearn.linear_model import LogisticRegression
# Observed sample data
a_cases = 33 # Number of exposed cases with outcome
b_cases = 66 # Number of exposed cases without outcome
c_controls = 75 # Number of unexposed controls with outcome
d_controls = 42 # Number of unexposed controls without outcome
# Create arrays for exposure and outcomes
# array of 1 for exposed and 0 for unexposed:
exposure = np.concatenate([np.ones(a_cases + b_cases), np.zeros(c_controls + d_controls)]).reshape(-1, 1)
# array of 1 for outcome and 0 for no outcome:
outcomes = np.concatenate([np.ones(a_cases + c_controls), np.zeros(b_cases + d_controls)])
# Ensure lengths match
if len(exposure) != len(outcomes):
print("Lengths of exposure and outcomes arrays do not match.")
else:
# Perform logistic regression
model = LogisticRegression()
model.fit(exposure, outcomes)
# Calculate effect size (odds ratio)
effect_size = np.exp(model.coef_[0][0])
print("Estimated effect size:", effect_size)
# Calculate odds ratio
odds_ratio = (a_cases / b_cases) / (c_controls / d_controls)
print("Calculated odds ratio:", odds_ratio)
'''This odds ratio represents the ratio of the odds of the outcome among exposed individuals to the odds of the outcome among unexposed individuals.
It's calculated directly from the counts of individuals in each category.
For example, if the odds ratio is 0.27, it suggests that the odds of the outcome are approximately 0.27 times lower among exposed individuals compared to unexposed individuals.'''