Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating difference in ARPU (using knowledges of chapter 2) #31

Open
angrydozer opened this issue Jul 14, 2024 · 1 comment
Open

Estimating difference in ARPU (using knowledges of chapter 2) #31

angrydozer opened this issue Jul 14, 2024 · 1 comment

Comments

@angrydozer
Copy link

People, hey!

I've already read first 2 chapters of the book (BAP3) – it's super useful.

After that I decided to experiment with models and my goal is to estimate difference between ARPU for different coutnries.
I started with code below, but then was stucked because sampling countinues infinite amount of time.
If someone can help me with an advice 🤕 regarding this model, explain the problem, I will really appreciate it.

All data stored in pandas dataframe called data where rows = users. It contain following columns:
player_id – unique user id
country_code – US or JP
revenue_7 – cumulative revenue to 7th day of user's life
is_payer – 0 or 1 computed from revenue_7, depending on revenue amount (zero or more than zero)

Currently my model looks like that:

country = np.array(['US', 'JP'])
country_idx = pd.Categorical(data['country_code'], categories=country).codes
coords = {'country': country, 'country_flat': country[idx]}

# ChatGPT suggested me to use that for ignoring of zero values, it allowed me to use Gamma dist
revenue_observed = np.where(data['is_payer'] == 1, data['revenue_7'], np.nan)

with pm.Model(coords=coords) as model:
    
    p = pm.Beta('p', alpha=1, beta=1, dims='country')
    y = pm.Bernoulli('y', p=p[country_idx], observed=data['is_payer'])

    mu = pm.HalfNormal('mu', sigma=10, dims='country')
    sigma = pm.HalfNormal('sigma', sigma=15, dims='country')
    revenue = pm.Gamma('revenue', mu=mu[country_idx], sigma=sigma[country_idx], observed=revenue_observed)

    arpu = pm.Deterministic('arpu', p * mu, dims='country')
    
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))
@aloctavodia
Copy link
Owner

Can you share the data? Or something that looks like your data?

Maybe instead of a gamma, you want to use a HurdleGamma? I don't mind you or others asking general modelling questions here, but if your questions are not directly book-related you can post it on PyMC's discord. You will get more people looking at your questions and potentially more answers there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants