You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've already read first 2 chapters of the book (BAP3) – it's super useful.
After that I decided to experiment with models and my goal is to estimate difference between ARPU for different coutnries.
I started with code below, but then was stucked because sampling countinues infinite amount of time. If someone can help me with an advice 🤕 regarding this model, explain the problem, I will really appreciate it.
All data stored in pandas dataframe called data where rows = users. It contain following columns: player_id – unique user id country_code – US or JP revenue_7 – cumulative revenue to 7th day of user's life is_payer – 0 or 1 computed from revenue_7, depending on revenue amount (zero or more than zero)
Currently my model looks like that:
country = np.array(['US', 'JP'])
country_idx = pd.Categorical(data['country_code'], categories=country).codes
coords = {'country': country, 'country_flat': country[idx]}
# ChatGPT suggested me to use that for ignoring of zero values, it allowed me to use Gamma dist
revenue_observed = np.where(data['is_payer'] == 1, data['revenue_7'], np.nan)
with pm.Model(coords=coords) as model:
p = pm.Beta('p', alpha=1, beta=1, dims='country')
y = pm.Bernoulli('y', p=p[country_idx], observed=data['is_payer'])
mu = pm.HalfNormal('mu', sigma=10, dims='country')
sigma = pm.HalfNormal('sigma', sigma=15, dims='country')
revenue = pm.Gamma('revenue', mu=mu[country_idx], sigma=sigma[country_idx], observed=revenue_observed)
arpu = pm.Deterministic('arpu', p * mu, dims='country')
idata = pm.sample()
idata.extend(pm.sample_posterior_predictive(idata))
The text was updated successfully, but these errors were encountered:
Can you share the data? Or something that looks like your data?
Maybe instead of a gamma, you want to use a HurdleGamma? I don't mind you or others asking general modelling questions here, but if your questions are not directly book-related you can post it on PyMC's discord. You will get more people looking at your questions and potentially more answers there.
People, hey!
I've already read first 2 chapters of the book (BAP3) – it's super useful.
After that I decided to experiment with models and my goal is to estimate difference between ARPU for different coutnries.
I started with code below, but then was stucked because sampling countinues infinite amount of time.
If someone can help me with an advice 🤕 regarding this model, explain the problem, I will really appreciate it.
All data stored in pandas dataframe called
data
where rows = users. It contain following columns:player_id – unique user id
country_code – US or JP
revenue_7 – cumulative revenue to 7th day of user's life
is_payer – 0 or 1 computed from revenue_7, depending on revenue amount (zero or more than zero)
Currently my model looks like that:
The text was updated successfully, but these errors were encountered: