Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison of output (impact$series$cum.effect) in Python and R packages #60

Open
rj678 opened this issue Sep 11, 2022 · 2 comments
Open

Comments

@rj678
Copy link

rj678 commented Sep 11, 2022

thanks for the great effort in keeping this library updated.

I'm working on converting an R library to Python, and the R library has the following line of code:

preperiod <- subset(impact$series, cum.effect == 0)

where impact is the output object of the CausalImpact library.

From what I can tell:

impact$series$cum.effect in R is computed in impact.inferences.post_cum_effects_means in python.

I used the comparison example that you have provided in the README (with comparison_data.csv), but I'm getting different output. From the R library, the values of impact$series$cum.effect start with zero in the earlier dates, whereas it is NaN in the Python package, the values for the later dates differ as well.

I'd greatly appreciate some feedback on comparing the output so I can covert the following line of code to Python appropriately:

preperiod <- subset(impact$series, cum.effect == 0)

I tried both methods: hmc and vi, and the output of the other columns in impact$series is different from impact.inferences in python as well.

thank you and looking forward to hearing back from you

@WillianFuks
Copy link
Owner

Hi @rj678 ,

The preperiod as given by your assignment would be computed in Python by something like:

preperiod = ci.inferences['post_cum_effects_means'][ci.inferences['post_cum_effects_means'].isna()]

Which essentially retrieves completed predictions of training data. In R package the empty values were assigned as "zeroes" whereas in Python, as they don't exist, remained as NaN.

Notice also that if you want to work with pre_period data it's also available in the ci object in ci.pre_data or ci.normed_pre_data (the latter is same data but with normalization applied).

As for varying results, did the results you observed differ too much from the official README report? I just ran it here and had very close results — using hmc method. They will never be the same as the algorithm behind is not deterministic but they should always converge to the same conclusions and be very close for the most part.

Results are expected to change from the original R package as well but again they should lead to same conclusions and be similar overall. The cumulative field will differ more as it sums up all estimated points in post period.

Let me know if this helps you,

Best,

Will

@rj678
Copy link
Author

rj678 commented Sep 11, 2022

thanks so much for confirming that the empty values are zero in in R, and NaN in Python - from what I remember, the difference between the non-zero values in impact$series$cum.effect and ci.inferences['post_cum_effects_means'] was not insignificant - I'll check again and get back, thanks so much for the detailed response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants