Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change examples to use coords where appropriate #14

Closed
twiecki opened this issue Dec 27, 2020 · 12 comments
Closed

Change examples to use coords where appropriate #14

twiecki opened this issue Dec 27, 2020 · 12 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@twiecki
Copy link
Member

twiecki commented Dec 27, 2020

coords is a very useful feature that not enough people know about. Our docs should establish best practices so using it in our NBs is an important step.

@twiecki twiecki added good first issue Good for newcomers help wanted Extra attention is needed labels Dec 27, 2020
@AlexAndorra
Copy link
Collaborator

+1 💯
This is also a good way to introduce more people to the wonderful InferenceData backend, which will be returned by default by pm.sample in the next version of PyMC3 👌

@twiecki
Copy link
Member Author

twiecki commented Jan 4, 2021

@CloudChaoszero This is getting a lot but I wonder if at least with the changing of the examples in #16 we should at least have them use InferenceData (using coords would be the cherry).

@CloudChaoszero
Copy link
Contributor

@CloudChaoszero This is getting a lot but I wonder if at least with the changing of the examples in #16 we should at least have them use InferenceData (using coords would be the cherry).

Sounds great @twiecki! I saw this earlier, but was unsure what coords was (i skimmed Alex's link lol).

Is that another term (or method of said class) InferenceData?

@twiecki
Copy link
Member Author

twiecki commented Jan 4, 2021

@CloudChaoszero No, it's a way to specify dimensions in a model. For example, for Radon, instead of just specifying shape=len(counties), you could specify dims='counties' and then also have the names of the counties show up in the InferenceData object, rather than just the numbers, which also makes plotting easier.

I do have to warn you though, this would require you to level up on xarray and arviz and would be quite an important but also time-intensive refactor. You can just take it one NB at a time and stop when you're bored, every little bit helps, but just wanted to mention that. Alternatively, just changing the plotting and stats functions to arviz alone would already be a great contribution, and probably more important in terms of time-line.

@AlexAndorra
Copy link
Collaborator

Together with dims, this is what allows more intuitive multi-dimensional data indexing with InferenceData (instead of plain numpy-array indexing, which is painful for humans for multi-dim data, as I'm sure you already noticed 😅 ).

You can see that in the link I posted above: the posterior data has 3 dims, chain, draw and school, which respectively contain 4, 500 and 8 coords. For chain, those coordinates are 0, 1, 2 and 3 (because no need to name the chains), but for school they have informative labels: 'Choate', 'Deerfield', 'Phillips Andover', 'Phillips Exeter', ...
This allows you to refer to them by name now, instead of only shape numbers, which is very helpful when you have a lot of dimensions!

There are lots of examples in this NB for instance. Hope this helps 🖖

@CloudChaoszero
Copy link
Contributor

Ah, sounds great @twiecki & @AlexAndorra! The combination of learning arviz & xarray is aligned to what I am trying to learn within the PyMC3 space. ( thanks y'all for the resources).

I can have this refactor be implemented after the plots & diagnostics PR.
Looking forward to work on this, and any additional comments or questions

@CloudChaoszero
Copy link
Contributor

Small update:

  • I will eventually get to this either this or next weekend haha

(Recently completed a couple of PRs from repos pymc3 & pymc-example)

@CloudChaoszero
Copy link
Contributor

I'm closing in on completing the PRs regarding pm to arviz plot replacements. 🙌

I'll start working on this issue for the next few weeks.

@OriolAbril
Copy link
Member

This is great @CloudChaoszero!

I have a favour to ask about the approach to submit PRs, I hope you don't mind. We are offering an outreachy internship project about updating all the notebooks, not only to replace outdated and deprecated behaviour, use ArviZ+xarray, make sure they all show best practices but to move them to pymc3 v4 which we are planning to release soon. The internship (if this is eventually the chosen project) will take place during summer, but the application process will start in 1-2 weeks. We therefore expect applicants to start contributing and getting familiar with the repo as part of the application process, it's hard to try to guess how many people will get involved, but we should be prepared to to accomodate several people working and submitting PRs at the same time. Thus, I would suggest working on few notebooks and make many changes to them instead of applying a small subset of changes to all the notebooks. jupyter notebooks don't really play well with git history and I think doing this will minimize merge conflicts due to multiple people working on the same notebook.

@CloudChaoszero
Copy link
Contributor

This is great @CloudChaoszero!

I have a favour to ask about the approach to submit PRs, I hope you don't mind. We are offering an outreachy internship project about updating all the notebooks, not only to replace outdated and deprecated behaviour, use ArviZ+xarray, make sure they all show best practices but to move them to pymc3 v4 which we are planning to release soon. The internship (if this is eventually the chosen project) will take place during summer, but the application process will start in 1-2 weeks. We therefore expect applicants to start contributing and getting familiar with the repo as part of the application process, it's hard to try to guess how many people will get involved, but we should be prepared to to accomodate several people working and submitting PRs at the same time. Thus, I would suggest working on few notebooks and make many changes to them instead of applying a small subset of changes to all the notebooks. jupyter notebooks don't really play well with git history and I think doing this will minimize merge conflicts due to multiple people working on the same notebook.

@OriolAbril Oh no worries! Sounds good, and glad to see Pymc is providing internships to individuals from diverse backgrounds.

@OriolAbril
Copy link
Member

Thanks! 😄 and thanks for all the updates so far, they will really help in focusing the internship on pymc3 v3->v4 updates and showcase all the improvements that will come with this next version

@CloudChaoszero
Copy link
Contributor

Small update: I should come back to (incrementally) making the changes in the notebooks @OriolAbril. I had some bandwidth constraints recently haha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants