Change examples to use coords where appropriate #14

twiecki · 2020-12-27T10:45:03Z

coords is a very useful feature that not enough people know about. Our docs should establish best practices so using it in our NBs is an important step.

The text was updated successfully, but these errors were encountered:

AlexAndorra · 2020-12-27T12:10:06Z

+1 💯
This is also a good way to introduce more people to the wonderful InferenceData backend, which will be returned by default by pm.sample in the next version of PyMC3 👌

twiecki · 2021-01-04T09:24:14Z

@CloudChaoszero This is getting a lot but I wonder if at least with the changing of the examples in #16 we should at least have them use InferenceData (using coords would be the cherry).

CloudChaoszero · 2021-01-04T09:57:26Z

@CloudChaoszero This is getting a lot but I wonder if at least with the changing of the examples in #16 we should at least have them use InferenceData (using coords would be the cherry).

Sounds great @twiecki! I saw this earlier, but was unsure what coords was (i skimmed Alex's link lol).

Is that another term (or method of said class) InferenceData?

twiecki · 2021-01-04T10:25:14Z

@CloudChaoszero No, it's a way to specify dimensions in a model. For example, for Radon, instead of just specifying shape=len(counties), you could specify dims='counties' and then also have the names of the counties show up in the InferenceData object, rather than just the numbers, which also makes plotting easier.

I do have to warn you though, this would require you to level up on xarray and arviz and would be quite an important but also time-intensive refactor. You can just take it one NB at a time and stop when you're bored, every little bit helps, but just wanted to mention that. Alternatively, just changing the plotting and stats functions to arviz alone would already be a great contribution, and probably more important in terms of time-line.

AlexAndorra · 2021-01-04T10:26:50Z

Together with dims, this is what allows more intuitive multi-dimensional data indexing with InferenceData (instead of plain numpy-array indexing, which is painful for humans for multi-dim data, as I'm sure you already noticed 😅 ).

You can see that in the link I posted above: the posterior data has 3 dims, chain, draw and school, which respectively contain 4, 500 and 8 coords. For chain, those coordinates are 0, 1, 2 and 3 (because no need to name the chains), but for school they have informative labels: 'Choate', 'Deerfield', 'Phillips Andover', 'Phillips Exeter', ...
This allows you to refer to them by name now, instead of only shape numbers, which is very helpful when you have a lot of dimensions!

There are lots of examples in this NB for instance. Hope this helps 🖖

CloudChaoszero · 2021-01-04T10:44:12Z

Ah, sounds great @twiecki & @AlexAndorra! The combination of learning arviz & xarray is aligned to what I am trying to learn within the PyMC3 space. ( thanks y'all for the resources).

I can have this refactor be implemented after the plots & diagnostics PR.
Looking forward to work on this, and any additional comments or questions

CloudChaoszero · 2021-01-21T09:53:44Z

Small update:

I will eventually get to this either this or next weekend haha

(Recently completed a couple of PRs from repos pymc3 & pymc-example)

CloudChaoszero · 2021-03-21T05:42:59Z

I'm closing in on completing the PRs regarding pm to arviz plot replacements. 🙌

I'll start working on this issue for the next few weeks.

OriolAbril · 2021-03-21T06:21:58Z

This is great @CloudChaoszero!

I have a favour to ask about the approach to submit PRs, I hope you don't mind. We are offering an outreachy internship project about updating all the notebooks, not only to replace outdated and deprecated behaviour, use ArviZ+xarray, make sure they all show best practices but to move them to pymc3 v4 which we are planning to release soon. The internship (if this is eventually the chosen project) will take place during summer, but the application process will start in 1-2 weeks. We therefore expect applicants to start contributing and getting familiar with the repo as part of the application process, it's hard to try to guess how many people will get involved, but we should be prepared to to accomodate several people working and submitting PRs at the same time. Thus, I would suggest working on few notebooks and make many changes to them instead of applying a small subset of changes to all the notebooks. jupyter notebooks don't really play well with git history and I think doing this will minimize merge conflicts due to multiple people working on the same notebook.

CloudChaoszero · 2021-03-21T06:30:08Z

This is great @CloudChaoszero!

I have a favour to ask about the approach to submit PRs, I hope you don't mind. We are offering an outreachy internship project about updating all the notebooks, not only to replace outdated and deprecated behaviour, use ArviZ+xarray, make sure they all show best practices but to move them to pymc3 v4 which we are planning to release soon. The internship (if this is eventually the chosen project) will take place during summer, but the application process will start in 1-2 weeks. We therefore expect applicants to start contributing and getting familiar with the repo as part of the application process, it's hard to try to guess how many people will get involved, but we should be prepared to to accomodate several people working and submitting PRs at the same time. Thus, I would suggest working on few notebooks and make many changes to them instead of applying a small subset of changes to all the notebooks. jupyter notebooks don't really play well with git history and I think doing this will minimize merge conflicts due to multiple people working on the same notebook.

@OriolAbril Oh no worries! Sounds good, and glad to see Pymc is providing internships to individuals from diverse backgrounds.

OriolAbril · 2021-03-21T06:37:52Z

Thanks! 😄 and thanks for all the updates so far, they will really help in focusing the internship on pymc3 v3->v4 updates and showcase all the improvements that will come with this next version

CloudChaoszero · 2021-04-22T08:10:08Z

Small update: I should come back to (incrementally) making the changes in the notebooks @OriolAbril. I had some bandwidth constraints recently haha

Move bart from PyMC

twiecki added good first issue Good for newcomers help wanted Extra attention is needed labels Dec 27, 2020

CloudChaoszero mentioned this issue Mar 21, 2021

Update GLM hierarchical binomial model to best practices #45

Merged

MarcoGorelli mentioned this issue Apr 24, 2021

Use coords / xarray magic in dependent density regression #146

Merged

CloudChaoszero mentioned this issue Apr 25, 2021

Update GLM hierarchical & Robust Outlier Detection to best practices #147

Merged

OriolAbril closed this as completed Aug 30, 2021

twiecki pushed a commit that referenced this issue Jan 17, 2023

Merge pull request #14 from pymc-devs/bart

697f8dc

Move bart from PyMC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change examples to use coords where appropriate #14

Change examples to use coords where appropriate #14

twiecki commented Dec 27, 2020

AlexAndorra commented Dec 27, 2020

twiecki commented Jan 4, 2021

CloudChaoszero commented Jan 4, 2021

twiecki commented Jan 4, 2021

AlexAndorra commented Jan 4, 2021

CloudChaoszero commented Jan 4, 2021

CloudChaoszero commented Jan 21, 2021

CloudChaoszero commented Mar 21, 2021

OriolAbril commented Mar 21, 2021

CloudChaoszero commented Mar 21, 2021

OriolAbril commented Mar 21, 2021

CloudChaoszero commented Apr 22, 2021

Change examples to use coords where appropriate #14

Change examples to use coords where appropriate #14

Comments

twiecki commented Dec 27, 2020

AlexAndorra commented Dec 27, 2020

twiecki commented Jan 4, 2021

CloudChaoszero commented Jan 4, 2021

twiecki commented Jan 4, 2021

AlexAndorra commented Jan 4, 2021

CloudChaoszero commented Jan 4, 2021

CloudChaoszero commented Jan 21, 2021

CloudChaoszero commented Mar 21, 2021

OriolAbril commented Mar 21, 2021

CloudChaoszero commented Mar 21, 2021

OriolAbril commented Mar 21, 2021

CloudChaoszero commented Apr 22, 2021