timeseries / regression plot #313

ahartikainen · 2018-10-01T22:16:27Z

I think we need timeseries / regression plot.

Should it go under ppc plot?

We accept x and y
x:

ndarray (1D) = user provided numpy array
str = user provided parameter (data, posterior or posterior predictive)

y:

ndarray (1D --> (chain, draw, N) 3D) = user provided numpy array
str = user provided parameter (data, posterior or posterior predictive)

There are multiple ways to visualize uncertainty:

scatter (error-bars, violin-points)
line (area fill, percentile lines)

ahartikainen · 2018-10-05T04:43:55Z

Also, random draws from posterior are one good way to visualize the uncertainty. At least for static images.

Here was some discussion about the quantiles #2

canyon289 · 2018-10-05T05:29:24Z

This visualization from Stitch Fix is a nice example in my opinion

https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/

utkarsh-maheshwari · 2021-03-14T15:27:03Z

I believe line plots are good representation but the best representation would obviously depend on the type of data. I propose that in short term, we should focus on integrating line plots for time series analysis and later we can add more plots to the library. I would love to work on this feature.

OriolAbril · 2021-03-14T16:33:07Z

I think that the best way to begin is probably by creating a small database of regression and timeseries models, maybe take examples from ROS https://avehtari.github.io/ROS-Examples/examples.html (it whould not be too much work to port to cmdstanpy or pystan) or using https://github.com/bambinos/Bambi_resources/tree/master/ROS and then see how they could be reproduced from ArviZ and InferenceData objects. There are many things to take into account for the plots and I think it will be useful to get a better picture of what could be supported to decide what will be supported and how.

utkarsh-maheshwari · 2021-03-15T17:33:31Z

Sure.
By now, I am not very familiar with R. I'll try to re-implement some examples of bambi.

OriolAbril · 2021-03-15T18:12:01Z

You probably won't need to reimplement them, bambi already uses ArviZ, it is more than anything to get an idea of the different possibilities regarding regression and timeseries plots and to get familiar with ArviZ+xarray usage which can be quite different from ArviZ development where xarray does not play such an important role

utkarsh-maheshwari · 2021-03-16T15:00:06Z

Okay. I saw some of ROS examples too. I think it's not that hard to understand them. I am going through the examples tring to get familiar with the plots and ArviZ+xarray usage. I will keep in mind that we need to create a small database to get started.

utkarsh-maheshwari · 2021-03-16T17:53:29Z

I have gone through examples in https://github.com/bambinos/Bambi_resources/tree/master/ROS. I can see that many examples generate fake data. I think the database generated/used in these 2 examples are good for time series/regression analysis
https://github.com/bambinos/Bambi_resources/blob/master/ROS/Unemployment/unemployment.ipynb
https://github.com/bambinos/Bambi_resources/tree/master/ROS/ElectionsEconomy
We can get an idea from these databases.

utkarsh-maheshwari · 2021-03-17T15:52:14Z

What are the things we need to keep in mind while creating database.
For univariate linear regression, 2 fields( For example, date/year and unemployment) are enough to demonstrate the example. But for multi-variate regression, we need more fields. Do we need consider it? Are there other such points to be considered?

OriolAbril · 2021-03-17T17:56:55Z

Of the top of my head (I'll try to get back here and keep adding things that may come to me later) these are some of the things to consider for the design:

Predictors/predictions: we can have one or many of either of predictors and predictions, not only multiple predictors.
hierarchies: there could be group level predictions in addition to observation level ones
interpolation/extrapolation or forecasting. This is also similar to posterior predictive checks vs visualization of predictions.
info to show: spagetti plots, hdi bands, quantile bands, also related to the one above, we could generate spagetti plots or bands from posterior predictive samples (assuming the data used for fitting is on a fine enough grid) but we may also want to generate prediction lines/curves from the posterior samples instead of using posterior predictive samples.

utkarsh-maheshwari · 2021-03-17T20:14:12Z

Speaking of time series analysis, one compulsory field is date/years ( let say 100 years ). We can have single or multiple monitored variables( monitored over 100 years). These could be generated or taken from real databases. I think generating them would be better idea as then we could cross-validate the model. Do we need more fields?

OriolAbril · 2021-03-18T15:22:46Z

I don't think it matters the origin of the data, the goal is to visualize the results of the models, we don't need to check the model is correct as the visualization should work either way, after all, one of its goals it to check the models and see if they are working.

What were you thinking when you mentioned cross validation? I may be missing something. We also have another project about refitting models that would allow implementing k-fold crossvalidation, approximate leave future out... which will probably need some plots of their own, but I think this is outside of the scope of the timeseries/regression plots, I am not even sure all the points above can be covered in a single project either, you may need to select a subset of cases to support.

utkarsh-maheshwari · 2021-03-18T16:42:44Z

By cross validation, I meant, for example we generate y like this
x = np.arange(1, 21)
n = x.shape
a = .2
b = .3
sigma = .5
y = a + b*x + sigma*stats.norm().rvs(n)

Then, in the example, we'll probably find distribution of a_hat and b_hat. We can then crossvalidate with original values (that are .2 and .3 here).
Nevermind, I realize I went off the track. Sorry for that. That cross-validation probably doesn't matter.

I think better way is to just start with creating database with 2 fields and then add fields to it when required.

OriolAbril · 2021-03-18T16:53:00Z

Don't worry about going off track, I am just trying to keep the eye on the price, especially this year with the reduced coding period, it is crucial to define what is part of the project and what is not (even when useful and interesting too).

I am not sure we have the same idea in mind when thinking about database. My proposal was to have a "database" of inferencedata objects (local files is fine, on figshare if we want that to be public) so that when you are implementing the plot_regression (or plot_timeseries or whatever name is chosen eventually) you can easily go plot_regression(idata1...), plot_regression(idata2...)... and ensure that the api allows to generate all the different plots we are interested in. I also though that gathering this idata objects would be a good way to get familiar with the different possible visualizations involved in the project and thus help with the proposal and design phase.

I proposed looking into ROS because it has many examples covering a wide range of cases and already provides an implementation for all of the examples, so getting from there to inferencedata objects should be less work than trying to come up with the models/data from scratch. The bambi port is still a work in progress so I don't know how many can be taken as inferencedata "for free" from there, maybe @canyon289 can help with that. But looking at other examples/books/pakages is also perfectly fine.

utkarsh-maheshwari · 2021-03-18T19:00:55Z

My proposal was to have a "database" of inferencedata objects

Can we take some dicts/dataframes defined in ROS examples, convert them to inferencedata using az.convert_to_inference_data?

OriolAbril · 2021-03-18T21:27:45Z

Can we take some dicts/dataframes defined in ROS examples, convert them to inferencedata using az.convert_to_inference_data?

I guess so, it depends on what the data inside the dicts is, is the whole posterior stored as dict? posterior+observations?

ahartikainen · 2021-03-18T23:15:34Z

Maybe we could take data from posteriordb?

utkarsh-maheshwari · 2021-03-25T20:24:11Z

I saw posteriordb. There are lots of models. I filter out some which have "time series" in keywords. for example - https://github.com/stan-dev/posteriordb/blob/master/posterior_database/posteriors/rstan_downloads-prophet.json

I also took a quick look of prophet library developed by facebook. I think we can take an idea of time series plots from there too. Can we?

utkarsh-maheshwari · 2021-04-01T15:15:06Z

Do we need a seperate function like plot_lm int #512 to tackle regression which does not include time series analysis?

OriolAbril · 2021-11-09T00:12:27Z

I think we can close this now with plot_lm and plot_ts? @ahartikainen @canyon289

GWeindel mentioned this issue Jan 8, 2019

Adding simple, multiple and hierarchical regression plots #512

Open

ahartikainen mentioned this issue Mar 16, 2019

Working with ArviZ bambinos/bambi#132

Closed

rpgoldman mentioned this issue Sep 18, 2019

Add tests for adding posterior_predictive without trace #823

Merged

OriolAbril mentioned this issue Jan 9, 2021

Remove Plots & Diagnostics from PyMC Code Base pymc-devs/pymc#4371

Closed

canyon289 closed this as completed Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timeseries / regression plot #313

timeseries / regression plot #313

ahartikainen commented Oct 1, 2018

ahartikainen commented Oct 5, 2018

canyon289 commented Oct 5, 2018 •

edited

Loading

utkarsh-maheshwari commented Mar 14, 2021

OriolAbril commented Mar 14, 2021

utkarsh-maheshwari commented Mar 15, 2021

OriolAbril commented Mar 15, 2021

utkarsh-maheshwari commented Mar 16, 2021 •

edited

Loading

utkarsh-maheshwari commented Mar 16, 2021

utkarsh-maheshwari commented Mar 17, 2021 •

edited

Loading

OriolAbril commented Mar 17, 2021

utkarsh-maheshwari commented Mar 17, 2021

OriolAbril commented Mar 18, 2021

utkarsh-maheshwari commented Mar 18, 2021

OriolAbril commented Mar 18, 2021

utkarsh-maheshwari commented Mar 18, 2021

OriolAbril commented Mar 18, 2021

ahartikainen commented Mar 18, 2021

utkarsh-maheshwari commented Mar 25, 2021

utkarsh-maheshwari commented Apr 1, 2021 •

edited

Loading

OriolAbril commented Nov 9, 2021

timeseries / regression plot #313

timeseries / regression plot #313

Comments

ahartikainen commented Oct 1, 2018

ahartikainen commented Oct 5, 2018

canyon289 commented Oct 5, 2018 • edited Loading

utkarsh-maheshwari commented Mar 14, 2021

OriolAbril commented Mar 14, 2021

utkarsh-maheshwari commented Mar 15, 2021

OriolAbril commented Mar 15, 2021

utkarsh-maheshwari commented Mar 16, 2021 • edited Loading

utkarsh-maheshwari commented Mar 16, 2021

utkarsh-maheshwari commented Mar 17, 2021 • edited Loading

OriolAbril commented Mar 17, 2021

utkarsh-maheshwari commented Mar 17, 2021

OriolAbril commented Mar 18, 2021

utkarsh-maheshwari commented Mar 18, 2021

OriolAbril commented Mar 18, 2021

utkarsh-maheshwari commented Mar 18, 2021

OriolAbril commented Mar 18, 2021

ahartikainen commented Mar 18, 2021

utkarsh-maheshwari commented Mar 25, 2021

utkarsh-maheshwari commented Apr 1, 2021 • edited Loading

OriolAbril commented Nov 9, 2021

canyon289 commented Oct 5, 2018 •

edited

Loading

utkarsh-maheshwari commented Mar 16, 2021 •

edited

Loading

utkarsh-maheshwari commented Mar 17, 2021 •

edited

Loading

utkarsh-maheshwari commented Apr 1, 2021 •

edited

Loading