Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user guide for working with large time-series datasets #1302

Merged
merged 11 commits into from
Apr 12, 2024
Merged

Conversation

droumis
Copy link
Member

@droumis droumis commented Apr 8, 2024

supersedes #1205

This adds a notebook that explains the different ways of working with large time-series datasets with holoviz

@droumis droumis mentioned this pull request Apr 8, 2024
@droumis droumis requested a review from maximlt April 9, 2024 16:40
@droumis
Copy link
Member Author

droumis commented Apr 9, 2024

Not sure why this page isn't showing up in the toc on the dev site

image

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a few changes, please let me know if you disagree with some of them, they can always be reverted.

My main comment, which applies to pretty much every user guide that attempts to demonstrate Datashader, is that the file size ends up being quite large, close to 30MB in this case. If you don't have a super fast connection, you can see the page being loaded slowly (this one takes so long for me https://holoviews.org/user_guide/Large_Data.html!), and there's also some time spent rendering the page and its plots that have many data points. At the same time, most of these plots would deserve to be inspected with a live Python kernel to see the full benefit of the applied approach (LTTB, datashader).
Instead of displaying the real Bokeh plots, wouldn't it be better if we displayed pretty images or GIFs? (I don't want to block this PR if we think that's what we should do, that could be done in a 2nd iteration).

.gitignore Show resolved Hide resolved
@@ -5,6 +5,8 @@
"id": "artificial-english",
"metadata": {},
"source": [
"# Large Timeseries Data\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new set up the notebooks need to have a title.

doc/user_guide/Large_Timeseries.ipynb Show resolved Hide resolved
@@ -166,22 +173,6 @@
"This makes LTTB an ideal default method for exploring timeseries datasets, particularly when the dataset size is unknown or too large for standard WebGL rendering."
]
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really for documenting unreleased features. But something we could do would be to document this as if HoloViews 1.19.0 was already released. In that case, we should also update the code so it's already able to accept and pass the new values. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I included a note that these options are available starting in HoloViews 1.19.0. @hoxbro please be sure to include the tsdownsample PR in the next minor release. I don't think anything further needs to be updated in hvPlot code.

### Enhanced Downsampling Options

Starting in HoloViews version 1.19.0, integration with the [tsdownsample](https://github.com/predict-idlab/tsdownsample) library introduces enhanced downsampling functionality with the following methods, which will be accepted as inputs to `downsample` in hvPlot:

- **lttb**: Implements the Largest Triangle Three Buckets ([LTTB](https://github.com/predict-idlab/tsdownsample?tab=readme-ov-file#:~:text=performs%20the-,Largest%20Triangle%20Three%20Buckets,-algorithm)) algorithm, optimizing the selection of points to retain the visual shape of the data.
- **minmax**: For each segment of the data, this method retains the minimum and maximum values, ensuring that peaks and troughs are preserved.
- **minmax-lttb**: A hybrid approach that combines the minmax strategy with LTTB.
- **m4**: A [multi-step process](https://www.vldb.org/pvldb/vol7/p797-jugel.pdf) that leverages the min, max, first, and last values for each time segment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything further needs to be updated in hvPlot code.

Not quite right I think, downsample can just be a boolean to use downsample1d, I don't think there is a way to customize how it's called from hvPlot at the moment (or if there is, it's not obvious and needs to be documented). I'll open an issue, but since multiple algorithms are documented here, it should be implemented before the release.

doc/user_guide/Large_Timeseries.ipynb Outdated Show resolved Hide resolved
doc/user_guide/Large_Timeseries.ipynb Show resolved Hide resolved
@maximlt
Copy link
Member

maximlt commented Apr 10, 2024

Not sure why this page isn't showing up in the toc on the dev site

It's now showing up.

@maximlt maximlt added this to the 0.9.3 milestone Apr 12, 2024
@droumis
Copy link
Member Author

droumis commented Apr 12, 2024

My main comment, which applies to pretty much every user guide that attempts to demonstrate Datashader, is that the file size ends up being quite large, close to 30MB in this case. If you don't have a super fast connection, you can see the page being loaded slowly (this one takes so long for me https://holoviews.org/user_guide/Large_Data.html!), and there's also some time spent rendering the page and its plots that have many data points. At the same time, most of these plots would deserve to be inspected with a live Python kernel to see the full benefit of the applied approach (LTTB, datashader).
Instead of displaying the real Bokeh plots, wouldn't it be better if we displayed pretty images or GIFs? (I don't want to block this PR if we think that's what we should do, that could be done in a 2nd iteration).

We decided in a meeting to not block this PR.. it would be nice in the future to have some mechanism to optionally facilitate images in docs rather than big plots

@droumis
Copy link
Member Author

droumis commented Apr 12, 2024

ready for merge?

@maximlt
Copy link
Member

maximlt commented Apr 12, 2024

We decided in a meeting to not block this PR.. it would be nice in the future to have some mechanism to optionally facilitate images in docs rather than big plots

Yep, this sort of this is falling in our ever-growing bucket of nice-to-have-but-will-likely-never-happen! By the way, the Large Data guide in HoloViews does have a couple of GIFs, so adding GIFs to show what Datashader is capable of isn't unprecedented. I still think that would be the best thing to do to both make the page faster to load and better demonstrate hvPlot with Datashader, but as I don't really want to do it now, I'll just put that in the bucket too!

ready for merge?

Yes!

@maximlt maximlt merged commit d8c9500 into main Apr 12, 2024
10 of 11 checks passed
@maximlt maximlt deleted the large-ts branch April 12, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants