Add user guide for working with large time-series datasets #1302

droumis · 2024-04-08T18:28:48Z

supersedes #1205

This adds a notebook that explains the different ways of working with large time-series datasets with holoviz

droumis · 2024-04-09T22:01:51Z

Not sure why this page isn't showing up in the toc on the dev site

maximlt

I've pushed a few changes, please let me know if you disagree with some of them, they can always be reverted.

My main comment, which applies to pretty much every user guide that attempts to demonstrate Datashader, is that the file size ends up being quite large, close to 30MB in this case. If you don't have a super fast connection, you can see the page being loaded slowly (this one takes so long for me https://holoviews.org/user_guide/Large_Data.html!), and there's also some time spent rendering the page and its plots that have many data points. At the same time, most of these plots would deserve to be inspected with a live Python kernel to see the full benefit of the applied approach (LTTB, datashader).
Instead of displaying the real Bokeh plots, wouldn't it be better if we displayed pretty images or GIFs? (I don't want to block this PR if we think that's what we should do, that could be done in a 2nd iteration).

.gitignore

maximlt · 2024-04-09T21:58:45Z

doc/user_guide/Large_Timeseries.ipynb

@@ -5,6 +5,8 @@
   "id": "artificial-english",
   "metadata": {},
   "source": [
+    "# Large Timeseries Data\n",


With the new set up the notebooks need to have a title.

doc/user_guide/Large_Timeseries.ipynb

maximlt · 2024-04-09T22:04:56Z

doc/user_guide/Large_Timeseries.ipynb

@@ -166,22 +173,6 @@
    "This makes LTTB an ideal default method for exploring timeseries datasets, particularly when the dataset size is unknown or too large for standard WebGL rendering."
   ]
  },
-  {


I'm not really for documenting unreleased features. But something we could do would be to document this as if HoloViews 1.19.0 was already released. In that case, we should also update the code so it's already able to accept and pass the new values. What do you think?

Ok, I included a note that these options are available starting in HoloViews 1.19.0. @hoxbro please be sure to include the tsdownsample PR in the next minor release. I don't think anything further needs to be updated in hvPlot code.

### Enhanced Downsampling Options Starting in HoloViews version 1.19.0, integration with the [tsdownsample](https://github.com/predict-idlab/tsdownsample) library introduces enhanced downsampling functionality with the following methods, which will be accepted as inputs to `downsample` in hvPlot: - **lttb**: Implements the Largest Triangle Three Buckets ([LTTB](https://github.com/predict-idlab/tsdownsample?tab=readme-ov-file#:~:text=performs%20the-,Largest%20Triangle%20Three%20Buckets,-algorithm)) algorithm, optimizing the selection of points to retain the visual shape of the data. - **minmax**: For each segment of the data, this method retains the minimum and maximum values, ensuring that peaks and troughs are preserved. - **minmax-lttb**: A hybrid approach that combines the minmax strategy with LTTB. - **m4**: A [multi-step process](https://www.vldb.org/pvldb/vol7/p797-jugel.pdf) that leverages the min, max, first, and last values for each time segment.

I don't think anything further needs to be updated in hvPlot code.

Not quite right I think, downsample can just be a boolean to use downsample1d, I don't think there is a way to customize how it's called from hvPlot at the moment (or if there is, it's not obvious and needs to be documented). I'll open an issue, but since multiple algorithms are documented here, it should be implemented before the release.

doc/user_guide/Large_Timeseries.ipynb

maximlt · 2024-04-10T10:59:52Z

Not sure why this page isn't showing up in the toc on the dev site

It's now showing up.

droumis · 2024-04-12T16:23:03Z

My main comment, which applies to pretty much every user guide that attempts to demonstrate Datashader, is that the file size ends up being quite large, close to 30MB in this case. If you don't have a super fast connection, you can see the page being loaded slowly (this one takes so long for me https://holoviews.org/user_guide/Large_Data.html!), and there's also some time spent rendering the page and its plots that have many data points. At the same time, most of these plots would deserve to be inspected with a live Python kernel to see the full benefit of the applied approach (LTTB, datashader).
Instead of displaying the real Bokeh plots, wouldn't it be better if we displayed pretty images or GIFs? (I don't want to block this PR if we think that's what we should do, that could be done in a 2nd iteration).

We decided in a meeting to not block this PR.. it would be nice in the future to have some mechanism to optionally facilitate images in docs rather than big plots

droumis · 2024-04-12T17:55:26Z

ready for merge?

maximlt · 2024-04-12T20:30:38Z

We decided in a meeting to not block this PR.. it would be nice in the future to have some mechanism to optionally facilitate images in docs rather than big plots

Yep, this sort of this is falling in our ever-growing bucket of nice-to-have-but-will-likely-never-happen! By the way, the Large Data guide in HoloViews does have a couple of GIFs, so adding GIFs to show what Datashader is capable of isn't unprecedented. I still think that would be the best thing to do to both make the page faster to load and better demonstrate hvPlot with Datashader, but as I don't really want to do it now, I'll just put that in the bucket too!

ready for merge?

Yes!

droumis added 4 commits April 8, 2024 10:10

allow versioning for nbs in docs since examples is depr

b78bc9f

update user guide index

fb7e9c0

update normal timeseries nb to link to large nb

e3e32ed

Large Timeseries nb with section describing tsdownsampler options

3045b17

droumis mentioned this pull request Apr 8, 2024

Large timeseries #1205

Closed

droumis added 4 commits April 8, 2024 11:32

remove extra cell

303d939

minor

7347c9a

cleanup

0b78820

fix index

fc8e60d

droumis requested a review from maximlt April 9, 2024 16:40

minor changes

0fa69e6

maximlt reviewed Apr 9, 2024

View reviewed changes

oops, re-enable notebook execution

f0951f1

maximlt added this to the 0.9.3 milestone Apr 12, 2024

Add back lttb options and other minor things

ea3cb4f

maximlt merged commit d8c9500 into main Apr 12, 2024
10 of 11 checks passed

maximlt deleted the large-ts branch April 12, 2024 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add user guide for working with large time-series datasets #1302

Add user guide for working with large time-series datasets #1302

droumis commented Apr 8, 2024

droumis commented Apr 9, 2024

maximlt left a comment •

edited

Loading

maximlt Apr 9, 2024

maximlt Apr 9, 2024

droumis Apr 12, 2024

maximlt Apr 12, 2024

maximlt commented Apr 10, 2024

droumis commented Apr 12, 2024

droumis commented Apr 12, 2024

maximlt commented Apr 12, 2024

Add user guide for working with large time-series datasets #1302

Add user guide for working with large time-series datasets #1302

Conversation

droumis commented Apr 8, 2024

droumis commented Apr 9, 2024

maximlt left a comment • edited Loading

Choose a reason for hiding this comment

maximlt Apr 9, 2024

Choose a reason for hiding this comment

maximlt Apr 9, 2024

Choose a reason for hiding this comment

droumis Apr 12, 2024

Choose a reason for hiding this comment

maximlt Apr 12, 2024

Choose a reason for hiding this comment

maximlt commented Apr 10, 2024

droumis commented Apr 12, 2024

droumis commented Apr 12, 2024

maximlt commented Apr 12, 2024

maximlt left a comment •

edited

Loading