-
-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOAL] Support viewing of medium, multi-channel timeseries data #6058
Comments
After talking with @philippjfr, the updated task is to first try to utilize tsdownsample directly (if it is available). Then we'll have LTTB and MinMaxLTTB available to us and we can check if this is sufficiently performant for our use cases. If not, we can then explore other options with Datashader. |
Note that the precise downsampling implementation doesn't seem to matter much at all because most of the time is dominated by the slicing step, i.e. selecting the data within the viewport. |
Having played with it some more I think the only way to support this workflow better is to add in an optimization for wide dataframes. Specifically if you create an NdOverlay of Curve elements from a DataFrame with columns A, B, C we need to make sure that all three Curve elements share the same underlying DataFrame, and the downsample operation should detect that, slice the DataFrame based on the current viewport and then apply the downsampling to that pre-sliced data. This will massively speed up downsampling for large numbers of traces. |
This is probably a pre-requisite to get the above mentioned workflows working well: #6061 |
Okay just to capture what I think needs to happen to support this workflow well. Currently the cost of the operation can be broken down into 1. Shared Data SlicingThe This optimization itself is relatively easy to achieve and I'll add it to the existing PR adding optional import pandas as pd
import hvplot.pandas
df = pd._testing.makeDataFrame().reset_index(drop=True)
df.hvplot.line(downsample=True) The problem here is that internally hvPlot is generating copies of the DataFrames where it renames each column in turn from it's original name (here that's 2. Pandas Index SlicingSlicing on a Pandas index is (significantly) faster than slicing on a column, therefore we should allow HoloViews to operate directly on a DataFrame with an index (instead of dropping the index as we do now). This work was started in #6061. This is likely the highest effort but also has the largest benefits beyond this particular workflow. 3. Optimizing the downsamplingOnce we have done 1 (and 2) the cost of the operation will be dominated by the |
The remaining task has to do with hvPlot, so I'll close this as the HoloViews aspects are largely completed |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Is your feature request related to a problem? Please describe.
Target use cases of stacked timeseries commonly utilize a significant number of lines and samples which requires aggregation or downsampling in order to send to the browser. Currently, due to performance limitations, the standard HoloViews+Bokeh approach to this visualization with
subcoordinate_y
is only useable with a small part of the typical data size range.Describe the solution you'd like
Let's shoot to make it not only possible, but smooth/performant to visualize and interact with:
Medium Size Dataset: too big for browser but fits in memory. For instance, a data size target of 100 stacked traces, each with 1,000 (16 bit) samples per second for 10,000 seconds. That's one billion samples and about 2 GB.
Task List (Updated):
Medium Size
Notes for Medium Size:
LTTB
doesn't scale super-well when moving to really large datasets, so when dealing with more than 1 million samples, you might consider using [MinMaxLTTB
][aggregation.aggregators.MinMaxLTTB]"or we could try to get datashader to play nicely withsubcoordinate_y
while retaining all the niceties of standard Bokeh interactivity. If we go with Datashader, this would likely entail passing the scale and offset for each trace into the pre-Datashader rendering pipeline step.The text was updated successfully, but these errors were encountered: