Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: MVP plotly-express docs #554

Merged
merged 39 commits into from
Jul 29, 2024
Merged

docs: MVP plotly-express docs #554

merged 39 commits into from
Jul 29, 2024

Conversation

alexpeters1208
Copy link
Contributor

@alexpeters1208 alexpeters1208 commented Jun 14, 2024

Minimum required docs for the plotly-express plugin. Here are the outstanding items:

  1. Fill out "other".
  2. Document ecdf once it is implemented.

Directions for testing:

As of 7/17, everything needed for testing is baked into a release. Here's a simple testing environment using pip-installed DH.

# make new dir for testing
mkdir test-dx && cd test-dx

# create env for installs
python -m venv test-dx-venv
source test-dx-venv/bin/activate

# install some necessary things for the build
pip install --upgrade pip setuptools

# install the server, need 35.1 or 34.3
pip install deephaven-server==0.35.1

# install the plugin
pip install deephaven-plugin-plotly-express

# I need to do this to get `which deephaven` to give the correct venv version, you may not
deactivate
source test-dx-venv/bin/activate

# start the server
deephaven server

@alexpeters1208 alexpeters1208 self-assigned this Jun 14, 2024
@alexpeters1208 alexpeters1208 marked this pull request as draft June 14, 2024 14:52
@alexpeters1208 alexpeters1208 marked this pull request as ready for review June 21, 2024 18:15
Copy link
Contributor

@dsmmcken dsmmcken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing huge jumps out on first read. I haven't tried running every block of code. @jnumainville should look through with sharper eyes on the code blocks.

plugins/plotly-express/docs/timeline.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/timeline.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/area.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/box.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/histogram.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/line-3d.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/multiple-axes.md Show resolved Hide resolved
plugins/plotly-express/docs/strip.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/area.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/candlestick.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/candlestick.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/histogram.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Show resolved Hide resolved
plugins/plotly-express/docs/bar.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/bar.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/box.md Show resolved Hide resolved
plugins/plotly-express/docs/candlestick.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/sub-plots.md Show resolved Hide resolved
plugins/plotly-express/docs/sunburst.md Outdated Show resolved Hide resolved
jobs = dx.data.jobs() # import the ticking jobs dataset

# the `by` argument is used to color the bars by another categorical variable
jobs_resource_tracking = dx.timeline(jobs, x_start="StartTime", x_end="EndTime", y="Job")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is identical to the prior example and is not doing what it says it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this initially. Getting the example "right" gives some pretty bad results. Putting the code in but leaving this comment open.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins/plotly-express/docs/treemap.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/line.md Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/scatter.md Show resolved Hide resolved
Copy link
Contributor

@dsmmcken dsmmcken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I hadn't mentioned this previously but one of the reasons I pushed for all our example data sets to be deterministic was for testing. This one example is not.

plugins/plotly-express/docs/density_heatmap.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/box.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/box.md Show resolved Hide resolved
plugins/plotly-express/docs/histogram.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/histogram.md Show resolved Hide resolved
plugins/plotly-express/docs/area.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/violin.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/violin.md Show resolved Hide resolved
cat_dog = stocks.where("sym in `CAT`, `DOG`")

# use `by` to specify the grouping column and order axes left to right with yaxis_sequence
line_plot_by = dx.line(cat_dog, x="timestamp", y="price", by="sym", yaxis_sequence=[1, 2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a ticket to assess the library. It is not a problem with your example.

plugins/plotly-express/docs/scatter.md Outdated Show resolved Hide resolved

### Multiple columns

When two or more response variables appear in separate columns, passing multiple column names to `x` or `y` is the recommended way to create multiple axes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this example:

import deephaven.plot.express as dx
gapminder = dx.data.gapminder() # import a ticking version of the Gapminder dataset

# get a specific country
brazil = gapminder.where("country == `Brazil`")

# specify multiple y-axis columns and order axes left to right with yaxis_sequence
line_plot_multi = dx.line(brazil, x="year", y=["pop", "gdpPercap"], yaxis_sequence=[1, 2])
line_plot_multi_2 = dx.line(brazil, x="year", y=["pop", "gdpPercap"])
image image

Passing multiple value to y is NOT how the multiple axes are created. That creates two lines. To create separate axes for the lines, you need to specify yaxis_sequence. This interaction is not clear in the prose or example. The example would be better with two plots like I have. Show that providing y=[a,b] gives two lines, and then adding yaxis_sequence puts them on different axes. As it is, the prose is incorrect and doesn't show them how to do these two common cases.

plugins/plotly-express/docs/plot-by.md Outdated Show resolved Hide resolved
plugins/plotly-express/docs/multiple-axes.md Outdated Show resolved Hide resolved
@@ -1,5 +1,146 @@
# Plot By

To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart.
To plot multiple series from a table into a single chart, use the `by` parameter. This parameter accepts a column name or a list of column names denoting other variables of interest in the dataset. The chart will be partitioned by the values in the specified column(s), with one series for each unique value. Other parameters, such as `color` (for which `by` is an alias), `symbol`, `size`, `width`, and `line_dash` can also be used to partition the chart.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be clarified that by is not simply an alias for color
it can be tweaked by using by_vars and passing in these other columns such as symbol and size
it also behaves slightly differently though, take this example

import deephaven.plot.express as dx
tips = dx.data.tips() # import the example iris data set

by_list = dx.scatter(tips, x="TotalBill", y="Tip", by=["Time", "Smoker"], by_vars=["color", "symbol"])
by_prod = dx.scatter(tips, x="TotalBill", y="Tip", by="Time", symbol="Smoker")

by_list just loops through color/symbol combos (jointly)
whereas by_prod assigns colors to specific column values, so, of the four joint values, Lunch and Dinner have the same color and Yes and No have the same symbol.

The first method is more useful to emphasize differences, whereas the second is more useful to emphasize similarities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the "for which by" is an alias" part - hopefully that cleans up the confusion. As far as using by_vars, that seems like something that should belong in the expanded version of this doc once the full thing is written, and not necessarily in the introduction. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, shouldn't be in the intro, that's fine

plugins/plotly-express/docs/plot-by.md Outdated Show resolved Hide resolved
chipkent
chipkent previously approved these changes Jul 27, 2024

Under the hood, the Deephaven query engine performs a `parition_by` table operation on the given color column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease.
Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given grouping column to create each series. This efficient implementation means that plots with multiple groups can easily scale to tens of millions or billions of rows with ease.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, partition_by should be linked, but I don't know that @dsmmcken is far enough down the new impl to worry about this yet.

@alexpeters1208 alexpeters1208 merged commit 4c556d3 into main Jul 29, 2024
14 checks passed
@alexpeters1208 alexpeters1208 deleted the dx-min-docs branch July 29, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants