-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# SHA1:623feb0dd2b6bd376238ecf75069bc82136c2d70 | ||
# SHA1:78fe89f88adf34ac75513d363d7d9d0b5cc8cd1c | ||
# | ||
# This file is autogenerated by pip-compile-multi | ||
# To update, run: | ||
|
@@ -12,16 +12,26 @@ | |
# -r requirements/base.in | ||
# -r requirements/development.in | ||
# -r requirements/testing.in | ||
cmdstanpy==1.1.0 | ||
# via prophet | ||
contourpy==1.0.7 | ||
# via matplotlib | ||
coverage[toml]==7.2.5 | ||
# via pytest-cov | ||
cycler==0.11.0 | ||
# via matplotlib | ||
db-dtypes==1.1.1 | ||
# via pandas-gbq | ||
docker==6.1.1 | ||
# via -r requirements/testing.in | ||
ephem==4.1.4 | ||
# via lunarcalendar | ||
exceptiongroup==1.1.1 | ||
# via pytest | ||
flask-testing==0.8.1 | ||
# via -r requirements/testing.in | ||
fonttools==4.39.4 | ||
# via matplotlib | ||
freezegun==1.2.2 | ||
# via -r requirements/testing.in | ||
google-api-core[grpc]==2.11.0 | ||
|
@@ -73,6 +83,12 @@ iniconfig==2.0.0 | |
# via pytest | ||
jsonschema-spec==0.1.4 | ||
# via openapi-spec-validator | ||
kiwisolver==1.4.4 | ||
# via matplotlib | ||
lunarcalendar==0.0.9 | ||
# via prophet | ||
matplotlib==3.7.1 | ||
# via prophet | ||
oauthlib==3.2.2 | ||
# via requests-oauthlib | ||
openapi-schema-validator==0.4.4 | ||
|
@@ -85,6 +101,8 @@ parameterized==0.9.0 | |
# via -r requirements/testing.in | ||
pathable==0.4.3 | ||
# via jsonschema-spec | ||
prophet==1.1.3 | ||
# via apache-superset | ||
proto-plus==1.22.2 | ||
# via | ||
# google-cloud-bigquery | ||
|
@@ -107,8 +125,6 @@ pydata-google-auth==1.7.0 | |
# via pandas-gbq | ||
pyfakefs==5.2.2 | ||
# via -r requirements/testing.in | ||
pyhive[presto]==0.6.5 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is likely a legacy dependency resulting from not having run |
||
# via apache-superset | ||
pytest==7.3.1 | ||
# via | ||
# -r requirements/testing.in | ||
|
@@ -130,6 +146,10 @@ sqlalchemy-bigquery==1.6.1 | |
# via apache-superset | ||
statsd==4.0.1 | ||
# via -r requirements/testing.in | ||
tqdm==4.65.0 | ||
# via | ||
# cmdstanpy | ||
# prophet | ||
trino==0.324.0 | ||
# via apache-superset | ||
tzdata==2023.3 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,7 +175,7 @@ def get_git_sha() -> str: | |
"postgres": ["psycopg2-binary==2.9.6"], | ||
"presto": ["pyhive[presto]>=0.6.5"], | ||
"trino": ["trino>=0.324.0"], | ||
"prophet": ["prophet>=1.0.1, <1.1", "pystan<3.0"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
"prophet": ["prophet>=1.1.0, <2.0.0"], | ||
"redshift": ["sqlalchemy-redshift>=0.8.1, < 0.9"], | ||
"rockset": ["rockset>=0.8.10, <0.9"], | ||
"shillelagh": [ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
import logging | ||
from typing import Optional, Union | ||
|
||
import pandas as pd | ||
from flask_babel import gettext as _ | ||
from pandas import DataFrame | ||
|
||
|
@@ -134,7 +135,13 @@ def prophet( # pylint: disable=too-many-arguments | |
raise InvalidPostProcessingError(_("DataFrame include at least one series")) | ||
|
||
target_df = DataFrame() | ||
for column in [column for column in df.columns if column != index]: | ||
|
||
for column in [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @villebro I would love your input on this as I'm somewhat perplexed as to how this was working. After reenabling the tests it seemed to be trying to fit non-numerical columns within the The original logic I had was for determining numeric columns was, df.select_dtypes(include=np.number).columns however the MySQL tests were failing because it seems like some of the numeric columns where of type |
||
column | ||
for column in df.columns | ||
if column != index | ||
and pd.to_numeric(df[column], errors="coerce").notnull().all() | ||
]: | ||
fit_df = _prophet_fit_and_predict( | ||
df=df[[index, column]].rename(columns={index: "ds", column: "y"}), | ||
confidence_interval=confidence_interval, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -444,11 +444,11 @@ def test_chart_data_dttm_filter(self): | |
else: | ||
raise Exception("ds column not found") | ||
|
||
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm really not sure how this test worked before it was skipped without the inclusion of the fixture which ensures (hopefully) that the tests are idempotent. |
||
def test_chart_data_prophet(self): | ||
""" | ||
Chart data API: Ensure prophet post transformation works | ||
""" | ||
pytest.importorskip("prophet") | ||
time_grain = "P1Y" | ||
self.query_context_payload["queries"][0]["is_timeseries"] = True | ||
self.query_context_payload["queries"][0]["groupby"] = [] | ||
|
@@ -476,7 +476,7 @@ def test_chart_data_prophet(self): | |
self.assertIn("sum__num__yhat", row) | ||
self.assertIn("sum__num__yhat_upper", row) | ||
self.assertIn("sum__num__yhat_lower", row) | ||
self.assertEqual(result["rowcount"], 47) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why (on line #473) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should update the query context so that it's not using the legacy properties ( |
||
self.assertEqual(result["rowcount"], 103) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if 47 or 103 is correct. I assume these tests have been disabled for a long time and thus the underlying data might have changed. |
||
|
||
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices") | ||
def test_chart_data_invalid_post_processing(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new version of prophet is already available :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I think about it:
"prophet": ["prophet>=1.1.0, <2.0.0"],
would always install latest prophet 1.x, but CI would only test the pinned release. Maybe we should pin prophet insetup.py
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastianliebscher CI uses the pinned version in the frozen
requirements/testing.txt
file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, with future releases of prophet if a user enables the optional dependency with
pip install -e '.[prophet]'
, they would potentially install a version that is not tested against CI.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue you describe is true for any package (or sub-package) which is not explicitly pinning.
Note we (Airbnb) typically install Superset as a service by:
requirements/production.in
file which includes the linespip-compile-multi
to compile the dependencies—taking into account theprophet
,ddtrace
, etc. production dependencies.which ensures we always install Superset in a deterministic manner which is super critical.
Why doesn’t Superset explicitly pin every package (and sub-package) in
setup.py
? Though packages should provide flexibility, one could argue this is valid for a service—given it typically isn’t used as a dependency—however we run into the issue of creating an infeasible dependency space when installation dependent requirements are added. By relaxing constraints (by way of removing explicit pins) we allowpip-compile-multi
to solve this somewhat difficult problem on our behalf.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late, but I thank you very much for your detailed explanation! This helped a lot to understand how Superset manages dependencies.