-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129
fix: Bump prophet, re-enable tests, and remedy column eligibility logic #24129
Conversation
@@ -107,8 +125,6 @@ pydata-google-auth==1.7.0 | |||
# via pandas-gbq | |||
pyfakefs==5.2.2 | |||
# via -r requirements/testing.in | |||
pyhive[presto]==0.6.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely a legacy dependency resulting from not having run pip-compile-multi
. The presto
extra requirement is specified in requirements/development.in
which is an input to requirements/testing.in
and thus shouldn't be re-included here.
@@ -177,7 +177,7 @@ def get_git_sha() -> str: | |||
"postgres": ["psycopg2-binary==2.9.6"], | |||
"presto": ["pyhive[presto]>=0.6.5"], | |||
"trino": ["trino>=0.324.0"], | |||
"prophet": ["prophet>=1.0.1, <1.1", "pystan<3.0"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pystan
is now no longer required for prophet
v1.1.
27e9420
to
1765c97
Compare
@@ -444,11 +444,11 @@ def test_chart_data_dttm_filter(self): | |||
else: | |||
raise Exception("ds column not found") | |||
|
|||
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not sure how this test worked before it was skipped without the inclusion of the fixture which ensures (hopefully) that the tests are idempotent.
@@ -134,7 +135,11 @@ def prophet( # pylint: disable=too-many-arguments | |||
raise InvalidPostProcessingError(_("DataFrame include at least one series")) | |||
|
|||
target_df = DataFrame() | |||
for column in [column for column in df.columns if column != index]: | |||
for column in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro I would love your input on this as I'm somewhat perplexed as to how this was working. After reenabling the tests it seemed to be trying to fit non-numerical columns within the pd.DataFrame
. I couldn't find any options where we explicitly define which columns to produce a forecast for.
The original logic I had was for determining numeric columns was,
df.select_dtypes(include=np.number).columns
however the MySQL tests were failing because it seems like some of the numeric columns where of type object
which is typically used to encode strings.
@@ -476,7 +476,7 @@ def test_chart_data_prophet(self): | |||
self.assertIn("sum__num__yhat", row) | |||
self.assertIn("sum__num__yhat_upper", row) | |||
self.assertIn("sum__num__yhat_lower", row) | |||
self.assertEqual(result["rowcount"], 47) | |||
self.assertEqual(result["rowcount"], 103) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if 47 or 103 is correct. I assume these tests have been disabled for a long time and thus the underlying data might have changed.
Codecov Report
@@ Coverage Diff @@
## master #24129 +/- ##
==========================================
+ Coverage 69.06% 69.08% +0.02%
==========================================
Files 1901 1901
Lines 74019 74020 +1
Branches 8116 8116
==========================================
+ Hits 51121 51137 +16
+ Misses 20787 20772 -15
Partials 2111 2111
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
4c44e85
to
85245d1
Compare
@villebro I might need some additional 👀 on this as I'm somewhat out of my league in terms of expectations and inconsistencies regarding |
@@ -476,7 +476,7 @@ def test_chart_data_prophet(self): | |||
self.assertIn("sum__num__yhat", row) | |||
self.assertIn("sum__num__yhat_upper", row) | |||
self.assertIn("sum__num__yhat_lower", row) | |||
self.assertEqual(result["rowcount"], 47) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why (on line #473) result["data"]
is []
for only MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the query context so that it's not using the legacy properties (granularity
, is_timeseries
etc). I wonder if that may be causing issues.
85245d1
to
8b517ec
Compare
This PR will also fix #24406 |
8b517ec
to
4710248
Compare
@sebastianliebscher and @villebro I fixed the CI issue—there was an inconsistency with how the Pandas Dataframe columns where being encoded—and thus this should be ready to review. |
4710248
to
13597d1
Compare
@@ -85,6 +101,8 @@ parameterized==0.9.0 | |||
# via -r requirements/testing.in | |||
pathable==0.4.3 | |||
# via jsonschema-spec | |||
prophet==1.1.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new version of prophet is already available :)
prophet==1.1.3 | |
prophet==1.1.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I think about it: "prophet": ["prophet>=1.1.0, <2.0.0"],
would always install latest prophet 1.x, but CI would only test the pinned release. Maybe we should pin prophet in setup.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastianliebscher CI uses the pinned version in the frozen requirements/testing.txt
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, with future releases of prophet if a user enables the optional dependency with pip install -e '.[prophet]'
, they would potentially install a version that is not tested against CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue you describe is true for any package (or sub-package) which is not explicitly pinning.
Note we (Airbnb) typically install Superset as a service by:
- Defining a
requirements/production.in
file which includes the lines
-e .[prophet,…]
ddtrace
…
- Running
pip-compile-multi
to compile the dependencies—taking into account theprophet
,ddtrace
, etc. production dependencies. - Installing Superset via,
python3 -m pip install -r requirements/production.txt
python3 -m pip install -e .
which ensures we always install Superset in a deterministic manner which is super critical.
Why doesn’t Superset explicitly pin every package (and sub-package) in setup.py
? Though packages should provide flexibility, one could argue this is valid for a service—given it typically isn’t used as a dependency—however we run into the issue of creating an infeasible dependency space when installation dependent requirements are added. By relaxing constraints (by way of removing explicit pins) we allow pip-compile-multi
to solve this somewhat difficult problem on our behalf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late, but I thank you very much for your detailed explanation! This helped a lot to understand how Superset manages dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for cleaning this up. After this is merged I can follow up by refactoring those tests so that they're using the generic x-axis syntax in the query context. I'll also make sure the rowcount is correct. At any rate, I think things are better after this is merged than before, so LGTM 🚢
@@ -476,7 +476,7 @@ def test_chart_data_prophet(self): | |||
self.assertIn("sum__num__yhat", row) | |||
self.assertIn("sum__num__yhat_upper", row) | |||
self.assertIn("sum__num__yhat_lower", row) | |||
self.assertEqual(result["rowcount"], 47) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the query context so that it's not using the legacy properties (granularity
, is_timeseries
etc). I wonder if that may be causing issues.
SUMMARY
The
prophet
package (formallyfbprophet
) historically has been "difficult" to work with from an installation perspective—atypical requirement definitions (fbstan
needed to be installed first), no pre-built wheels, etc.—though thankfully in v1.1 there are pre-built binaries for MacOS, Windows, and Linux.This PR bumps the version of the package and defines it as a requirement for the
testing
environment removing the need for thepytest.importorskip("prophet")
—likely present due to problematic build. This ensures that the associated tests will now always run.The frozen requirements were updated by running:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION