Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/TST: slow tests #34131

Closed
jreback opened this issue May 12, 2020 · 8 comments
Closed

CI/TST: slow tests #34131

jreback opened this issue May 12, 2020 · 8 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite

Comments

@jreback
Copy link
Contributor

jreback commented May 12, 2020

xref #30641

from the new arm64 job, but indicative of general slowness: https://travis-ci.org/github/pandas-dev/pandas/jobs/685955999

This issue can be closed by multiple PRs

========================== slowest 10 test durations ===========================
88.88s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[tzlocal()-30S-Y]
65.15s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[tzlocal()-30S-10M]
22.33s call     pandas/tests/io/parser/test_multi_thread.py::test_multi_thread_path_multipart_read_csv[python]
21.54s call     pandas/tests/io/parser/test_multi_thread.py::test_multi_thread_path_multipart_read_csv[c_high]
21.03s call     pandas/tests/io/parser/test_multi_thread.py::test_multi_thread_path_multipart_read_csv[c_low]
19.46s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['dateutil/US/Pacific'-30S-Y]
15.75s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['dateutil/Asia/Singapore'-30S-10M]
15.71s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['dateutil/Asia/Singapore'-30S-Y]
15.61s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['Asia/Tokyo'-30S-Y]
15.59s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['US/Eastern'-30S-Y]
14.27s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['Asia/Tokyo'-30S-10M]
13.18s call     pandas/tests/groupby/transform/test_transform.py::test_cython_transform_frame[shift-args3-<lambda>]
13.11s call     pandas/tests/test_sorting.py::TestSorting::test_int64_overflow_moar
12.98s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['US/Eastern'-30S-10M]
12.92s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['dateutil/US/Pacific'-30S-10M]
12.78s call     pandas/tests/io/sas/test_sas7bdat.py::TestSAS7BDAT::test_iterator_loop
12.76s call     pandas/tests/io/sas/test_xport.py::TestXport::test1_basic
12.18s call     pandas/tests/groupby/transform/test_transform.py::test_cython_transform_frame[shift-args2-<lambda>]
11.97s call     pandas/tests/indexing/multiindex/test_chaining_and_caching.py::test_indexer_caching
11.95s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[pytz.FixedOffset(300)-30S-Y]
11.28s call     pandas/tests/io/parser/test_multi_thread.py::test_multi_thread_string_io_read_csv[python]
11.19s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[datetime.timezone(datetime.timedelta(0, 3600))-30S-Y]
10.60s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[tzutc()-30S-Y]
10.45s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[<UTC>-30S-Y]
10.35s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[datetime.timezone.utc-30S-Y]
10.19s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[datetime.timezone(datetime.timedelta(-1, 82800), 'foo')-30S-Y]
10.12s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[tzutc()-30S-10M]
10.05s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit['UTC'-30S-Y]
9.99s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[pytz.FixedOffset(-300)-30S-Y]
9.58s call     pandas/tests/resample/test_datetime_index.py::test_nearest_upsample_with_limit[<UTC>-30S-10M]```
@jreback jreback added Testing pandas testing functions or related to the test suite Performance Memory or execution speed performance good first issue labels May 12, 2020
@jreback jreback added this to the 1.1 milestone May 12, 2020
@Nishikoh Nishikoh mentioned this issue May 12, 2020
5 tasks
@jreback
Copy link
Contributor Author

jreback commented May 12, 2020

updated the above list for top 30: https://travis-ci.org/github/pandas-dev/pandas/jobs/686095405 is the build (one of the standard ones)

@dsaxton
Copy link
Member

dsaxton commented May 12, 2020

For test_nearest_upsample_with_limits from pandas/tests/resample/test_datetime_index.py is it necessary to parametrize over so many timezones? Removing that fixture causes a speedup for me from around 5 minutes to less than a second. Looks there was a timezone specific issue, but I suppose a smaller set could be tested? #33895

@jreback
Copy link
Contributor Author

jreback commented May 12, 2020

The issue is here: https://github.com/pandas-dev/pandas/pull/33939/files#diff-dd427a6bbdfef7e333ac320adb15614dR539

its not the tz's at all, rather upsampling creates an enormous array

e.g.

In [10]: ts = pd.Series(1, index=pd.date_range("1/1/2000", periods=3, freq='Y'))                                                                              

In [11]: ts.resample('1min').nearest(limit=2)                                                                                                                 
Out[11]: 
2000-12-31 00:00:00    1.0
2000-12-31 00:01:00    1.0
2000-12-31 00:02:00    1.0
2000-12-31 00:03:00    NaN
2000-12-31 00:04:00    NaN
2000-12-31 00:05:00    NaN
                      ... 
2002-12-30 23:55:00    NaN
2002-12-30 23:56:00    NaN
2002-12-30 23:57:00    NaN
2002-12-30 23:58:00    1.0
2002-12-30 23:59:00    1.0
2002-12-31 00:00:00    1.0
Freq: T, Length: 1051201, dtype: float64

In [12]: ts.resample('10s').nearest(limit=2)                                                                                                                  
Out[12]: 
2000-12-31 00:00:00    1.0
2000-12-31 00:00:10    1.0
2000-12-31 00:00:20    1.0
2000-12-31 00:00:30    NaN
2000-12-31 00:00:40    NaN
2000-12-31 00:00:50    NaN
                      ... 
2002-12-30 23:59:10    NaN
2002-12-30 23:59:20    NaN
2002-12-30 23:59:30    NaN
2002-12-30 23:59:40    1.0
2002-12-30 23:59:50    1.0
2002-12-31 00:00:00    1.0
Freq: 10S, Length: 6307201, dtype: float64

so should split this test up into 2, e.g. no real need to upsample Y to 10s

@avirlrma
Copy link

take

@OlivierLuG
Copy link
Contributor

I've made a PR to remove the fixtures @pytest.mark.slow when the tests were fast.

On the other hand, I have gathered these data for tests that take a lot of time to finish. It may be the functions to look to ?

pandas/tests/filename(::class)::function (first lines only) Nb. of call Avg. Time Total time
groupby/test_value_counts.py::test_series_groupby_value_counts 1568 0,29 460,19
window/moments/test_moments_consistency_rolling.py::test_rolling_apply_consistency 1807 0,16 297,66
groupby/test_nunique.py::test_series_groupby_nunique 49 5,23 256,3
window/moments/test_moments_consistency_rolling.py::test_rolling_consistency 1807 0,06 110,99
indexing/multiindex/test_indexing_slow.py::test_multiindex_get_loc 1 99,42 99,42
window/moments/test_moments_consistency_rolling.py::test_rolling_consistency_series 757 0,13 97,43
io/test_stata.py::TestStata::test_stata_119 1 91,81 91,81
window/moments/test_moments_consistency_rolling.py::test_rolling_consistency_cov 1814 0,047 86,29
frame/test_arithmetic.py::TestFrameFlexArithmetic::test_floordiv_axis0_numexpr_path 2 28,185 56,37
window/moments/test_moments_consistency_ewm.py::test_ewm_consistency 862 0,057 49,15
window/moments/test_moments_consistency_rolling.py::test_rolling_consistency_var 1767 0,024 43,96
test_sorting.py::TestMerge::test_int64_overflow_issues 1 42,4 42,4
window/moments/test_moments_consistency_rolling.py::test_rolling_consistency_std 1807 0,0197 35,73
plotting/test_boxplot_method.py::TestDataFrameGroupByPlots::test_grouped_box_return_type 3 11,4 34,43
plotting/test_frame.py::TestDataFramePlots::test_errorbar_plot 2 15,27 30,54
computation/test_eval.py::TestAlignment::test_medium_complex_frame_alignment 4 7,5 30
frame/methods/test_duplicated.py::test_duplicated_do_not_fail_on_wide_dataframes 1 28,54 28,54
plotting/test_frame.py::TestDataFramePlots::test_errorbar_timeseries 2 12,55 25,1
plotting/test_misc.py::TestDataFramePlots::test_andrews_curves 3 7,08666666666667 21,26
computation/test_eval.py::TestAlignment::test_complex_series_frame_alignment 5 4,064 20,32
plotting/test_hist_method.py::TestDataFramePlots::test_hist_df_legacy 2 9,46 18,92
plotting/test_frame.py::TestDataFramePlots::test_plot 2 9,25 18,5
window/moments/test_moments_consistency_ewm.py::test_ewm_consistency_series_data 361 0,0477285318559555 17,2299999999999
computation/test_eval.py::TestEvalNumexprPandas::test_complex_cmp_ops 24 0,69625 16,71
plotting/test_boxplot_method.py::TestDataFramePlots::test_boxplot_legacy1 2 8,255 16,51
plotting/test_boxplot_method.py::TestDataFrameGroupByPlots::test_grouped_box_layout 2 7,96 15,92
window/moments/test_moments_consistency_expanding.py::test_expanding_consistency 216 0,0641203703703706 13,8500000000001
plotting/test_datetimelike.py::TestTSPlot::test_tsplot 2 6,72 13,44
plotting/test_datetimelike.py::TestTSPlot::test_ts_plot_with_tz 26 0,488076923076923 12,69
plotting/test_hist_method.py::TestSeriesPlots::test_hist_layout_with_by 2 6,34 12,68
plotting/test_hist_method.py::TestDataFrameGroupByPlots::test_grouped_hist_layout 3 4,10666666666667 12,32
plotting/test_series.py::TestSeriesPlots::test_hist_layout_with_by 2 6,16 12,32

@jreback
Copy link
Contributor Author

jreback commented Jun 8, 2020

most of the above list are now fixed by #34413; the csv multithread test could use some work on reducing the time

OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 13, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 14, 2020
OlivierLuG added a commit to OlivierLuG/pandas that referenced this issue Jun 14, 2020
@TomAugspurger TomAugspurger removed this from the 1.1 milestone Jun 17, 2020
@TomAugspurger
Copy link
Contributor

Cleared the milestone. When will we be able to close this?

@mroeschke mroeschke added the Closing Candidate May be closeable, needs more eyeballs label Dec 26, 2021
@mroeschke
Copy link
Member

Since it appears these timings were run a while ago, it's probably better that we open targeted issues for slow tests so it's more actionable. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

6 participants