Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: significantly improve performance of MultiIndex.shape #27384

Merged
merged 2 commits into from
Jul 18, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jul 14, 2019

MultiIndex.shape is currently extremely slow as it triggers the creation of ._values, which can be quite expensive for datetime levels. The one mitigating factor is that this result is cached and thus making ._values.shape near-instant on subsequent calls, but also hard to catch in asv benchmarks; this commit adds a suite dedicated to measuring such cached properties on Index objects.

asv results show a ~400,000x speedup for a relatively straightforward case:

       before           after         ratio
     [269d3681]       [d205acf6]
     <master>       <shape>   
-      3.52±0.02s       8.33±0.2μs     0.00  index_cached_properties.MultiIndexCached.time_shape
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd WillAyd added MultiIndex Performance Memory or execution speed performance labels Jul 14, 2019
pandas/core/indexes/multi.py Outdated Show resolved Hide resolved
@qwhelan
Copy link
Contributor Author

qwhelan commented Jul 18, 2019

Updated asv results show moving into Index benefits a few other classes significantly as well:

       before           after         ratio
     [a4c19e7a]       [3c946017]
     <unsorted_cats~1>       <shape>   
-     2.59±0.07μs      2.16±0.06μs     0.83  index_cached_properties.IndexCache.time_shape('PeriodIndex')
-     2.74±0.09μs       2.26±0.1μs     0.83  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
-      5.06±0.2μs       3.57±0.2μs     0.70  index_cached_properties.IndexCache.time_shape('UInt64Index')
-      5.80±0.4μs       3.70±0.3μs     0.64  index_cached_properties.IndexCache.time_shape('Float64Index')
-      6.40±0.4μs       4.08±0.3μs     0.64  index_cached_properties.IndexCache.time_shape('TimedeltaIndex')
-      6.80±0.3μs       3.88±0.2μs     0.57  index_cached_properties.IndexCache.time_shape('IntervalIndex')
-        65.2±1μs         903±20ns     0.01  index_cached_properties.IndexCache.time_shape('Int64Index')
-      65.1±0.9μs         892±10ns     0.01  index_cached_properties.IndexCache.time_shape('RangeIndex')
-         214±2ms       4.45±0.2μs     0.00  index_cached_properties.IndexCache.time_shape('MultiIndex')

@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants