Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: regression from 0.10.1 #3089

Closed
jreback opened this issue Mar 19, 2013 · 32 comments
Closed

PERF: regression from 0.10.1 #3089

jreback opened this issue Mar 19, 2013 · 32 comments
Labels
Bug Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Mar 19, 2013

f6ac8c0 to 05a737d

Bad news:

Results:
                                            t_head  t_baseline      ratio
name                                                                     
reindex_frame_level_reindex                 1.1343      0.7571     1.4982
reindex_frame_level_align                   1.1690      0.7548     1.5488

Good news:

Results:
                                            t_head  t_baseline      ratio
name                                                                     
frame_get_dtype_counts                      0.0968    208.6380     0.0005
frame_wide_repr                             0.5321    207.0730     0.0026
groupby_first_float32                       3.0386    336.2439     0.0090
groupby_last_float32                        3.2206    335.2571     0.0096
frame_to_csv2                             181.1550   2188.9260     0.0828
indexing_dataframe_boolean                 12.9179    125.7350     0.1027
write_csv_standard                         36.3702    230.9210     0.1575
frame_reindex_axis0                         0.3122      1.0984     0.2842
frame_to_csv_mixed                        340.1070   1091.9340     0.3115
frame_to_csv                              106.5671    219.1830     0.4862
frame_add                                  22.0693     34.1867     0.6456
frame_mult                                 22.0064     33.9859     0.6475
frame_reindex_upcast                       11.4179     17.3291     0.6589
frame_fancy_lookup_all                     14.3931     20.5621     0.7000
series_string_vector_slice                148.3371    195.5922     0.7584

???

frame_reindex_axis1                         2.3612      2.3386     1.0097

Fixed:

indexing_dataframe_boolean_rows             0.2180      0.1953     1.1164
@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

I think that the multi-level reindexing ones are possibly related to the take changes (#2819)
(which help a great deal in the most common cases), I didn't investigate much, that's just the top
on profiling (in both old and new)

@ghost
Copy link

ghost commented Mar 19, 2013

bisected?

@ghost
Copy link

ghost commented Mar 19, 2013

vbench was meant exactly for that, but it hasn't been updated in a while.
Too bad.

http://pandas.pydata.org/pandas-docs/vbench

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

can we run it on each commit and get the graph?

@ghost
Copy link

ghost commented Mar 19, 2013

That's it's reason for being. I just added test_perf a few months ago for the devs.

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

agreed....how does it update though?

@ghost
Copy link

ghost commented Mar 19, 2013

I think wes goes into the basement and presses a button.

@ghost
Copy link

ghost commented Mar 19, 2013

we're getting too chatty on the issue tracker, need an irc channel. or campfire.

@jseabold
Copy link
Contributor

FWIW, I lurk and learn, but I'm sure others hate the noise just as much as I find it helpful. A dev list might be preferable to IRC unless you can commit resources to a bot to archive everything in a publicly searchable way. I don't know how public campfire is.

@ghost
Copy link

ghost commented Mar 19, 2013

We're definitely pushing the boundaries of S/N, that's not cool.
Email is inconvenient for realtime, so not really a solution for many things.
I believe freenode has web based logs for all channels, though I'm not sure I consider that a plus.

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

fyi...i think i found this regression (my issue!)...issue a fix in a bit

@wesm
Copy link
Member

wesm commented Mar 19, 2013

I could set up a hipchat for pandas devs if you like, but that wouldn't be public.

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

#3093 fixes indexing_boolean_data_frame_rows

@ghost
Copy link

ghost commented Mar 19, 2013

@wesm , punting on the chat quesion.

would it be possible to get a daily vbench going? we're discovering perf regression by mistake...

@stephenwlin
Copy link
Contributor

hmm, well, i have kept all my old vb_suite logs (including several for testing #2819, #2867 against their immediate base commits) and I can't find any entries from where reindex_frame_level_* were affected negatively (in fact, a lot of results had the test improved by 20+%)...it's always possible there is some complex interaction between multiple lines on work that were developed independently, though...also I'm surprised frame_reindex_axis1 isn't on the positive side since I have consistent >25% improvements in that one.

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2013

pretty easy to break perf; thxs for checking

@stephenwlin
Copy link
Contributor

one thing to note is that i'm on 32-bit, that might make a difference...
also, here's my last vb_suite logs for the two commits #2819: http://pastebin.com/xEHn0Mhz, #2867: http://pastebin.com/sFccHx1K

@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2013

@y-p is there a way to 'save' a vb_suite db for a particular commit, so that subsequent runs of test_perf can just refer to that rather than rebuilding it? (and if not, maybe make an option in test_perf?)

or maybe just save the build (rather than the db), but that would still save the get/build time

@ghost
Copy link

ghost commented Mar 20, 2013

vb_suite has a persistent db, test_perf purges the db between runs. I'll add an option
when I get a chance.

@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2013

@stephenwlin
I did a git bisect....here's what broke reindex_frame_level_align and reindex
prob just an odd special case

10f65e02536b4dda0ab7761cb1d10e274b208cfb is the first bad commit
commit 10f65e02536b4dda0ab7761cb1d10e274b208cfb
Author: Stephen Lin <stephenwlin@gmail.com>
Date:   Wed Feb 13 03:33:35 2013 -0500

    ENH: optimize Cython take functions with memmove where possible

:040000 040000 c9e8c83951445333494415e5169abf7a7bad951c 45efa72631f1530adf6858155f3aa2bca0204907 M      pandas
[cow-jreback-~/pandas] git bisect log
git bisect start
# good: [05a737dbb9f857e25a211739d90c42f3d9428b8e] TST: skip problematic xlrd test
git bisect good 05a737dbb9f857e25a211739d90c42f3d9428b8e
# bad: [605f87edda24b07f0f0a1b2a2462a4e121629494] DOC:  update RELEASE.rst
git bisect bad 605f87edda24b07f0f0a1b2a2462a4e121629494
# good: [9faec8293aa7fda64b85f3aa05bff5600c7b28d1] BUG: don't show [] for empty series. close #2775
git bisect good 9faec8293aa7fda64b85f3aa05bff5600c7b28d1
# good: [9faa8defe954f89a69dfc9b48d7ec1cfcb9936d8] Merge branch 'dtypes_bug' of https://github.com/jreback/pandas into jreback-dtypes_bug
git bisect good 9faa8defe954f89a69dfc9b48d7ec1cfcb9936d8
# bad: [49e123f95d95b70d10f5e5852d2682741799aed1] RLS: fixup RELEASE.rst links
git bisect bad 49e123f95d95b70d10f5e5852d2682741799aed1
# bad: [cb5e7dd2f0c96108e84e930b86bc522f8c613cfb] Merge pull request #2871 from jreback/dtypes1
git bisect bad cb5e7dd2f0c96108e84e930b86bc522f8c613cfb
# bad: [f9dfa811aa1a925e43289491558d95e1e2a2e000] RLS: add release notes for 'opt-take-2' and 'fix-maybe-convert-objects'
git bisect bad f9dfa811aa1a925e43289491558d95e1e2a2e000
# good: [759b0db9436999877791e25bf524c9c40202bc6d] ENH: Consolidate and further improve performance of take functions
git bisect good 759b0db9436999877791e25bf524c9c40202bc6d
# bad: [10f65e02536b4dda0ab7761cb1d10e274b208cfb] ENH: optimize Cython take functions with memmove where possible
git bisect bad 10f65e02536b4dda0ab7761cb1d10e274b208cfb

@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2013

@y-p thxs

@ghost
Copy link

ghost commented Mar 20, 2013

I remember that thread. "is memmove slower then memcpy".

@stephenwlin
Copy link
Contributor

hmm..well, that's weird, i also have a vb_suite diff that's memmove vs memcpy and everything there is +-10% (with most with +-5%, basically normally distributed around 1.0) ...maybe there's a 32-bit vs 64-bit issue...i will take a look though...i'll run a vb_suite with just that commit on my machine

@stephenwlin
Copy link
Contributor

@jreback can you give me info about the machine you're using to test this (cython version, gcc version, etc.?) this commit literally does nothing but change an explicit loop into a single memmove call, so it's all up to what cython and the compiler do with that: any reasonable compiler should inline memmove to be just as fast or faster than a loop but I suppose there's a chance it might not...I suspect this test is creating some degenerate condition where memmove is being called on a short array over and over again and the compiler is not inlining correctly (i.e. it happens to exactly be the transpose of the ideal shape for the the optimization, so that the overhead of the call actually exceeds that of the original loop); if that's the case I can add an explicit check to avoid that for short arrays

@jreback
Copy link
Contributor Author

jreback commented Mar 20, 2013

As you posted elsewhere, the x64 case is probably using SIMD / compiler optimizations

Linux goat 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux
Cython is 0.17.2
[goat-jreback-~/pandas] gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8) 

@jreback
Copy link
Contributor Author

jreback commented Mar 21, 2013

@stephenwlin did a great writeup for those who are interested

http://mail.python.org/pipermail/pandas-dev/2013-March/000008.html

going to close this issue for new, created #3114 to mark that we need a method for creating various
type of c-contig/f-contig test cases in an easy way

@jreback jreback closed this as completed Mar 21, 2013
@ghost
Copy link

ghost commented Mar 21, 2013

@stephenwlin , that was a formidable piece.

@stephenwlin
Copy link
Contributor

Found out it had to do with the fact that the width of the array to be copied is not compile-time constant: I posted a question on StackOverflow about this http://stackoverflow.com/questions/15554565/forcing-gcc-to-perform-loop-unswitching-of-memcpy-runtime-size-checks

@jreback
Copy link
Contributor Author

jreback commented Mar 21, 2013

@stephenwlin you are deep into the c-code! (that's a good thing!)

@stephenwlin
Copy link
Contributor

@jreback, can you vbench #3130 ? I went the magic number route (for now), 128 bytes ended up slowing another test (didn't look into the particulars of it, though) so I'm going with 64 bytes for now. EDIT apparently I got some things mixed up here, #3130 as tested and submitted has a 256 byte limit and it doesn't have any downside, so I'm going with that

@jreback
Copy link
Contributor Author

jreback commented Mar 21, 2013

testing now....i'll post on that thread

@stephenwlin
Copy link
Contributor

@jreback actually, i got my commits mixed up, turns out this was 256 bytes (it's the largest limit that doesn't have negative impacts on other tests, which is what we want)...I amend the comment and the issue but the commit was good (I verified the hash from my vb_suite log, just in case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

4 participants