-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Use fastpath for accessing option value in internals #49771
PERF: Use fastpath for accessing option value in internals #49771
Conversation
@@ -15,7 +15,7 @@ | |||
|
|||
import numpy as np | |||
|
|||
from pandas._config import get_option | |||
from pandas._config.config import _global_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this much faster than using pandas._config.config.options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in any case faster, since _global_config
is just a plain python dict (which is quite fast), while the options object is a custom class mimicking attribute access:
In [32]: %timeit _global_config["mode"]["copy_on_write"]
52.6 ns ± 1.85 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [36]: %timeit pd._config.config.options.mode.copy_on_write
2.78 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Strangely it seems that this is even slower than the pd.get_option("mode.copy_on_write")
:
In [37]: %timeit pd.get_option("mode.copy_on_write")
1.38 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to avoid a private import, this is also an option:
In [39]: %timeit pd.options.d["mode"]["copy_on_write"]
85.9 ns ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
The options DictWrapper class seems to have a "public" d
attribute exposing the underlying dict (_global_config
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth tying ourself in knots over, thanks for checking
hmm at one point we had a code check that checked for private imports. guess that got lost somewhere along the way |
Curious if you've looked at making |
@WillAyd you mean the |
Best Practices-wise, the best-case would be to try to optimize get_option. Next-best would be to make _global_config public-for-inernal-usage and canonize this as how we do option checks within pandas |
That would require a change of (or addition to) how It's easy to rename |
yah even if we did optimize get_option a dict lookup will always be way faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
… value in internals
…option value in internals) (#50128) Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
Related to #49729
Accessing an option value is in general quite fast, but when it is in code that is called repeatedly in a for loop, it still adds up it seems.
Using the example from #49729:
I think this should be a safe way to access the option (but will add a test that ensures you can switch back and forth with CoW within a session).
Are we OK with this pattern?
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature. -> added in PERF: first try inplace setitem for .at indexer #49772