-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly speed up execution time of common cases #142
Conversation
Thanks for the PR! I believe this should mostly be helpful in CLI invocations where TOML isn't actually parsed (but If the assumption is correct, then maybe it is more reasonable that applications where this 1-2 milliseconds is critical make this optimization themselves by importing Tomli lazily, only when parsing actually happens? If we still come to the conclusion that this optimization makes sense, what do you think about implementing this with something like @lru_cache(mazsize=None)
def regex(name: str) -> "re.Pattern":
if name == "datetime":
...
elif name == "number":
...
... in from tomli._re import regex
# Just an example
regex("datetime").match(src, pos) in Then we'd not need the singleton and also slightly less dunder magic. |
for more information, see https://pre-commit.ci
Btw triggering GitHub actions via pre-commit.ci is absolutely genius 😄 Had not realized one could do that. Will start using that trick myself if I ever need it. |
Hmm, good point. I hadn't read the code thoroughly: so the patterns are always used for every invocation of |
In the current state, if the TOML file contains a value assignment of a number (float, integer, hex, etc.), inline array, or inline table, then all regexes will be compiled. If these are not assigned but a datetime or localtime is, then some, but not all regexes will be compiled. I assume there's not many TOML files where for example no integers or floats are assigned. EDIT: my assumption may be wrong. E.g. there's no floats or integers in https://github.com/hukkin/tomli/blob/master/pyproject.toml There are inline arrays, but we could move priority of inline arrays in |
Ah okay thanks! Yes my
|
Co-Authored-By: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com>
Lazy compilation of the regular expressions + your idea of re-ordering makes the first run as fast as edit: eh, it seems number regex needs to go after dates unfortunately |
Great! Would you mind using the "lru_cache" implementation I described earlier? I'd much prefer it over "setattr/_getattr_/singleton" if there's no significant performance or other drawbacks. No worries about bad commit history or force-pushes btw, I will do a squash when we merge this anyways. |
Also, would be great if you could confirm that this actually fixes the responsiveness issue of your CLI tool (a real world problem), and that the issue is not fixable by lazily importing Tomli. This is already in a territory of micro-optimizations where we have to do trade-offs: we sacrifice some performance of parsing large TOML files and TOML files with numbers, to achieve better performance with small files without numbers. I don't think that this is that obvious trade-off to make, so would be much happier merging if at least we fix your real-world problem. |
If there are no numbers then this fixes it, otherwise no. If only the number pattern wasn't greedy and could come before dates since dates aren't that common 😄 However, I think it would still be beneficial overall to incorporate your idea and move those 2 simple equality checks to before those 3 regex searches. Would you like me to keep that one change? |
for more information, see https://pre-commit.ci
Thanks! |
I'm wrapping up re-writing a CLI tool where responsiveness is critical and during the switch from
toml
totomli
I noticed load time increased by ~1-2 milliseconds. This PR brings it down the equivalent amount.If you don't like the singleton you can eventually use https://www.python.org/dev/peps/pep-0562/ but I haven't benchmarked that way.