Slightly speed up execution time of common cases #142

ofek · 2021-11-14T22:11:57Z

I'm wrapping up re-writing a CLI tool where responsiveness is critical and during the switch from toml to tomli I noticed load time increased by ~1-2 milliseconds. This PR brings it down the equivalent amount.

If you don't like the singleton you can eventually use https://www.python.org/dev/peps/pep-0562/ but I haven't benchmarked that way.

hukkin · 2021-11-14T23:02:12Z

Thanks for the PR!

I believe this should mostly be helpful in CLI invocations where TOML isn't actually parsed (but import tomli still runs). The parsing of almost any real world TOML file will probably still run the code path where regexes are compiled. Do you agree?

If the assumption is correct, then maybe it is more reasonable that applications where this 1-2 milliseconds is critical make this optimization themselves by importing Tomli lazily, only when parsing actually happens?

If we still come to the conclusion that this optimization makes sense, what do you think about implementing this with something like

@lru_cache(mazsize=None)
def regex(name: str) -> "re.Pattern":
    if name == "datetime":
        ...
    elif name == "number":
        ...
    ...

in _re.py. And then

from tomli._re import regex
# Just an example
regex("datetime").match(src, pos)

in _parser.py

Then we'd not need the singleton and also slightly less dunder magic.

for more information, see https://pre-commit.ci

hukkin · 2021-11-14T23:04:41Z

Btw triggering GitHub actions via pre-commit.ci is absolutely genius 😄 Had not realized one could do that. Will start using that trick myself if I ever need it.

ofek · 2021-11-14T23:10:08Z

Hmm, good point. I hadn't read the code thoroughly: so the patterns are always used for every invocation of loads()?

hukkin · 2021-11-14T23:16:46Z

In the current state, if the TOML file contains a value assignment of a number (float, integer, hex, etc.), inline array, or inline table, then all regexes will be compiled. If these are not assigned but a datetime or localtime is, then some, but not all regexes will be compiled.

I assume there's not many TOML files where for example no integers or floats are assigned.

EDIT: my assumption may be wrong. E.g. there's no floats or integers in https://github.com/hukkin/tomli/blob/master/pyproject.toml There are inline arrays, but we could move priority of inline arrays in parse_value function to fix the issue with them

ofek · 2021-11-14T23:32:53Z

Ah okay thanks! Yes my pyproject.toml files also have no floats nor integers. This is what I'm seeing:

$ python -m timeit -n 1 -r 1 -s "from pathlib import Path;t=Path('pyproject.toml').read_text()" "import toml;toml.loads(t)"
1 loop, best of 1: 7.36 msec per loop
$ python -m timeit -n 1 -r 1 -s "from pathlib import Path;t=Path('pyproject.toml').read_text()" "import tomli;tomli.loads(t)"
1 loop, best of 1: 8.68 msec per loop

Co-Authored-By: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com>

ofek · 2021-11-15T00:01:10Z

Lazy compilation of the regular expressions + your idea of re-ordering makes the first run as fast as toml even with numbers! I just pushed a commit.

edit: eh, it seems number regex needs to go after dates unfortunately

hukkin · 2021-11-15T00:38:00Z

Great! Would you mind using the "lru_cache" implementation I described earlier? I'd much prefer it over "setattr/_getattr_/singleton" if there's no significant performance or other drawbacks.

No worries about bad commit history or force-pushes btw, I will do a squash when we merge this anyways.

hukkin · 2021-11-15T00:52:00Z

Also, would be great if you could confirm that this actually fixes the responsiveness issue of your CLI tool (a real world problem), and that the issue is not fixable by lazily importing Tomli.

This is already in a territory of micro-optimizations where we have to do trade-offs: we sacrifice some performance of parsing large TOML files and TOML files with numbers, to achieve better performance with small files without numbers. I don't think that this is that obvious trade-off to make, so would be much happier merging if at least we fix your real-world problem.

ofek · 2021-11-15T01:43:44Z

If there are no numbers then this fixes it, otherwise no. If only the number pattern wasn't greedy and could come before dates since dates aren't that common 😄

However, I think it would still be beneficial overall to incorporate your idea and move those 2 simple equality checks to before those 3 regex searches. Would you like me to keep that one change?

for more information, see https://pre-commit.ci

hukkin · 2021-11-15T08:31:08Z

Thanks!

Lazily compile regular expressions to speed up load time

1e905a7

ofek force-pushed the load-time branch from fa5197e to 1e905a7 Compare November 14, 2021 23:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

ac5cd49

for more information, see https://pre-commit.ci

re-order conditions based on speed of checking and likelihood

1e391dd

Co-Authored-By: Taneli Hukkinen <3275109+hukkin@users.noreply.github.com>

ofek changed the title ~~Lazily compile regular expressions to speed up load time~~ Speed up load time, and execution time of common cases Nov 15, 2021

fix

cec94dd

ofek and others added 2 commits November 14, 2021 22:21

final update

cb40c82

[pre-commit.ci] auto fixes from pre-commit.com hooks

9d2f08e

for more information, see https://pre-commit.ci

ofek changed the title ~~Speed up load time, and execution time of common cases~~ Slightly speed up execution time of common cases Nov 15, 2021

hukkin merged commit 809a8ae into hukkin:master Nov 15, 2021

ofek deleted the load-time branch November 15, 2021 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slightly speed up execution time of common cases #142

Slightly speed up execution time of common cases #142

ofek commented Nov 14, 2021

hukkin commented Nov 14, 2021

hukkin commented Nov 14, 2021

ofek commented Nov 14, 2021

hukkin commented Nov 14, 2021 •

edited

Loading

ofek commented Nov 14, 2021

ofek commented Nov 15, 2021 •

edited

Loading

hukkin commented Nov 15, 2021

hukkin commented Nov 15, 2021

ofek commented Nov 15, 2021

hukkin commented Nov 15, 2021

Slightly speed up execution time of common cases #142

Slightly speed up execution time of common cases #142

Conversation

ofek commented Nov 14, 2021

hukkin commented Nov 14, 2021

hukkin commented Nov 14, 2021

ofek commented Nov 14, 2021

hukkin commented Nov 14, 2021 • edited Loading

ofek commented Nov 14, 2021

ofek commented Nov 15, 2021 • edited Loading

hukkin commented Nov 15, 2021

hukkin commented Nov 15, 2021

ofek commented Nov 15, 2021

hukkin commented Nov 15, 2021

hukkin commented Nov 14, 2021 •

edited

Loading

ofek commented Nov 15, 2021 •

edited

Loading