Improve docs performance (#2480) #2481

beckjake · 2020-05-21T21:34:52Z

resolves #2480

Description

Improve the docs/schema.yml parsing performance pretty significantly.

On a sample project with ~1k sources and ~50k docs blocks, the result of time dbt ls > /dev/null improved from a bit over 15 minutes on my machine to a bit under 1.5 minutes.

There are 4 5 major parts to this PR:

in get_rendered, do a quick regex check to see if there are any {% or {{ in the document. If not, we just return that value.
- except native mode, where we could also do this (we'd have to call ast.literal_eval like the native renderer, though!). The ROI seemed low.
When parsing docs, instead of building a full template and rendering that and doing a bunch of extra rendering, just examine the contents of the docs blocks. This way, docs parsing can use the get_rendered improvements (otherwise the {% docs my_docs %}... triggers the match)
When loading yaml, use the CSafeLoader if it's available. To get it, users will have to have cython (I'm not 100% on this) and headers for libyaml on their system at pip install time. We could see about adding libyaml to homebrew too I guess.
To resolve refs/sources/docs, dbt now builds up a cache and digs through that. Although constructing the cache takes O(n) the first time something is resolved, subsequent searches are O(1).
New!: We used to call build_flat_graph twice, which is unnecessary - we do it after parsing and that is sufficient, there's no reason to call it at runtime.

I had to add special validation to ref handling to catch non-hashable inputs earlier as an error, previously we did the search but we'd fail to find the matching ref.

This PR has some unit test changes, but the goal is to not have to change the integration tests here.

There is a bit of extra indirection on the resolve-caching level: The cache logic internally stores unique IDs and does lookups on the manifest at resolution time to pick out the actual doc/source/node. This avoids having to rebuild the cache just because a node was compiled and makes it a bit easier to handle the run_sql case where we insert our new node into the manifest.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

beckjake · 2020-05-21T22:42:48Z

Here's some analysis of the results of dbt -r x.profile ls > /dev/null for the project I described above. Note that I ran with the libyaml C bindings - without them, I'd expect to spend an extra 20-30s loading yaml.

Detailed breakdown

145s spent total
5s spent compiling the manifest
10s spent writing the manifest to disk
- unsurprisingly, this is almost all hologram's to_dict
5s spent building the flat graph. It's a big graph!
- this is also dominated by hologram's to_dict
80s in ManifestLoader.load().
- ~30s parsing docs.md files. I actually am not sure that's so wild, there are a lot of them!
  - this is dominated by dbt's lex_for_blocks function (~25s)
    - which is itself dominated by re.search (~20s of it appears to be spent in re.search)
    - so we should either call that less or find a way to make searching for regex patterns faster.
- ~25s spent on rendering all the fields in the schema.yml files. We recurse, which confuses the heck out of snakeviz, but almost all of this appears to be spent rendering fields in jinja
- 4s on yaml, I think this is about as good as it gets
- ~20s spent decoding the schema.yml entries in hologram and validating it
  - snakeviz totally falls apart here when it comes to saying what took what kind of time, but roughly an even split between validation and decoding?
  - dbt spent what appears to me to be a totally wild amount of time in best_match, which does error handling in jsonschema. This might be worth investigating, I guess.
    - we spent 12s total in hologram's validate, of which 8.5s was spent in jsonschema.best_match
    - I think this is because of how union parsing works in hologram. Maybe we can avoid the best_match call if we aren't going to care about it, or optimize union parsing some more.
45s in create_manifest. Snakeviz does a bad job here again, but I'll give my best estimates.
- For this project, this is completely dominated by the docs processing, because this project has like 45,000 doc calls. Like, what do you want even!
- ~3s in render_template()
- ~1-2s in get_environment()
- ~40s in jinja's Environment.compile
  - ~40% is spent in jinja codegen
  - ~20% is spent in jinja parsing
  - ~20% is spent in python's built-in compile (on the result of parsing/codegen)

Summary

So, out of our 145s run, we spent

70s rendering jinja
25s pretending to be jinja

We also spent:

19s in hologram's _encode_field
- 80,164 non-recursive calls (including recursion, 1,473,895 calls)
15s in hologram's _decode_field
- 3,212 non-recursive calls (including recursion, 138,679 calls)
24s iterating over dataclass fields in hologram
- we called this 256,064 times, which seems like quite a lot

Some of the jinja time is actually hologram time, since the 70s is what we spent in jinja's get_rendered, and that can result in hologram serialization calls, so it's not valid to add those two groups together. That said, I'm pretty confident those two libraries account for almost all of the time spent in dbt ls, so any further optimizations will have to take that into account.

…inja internally Make docs parsing avoid a bunch of template nonsense and just use get_rendered Add a check to get_rendered that skips the block if we can prove it should render to its input - but not in native mode

removed the internal `find_*_by_name` calls for things that use the cache, just do them through `resolve_*` fix unit tests to point at `resolve_*` now

This is a significant performance improvement with a lot of yaml files!

And point tests to the new image

…ewlines

…tributors entry

drewbanin

One comment worth discussing, but otherwise, i'm thrilled with these improvements! Thanks so much for the thorough writeup too - that was a great and really informative read :)

drewbanin · 2020-05-22T16:20:45Z

core/dbt/clients/jinja.py

+    # If this is desirable in the native env as well, we could handle the
+    # native=True case by passing the input string to ast.literal_eval, like
+    # the native renderer does.
+    if (


the one implication of this is that invalid jinja will return plaintext instead of some sort of compiler error, eg:

{ { config(....) }}

I think that's ok, but we should keep this in mind if anyone write in with a support question about something like this :)

Well, one thing we could do there is also search for [%}]}. That's a pretty minor fix that might actually capture all cases that jinja would error about - I think jinja would just silently pass { { config(...) } } as text, right?

yep - i'd be in favor of also searching for [%}]}. It's a pretty minor failure mode, but I do think it's a good and positive change to make :)

I'm not at all concerned about

{ { config() } }

that is plaintext as far as i'm concerned

… render so users get compile errors instead of wrong data in that case

beckjake · 2020-05-22T20:16:37Z

github is struggling a bit right now. I'm going to try to merge this since the tests all passed (except building wheels on windows - failed due to github, don't care!) but the webhooks aren't firing. We'll see if the merge button works (or the comment button!).

cla-bot bot added the cla:yes label May 21, 2020

Jacob Beck added 7 commits May 21, 2020 16:53

add caching on the manifest for docs, ref, and source calls

cf94ce6

removed the internal `find_*_by_name` calls for things that use the cache, just do them through `resolve_*` fix unit tests to point at `resolve_*` now

if we have "cyaml" available, use it

b9a86df

This is a significant performance improvement with a lot of yaml files!

avoid double-building the flat graph - we already built it at parse time

a586574

add libyaml-dev so we get that libyaml goodness

7f99353

And point tests to the new image

On windows, docs now include their original carriage returns before n…

514a1ad

…ewlines

Add changelog entry, push another entry to its real release + add con…

6cc0d91

…tributors entry

beckjake force-pushed the fix/docs-perf branch from 4be4617 to 6cc0d91 Compare May 21, 2020 22:55

beckjake requested review from kwigley and drewbanin May 21, 2020 22:55

beckjake marked this pull request as ready for review May 21, 2020 22:55

drewbanin approved these changes May 22, 2020

View reviewed changes

Check for instances of "}}" and "%}" as well when deciding whether to…

0d825eb

… render so users get compile errors instead of wrong data in that case

beckjake merged commit 6dac4c7 into dev/octavius-catto May 22, 2020

beckjake deleted the fix/docs-perf branch May 22, 2020 20:16

drewbanin mentioned this pull request Jun 10, 2020

docs/schema.yml parsing can be very slow at large scales #2480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs performance (#2480) #2481

Improve docs performance (#2480) #2481

beckjake commented May 21, 2020 •

edited

Loading

beckjake commented May 21, 2020 •

edited

Loading

drewbanin left a comment

drewbanin May 22, 2020

beckjake May 22, 2020

drewbanin May 22, 2020

beckjake commented May 22, 2020

Improve docs performance (#2480) #2481

Improve docs performance (#2480) #2481

Conversation

beckjake commented May 21, 2020 • edited Loading

Description

Checklist

beckjake commented May 21, 2020 • edited Loading

Detailed breakdown

Summary

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin May 22, 2020

Choose a reason for hiding this comment

beckjake May 22, 2020

Choose a reason for hiding this comment

drewbanin May 22, 2020

Choose a reason for hiding this comment

beckjake commented May 22, 2020

beckjake commented May 21, 2020 •

edited

Loading

beckjake commented May 21, 2020 •

edited

Loading