Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs performance (#2480) #2481

Merged
merged 8 commits into from
May 22, 2020
Merged

Improve docs performance (#2480) #2481

merged 8 commits into from
May 22, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented May 21, 2020

resolves #2480

Description

Improve the docs/schema.yml parsing performance pretty significantly.

On a sample project with ~1k sources and ~50k docs blocks, the result of time dbt ls > /dev/null improved from a bit over 15 minutes on my machine to a bit under 1.5 minutes.

There are 4 5 major parts to this PR:

  • in get_rendered, do a quick regex check to see if there are any {% or {{ in the document. If not, we just return that value.
    • except native mode, where we could also do this (we'd have to call ast.literal_eval like the native renderer, though!). The ROI seemed low.
  • When parsing docs, instead of building a full template and rendering that and doing a bunch of extra rendering, just examine the contents of the docs blocks. This way, docs parsing can use the get_rendered improvements (otherwise the {% docs my_docs %}... triggers the match)
  • When loading yaml, use the CSafeLoader if it's available. To get it, users will have to have cython (I'm not 100% on this) and headers for libyaml on their system at pip install time. We could see about adding libyaml to homebrew too I guess.
  • To resolve refs/sources/docs, dbt now builds up a cache and digs through that. Although constructing the cache takes O(n) the first time something is resolved, subsequent searches are O(1).
  • New!: We used to call build_flat_graph twice, which is unnecessary - we do it after parsing and that is sufficient, there's no reason to call it at runtime.

I had to add special validation to ref handling to catch non-hashable inputs earlier as an error, previously we did the search but we'd fail to find the matching ref.

This PR has some unit test changes, but the goal is to not have to change the integration tests here.

There is a bit of extra indirection on the resolve-caching level: The cache logic internally stores unique IDs and does lookups on the manifest at resolution time to pick out the actual doc/source/node. This avoids having to rebuild the cache just because a node was compiled and makes it a bit easier to handle the run_sql case where we insert our new node into the manifest.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label May 21, 2020
@beckjake
Copy link
Contributor Author

beckjake commented May 21, 2020

Here's some analysis of the results of dbt -r x.profile ls > /dev/null for the project I described above. Note that I ran with the libyaml C bindings - without them, I'd expect to spend an extra 20-30s loading yaml.

Detailed breakdown

  • 145s spent total
  • 5s spent compiling the manifest
  • 10s spent writing the manifest to disk
    • unsurprisingly, this is almost all hologram's to_dict
  • 5s spent building the flat graph. It's a big graph!
    • this is also dominated by hologram's to_dict
  • 80s in ManifestLoader.load().
    • ~30s parsing docs.md files. I actually am not sure that's so wild, there are a lot of them!
      • this is dominated by dbt's lex_for_blocks function (~25s)
        • which is itself dominated by re.search (~20s of it appears to be spent in re.search)
        • so we should either call that less or find a way to make searching for regex patterns faster.
    • ~25s spent on rendering all the fields in the schema.yml files. We recurse, which confuses the heck out of snakeviz, but almost all of this appears to be spent rendering fields in jinja
    • 4s on yaml, I think this is about as good as it gets
    • ~20s spent decoding the schema.yml entries in hologram and validating it
      • snakeviz totally falls apart here when it comes to saying what took what kind of time, but roughly an even split between validation and decoding?
      • dbt spent what appears to me to be a totally wild amount of time in best_match, which does error handling in jsonschema. This might be worth investigating, I guess.
        • we spent 12s total in hologram's validate, of which 8.5s was spent in jsonschema.best_match
        • I think this is because of how union parsing works in hologram. Maybe we can avoid the best_match call if we aren't going to care about it, or optimize union parsing some more.
  • 45s in create_manifest. Snakeviz does a bad job here again, but I'll give my best estimates.
    • For this project, this is completely dominated by the docs processing, because this project has like 45,000 doc calls. Like, what do you want even!
    • ~3s in render_template()
    • ~1-2s in get_environment()
    • ~40s in jinja's Environment.compile
      • ~40% is spent in jinja codegen
      • ~20% is spent in jinja parsing
      • ~20% is spent in python's built-in compile (on the result of parsing/codegen)

Summary

So, out of our 145s run, we spent

  • 70s rendering jinja
  • 25s pretending to be jinja

We also spent:

  • 19s in hologram's _encode_field
    • 80,164 non-recursive calls (including recursion, 1,473,895 calls)
  • 15s in hologram's _decode_field
    • 3,212 non-recursive calls (including recursion, 138,679 calls)
  • 24s iterating over dataclass fields in hologram
    • we called this 256,064 times, which seems like quite a lot

Some of the jinja time is actually hologram time, since the 70s is what we spent in jinja's get_rendered, and that can result in hologram serialization calls, so it's not valid to add those two groups together. That said, I'm pretty confident those two libraries account for almost all of the time spent in dbt ls, so any further optimizations will have to take that into account.

Jacob Beck added 7 commits May 21, 2020 16:53
…inja internally

Make docs parsing avoid a bunch of template nonsense and just use get_rendered
Add a check to get_rendered that skips the block if we can prove it should render to its input
  - but not in native mode
removed the internal `find_*_by_name` calls for things that use the cache, just do them through `resolve_*`
fix unit tests to point at `resolve_*` now
This is a significant performance improvement with a lot of yaml files!
And point tests to the new image
@beckjake beckjake requested review from kwigley and drewbanin May 21, 2020 22:55
@beckjake beckjake marked this pull request as ready for review May 21, 2020 22:55
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment worth discussing, but otherwise, i'm thrilled with these improvements! Thanks so much for the thorough writeup too - that was a great and really informative read :)

# If this is desirable in the native env as well, we could handle the
# native=True case by passing the input string to ast.literal_eval, like
# the native renderer does.
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one implication of this is that invalid jinja will return plaintext instead of some sort of compiler error, eg:

{ { config(....) }}

I think that's ok, but we should keep this in mind if anyone write in with a support question about something like this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, one thing we could do there is also search for [%}]}. That's a pretty minor fix that might actually capture all cases that jinja would error about - I think jinja would just silently pass { { config(...) } } as text, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - i'd be in favor of also searching for [%}]}. It's a pretty minor failure mode, but I do think it's a good and positive change to make :)

I'm not at all concerned about

{ { config() } }

that is plaintext as far as i'm concerned

… render so users get compile errors instead of wrong data in that case
@beckjake
Copy link
Contributor Author

github is struggling a bit right now. I'm going to try to merge this since the tests all passed (except building wheels on windows - failed due to github, don't care!) but the webhooks aren't firing. We'll see if the merge button works (or the comment button!).

@beckjake beckjake merged commit 6dac4c7 into dev/octavius-catto May 22, 2020
@beckjake beckjake deleted the fix/docs-perf branch May 22, 2020 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants