-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve docs performance (#2480) #2481
Conversation
Here's some analysis of the results of Detailed breakdown
SummarySo, out of our 145s run, we spent
We also spent:
Some of the jinja time is actually hologram time, since the 70s is what we spent in jinja's |
…inja internally Make docs parsing avoid a bunch of template nonsense and just use get_rendered Add a check to get_rendered that skips the block if we can prove it should render to its input - but not in native mode
removed the internal `find_*_by_name` calls for things that use the cache, just do them through `resolve_*` fix unit tests to point at `resolve_*` now
This is a significant performance improvement with a lot of yaml files!
And point tests to the new image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment worth discussing, but otherwise, i'm thrilled with these improvements! Thanks so much for the thorough writeup too - that was a great and really informative read :)
# If this is desirable in the native env as well, we could handle the | ||
# native=True case by passing the input string to ast.literal_eval, like | ||
# the native renderer does. | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one implication of this is that invalid jinja will return plaintext instead of some sort of compiler error, eg:
{ { config(....) }}
I think that's ok, but we should keep this in mind if anyone write in with a support question about something like this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, one thing we could do there is also search for [%}]}
. That's a pretty minor fix that might actually capture all cases that jinja would error about - I think jinja would just silently pass { { config(...) } }
as text, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - i'd be in favor of also searching for [%}]}
. It's a pretty minor failure mode, but I do think it's a good and positive change to make :)
I'm not at all concerned about
{ { config() } }
that is plaintext as far as i'm concerned
… render so users get compile errors instead of wrong data in that case
github is struggling a bit right now. I'm going to try to merge this since the tests all passed (except building wheels on windows - failed due to github, don't care!) but the webhooks aren't firing. We'll see if the merge button works (or the comment button!). |
resolves #2480
Description
Improve the docs/schema.yml parsing performance pretty significantly.
On a sample project with ~1k sources and ~50k docs blocks, the result of
time dbt ls > /dev/null
improved from a bit over 15 minutes on my machine to a bit under 1.5 minutes.There are
45 major parts to this PR:get_rendered
, do a quick regex check to see if there are any{%
or{{
in the document. If not, we just return that value.ast.literal_eval
like the native renderer, though!). The ROI seemed low.get_rendered
improvements (otherwise the{% docs my_docs %}...
triggers the match)pip install
time. We could see about adding libyaml to homebrew too I guess.I had to add special validation to
ref
handling to catch non-hashable inputs earlier as an error, previously we did the search but we'd fail to find the matching ref.This PR has some unit test changes, but the goal is to not have to change the integration tests here.
There is a bit of extra indirection on the resolve-caching level: The cache logic internally stores unique IDs and does lookups on the manifest at resolution time to pick out the actual doc/source/node. This avoids having to rebuild the cache just because a node was compiled and makes it a bit easier to handle the
run_sql
case where we insert our new node into the manifest.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.