Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/improve rpc compile performance (#1824) #1830

Merged

Conversation

beckjake
Copy link
Contributor

Improve RPC startup/sighup performance by only parsing, not compiling
Fixes #1824

On a local demo case, server ready time improved from 24s to 14s, but it's a weird test project so don't get too excited. I'm interested to see real-world data...

  • Renamed dbt.loader.GraphLoader to dbt.parser.manifest.ManifestParser, because that's what it is.
  • Reorganized RPC tasks into task.rpc
  • Moved RPC server into task.rpc.server
  • Reorganized RPC handling
    • refactored how calls and RPC tasks are related
    • separated tasks from methods
  • Manifests now have an expect method that does a node lookup and raises an internal error if it's missing
    • use this in situations where unique_id not being in nodes is an unrecoverable internal logic error
  • Manifest metadata is caculated in a much more reasonable fashion now
  • Move some circular imports into the methods/functions that use them.
  • Finally made mypy happy with the adapter factory and dbt deps
  • got mypy to a point where it passes on core - not everything is annotated but everything in core is at least checked to some degree.

I've rebased extensively to try to keep the various commits sensible and digestible chunks of the PR, though tests may be out of whack in some places. That may make it easier to review...

Jacob Beck added 10 commits October 14, 2019 11:37
Create new helper function dbt.perf_utils.get_full_manifest
Update task.runnable accordingly
Update RPC server accordingly
initial refactoring of adapter factory stuff
Move HasCredentials protocol into connection contract and use that in the base connection
RemoteCallableResult -> RPCResult
RemoteCallable -> RemoteMethod
 - move some things from RPCTask -> RemoteMethod
   - recursive_subclasses classmethod
things in core/dbt/rpc now are all based on RemoteMethods, not RPCTasks
The _sql tasks now compile any ref'ed CTE chains at RPC call time

Give RPC tasks their own folder
 - task/rpc_server -> task/rpc/server
 - task/remote -> task/rpc/{project_commands,sql_commands,base}

Linker enhancements:
  - Expose subset graph building so multiple methods can use it
  - Expose a way for the linker to provide an interable of the ephemeral ancestors of a node
     - it's guaranteed to be ordered (so nested CTEs behave)
Some circular import cleanups
remove is_type function, just compare to resource_type
Add type checking for dbt deps
@beckjake beckjake requested a review from drewbanin October 14, 2019 18:06
@cla-bot cla-bot bot added the cla:yes label Oct 14, 2019
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! Empirically, it takes the time to hup the rpc server from ~26 seconds all the way down to ~1 second for a typical-looking dbt project :D

While playing around with this PR, I did notice that dbt is logging ephemeral models as "running" in the poll method logs, which probably isn't exactly right. I don't think that change was introduced in this PR though, so I'll create a new issue to address that.

Were the changes around deps.py all related to mypy? Those look non-trivial to me...

@@ -415,6 +387,7 @@ def patch_nodes(self, patches):
'not found or is disabled').format(patch.name)
)

# TODO: why is this here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👁 👁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know. I'm going to remove it and push and if the tests pass, we'll call it good to go?

@beckjake
Copy link
Contributor Author

Were the changes around deps.py all related to mypy? Those look non-trivial to me...

It's actually mostly type annotations! I did rename some things for clarity.

@beckjake beckjake merged commit 6287d6d into dev/louisa-may-alcott Oct 14, 2019
@beckjake beckjake deleted the feature/improve-rpc-compile-performance branch October 14, 2019 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve rpc server compilation performance
2 participants