Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance profiling to dbt #1001

Closed
drewbanin opened this issue Sep 14, 2018 · 6 comments
Closed

Add performance profiling to dbt #1001

drewbanin opened this issue Sep 14, 2018 · 6 comments
Assignees
Milestone

Comments

@drewbanin
Copy link
Contributor

Feature

Feature description

Application performance is important, but it can be difficult to pin down the performance impact of code changes. Profiling will help us benchmark dbt runtimes, and then evaluate the performance impact of prospective code changes.

I'm sure there are lots of low-hanging fruit, as we haven't dedicated a ton of brain cycles to performance recently. Adding profiling is the first step in any prospective performance optimization.

Who will this benefit?

Folks who use dbt; folks who have important things to do with their time; whiterose

@cmcarthur
Copy link
Member

test this on windows

@cmcarthur cmcarthur added this to the Stephen Girard milestone Oct 17, 2018
@drewbanin drewbanin modified the milestones: Stephen Girard, Grace Kelly Nov 2, 2018
@drewbanin
Copy link
Contributor Author

@beckjake can you share a link to the profiler you mentioned earlier today? I'd be cool with either merging #1020 as-is, or potentially using a different profiler if it makes our lives easier.

@beckjake
Copy link
Contributor

beckjake commented Nov 14, 2018 via email

@beckjake
Copy link
Contributor

I've looked at it some more, I think we should just merge #1020 with some minor modifications - in particular I would like to add a flag (--single-thread?, probably suppressed in --help output) to change from using a thread pool's map to just using the default map(), keeping all execution in the main thread. Obviously that'll be super slow but at least we'll get actual timing information. We can then point people at austin if they want more detailed/useful threaded perf information, since it requires no actual python imports, etc.

My reasoning here is that at some point (maybe even as soon as #1128 merges!) we'll have hit all the low-hanging fruit in parsing/setup and we'll want performance information from dbt execution, which #1020 can't currently give us.

@drewbanin
Copy link
Contributor Author

Good call @beckjake. Do you feel well-equipped to pick up #1020?

There's some overlap in the --single-thread functionality you're describing with #813. I don't know that we have to do both of those simultaneously, but it could be useful to keep it in mind.

@beckjake
Copy link
Contributor

Yeah, I'm doing it now (I think it will be very simple, I'll open a PR against the profiler branch soon).

I don't think #813 will interact with this, as what I'm planning should just preserve any existing order and not care about the node selector behavior at all (which is what I think matters here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants