-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature CT-426 manage schemas #5392
Feature CT-426 manage schemas #5392
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @bneijt |
06fa34d
to
5b4be03
Compare
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @bneijt |
2 similar comments
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @bneijt |
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @bneijt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really appreciative of you both for opening up the PR, in which we can ground more concrete discussion. I took a first look through and left some initial comments.
I sense that the current implementation assumes certain patterns for how users are leveraging custom schemas across environments. I think it's good to be opinionated — it's an optional feature, people don't need to use it — so long as we're clear on what those opinions are. The most successful version of this feature asks us to also provide even clearer recommendations for how users ought to be structuring their DWH (logical database/schema organization), so that dbt knows how to clean up after them.
53fbfbc
to
3a17595
Compare
@jtcohen6 could you review this PR? I think that the only missing part is the docs. But for that I would like to know whether the feature is finalized as is. |
@agoblet Sorry for the delay! I've been away on vacation, and heads down getting ready for Coalesce (this week!). This is definitely on my list to review later in the month. Thanks for the great conversation so far. If we can get this merge-ready, I would love to include it in an upcoming release. Next step here is on me to review and provide feedback. |
@jtcohen6 any news? Regarding the docs update, I think we only need to extend the dbt_project.yml spec |
@jtcohen6 what is blocking us from moving forward with this? How can we solve it? |
@agoblet Sorry for the radio silence - this is still on me! The remaining blocker at this point is me taking the time to sit down with the latest set of changes, and provide a functional review, including product/UX refinement as appropriate. Once I've signed off, I'll remove the Apologies that the limiting reagent here has been my time. We're still working through the backlog of new issues that came in during Coalesce. To provide a sense of timing: Since this is a new feature, it would be included in the next minor version release after it's merged. The next minor version we have scheduled is v1.4, for release in January — so, we still have some leeway. I'm really appreciative of your patience in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I've had a chance to sit down with this for a few hours. There's a lot of promise here! I've also left a lot of feedback. Most of it is quite tactical, and I think we've got a solid path to the finish line.
# dbt_project.yml
managed-schemas:
# if using postgres, 'database' should always be default / {target.database}
- schema: # matches models without a custom schema configuration --> *probably* {target.schema}, but could vary if using a custom 'generate_schema_name' macro
prune-models: warn
- schema: another_schema # matches models configured with 'schema: another_schema' --> where they actually land will vary based on 'generate_schema_name' macro!
prune-models: warn
# What about when:
# - using generate_schema_name_for_env, and every model has the same 'resolved' schema, but with different 'prune' actions?
# - we just deleted the last model with the 'another_schema' config? should we raise a warning? still prune?
# - ... lots of other edge cases! ...
That feedback is in two forms: I dropped comments inline in this PR; and I also pushed up a commit with a whole bunch of code comments and illustrative changes (though not always the prettiest ones): 3a1c689
As far as next steps:
- Once this has passed functional review, one of the Core engineers will give it code review as well. The good news is that the changes here are relatively self-contained. Still, tasks are a tricky business; while any significant refactoring should be out of scope, there might be some cleanup / reorg in order.
- @dbeatty10 has offered to provide additional UX feedback. (Doug, I'd be happy to get your thoughts on any of the comments I left on this PR, or thoughts expressed in the commit linked above!)
- We'll need to identify follow-on work for:
- Adding
manage
to the new CLI (currently on feature branch) - Supporting this task in
dbt-rpc
(hopefully viacli_args
method, no work needed) - Supporting this task in
dbt-server
(WIP - @ChenyuLInx, could be a nice concrete exercise for our thinking about the handoff points, as described in [CT-1583][Feature] Utilize new way of providing manifest for tasks dbt-server#127)
- Adding
@bneijt @agoblet The question to answer first: Is this something you're still interested in working on, and carrying over the finish line? Or would you rather one of us to pick it up from here? (I completely understand if your interest & capacity have shifted in the months since you last worked on this—the delay is on me.)
warn_or_error( | ||
f"Found unused model {target_database}.{target_schema}.{target_identifier}" | ||
) | ||
elif target_action == PruneModelsAction.DROP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For visibility: What do you think about some info-level logging, right before dropping each schema? (The debug-level logs will probably include the SQL that's submitted by adapter.drop_relation
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agoblet will look into this
models_in_database[(target_database, target_schema, target_identifier)] | ||
) | ||
|
||
def _assert_schema_uniqueness(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this validation check here, let's do it at parse time, while dbt is reading + validating dbt_project.yml
. We can do that by defining a custom validate
method on SchemaManagementConfiguration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agoblet will look into this
@jtcohen6 thanks for the valuable feedback! We are still interested in finishing this. I will start fixing some of these things hopefully next week. What is your idea about the timeline for this PR? Do we still want to add it to the january release, or do you think that is too tight? |
Very excited about your work on this @agoblet ! I defer to @jtcohen6, but we might be too tight for the January release -- after today, we'll all be out of the office until January 3rd, and we're planning on cutting the release candidate (RC) for 1.4 just a handful of days later. The final release is planned for about two weeks after the RC. |
@agoblet Agree that the timing here might be tight for v1.4, since we're planning to put out the release candidate (= lock any code changes) next week. Apologies again for the delay on my side. That's okay! I'd still be very excited to merge this PR, when it's ready, without adding undue rush. Our current plan (hope) is to put out an early beta of v1.5 in February (final release scheduled for April), including a bunch of internal API changes that are currently on a feature branch ( One big thing that remains is merging / rebasing changes in |
Hey @agoblet, Thanks for contributing this! We are focusing on getting the |
Great, thanks @ChenyuLInx , we did not rebase yet, will do that once the new cli is on main. I will also try to fix the remaining comments next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bneijt We recently merged the feature branch! Let me know if there's any help needed moving this PR to the click framework!
core/dbt/main.py
Outdated
@@ -448,6 +449,21 @@ def _build_debug_subparser(subparsers, base_subparser): | |||
return sub | |||
|
|||
|
|||
def _build_manage_subparser(subparsers, base_subparser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add noop and warn tests improve tests rename tests add view dropping test add unmanaged schema test make tests more dry Delete tmp.csv Manage schemas is optional Add --target-path as a CLI option. (dbt-labs#5402) Include py.typed in MANIFEST.in (dbt-labs#5703) This enables packages that install dbt-core from pypi to use mypy. wip: move manage logic to separate command Add manage command
37da39d
to
05fcaa2
Compare
I left the organisation I where I started this work and have not used dbt for a few years now. I nolonger wish to contribute to this and I’m closing this pr. |
resolves #4957 CT-426
Description
Add support to manage (drop/monitor) schema for unmanaged models.
TODO because of draft
Checklist
changie new
to create a changelog entry