-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GitHub action to check dbt yaml key sort order #578
Add GitHub action to check dbt yaml key sort order #578
Conversation
4266d13
to
071a60c
Compare
file_path = os.path.join(root, file) | ||
# Temporary to solve the models/model/schema.yaml problem, there are | ||
# different partitions of columns under the columns: key | ||
# Not sure if it is worth it to build logic to handle that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column list in this file is partitioned into different sets of column within the same columns:
key. This makes the alphanumeric sorting quite complicate. For now my most simple solution is just to remove these files from consideration. I'm curious to hear thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Suggestion, non-blocking] I suspect we'll run into more cases like this one where we want to disable this check, and it'll make maintenance difficult if we have to update the script every time to handle it. My suggestion for a quick and simple approach is that we update models/model/schema.yml
to start with the comment # disable-check-sort-order
, and then update this script to check for that at the start of each file as a way to disable the check.
At some point I suspect we'll also want to disable individual lines, which we could also do by implementing similar logic in check_yaml_file
. But for now, I think it's good enough to start with only the feature we need.
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
I believe I added all of the comments. This PR is starting to get a bit long so let me know if I missed anything. I started on a python script refactor but realized it would take me a bit of time and testing so I didn't go through with it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one small piece of feedback that I think got missed on the last pass, but otherwise this is good to go!
dbt/models/location/schema.yml
Outdated
- pin10 | ||
- year | ||
# GEOIDs are the correct length | ||
# GEOIDs are the correct length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagnerlmichael Seeems like this stray indent is still here, right?
This PR takes care of #209. We add one action which checks for each set of files in the issue checklist. I set up what I think to be a pretty readable output that tells the user what is failing and where when there is data that is out of order.
Here is an example. At the bottom we have each file with unsorted data and the number of times it occurs per list that needs to be sorted -
file.yaml (# lists needed to be corrected)
. Above that we see the full list that is incorrect and arrows pointing to out of order observations.The is a huge diff here but that is mostly due to me sorting and re-configuring yaml anchors to get the sorting fixed. The key files are
dbt/scripts/check_sort_dbt_yaml_files.py
and.github/workflows/check_sort_dbt_yaml_files.yaml
dbt/scripts/check_sort_dbt_yaml_files.py
has a bunch of functions that handle different files to sort, and they are all called incheck_all_files()