Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitHub action to check dbt yaml key sort order #578

Merged

Conversation

wagnerlmichael
Copy link
Member

@wagnerlmichael wagnerlmichael commented Aug 27, 2024

This PR takes care of #209. We add one action which checks for each set of files in the issue checklist. I set up what I think to be a pretty readable output that tells the user what is failing and where when there is data that is out of order.

Here is an example. At the bottom we have each file with unsorted data and the number of times it occurs per list that needs to be sorted - file.yaml (# lists needed to be corrected). Above that we see the full list that is incorrect and arrows pointing to out of order observations.

. . .
. . .
----------------------------------------
In file: dbt/models/sale/docs.md
Headings not sorted:
- # flag
- # foreclosure
- # parameter
---> # group_mean
- # metadata
- # mydec
- # vw_ias_salesval_upload
----------------------------------------
The following files have unsorted columns:
dbt/models/ccbor/schema.yml (1)
dbt/models/iasworld/schema/iasworld.dweldat.yml (1)

The following files have unsorted data tests:
dbt/models/iasworld/schema/iasworld.dweldat.yml (3)

The following markdown files have unsorted headings:
dbt/models/other/docs.md (1)
dbt/models/sale/docs.md (1)

The following columns.md files have unsorted '##' headings:
dbt/models/proximity/columns.md (1)

The following shared_columns.md files have unsorted headings:
dbt/models/shared_columns.md (1)

The is a huge diff here but that is mostly due to me sorting and re-configuring yaml anchors to get the sorting fixed. The key files are dbt/scripts/check_sort_dbt_yaml_files.py and .github/workflows/check_sort_dbt_yaml_files.yaml

dbt/scripts/check_sort_dbt_yaml_files.py has a bunch of functions that handle different files to sort, and they are all called in check_all_files()

@wagnerlmichael wagnerlmichael linked an issue Aug 27, 2024 that may be closed by this pull request
5 tasks
@wagnerlmichael wagnerlmichael changed the title 209 add GitHub action to check dbt yaml key sort order Add GitHub action to check dbt yaml key sort order Aug 27, 2024
@wagnerlmichael wagnerlmichael force-pushed the 209-add-github-action-to-check-dbt-yaml-key-sort-order branch from 4266d13 to 071a60c Compare August 27, 2024 16:43
Comment on lines 82 to 85
file_path = os.path.join(root, file)
# Temporary to solve the models/model/schema.yaml problem, there are
# different partitions of columns under the columns: key
# Not sure if it is worth it to build logic to handle that
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column list in this file is partitioned into different sets of column within the same columns: key. This makes the alphanumeric sorting quite complicate. For now my most simple solution is just to remove these files from consideration. I'm curious to hear thoughts on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion, non-blocking] I suspect we'll run into more cases like this one where we want to disable this check, and it'll make maintenance difficult if we have to update the script every time to handle it. My suggestion for a quick and simple approach is that we update models/model/schema.yml to start with the comment # disable-check-sort-order, and then update this script to check for that at the start of each file as a way to disable the check.

At some point I suspect we'll also want to disable individual lines, which we could also do by implementing similar logic in check_yaml_file. But for now, I think it's good enough to start with only the feature we need.

@wagnerlmichael
Copy link
Member Author

I believe I added all of the comments. This PR is starting to get a bit long so let me know if I missed anything. I started on a python script refactor but realized it would take me a bit of time and testing so I didn't go through with it

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one small piece of feedback that I think got missed on the last pass, but otherwise this is good to go!

- pin10
- year
# GEOIDs are the correct length
# GEOIDs are the correct length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wagnerlmichael Seeems like this stray indent is still here, right?

@wagnerlmichael wagnerlmichael merged commit 27641cd into master Sep 11, 2024
9 checks passed
@wagnerlmichael wagnerlmichael deleted the 209-add-github-action-to-check-dbt-yaml-key-sort-order branch September 11, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GitHub Action to check dbt YAML key sort order
2 participants