Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ratio stats Glue job to dbt python model #422
Refactor ratio stats Glue job to dbt python model #422
Changes from 27 commits
06bb719
63bd8cb
f3012e1
1e981ec
af7f504
02fab5d
eacd40a
26f9013
3e7f558
199c405
f66ed93
8fdc7a4
8dbec57
487b8a7
15b9af7
8f8df47
f0dbd9e
1576932
fe1d40d
9434c4e
5bc9442
0499aed
bd03719
5f12a41
f747d7f
0efa325
2e3bbfd
7c07280
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wrridgeway It looks like these functions are still cutting out the top and bottom 5% of ratios. I can't quite remember where we landed on this, but I thought we were shifting to kicking out sales using the new flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly, our linters and pre-commit checks absolutely hate this setup. There are a two problems:
sc
is undefined in the script (it's added when you submit the job).addPyFile()
, which loads the packages from S3.I'm not sure if there's a way to disable these linting toggles in a prettier way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question, non-blocking] Unfortunately this is pretty common when 1) writing code for environments with implicit context and B) writing setup code that needs to run prior to import. I don't think there's an obviously better way to do it, although we might consider switching to a different linter that supports block-level ignores (unlike flake8).
That being said, can you provide some context on why pylint is turned off? Is that just for your editor integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's because it's also mad about
sc
not being defined. Pylint in this case is kind of extraneous since flake8 is running already in precommit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the
detect_chasing()
stuff because:a) It's not actually super effective at detecting sales chasing.
b) It was causing extremely difficult to debug errors in the spark job.
We can work to add it back later if it's truly needed. FYI @ccao-jardine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Suggestion, non-blocking] Since this is essentially just a comma-and-space-separated string of key-value pairs, it might make future development more convenient to define it as a dict and then serialize it to this format with a oneliner like so:
But I leave it up to you to decide which data structure you find more convenient!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect that we'll persist this schema definition. It will probably only live until Nicole gets back and can confirm that Tableau is still working fine.
This file was deleted.