-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the poverty dataset #75
Conversation
This is great!! But not exactly in the format of other variables (e.g. GeoName is missing) -- can you please update it? |
@Niklewa, heads up some of the tests fail -- can you tag me in a comment when it's ready for me to review and merge? Thanks! |
@emackev All good now, minor linting issue |
@emackev everything looks good on my side. Unless there are further issues, can you make a new accepting review so that this can be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed all the issues. The clean_variable.py
required modifications as exclusions are now stored in a .csv file instead of a .pkl file. However, for some reason, it is not functioning correctly now. I will address that later in the day.
Now, everything seems to be fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I'll merge.
@Niklewa , I realize that when I ran the cleaning_pipeline, git says that the variables are new. Did you perhaps change something in the data cleaning pipeline after you last ran cleaning_pipeline? |
@emackev, I have investigated that, and I do not think I changed anything. The only difference between those files is the last digit of some standardized values (incredibly small difference). Perhaps it has something to do with randomness. |
hmm, likely a numerical precision thing. Thanks for checking! It's a bit strange that it only shows up for the recent variables, not the older ones, after running clean_variables.py. Shall we push the updated variables just in case? |
It is strange. I think it's worth pushing them. |
The dataset includes the following variables: GeoFIPS, total poverty, poverty under 18, median household income, year