-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a publically-available example for the who dataset #149
Conversation
52db0a6
to
4f0710a
Compare
I think most test error go away when this branch is rebased I don't think we can upload the dataset in our repo. There is no license information at kaggle which would allow a redistribution https://academia.stackexchange.com/a/63157 i will contact the person first, but if this does not work, I think we have to download the dataset from kaggle within the notebook to be on the safe side. Since there is no downloadable link available I think we have to use kaggle API, make it a dependency and use
with some account information which we also need to include here. Not so idea, would prefer that the owner just allows us the upload. I think the notebook are nice like they are. Only things
|
4f0710a
to
6fa1e9d
Compare
Okay, contacting people on kaggle is only possible if you have a higher tier account and for that you need to fill up you profile contribute some stuff and get upvotes. I think downloading it from kaggle with some credentials is the easier solution. I made an account with my epfl address and added the token to the yml for the test. I think the chance of abuse is very little. What i did (should be a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you agree with the changes @rosecers, I am okay squash merging it.
Heya. Isn't this data taken from somewhere? Must be. If that's the case, we can also fetch it from the original source so we make it available in a more open format. |
Yes, the specific dataset is taken from kaggle https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who it has no license attached, so redistributing this dataset seems to me troublesome. But the dataset itself is I think a merge from different WHO datasets like this one
https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/ghe-life-expectancy-and-healthy-life-expectancy |
Uhm. I think that if they used those datasets, based on the terms of CC BY-NC-SA 3.0 IGO, the kaggle dataset should also be distributed according to CC BY-NC-SA 3.0 IGO (otherwise they're in break of the -SA provisions). |
Ah right this should be under the ShareAlike constraint
https://creativecommons.org/licenses/by-nc-sa/3.0/igo/ Some licenses can be made proprietary with inheritance, so I was not sure till now. I will check if all the data is available on WHO, and if its much effort to merge it (I guess not). Would be nicer to have it here as a dataset than downloading it, on that I agree. |
I compared the data from the WHO website with the one from kaggle and I cant find any interpretation that make the two agree with each other (checked for Afghanistan) Looked into the dataset and it has serious issues. Look at the population of Russia
We need to also redo the analysis after we made our own dataset |
7868ee1
to
35fd969
Compare
@agoscinski I have updated the examples. During my rebase, I noticed a lot of documentation changes from your pushes -- do you want these in this PR? |
8dfcc83
to
8620e1a
Compare
The documentation changes should come from the last merged PR. We still need
|
4fa5cf4
to
79d85ab
Compare
052ca93
to
c967a65
Compare
c967a65
to
9bc9a01
Compare
A publically-available version of the examples used in the paper text