Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

171 world bank projects database #172

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lpicci96
Copy link
Collaborator

@lpicci96 lpicci96 commented Jul 4, 2023

New import module: world_bank_projects
query the World Bank API and format projects data from the response json

Closes #171

@lpicci96 lpicci96 added new feature import_tools for the importers module labels Jul 4, 2023
@lpicci96 lpicci96 requested a review from jm-rivera July 4, 2023 15:33
@lpicci96 lpicci96 linked an issue Jul 4, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Merging #172 (436ae1f) into main (55fa530) will decrease coverage by 1.56%.
The diff coverage is 61.11%.

@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
- Coverage   75.82%   74.26%   -1.56%     
==========================================
  Files          25       26       +1     
  Lines        1369     1531     +162     
==========================================
+ Hits         1038     1137      +99     
- Misses        331      394      +63     
Impacted Files Coverage Δ
bblocks/import_tools/world_bank_projects.py 61.11% <61.11%> (ø)

@lpicci96
Copy link
Collaborator Author

lpicci96 commented Jul 7, 2023

@jm-rivera see updated script

  1. adds functionality to format sector data
    • _get_sector_data parses the json to get all sector names and percentages plus additional logic for unusual responses
    • self.format_sector_data creates a df for sector data in _data attribute
  2. additional fields are included in the general data
  3. QueryAPI ammended to allow passing a list of fields, by default it will select all fields

Copy link
Collaborator

@jm-rivera jm-rivera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luca!

A couple of minor changes but otherwise looks really good.

Two things to also do in relation to this PR:

  • Open an issue about documentation. Based on the example I showed you for IDS, this should have very detailed documentation and examples. That can happen later though
  • Open an issue about the 'additional_fields' discussion we had.
  • Add the changes to the changelog to get things ready for a minor release

bblocks/import_tools/world_bank_projects.py Outdated Show resolved Hide resolved
bblocks/import_tools/world_bank_projects.py Outdated Show resolved Hide resolved
@lpicci96
Copy link
Collaborator Author

@jm-rivera for your review

Comment on lines +243 to +253

# check if there are missing sectors from the dict
if (len(sectors_dict) == len(sectors) - 1) and (sum(sectors_dict.values()) < 100):
# loop through all the available sectors
for s in sector_names:
# if a sectors has not been picked up it must be the missing sector
if s not in sectors_dict:
sectors_dict[s] = 100 - sum(sectors_dict.values())

if sum(sectors_dict.values()) != 100:
raise ValueError("Sector percentages don't add up to 100%")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jm-rivera we need to revaluate the strategy here. In some instances the sectors that exist in the list sectors they do exist in the disctionary sectors_dict. However, the missing sector does not exist in the website or the excel download from the main projects page. see for example project P178202 - "waste management is not included in the project page. In the api response, we don't see a percentage - this is the response value 'sector5': 'Waste Management!$!11!$!WB'.

My original assumption was that if sectors are specified their percentage allocations should add to 100%. However I have noticed this is not always the case. Project P073479 has 1 sector allocated for ICT technologies at 24%. If sector data does not have to add up to 100% there is no way to determine the percent that should be allocated to those missing sectors.

I'd suggest then that we remove this calculation and just parse the sectors which have a percent allocation
Open to suggestions.

@jm-rivera
Copy link
Collaborator

@lpicci96 as we discussed, date filtering is not working. I think we should change to arguments for start_year:int end_year:int, which make an api call for those values in the "fiscalyear" field (i.e fiscalyear=2019^2020^2021 for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
import_tools for the importers module new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

World Bank Projects database
2 participants