Skip to content
This repository has been archived by the owner on Aug 18, 2021. It is now read-only.

Improving data coverage #27

Open
Nosferican opened this issue Oct 19, 2019 · 3 comments
Open

Improving data coverage #27

Nosferican opened this issue Oct 19, 2019 · 3 comments

Comments

@Nosferican
Copy link

Nosferican commented Oct 19, 2019

acronym status source      
DHS fallback https://raw.githubusercontent.com/GSA/code-gov-harvester/master/data/fallback/DHS.json https://www.dhs.gov/code/json https://github.com/usdhs  
DOJ fallback https://raw.githubusercontent.com/GSA/code-gov-harvester/master/data/fallback/DOJ.json https://www.justice.gov/digitalstrategy https://github.com/usdoj  
EPA fallback https://raw.githubusercontent.com/GSA/code-gov-harvester/master/data/fallback/EPA.json https://edg.epa.gov/data.json https://github.com/USEPA  
NSF fallback https://raw.githubusercontent.com/GSA/code-gov-harvester/master/data/fallback/NSF.json https://www.nsf.gov/digitalstrategy/ https://github.com/nsf-open  
NARA fallback https://raw.githubusercontent.com/GSA/code-gov-harvester/master/data/fallback/NARA.json https://www.archives.gov/digitalstrategy https://github.com/usnationalarchives https://www.archives.gov/developer
DOI NULL NULL https://github.com/usinterior    
DOC NULL NULL https://github.com/CommerceGov https://github.com/usnistgov https://github.com/NOAA-GFDL
DOS NULL NULL https://www.state.gov/digital-government-strategy/  
USAID NULL NULL https://www.usaid.gov/usaid-digital-strategy https://github.com/USAID  
NRC NULL NULL https://www.nrc.gov/public-involve/open/digital-government.html https://www.nrc.gov/developer.html  
OPM NULL NULL https://www.opm.gov/blogs/OpenOPM/digital-government-strategy/  
USGS NULL NULL https://github.com/usgs/code-json-generator    
NSA NULL NULL https://code.nsa.gov/ https://github.com/nationalsecurityagency
EOP NULL NULL https://raw.githubusercontent.com/EOP-OMB/code_json/master/code.json  
HUD OK https://www.hud.gov/sites/documents/CODE_INVENTORY.JSON      
USDA OK https://www.usda.gov/sites/default/files/documents/code.json      
DOL OK https://www.dol.gov/code.json      
DOT OK https://www.transportation.gov/sites/dot.gov/files/docs/code.json      
TREASURY OK https://s3.amazonaws.com/static.treasury.gov/jsonfiles/code.json      
VA OK https://www.va.gov/code.json      
NASA OK https://raw.githubusercontent.com/nasa/Open-Source-Catalog/master/code.json      
GSA OK https://open.gsa.gov/code.json      
SBA OK https://www.sba.gov/code.json      
SSA OK https://www.ssa.gov/code.json      
CFPB OK https://www.consumerfinance.gov/code.json      
DOD OK https://code.mil/code.json      
ED OK https://www2.ed.gov/code.json      
DOE OK https://www.energy.gov/sites/prod/files/2019/07/f64/code-07-17-2019_0.json      
HHS OK https://www.hhs.gov/code.json      
FEC OK https://www.fec.gov/code.json      
@Nosferican
Copy link
Author

@IanLee1521 have y'all look into collecting the user/organizations for US federal agencies?
@CalvinIsch, could you look into finding users/organizations for US federal entities on GitHub?
@saracope What's the process to touch base with agencies for updating their code.json or reviving those?

A few special cases

  • EPA has both OSS projects and datasets on the same JSON file (currently it is not registered)
  • NSA has a list of OSS projects and a link to code.gov, but no code.json
  • USGS has a repository to to generate the code.json
  • EOP has a code.json not registered
  • DOJ has link to Drupal contributions

@IanLee1521
Copy link

Hi @Nosferican -- I haven't specifically, but mostly because there is already https://government.github.com/community/ which has that data.

I did run github.com/llnl/scraper against the U.S. entries on that site (at the time) and posted those results on the pull request: LLNL/scraper#3

Is that what you were thinking, or something else?

@Nosferican
Copy link
Author

The GitHub Government Community collection is a crowd-sourced initiative, but it isn't curated per se... For example, you have some organizations that are definitely not U.S. Federal dept/agencies (e.g., @radiofreeasia is under U.S. Federal, but is an NGO). We were thinking of querying the name of U.S. department and agencies and some combinations (e.g., US $name, acronym) against the GH Torrent organization name or GraphQL API.

For obtaining a list of U.S. federal dept/agencies and other entities we were thinking of using A-Z Index of U.S. Government Departments and Agencies which has a directory based on the U.S. Government Manual supplemented with entities that "directly serve the public" (e.g., USDA National Agricultural Statistics Service NASS). Ideally M-16-21 should have the dept/agency heads report monitor it, but I suspect it isn't occurring as diligently as it could be.

That proposal seems like the best approach considering the lack of access to an exhaustive list of (at least public) U.S. federal dept/agencies and the different organizational levels.

For example,

  • National Science Foundation (NSF)
    • Social, Behavioral & Economic Sciences (SBE)
      • National Center for Science and Engineering Statistics (NCSES)
        • Science & Engineering Indicators Program

It is unclear at what levels would a GitHub organization be set up. It is unclear which levels would show up in budget / organizational databases.

Some of the heuristics we might use are to require the GitHub organization to have listed a website with a .gov/.mil domain. Not all federal government website use those domains though (e.g., goarmy.com). Also not require organizations to be verified (usually requires the including some metadata at the website which is usually only reserved at the department level).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants