Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metadata to existing datasets #68

Closed
sjsrey opened this issue Mar 19, 2019 · 9 comments
Closed

add metadata to existing datasets #68

sjsrey opened this issue Mar 19, 2019 · 9 comments
Labels
data related to data layer
Milestone

Comments

@sjsrey
Copy link
Collaborator

sjsrey commented Mar 19, 2019

It would be helpful to document the original source for the cbsa definitions as these can change over time and we may need to cross-reference different time slices.

For example, we don't have parquet files for the following:

  • 41900 San Germán-Cabo Rojo, PR Metropolitan Statistical Area
  • 41980 San Juan-Caguas-Guaynabo, PR Metropolitan Statistical Area
  • 39140 Prescott, AZ Metropolitan Statistical Area
  • 38660 Ponce, PR Metropolitan Statistical Area
  • 10380 Aguadilla-Isabela-San Sebastián, PR Metropolitan Statistical Area
  • 11640
  • 32420 Mayagüez, PR Metropolitan Statistical Area
  • 25020 Guayama, PR Metropolitan Statistical Area
  • 19380 Dayton, OH Metropolitan Statistical Area

It makes sense that ltdb is not covering PR areas, but the AZ, OH and 11640 are less clear why they are not in the data.

@knaaptime
Copy link
Member

thats a good point. The shape data come directly from census, but the table that osnap refers to as cbsa is the county crosswalk, probably from here

@knaaptime
Copy link
Member

rather than storing the CBSA crosswalk, it might be better to just read it directly when necessary (like we do with the inflation table)

the benefits would be

  1. the documentation is implicit since the URL defines the original source (though we'd want to document explicitly as well)
  2. we could provide methods for users to call MSAs as defined in other years

2019 cbsa-county definitions

@sjsrey
Copy link
Collaborator Author

sjsrey commented Mar 19, 2019

And an older source for 2003 cbsa definitions

@knaaptime
Copy link
Member

workflow:

  1. store "appropriate version in osnap repo
  2. when user calls a dataset, osnap checks the canonical dataset by downloading from the URL
  3. osnap checks to see whether stored version matches canonical version
  4. if versions don't match, raise a warning, etc.

@knaaptime
Copy link
Member

if we're checking the dataset integrity with each use, is there any real use in caching it?

@knaaptime knaaptime added this to the release 0.1.0 milestone Aug 22, 2019
@knaaptime knaaptime added the data related to data layer label Aug 22, 2019
@knaaptime knaaptime changed the title documentation for cbsa source add metadata to existing datasets Aug 22, 2019
@knaaptime
Copy link
Member

@knaaptime
Copy link
Member

we should use this source. It might be best to always grab from the web like we do for the inflation adjustment table, though this would be used more and more prone to failure if the census api goes down. That would also mean you always need an internet connection to make MSA queries

@knaaptime
Copy link
Member

should be resolved by #215 but give it a look

@knaaptime
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data related to data layer
Projects
None yet
Development

No branches or pull requests

2 participants