-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate new sources.csv from PostgreSQL database #779
Comments
Tasks:
|
For the bit.ly URL, from what I can gather we can change the destination URL if we have a paying account. I suggest instead to put the new sources.csv (the one generated from the DB) in place of the old one in the mdb-csv bucket This means we will need to change our code to refer to sources-fromCatalog.csv instead of source.csv, but the advantage is that external users using the bit.ly link will start to transparently get the sources.csv generated from the DB. |
@emmambd can you give me an idea of the "dynamic info that comes from our data pipeline" that we want to include in the new csv file? |
@jcpitre This approach makes sense to me! Dynamic data mainly includes everything associated with the "latest dataset" that's in the feed response, so
|
@cka-y But it is in the API? That's clear - works for me |
In the API, There's a Maybe we can use |
The flow is as follows: when a dataset is uploaded to the datasets bucket, it automatically triggers the generation of the validation report and the extraction of the bounding box (and location). These are two separate processes. @jcpitre is correct that the location extraction is the only process without an associated timestamp. To address this, we could:
One concern with using |
@jcpitre Please don't update the old URL for the spreadsheet - please create a new one (like a v2) instead. And then we'll add it to the docs |
In addition to the dynamic data mentioned by @emmambd above. There could also be:
Passing these 3 extra fields along in the CSV would potentially be very valuable for end-consumers. |
@mil Could you share some more context for how you want to use the validation report URL? |
Currently I have an app which pulls GTFS data specified by Mobility Database's CSV (from either the CI bucket mirror or direct URLs). I've had users tell me particular feeds don't work well with the app sometimes. Having filesize & daterange metadata would be very helpful to address these end-user issues (as if they pull a huge feed or an outdated feed defacto things may not work - beyond my application logic). However also, if the validator URL was passed along this would provide a hardstop way for end-users to check if the feed should work in the app to begin with |
Describe the problem
As discussed, we want to shift away from the catalogs repo being our source of truth so we can include more of the dynamic info that comes from our data pipeline.
Proposed solution
locations
Alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: