-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Corrections from Cal-ITP (a selection) #125
Conversation
Updates to use AC Transit's own URL
> Blue Lake Rancheria is included in Humboldt's feed > Name should be Humboldt, not arcata/mad river (which is a subsystem of humboldt)
- LA Metro should have consistent naming between Rail and Bus (this is same agency, with a split feed only b/c of size)
- Should refer to DASH and Commuter Express bus services which is the brand people will search from - Update URL to be agency's
Soltrans download should be agency DNS
- Should be nevada not california
- not named PASS transit
- Should be San Mateo (or at least a city in San Mateo county) not San Francisco as municipality
- Download should be agency DNS
Add acronym
- add acronym
- name isn't SMAT anymore - its SMRT / santa maria regional transit NOTE: this is an outdated download URL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @e-lo,
Thank you for our first community source PR to the catalogs! It’s exciting to see some changes here that aren’t available on other aggregators yet.
-
I’ve left some comments in the review for specific instances where we can’t make an alteration, mainly due to current limitations of the catalogs. I’ve created issues for the relevant changes where they seem clear.
-
In order to trigger the pipeline, source additions/changes need to have [SOURCES] at the end of the PR name.
@@ -15,7 +15,7 @@ | |||
} | |||
}, | |||
"urls": { | |||
"direct_download": "https://transitfeeds.com/p/ac-transit/1269/latest/download", | |||
"direct_download": "https://api.actransit.org/transit/gtfs/download?token={{ API_KEY }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As highlighted in the release plan, right now API-based sources can’t be added since we’re running checks to see if a source is a valid ZIP file. I’ve created this issue to make the checks conditional so API sources can be added and we can define a convention for including API keys and authorization methods in the schema.
I’ve also added this info to CONTRIBUTING.MD so the constraint is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in updated PR.
@@ -1,21 +0,0 @@ | |||
{ | |||
"mdb_source_id": 598, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to preserve the Mobility Database IDs, we can’t delete sources. This issue is here so we can add deprecation info to the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate as I don't think the definition of what a unique ID even represents has been defined (see comment in I believe #36 ) - so it seems impractical /impossible to "hold them constant" if we don't know what they represent.
I would suggest we don't "hold anything to be frozen/backwards compatible until this issue is resolved/closed as it will end up in a LOT of messiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I backed out this change in the PR.
@@ -15,7 +15,7 @@ | |||
} | |||
}, | |||
"urls": { | |||
"direct_download": "http://data.trilliumtransit.com/gtfs/smat-ca-us/smat-ca-us.zip", | |||
"direct_download": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Currently, changing the name of a file would also trigger changing the direct download URL and break pipeline workflows. We could change the direct download URL structure so it’s non-descriptive (only the source ID for instance), or discuss other potential solutions.
-
A direct download URL needs to be available for the source to be valid. I’m assuming there’s no available replacement since you left it empty? We have talked about adding feed_end_date to the schema to make it easier to see if the most recently available URL is still in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is unfortunate. So what happens when an agency changes their name? When a URL changes what they represent? If this is the case, I would suggest only having the ID and/or geography (something that isn't going to change) in the name.
A direct download URL needs to be available for the source to be valid.
- That is correct. What should be the course of action - since the URL isn't valid and therefore doesn't conform to the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Agreed. Issue created here.
-
The schema states that the URL should automatically open the source information, and doesn't describe anything regarding data recency. In instances where the URL is not up-to-date but there is no known available alternative, we've still been including it in the catalogs, as this is a very common occurrence and deleting the source entirely seemed counterproductive for data access. Once historical URLs are implemented in the schema, we could include these URLs there instead for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Currently, changing the name of a file would also trigger changing the direct download URL and break pipeline workflows. We could change the direct download URL structure so it’s non-descriptive (only the source ID for instance), or discuss other potential solutions.
Here, it's the latest dataset URL that is affected, not the direct download one (so it's system generated not provided by the user), but I agree that in any cases it should be stable enough that changing the filename won't break the pipeline.
- A direct download URL needs to be available for the source to be valid. I’m assuming there’s no available replacement since you left it empty? We have talked about adding feed_end_date to the schema to make it easier to see if the most recently available URL is still in use.
I think adding feed_end_date and feed_start_date would be a good idea. It would allow us not to update the latest url if the feed expiration date is reached. Also, we could always keep the direct download url after adding the historical data.
General suggestion:
|
This should still be deleted - but is a limitation of the definition.
* beaumont transit fix * humboldt county fix * santa maria regional transit name fix * taft area transit fix * southern nevada rtc fix * la county shuttles agency name fix * ladot fix * sacramento regional tansit fix * sf bay area water emergency fix * sfmta fix * samtrans fix * santa barbara fix * scmtd fix * tulare county area fix * yolo county fix * add fares features * replace duplicate go bus file * switch replacements of 561 and 28 so URL of 28 is consistent
@e-lo These data updates were blocked until we found a safe method to change file names. Now that we know consumers are systematically refreshing the CSV, we can modify file names confidently and have updated the documentation to highlight file name changes. Because of this, these changes are now unblocked and have been merged in PR #239. (I made a separate PR for the sake of ease.) I am closing this one since the changes are already reflected. |
Categories of fixes:
Still a lot more to add/fix, but this is a start.