Create Initial GTFS "Catalog" #23

hunterowens · 2021-03-11T21:36:45Z

This PR converts the existing GTFS spreadsheet from the drive and creates rows for every agency with a GTFS URL that starts with http(s). Missing agencies that have "in google" or "not in google" filled out and I haven't loaded RT feeds yet.

I think, to load RT feeds, the best option is to join on transitland atlas via NTP ID, but curious if we've been tracking those URL(s) elsewhere.

e-lo

LGTM in general.
Two thoughts:

Create an entry for all transit providers even if they don't currently have a feed (empty list).
Add documentation [ somewhere ] including any implied meaning from ordering of feeds in the list.

hunterowens · 2021-03-15T17:38:14Z

I thought about this but decided against it since the master "list" remains the spreadsheet, and therefore, to determine something like % listing GTFS feeds, you should do a left join. Additionally, there's the open question that invites on should we have "empty list but only for agencies that should have GTFS (ie, operate fixed route, etc)
Once the list has implied meaning, I'll add docs for it. For now, we haven't incorporate the MTC subfeeds etc

e-lo · 2021-03-15T17:50:48Z

I thought about this but decided against it since the master "list" remains the spreadsheet, and therefore, to determine something like % listing GTFS feeds, you should do a left join. Additionally, there's the open question that invites on should we have "empty list but only for agencies that should have GTFS (ie, operate fixed route, etc)

Which "master list"? The Google Sheets one? AFIK that would require access to our google drive. We should probably have some sort of public version. [ we can discuss on Weds :-) ]

machow · 2021-03-16T19:11:45Z

Hey--I'm noticing the gtfs_schedule_url in the yaml config is a list, but a single column in the sheet. Is the use of a list in the yaml by design? What should happen when there are two schedule url entries?

(guessing there is a good reason for it, so this question is more for my understanding :)

hunterowens · 2021-03-16T20:11:37Z

Yep! You might want to take a look at #21, which a very long thread about the nature of feeds.

at least one agency in CA, LA metro produces two feeds (one rail / one bus), so rather than store that with a , separating (as in th sheet) I split into a list.

additionally, every agency in the MTC region (bay area) often produces their own feed plus participates in the Bay Area Regional GTFS that should be the same, but we should list both in the catalog

machow · 2021-03-16T20:32:13Z

Ah, thanks--I've been crawling down #21, and think it is slowly coming together. Guessing that this means the "primary key" for each url entry is <agency id> x <url entry index>?

Should work okay in most circumstances, but I wonder if weird things could happen if there were initially two url entries, and the first ended up being removed.

hunterowens · 2021-03-16T21:07:08Z

Yep, since for storage each run is the {run_time}/{itp_id}/{url}/(zip contents), if we remove one from the list, it should just not be there for future runs./

…

On Tue, Mar 16, 2021 at 1:32 PM Michael Chow ***@***.***> wrote: Ah, thanks--I've been crawling down #21 <#21>, and think it is slowly coming together. Guessing that this means the "primary key" for each url entry is <agency id> x <url entry index>? Should work okay in most circumstances, but I wonder if weird things could happen if there were two url entries, and the first ended up being removed. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANHXYRN2NYGBOQILFHVRY3TD6555ANCNFSM4ZBCD4DQ> .

hunterowens · 2021-03-18T00:40:03Z

gonna merge this knowing that it gets moved and updated in #25

add initial list based on url

7c9c9f8

hunterowens mentioned this pull request Mar 12, 2021

Reorganize GTFS "catalog" as many/many relationship of transit providers GTFS datasets #21

Closed

e-lo reviewed Mar 15, 2021

View reviewed changes

hunterowens mentioned this pull request Mar 16, 2021

Update GTFS Downloader to use the new yml based file #24

Closed

hunterowens merged commit 6da6f68 into main Mar 18, 2021

machow mentioned this pull request May 7, 2021

Fix "GTFS_RT" keys #94

Merged

machow deleted the gtfs-list branch October 4, 2021 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Initial GTFS "Catalog" #23

Create Initial GTFS "Catalog" #23

hunterowens commented Mar 11, 2021

e-lo left a comment •

edited

Loading

hunterowens commented Mar 15, 2021

e-lo commented Mar 15, 2021 •

edited

Loading

machow commented Mar 16, 2021 •

edited

Loading

hunterowens commented Mar 16, 2021

machow commented Mar 16, 2021 •

edited

Loading

hunterowens commented Mar 16, 2021 via email

hunterowens commented Mar 18, 2021

Create Initial GTFS "Catalog" #23

Create Initial GTFS "Catalog" #23

Conversation

hunterowens commented Mar 11, 2021

e-lo left a comment • edited Loading

Choose a reason for hiding this comment

hunterowens commented Mar 15, 2021

e-lo commented Mar 15, 2021 • edited Loading

machow commented Mar 16, 2021 • edited Loading

hunterowens commented Mar 16, 2021

machow commented Mar 16, 2021 • edited Loading

hunterowens commented Mar 16, 2021 via email

hunterowens commented Mar 18, 2021

e-lo left a comment •

edited

Loading

e-lo commented Mar 15, 2021 •

edited

Loading

machow commented Mar 16, 2021 •

edited

Loading

machow commented Mar 16, 2021 •

edited

Loading