Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Best Practice and spec alignment #45

Closed
scmcca opened this issue Dec 16, 2021 · 8 comments
Closed

Best Practice and spec alignment #45

scmcca opened this issue Dec 16, 2021 · 8 comments

Comments

@scmcca
Copy link

scmcca commented Dec 16, 2021

Best Practice and spec alignment

Along with my review provided in #44, here is a list of conflicts/opportunities to resolve for consistency/quality between the Best Practices and the spec.

All Files

  • Proposal: Many areas of the BP recommend that an optional field be included at each instance (i.e., in agency.txt, feed_info.txt). Instead we can add a one-time declaration in the "All Files" section that:

For data completeness, all optional files and fields should be provided if the information is available.

agency.txt

  • Formalize definition of "agency" in GTFS. Currently in the spec under agency.agency_id:

Identifies a transit brand which is often synonymous with a transit agency. Note that in some cases, such as when a single agency operates multiple separate services, agencies and brands are distinct. This document uses the term "agency" in place of "brand”.

  • Should we move this into the spec under "Term Definitions" along with the illustrative examples provided in the BP?
  • BP for agency_id: “Should be included, even if there is only one agency in the feed.” --> Move this into spec? (likely won't be able to require since it is optional in the spec for single-agency datasets)

stops.txt

  • stop_lat & stop_lon: Different (similar) recommendations exist for positional accuracy in the BP and the spec. We should streamline to one. IMHO the statement in the BP is more descriptive and should be merged into the spec.

Spec recommendation currently reads:

For stops/platforms (location_type=0) and boarding area (location_type=4), the coordinates must be the ones of the bus pole — if exists — and otherwise of where the travelers are boarding the vehicle (on the sidewalk or the platform, and not on the roadway or the track where the vehicle stops).

Best Practice recommendation currently reads:

Stop locations should be as accurate possible. Stop locations should have an error of no more than four meters when compared to the actual stop position. Stop locations should be placed very near to the pedestrian right of way where a passenger will board (i.e. correct side of the street).

routes.txt

  • agency_id: "must be included" is false according to the spec's "Conditionally Required" statement. Needs to be rationalized with BP for including agency_id in single-agency datasets.
  • route_id: "All trips on a given named route should reference the same route_id." should be a recommendation in trips.txt?
  • route_color & route_text_color: move to the spec and resolve #59 at the same time.

trips.txt

  • trip_headsign: "do not" needs to be rephrased as a suggestion "should not"
  • Special cases for loop routes and lasso routes use the word "must"
    • Should these tutorials be formalized into a separate piece of usage documentation for handling cases (i.e., instructional requirements)?

stop_times.txt

  • shape_dist_traveled: Word "must" is not suitable for best practice recommendation documentation. Remove from BP and include in "special cases" tutorial section.

calendar.txt & calendar_dates.txt

  • calendar.service_name: Should we really be recommending the use of extensions in the BP? If this is useful, should this field be adopted in the spec?

fare_attributes.txt & fare_rules.txt

  • agency_id: "should be included" --> move to spec? cc: recommendation to provide agency_id for single-agency datasets

shapes.txt
Contradictory recommendations. Which one is better? Choose one. IMHO BP is better:

Spec currently reads:

Shapes do not need to intercept the location of stops exactly.

BP reads:

Alignments should not “jag” to a curb stop, platform, or boarding location.

  • shape_dist_traveled: “Must be provided in both shapes.txt and stop_times.txt if an alignment includes looping or inlining (the vehicle crosses or travels over the same portion of alignment in one trip).” —> move to separate tutorial section for instructional requirements.

frequencies.txt

  • Consolidate behaviour of frequencies.txt into the spec: "Actual stop times are ignored for trips referenced by frequencies.txt; only travel time intervals between stops are significant for frequency-based trips. For clarity/human readability, it is recommended that the first stop time of a trip referenced in frequencies.txt should begin at 00:00:00 (first arrival_time value of 00:00:00)."
  • block_id: “Can be provided for frequency-based trips.” —> move to trips.block_id as a “May be provided for frequency-based trips.”

transfers.txt

  • transfer_type: ”If in-seat (block) transfers are allowed between trips, then the last stop of the arriving trip must be the same as the first stop of the departing trip.” —> resolve with in-seat transfers (#295)

feed_info.txt

  • “should be included [if available]”: make this a general statement about data completeness in "All Files"

Special cases (branches, lasso routes, loop routes)

  • formalize into separate guide section with requirement language
    - "these cases must be modeled like this"
@e-lo
Copy link

e-lo commented Dec 16, 2021

agency.txt:

Should we move this into the spec under "Term Definitions" along with the illustrative examples provided in the BP?

☑️ I think that would be very helpful.

BP for agency_id: “Should be included, even if there is only one agency in the feed.” --> Move this into spec? (likely won't be able to require since it is optional in the spec for single-agency datasets)

❓ So this would be a SHOULD in the spec? Would this be a warning then in the validator?

@scmcca
Copy link
Author

scmcca commented Dec 16, 2021

Currently agency.agency_id is Conditionally Required if there is more than one agency in the dataset. Otherwise, agency.agency_id is optional if there is only one agency in the dataset.

We would be adding, into the spec, the recommendation to provide agency.agency_id even if there is only one agency (but that would remain entirely optional for backwards compatibility).

@isabelle-dr I'm not sure how recommendations are normally treated in the validator?

@e-lo
Copy link

e-lo commented Dec 16, 2021

stops.txt lat/lon:

Combining the spec and best practice would be something like... ~

  • For stops/platforms (location_type=0) and boarding area (location_type=4), the coordinates must be in the pedestrian right-of-way on the correct side of the transit right-of-way.
  • Coordinates for stops/platforms and boarding areas must represent the bus pole — if one exists — and otherwise of where the travelers are boarding the vehicle (on the sidewalk, the platform, or pedestrian right-of-way on the correct side of the street).
  • Coordinates for all entries in stops.txt SHOULD have an error of no more than four meters when compared to the actual position of the entity that they represent.

@e-lo
Copy link

e-lo commented Dec 16, 2021

RE shape_dist_traveled, lasso, loop comments. If we are going to make it a must should this go in the spec itself?

@e-lo
Copy link

e-lo commented Dec 16, 2021

frequencies.txt --> I support both recommendations.

@isabelle-dr
Copy link
Contributor

isabelle-dr commented Dec 17, 2021

@scmcca thank you for working on this 🙏

I'm not sure how recommendations are normally treated in the validator?

The recommendations (i.e. mention of should) trigger a notice with a severity WARNING.

The Conditionally Required fields are treated the same way we treat the Required fields and the mentions of must: the validator emits a notice with an ERROR severity if the requirement is not met.

What you're offering to do with agency.agency_id here would translate to the following validator behaviour:

  • If there is one agency AND agency.agency_id is missing: you get a WARNING
  • If there is more than one agency ANDagency.agency_id is missing: you get an ERROR

That being said, that second ERROR isn't currently represented in the validator because we weren't sure how to write it without potential false positives I think? @lionel-nj

@isabelle-dr
Copy link
Contributor

isabelle-dr commented Dec 17, 2021

Proposal: Many areas of the BP recommend that an optional field be included at each instance (i.e., in agency.txt, feed_info.txt). Instead, we can add a one-time declaration in the "All Files" section that:

For data completeness, all optional files and fields should be provided if the information is available.

I think there is a risk of losing information if we make a one-time declaration about fields.

  1. There are 69 optional fields in the specification today and I think we have max 20 of those fields that the BP specifically recommends including. Currently, all BP recommendations + "should" in spec trigger a validator WARNING.
    If we add that all optional fields should be provided if the information is available, there wouldn't be any difference between:
    • fields that got adopted in an extension and are optional to preserve backwards compatibility: people should really include them if creating/updating a GTFS feed (e.g. The timepoint field should be provided).
    • actual optional fields (e.g. attributions.attributions_email)

Also, it would be unclear what to do with the validator: should we then emit a WARNING for all optional fields that aren't included? If so, the report would be noisy and the more important warnings risk being lost.

  1. For some optional fields, it isn't really a question of the information being available or not, we can't really replace them by a one-time statement (e.g. stop_code should be included in GTFS if there are passenger-facing stop numbers or short identifiers).

Making a one-time declaration about files: that's a good idea IMHO.

@Sergiodero
Copy link
Contributor

As GTFS Best Practices (BP) are currently in the process of being merged to the specification, MobilityData is migrating outstanding issues and PRs from this repository to google/transit. Thus, this issue will be closed and further discussion regarding this BP should take place in google/transit. Please refer to Issue #421 for a more detailed explanation of the migration process and the proposed next steps.

With this, we’re hoping to bring more visibility to outstanding BP issues and to restart the discussion around them, so that any improvement that the community finds valuable could be carried forward.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants