Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

Closed
lauriemerrell opened this issue Mar 31, 2022 · 1 comment
Labels
project-gtfs-schedule For issues related to gtfs-schedule project

Comments

@lauriemerrell
Copy link
Contributor

I am creating this issue to record something that we are aware of but where I believe the jury is currently out on how to handle; we are in a "monitoring" stance at present & tracking the prevalence of this issue.

There is a case that we encounter occasionally where an agency handles a feed transition like this, where the dates are in order (A, B, C, D, E = D + 1 day, F).

flowchart LR
subgraph f2[Feed 2, uploaded date C]
cal2[calendar.txt covers date E = D+1 to date F]
fi2[feed_info.txt has feed_start_date E = D+1 and feed_end_date F]
end

subgraph f1[Feed 1, uploaded date A]
cal1[calendar.txt covers date B to date D]
fi1[feed_info.txt has feed_start_date B and feed_end_date D]
end
Loading

So, feed 1 is deleted on the date that feed 2 is uploaded, even though feed 1 is not supposed to expire yet, from the agency's perspective.

GTFS Best Practices say:

At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.

@e-lo has submitted MobilityData/GTFS_Schedule_Best-Practices#48 to clarify the best practices and expectations around this case in general.

However, we are still left with a question of how to handle this scenario in our pipeline. At present, we mark feed 1 as deleted on date C (as soon as feed 2 is uploaded), and the agency will show as having no service between dates C and D, until feed 2 takes effect on date E.

I believe that this handling is defensible, but it can lead to our reports and tables displaying 0 service for an agency during a period where the agency believes that feed 1's coverage should have been persisted (based, perhaps, on feed_end_date in feed_info). We have been told that app consumers keep using the old feed until the new one takes effect.

There is currently no validation being produced when these situations occur, at least in the case of 273.0 (SacRT) for the month of March (feed uploaded 3/3/22 didn't take effect until 4/3/22). There is a validation for cases where it is less than 7 or 30 days before the current feed expires, but there is no validation when the feed has not yet taken effect.

A few considerations:

  • Do we want to perhaps create two versions of stg_daily_service and fact_daily_service, one version that is "strict" and treats feed 1 as deleted and the other that interpolates service?
  • We can probably create some logic to stop treating a feed as deleted if the one that replaces it takes effect exactly one day after the original feed, for example.
  • We should assess prevalence for the different aspects of this -- for example, how precise are feeds about using calendar / calendar_dates + feed_info in exact alignment; do we have cases in the pipeline where there was a genuine pause in service that would have looked the same as this?
  • I wonder if this approach is taken in particular by feeds that use some specific software or vendors... Perhaps there could be some flag for when we expect this behavior?
  • We do already have an is_interpolated flag but it doesn't cover this case.

cc @edasmalchi @o-ram @Nkdiaz for awareness

@lauriemerrell lauriemerrell added the project-gtfs-schedule For issues related to gtfs-schedule project label Mar 31, 2022
@lauriemerrell
Copy link
Contributor Author

I'm going to close this ticket. Per conversation just now with @e-lo and @o-ram, this situation is explicitly covered in the Cal ITP FAQ. Our recommendation is to publish "future" service in parallel (at a separate link) to the current active feed.

Conversation about the best practice is occurring at MobilityData/GTFS_Schedule_Best-Practices#48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project-gtfs-schedule For issues related to gtfs-schedule project
Projects
None yet
Development

No branches or pull requests

1 participant