Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

lauriemerrell · 2022-03-31T19:51:57Z

I am creating this issue to record something that we are aware of but where I believe the jury is currently out on how to handle; we are in a "monitoring" stance at present & tracking the prevalence of this issue.

There is a case that we encounter occasionally where an agency handles a feed transition like this, where the dates are in order (A, B, C, D, E = D + 1 day, F).

flowchart LR
subgraph f2[Feed 2, uploaded date C]
cal2[calendar.txt covers date E = D+1 to date F]
fi2[feed_info.txt has feed_start_date E = D+1 and feed_end_date F]
end

subgraph f1[Feed 1, uploaded date A]
cal1[calendar.txt covers date B to date D]
fi1[feed_info.txt has feed_start_date B and feed_end_date D]
end

So, feed 1 is deleted on the date that feed 2 is uploaded, even though feed 1 is not supposed to expire yet, from the agency's perspective.

GTFS Best Practices say:

At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated.

@e-lo has submitted MobilityData/GTFS_Schedule_Best-Practices#48 to clarify the best practices and expectations around this case in general.

However, we are still left with a question of how to handle this scenario in our pipeline. At present, we mark feed 1 as deleted on date C (as soon as feed 2 is uploaded), and the agency will show as having no service between dates C and D, until feed 2 takes effect on date E.

I believe that this handling is defensible, but it can lead to our reports and tables displaying 0 service for an agency during a period where the agency believes that feed 1's coverage should have been persisted (based, perhaps, on feed_end_date in feed_info). We have been told that app consumers keep using the old feed until the new one takes effect.

There is currently no validation being produced when these situations occur, at least in the case of 273.0 (SacRT) for the month of March (feed uploaded 3/3/22 didn't take effect until 4/3/22). There is a validation for cases where it is less than 7 or 30 days before the current feed expires, but there is no validation when the feed has not yet taken effect.

A few considerations:

Do we want to perhaps create two versions of stg_daily_service and fact_daily_service, one version that is "strict" and treats feed 1 as deleted and the other that interpolates service?
We can probably create some logic to stop treating a feed as deleted if the one that replaces it takes effect exactly one day after the original feed, for example.
We should assess prevalence for the different aspects of this -- for example, how precise are feeds about using calendar / calendar_dates + feed_info in exact alignment; do we have cases in the pipeline where there was a genuine pause in service that would have looked the same as this?
I wonder if this approach is taken in particular by feeds that use some specific software or vendors... Perhaps there could be some flag for when we expect this behavior?
We do already have an is_interpolated flag but it doesn't cover this case.

cc @edasmalchi @o-ram @Nkdiaz for awareness

The text was updated successfully, but these errors were encountered:

lauriemerrell · 2022-04-27T20:50:52Z

I'm going to close this ticket. Per conversation just now with @e-lo and @o-ram, this situation is explicitly covered in the Cal ITP FAQ. Our recommendation is to publish "future" service in parallel (at a separate link) to the current active feed.

Conversation about the best practice is occurring at MobilityData/GTFS_Schedule_Best-Practices#48

lauriemerrell added the project-gtfs-schedule For issues related to gtfs-schedule project label Mar 31, 2022

lauriemerrell closed this as completed Apr 27, 2022

edasmalchi mentioned this issue Aug 28, 2024

Bug: some missing operators in SHS Stops Export cal-itp/data-analyses#1216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

lauriemerrell commented Mar 31, 2022

lauriemerrell commented Apr 27, 2022

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

Pipeline handling of feeds that are deleted & replaced with a gap before next feed begins #1300

Comments

lauriemerrell commented Mar 31, 2022

lauriemerrell commented Apr 27, 2022