Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align speeds scripts part 1 (stop_segments tested for Jul 2024) / Jul open data part 1 #1187

Merged
merged 21 commits into from
Jul 24, 2024

Conversation

tiffanychu90
Copy link
Member

@tiffanychu90 tiffanychu90 commented Jul 24, 2024

gtfs_funnel

  • backfill all dates for getting vp dwell time
  • backfill all dates for vp_condenser.py
  • fix gtfs_funnel/crosswalk_gtfs_dataset_key_to_organization.py

segment speeds

  • align scripts
  • update gtfs_analytics_data.yml and use existing file paths to overwrite, add new file paths where necessary
  • simply porting over the filtering of 10 nearest neighbors to 2 in vp_around_stops used for Big Blue Bus is not performant.
    • tried setting up operator dfs with delayed...takes about an hour
    • map_partitions also takes about an hour
    • separating the steps for projecting vp against shape, stop against shape, and transforming the df to long gets it down to about 10 min
    • but this doesn't scale well for rt_stop_times -- needs further investigation because speedmap segments depends on rt_stop_times (trip-stop grain) and makes that grain even more granular
  • alignment steps:
    • new_nearest_10 --> nearest_vp_to_stop: this should replace stage2 and overwrite files completely -- tested on stop_segments / jul date
    • new_narrow_to_2 --> vp_around_stops: this is a new stage and creates new files and should probably be saved as stage2b -- tested on stop_segments / jul date
    • new_interpolate --> interpolate_stop_arrivals: this should replace stage3 and overwrite files completely -- tested on stop_segments / jul date
    • quick_bbb_speeds --> stop_arrivals_to_speeds, average_segment_speeds these should overwrite files completely
      • tweaked gtfs_schedule_wrangling.merge_operator_identifiers to support kwargs, so we can return what we want -- crosswalk now outputs a lot of columns we don't necessarily want to use everywhere
      • this is one of the downstream impacts of changing the crosswalk (@amandaha8), all references to helpers.import_schedule_gtfs_key_organization_crosswalk should be checked and surfaced soon after the change
      • adjust reference in hqta scripts for this
  • clean up / remove:
    • pipe.py --> use pipeline_[segment_type].py instead
    • remove open_data/link_operator_to_county_district.py -- unused
  • Research Request - Align segment_speeds pipeline scripts to use vp with dwell time #1183

open data

  • run part 1 of open data
  • run gtfs_funnel, hqta, open_data scripts for Jul 2024
  • do not run any of the speeds stuff that is changing

@tiffanychu90 tiffanychu90 changed the title Align speeds scripts Align speeds scripts part 1 (stop_segments tested for Jul 2024) Jul 24, 2024
@tiffanychu90 tiffanychu90 merged commit 8ff4d70 into main Jul 24, 2024
2 checks passed
@tiffanychu90 tiffanychu90 deleted the align-speeds-scripts branch July 24, 2024 18:49
@tiffanychu90 tiffanychu90 changed the title Align speeds scripts part 1 (stop_segments tested for Jul 2024) Align speeds scripts part 1 (stop_segments tested for Jul 2024) / Jul open data part 1 Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Research Request - Align segment_speeds pipeline scripts to use vp with dwell time
1 participant