Skip to content

benlk/cota-reroute-pdf-rss

Repository files navigation

Central Ohio Transit Authority Reroute PDF Feeds

> Add this link to your feedreader <

> View historical alerts <

What

Screenshot of the banner seen at the top of cota.com. This banner advertises the December 2 Holiday Hop reroutes.

An RSS feed of whenever COTA posts a new alert box in the header of COTA.com. This is most-often used for notice of reroutes.

This feed updates once per day. If you need more-timely updates than that, I'm sorry.

This feed contains the following information, using the screenshoted alert as an example:

  • The alert's header — REROUTES AHEAD
  • The alert's description — Dec. 2 | Holiday Hop
  • The link title — LEARN MORE
  • The link url — https://www.cota.com/reroutes/cota-reroutes-holiday-hop-231202.pdf
  • The date of the scrape that generated this alert

This feed does not contain:

  • Any information only found at the link
  • Any information parsed from the reroute PDF, such as:
    • which lines are affected
    • where the reroute is located
    • where you can catch your bus
  • Any information not contained in the alert box

If you want additional information on items in this feed, you'll need to click the link and/or contact COTA.

Why

When COTA posts a reroute PDF to COTA.com, they don't always announce it. When they do announce it, they only announce it on enshittified privately-owned social media networks: Twitter, Facebook, Instagram. If you don't have logins for those accounts, either you use a proxy like Nitter or you're locked out.

COTA also sometimes posts a notice via their GTFS feed about the reroute, but historically speaking, their GTFS alerts only say that there will be a reroute affecting a route. Their alerts do not, generally speaking, say the location of the reroute.

I don't want to manually check the COTA website every day; I want to receive notifications in the tools that I habitually use. So I wrote this scraper to make an RSS feed.

Common issues

GitHub serves the RSS feed with the incorrect Content-Type header of text/html. If this causes problems for your feedreader, consult your feedreader's documentation. FreshRSS supports adding a #force_feed to the end of the feed URL to force the software to interpret the file as application/rss+xml.

How & Credits

This project scrapes and archives the contents of the "Alerts" box which is intermittently present in the header of COTA.com. COTA.com is a Gatsby app, and that header is baked in directly. While examining the site's source code one day, I discovered a reference to the WordPress site which powers the Gatsby app. From there, I examined the read-only side of its WP-JSON API, and discovered that the "Alerts" box appears to be powered by the Advanced Custom Fields plugin for WordPress. So rather than scraping the COTA.com Gatsby app directly, I check the ACF endpoint to see if there's a new Alert posted.

curl -s "https://author.cota.com/wp-json/acf/v2/options/" | jq '.acf.alerts' > build/acf-options.json

The alert gets saved as JSON to a temporary build/ directory. I then use a PHP script to parse the JSON file and add any new alerts to a CSV file listing all historical alerts.

// check if the link_url exists in the file already
if ( ! str_contains( $csv_contents, $new_entry['link_url'] ) ) {
// if not, then write it to the file
fputcsv(
$csv_handle,
$new_entry,
CSV_SEPARATOR,
CSV_ENCLOSURE,
CSV_ESCAPE
);
}

Then with a different PHP script I convert that CSV to the RSS feed.

If the current alert's PDF URL isn't already in the CSV, the new line in the CSV results in a change to the RSS feed, and those changes result in a committable diff in Git. The GitHub Action which runs this scraper then commits the change to the main branch, which results in the updated RSS file becoming available at https://raw.githubusercontent.com/benlk/cota-reroute-pdf-rss/main/rss.xml

- name: Commit and push if the data has changed
run: |-
git config user.name "Automated"
git config user.email "actions@users.noreply.github.com"
git add -A
timestamp=$(date -u)
git commit -m "Latest data: ${timestamp}" || exit 0
git push

This project uses GitHub Actions à la Simon Willison to perform the scrape,. The process of writing this was greatly aided by Lasse Benninga's blog post on GitHub Actions scrapers, which simplifies and expands on Simon Willison's model.