Skip to content
sheldonabrown edited this page Jun 23, 2022 · 7 revisions

The archiver's main responsibility is archiving both real time and inference messages. Additionally:

  • The archiver picks up messages on both the Real-Time Queue and the Inference Queue.
  • The archiver queries the TDS for some additional records
  • The archiver does a minimal amount of data manipulation and, using Hibernate, inserts location and inference messages into their respective databases.
  • The archiver hosts an Internal Operational Developer API to the most recent vehicle locations and other operational data (operator ID, etc).
  • The archiver hosts a historical API of similar format to the Internal Operational API, that allows query-able access to the last 30 days of data.

Time Frames

Recorded data has 3 periods of duration:

  1. Realtime that stores now through 30 days (at a minimum) in a relational database (RDS)
  2. Reporting that stores from the end of the prior day through to 365 days in a relation database (RDS)
  3. Archive data that contains anything older than the above, in an offline format on a fileserver (S3)

Diagram

Reporting and Archiving Timeframes

Data

Reporting has several areas of concern. In each case, the message data is trivially serialized (flattened) into a database table schema shown below in section TDM Data Archiving. The data considered includes:

Data not stored by the archiver:

  • GTFS data; S3 conventions store this
  • Destination sign code data; the decision was made not to archive these spreadsheets

Design

Reporting Archive

The reporting and archive component exists in a single module onebusaway-nyc-report-archive. It contains classes for:

Below are some of the key classes of the onebusaway-nyc-report-archive module:

  • CcLocationReportRecord - Model entity that stores vehicle real time data. Maps to obanyc_cclocationreport table.
  • ArchivedInferredLocationRecord - Model entity that stores vehicle inference data. Maps to obanyc_inferredlocation table.
  • CcAndInferredLocationRecord - Model entity that stores vehicle's last known location. It is a combination of real time and inferred data. Maps to obanyc_last_known_vehicle table.
  • ArchivingInputQueueListenerTask - Queue listener that picks up real time data from the queue and processes it.
  • ArchivingInferenceQueueListenerTask - Queue listener that picks up inference data from the queue and processes it.
  • HistoricalRecordsDao - Retrieves historical records from the database. This service is used by historical operational API to fetch historical records.
  • NycQueuedInferredLocationDao - Saves inference records and retrieves vehicle last known location records. This service is used by Operational API to report vehicle last known location.
  • EmergencyStatusNotificationService - Processes incoming bus records from the real time queue and sends notification to amazon sns service if a bus is reporting emergency.
  • ArchiveBundleManagementServiceImpl - Archive specific version of BundleManagementService; needs to be timezone aware.
  • HistoricalRecordsResource - Web service resource for historical operational API. Provides methods that serve requests for fetching historical records.
  • LastKnownLocationResource - Web service resource for operational API. Provides methods that serve requests for fetching vehicle last known location records.
  • CancelledTripDao - Retrieve historical cancelled trip records from the database. This service parallels the HistoricalRecordsDao in providing historical records for the historical cancelled trips API.
  • HistoricalCancelledTripRecordsResource - Provides an API to retrieve cancelled trips by service date.

Real-Time and Inference Archiving

Realtime data is archived into obanyc_cclocationreport and obanyc_inferredlocation tables

TDM Data Archiving

  • Crew assignment, depot assignment, and vehicle pullout data archived daily at 4am (this is when the last updated values for the previous day come through)
  • Bundles are archived by convention on S3
  • DSC data is not archived (original requirement was waived)

Crew Assignment

+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| id           | int(11)      | NO   | PRI | NULL    |       |
| agency_id    | varchar(64)  | YES  |     | NULL    |       |
| depot_id     | varchar(16)  | YES  |     | NULL    |       |
| operator_id  | varchar(16)  | YES  |     | NULL    |       |
| run          | varchar(255) | YES  |     | NULL    |       |
| service_date | date         | NO   |     | NULL    |       |
| updated      | datetime     | YES  |     | NULL    |       |
+--------------+--------------+------+-----+---------+-------+

Depot Assignment

+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | int(11)     | NO   | PRI | NULL    |       |
| agency_id  | varchar(64) | YES  |     | NULL    |       |
| date       | date        | NO   |     | NULL    |       |
| depot_id   | varchar(16) | YES  |     | NULL    |       |
| vehicle_id | int(11)     | NO   |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+

Pullouts

+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id             | int(11)      | NO   | PRI | NULL    |       |
| agency_id      | varchar(64)  | YES  |     | NULL    |       |
| agency_id_tcip | int(11)      | YES  |     | NULL    |       |
| depot_id       | varchar(16)  | YES  |     | NULL    |       |
| operator_id    | varchar(16)  | YES  |     | NULL    |       |
| pullin_time    | datetime     | YES  |     | NULL    |       |
| pullout_time   | datetime     | YES  |     | NULL    |       |
| run            | varchar(255) | YES  |     | NULL    |       |
| service_date   | date         | NO   |     | NULL    |       |
| vehicle_id     | int(11)      | NO   |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+

Cancelled Trips

+------------------------+--------------+------+-----+---------+----------------+
| Field                  | Type         | Null | Key | Default | Extra          |
+------------------------+--------------+------+-----+---------+----------------+
| id                     | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| block                  | varchar(255) | YES  | MUL | NULL    |                |
| firstStopDepartureTime | time         | YES  |     | NULL    |                |
| firstStopId            | varchar(255) | YES  |     | NULL    |                |
| lastStopArrivalTime    | time         | YES  |     | NULL    |                |
| record_timestamp       | bigint(20)   | NO   |     | NULL    |                |
| route                  | varchar(255) | YES  |     | NULL    |                |
| routeId                | varchar(255) | YES  |     | NULL    |                |
| scheduledPullOut       | varchar(255) | YES  |     | NULL    |                |
| serviceDate            | date         | YES  | MUL | NULL    |                |
| status                 | varchar(255) | NO   |     | NULL    |                |
| timestamp              | datetime     | NO   | PRI | NULL    |                |
| trip                   | varchar(255) | NO   | MUL | NULL    |                |
+------------------------+--------------+------+-----+---------+----------------+

Schema Change Strategies

  1. Create slave database from backup of master
  2. Peform upgrade modifications to database
  3. Stop monitoring, turn on maintenance mode
  4. Stop external data capture cron jobs
  5. Stop service alerts cron job
  6. Stop archiver process (tomcat6)
  7. Backup master database
  8. Switch CNAME from master to slave
  9. Start archiver
  10. Start service alerts cron job
  11. Start external data capture cron jobs
  12. Turn on monitoring (once system stabilizes)
  13. Perform manual backfill for time from last backup to CNAME swap
  14. Delete old master

Emergency Message Handling

The real time messages from the buses may contain a flag indicating an emergency has occurred. The archiver listens for this flag and sends an Amazon SNS notification containing raw JSON of the real time record.

The Inference Engine will merely pass this flag along, it will not track state changes such as flag was set, flag is still set, flag was unset. This message is sent only once per emergency state transition.

Clone this wiki locally