-
Notifications
You must be signed in to change notification settings - Fork 30
Archive
The archiver's main responsibility is archiving both real time and inference messages. Additionally:
- The archiver picks up messages on both the Real-Time Queue and the Inference Queue.
- The archiver queries the TDS for some additional records
- The archiver does a minimal amount of data manipulation and, using Hibernate, inserts location and inference messages into their respective databases.
- The archiver hosts an Internal Operational Developer API to the most recent vehicle locations and other operational data (operator ID, etc).
- The archiver hosts a historical API of similar format to the Internal Operational API, that allows query-able access to the last 30 days of data.
Recorded data has 3 periods of duration:
- Realtime that stores now through 30 days (at a minimum) in a relational database (RDS)
- Reporting that stores from the end of the prior day through to 365 days in a relation database (RDS)
- Archive data that contains anything older than the above, in an offline format on a fileserver (S3)
Reporting has several areas of concern. In each case, the message data is trivially serialized (flattened) into a database table schema shown below in section TDM Data Archiving. The data considered includes:
- real-time data from the buses in TCIP-JSON format
- output of the inference engine in serialized JSON format
- crew and dispatch data in TCIP format
- depot assignment data in TCIP format
- cancelled trip data in serialized JSON format
Data not stored by the archiver:
- GTFS data; S3 conventions store this
- Destination sign code data; the decision was made not to archive these spreadsheets
The reporting and archive component exists in a single module onebusaway-nyc-report-archive. It contains classes for:
- listening to the real-time and inference queues
- archiving real-time and inference data
- exposing the operational API
- exposing the historical API
- monitoring the emergency bus flag status and making notification on a change in status
Below are some of the key classes of the onebusaway-nyc-report-archive module:
- CcLocationReportRecord - Model entity that stores vehicle real time data. Maps to obanyc_cclocationreport table.
- ArchivedInferredLocationRecord - Model entity that stores vehicle inference data. Maps to obanyc_inferredlocation table.
- CcAndInferredLocationRecord - Model entity that stores vehicle's last known location. It is a combination of real time and inferred data. Maps to obanyc_last_known_vehicle table.
- ArchivingInputQueueListenerTask - Queue listener that picks up real time data from the queue and processes it.
- ArchivingInferenceQueueListenerTask - Queue listener that picks up inference data from the queue and processes it.
- HistoricalRecordsDao - Retrieves historical records from the database. This service is used by historical operational API to fetch historical records.
- NycQueuedInferredLocationDao - Saves inference records and retrieves vehicle last known location records. This service is used by Operational API to report vehicle last known location.
- EmergencyStatusNotificationService - Processes incoming bus records from the real time queue and sends notification to amazon sns service if a bus is reporting emergency.
- ArchiveBundleManagementServiceImpl - Archive specific version of BundleManagementService; needs to be timezone aware.
- HistoricalRecordsResource - Web service resource for historical operational API. Provides methods that serve requests for fetching historical records.
- LastKnownLocationResource - Web service resource for operational API. Provides methods that serve requests for fetching vehicle last known location records.
- CancelledTripDao - Retrieve historical cancelled trip records from the database. This service parallels the HistoricalRecordsDao in providing historical records for the historical cancelled trips API.
- HistoricalCancelledTripRecordsResource - Provides an API to retrieve cancelled trips by service date.
Realtime data is archived into obanyc_cclocationreport and obanyc_inferredlocation tables
- Crew assignment, depot assignment, and vehicle pullout data archived daily at 4am (this is when the last updated values for the previous day come through)
- Bundles are archived by convention on S3
- DSC data is not archived (original requirement was waived)
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| agency_id | varchar(64) | YES | | NULL | |
| depot_id | varchar(16) | YES | | NULL | |
| operator_id | varchar(16) | YES | | NULL | |
| run | varchar(255) | YES | | NULL | |
| service_date | date | NO | | NULL | |
| updated | datetime | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| agency_id | varchar(64) | YES | | NULL | |
| date | date | NO | | NULL | |
| depot_id | varchar(16) | YES | | NULL | |
| vehicle_id | int(11) | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| agency_id | varchar(64) | YES | | NULL | |
| agency_id_tcip | int(11) | YES | | NULL | |
| depot_id | varchar(16) | YES | | NULL | |
| operator_id | varchar(16) | YES | | NULL | |
| pullin_time | datetime | YES | | NULL | |
| pullout_time | datetime | YES | | NULL | |
| run | varchar(255) | YES | | NULL | |
| service_date | date | NO | | NULL | |
| vehicle_id | int(11) | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| block | varchar(255) | YES | MUL | NULL | |
| firstStopDepartureTime | time | YES | | NULL | |
| firstStopId | varchar(255) | YES | | NULL | |
| lastStopArrivalTime | time | YES | | NULL | |
| record_timestamp | bigint(20) | NO | | NULL | |
| route | varchar(255) | YES | | NULL | |
| routeId | varchar(255) | YES | | NULL | |
| scheduledPullOut | varchar(255) | YES | | NULL | |
| serviceDate | date | YES | MUL | NULL | |
| status | varchar(255) | NO | | NULL | |
| timestamp | datetime | NO | PRI | NULL | |
| trip | varchar(255) | NO | MUL | NULL | |
+------------------------+--------------+------+-----+---------+----------------+
- Create slave database from backup of master
- Peform upgrade modifications to database
- Stop monitoring, turn on maintenance mode
- Stop external data capture cron jobs
- Stop service alerts cron job
- Stop archiver process (tomcat6)
- Backup master database
- Switch CNAME from master to slave
- Start archiver
- Start service alerts cron job
- Start external data capture cron jobs
- Turn on monitoring (once system stabilizes)
- Perform manual backfill for time from last backup to CNAME swap
- Delete old master
The real time messages from the buses may contain a flag indicating an emergency has occurred. The archiver listens for this flag and sends an Amazon SNS notification containing raw JSON of the real time record.
The Inference Engine will merely pass this flag along, it will not track state changes such as flag was set, flag is still set, flag was unset. This message is sent only once per emergency state transition.