-
Notifications
You must be signed in to change notification settings - Fork 30
Design Document
This document describes new software developed for the Onebusaway-NYC System. It provides design and architecture details of the system. It does not cover COTS (whether or not tailored and configured), or previously written and documented code.
This section describes architecture of the system.
The OneBusAway-NYC platform consists of the following components. All components are implemented in Java unless otherwise noted.
- Queue Infrastructure including HTTP proxy. Receives HTTP messages from buses and inserts them into a ZeroMQ queue. Also provides the platform for the ZeroMQ queue into which the inference engine inserts messages containing inferred bus information.
- Inference engine Consumes raw bus messages and produces (for each reasonably valid raw message) an inferred message, which it sends to the inference output queue. An inferred message has been assigned to a block, run, phase, and trip, and has other attributes assigned as well, such as DSC, distance along trip/block, schedule deviation, etc. See the page on Inferred Bus Data.
- Front-end CIS: Transit data service consumes data from the inference data queue and provides an API that allows the current state of the buses to be interrogated.
- Front-end CIS: Desktop & mobile web applications provides the UI for the Bus CIS Server.
- Front-end CIS: SMS web application provides web services that integrate with Mobile Commons to provide a queryable SMS interface.
- Front-end CIS: API web application provides the developer API and other webservices exposing public data.
- Transit Data Manager (TDM) provides web service support of operational data such as bundle retrieval, configuration services, operator and depot data, etc.
- Report And Archive persists real-time, inference and other operational data to the short, mid, and long term databases.
- Admin provides a UI for configuration and management utilities.
- GTFS-RT web application provides web services that return GTFS-RT
The Transit Data Bundle aka "The Bundle" is an internal data representation of an agency's static transit data such as schedule data. It consists of Java serialized objects of various data structures intended for high performance read-only indexing of static transit data backing the NycTransitDataService interface. Each bundle consists of a "transit graph" (loosely based on GTFS), calendar information about service dates and applicable schedules, and special indices keyed so as to allow fast lookup of specific information as provided by the Transit Data Service.
The core data model and infrastructure is provided by OneBusAway application modules with specific additions for the NYC application. Specific enhancements include:
- Additional trip data matching trips to runs
- Not in service Destination Sign Codes
- Non-revenue movement and location data
- Locations of depots
- Locations of terminals
- Trips to Sign Code indicies and;
- Sign Code to Trips indicies
Typically a bundle is built quarterly by agency operators via the Admin UI, with updates more frequently as necessary. Only one bundle can be in use by a OneBusAway-NYC component at a time. NYC specific improvements allow for the hot-swapping of bundles via the BundleManagementService Client and BundleServiceResource.
OneBusAway-NYC componets are architected as long lived processes that may run for months at a time, and as such the current schedule data will expire or change with updates (pick changes/updates). As such, components that load a bundle support dynamically querying the TDM for the active list of bundles, and downloading any new bundles not already cached on disk. These rules apply to the process of changing bundles:
- Only one bundle can be active at a given time.
- We should always have bundles in the TDM that do not overlap in terms of service start/end dates. That is, one (and only one) should be active for any given day across picks. (Excepting bundle updates, see below)
- If we do have overlapping bundles, they should be updates of the same pick/bundle. The one with the latest "updated" date will be used.
- We should never have holes in bundle date ranges. e.g. we should not allow:
Bundle A: 1/1/2011 - 4/3/2011
Bundle B: 4/5/2011 - 7/1/2011
This is not enforced by OneBusAway-NYC, so its important for operators to understand the potential for loss of service should holes exist.
- It's important to make sure the TDM start/end service dates match the data in the GTFS / calendar.txt. With that said, experienced operators may choose to reflect a bundle update by moving the start date forward so that is matches the deploy date, and not the date in the GTFS.
- Sometimes an agency retroactively changes the end date of a pick prior to a new pick, but after the release of that pick. So:
Agency_X releases pick A, with 1/1/2011 - 4/3/2011
...then releases pick A1 with 1/1/2011 - 4/1/2011
...then releases pick B with 4/2/2011 - 7/1/2011
All values that are agency-specific will use the "AgencyAndId" class from OneBusAway in cases where data from different agencies can be mixed together. The "AgencyAndId" class allows the specification (or prefixing) of a value with an agency identifier.
Currently, these agency-specific values are thought to include:
- Stop IDs
- Trip IDs
- Block IDs
- Run IDs
- Route IDs
- Vehicle IDs
- Operator IDs
- DSCs
Transit data bundles can contain data from multiple agencies, and any additions to a bundle are prefixed with an agency ID via this convention _. In the Stop ID case, this would be MTA_200884
Consider agencies Y and Z that share stops 2 and 3. Instead of creating stops Y_2, Y_3, Z_2 and Z_3, an umbrella agency can be created called X, and these stops can be associated with it. Thus we end up with X_2 and X_3. This is accomplished with the GTFS by specifying the defaultAgencyId of X during the bundle build process, and creating an empty GTFS with an entry for the umbrella agency.
The components outlined by the diagram above are described briefly below. Note that the section titles act as links to more detailed content.
The queue server runs two ZeroMQ queues which are used to pass messages between components, as well as an HTTP proxy application. The bhs_queue contains "raw" (or relatively raw) Vehicle Location Messages as sent by the vehicles. It is populated by the HTTP proxy application. The hardware on each bus sends its messages to the HTTP proxy application, which delivers them onto the bhs_queue. Messages on this queue are consumed both by the inference engines and the report/archive server.
The inference output queue is populated with inferred location messages by the inference engine applications directly, and is read by the front-end CIS as well as the report/archive server.
The Inference Engine consumes raw bus messages and produces (for each reasonably valid raw message) an inferred message, which it sends to the inference output queue. A raw bus message contains GPS co-ordinates, vehicle id, operator id, and a Destination Sign Code (DSC) corresponding to the head sign, but no other information linking this trip to a scheduled trip, such as a trip id that would be provided by a typical Auomatic Vehicle Locator (AVL) system.
An inferred message has been assigned to a block, run, phase, and trip, and has other attributes assigned as well, such as distance along trip/block, schedule deviation, etc. Clearly, this is no small feat; the inference engine uses a particle filter to match the raw message against operator, depot, vehicle, and schedule data to provide results.
When a bus message has been matched to a run and block, it is considered a formal match. When only a trip/route can be inferred, it is considered informal.
There are multiple inference engines in the OBANYC infrastructure; currently two partitioned by depot map. That is, instead of having a single instance try to serve all bus data, the data is sub-divided by buses belonging to ceratin depots. They all consume messages from a single bhs_queue and write to a single output queue.
For each actively transmitting inference engine -- a primary, there is also a backup secondary for failover purposes. The distinction is this:
Primary: actively listening for realtime records, processing inference, output results onto queue
Seconday: actively listening for realtime records, processing inference, not outputing results
Primary vs Secondary is dynamically determined every 10 seconds based on a DNS convention. If the IP of the configured primary DNS pointer resolvs to an IP present on the local interface, the machine is primary. If no primary DNS pointer is configured, the machine is assumed to be primary. Otherwise the machine is considered a secondary instance. The status API at /onebusaway-nyc-vehicle-tracing-webapp/status.do details the primary vs secondary status.
The Front-end CIS component consumes inferred messages from the queue, maintains current information on all known buses, and uses that information to answer queries from end users and render results, through various presentations. The Front-end is a convenient operational term for the component that is responsible for answering:
- Desktop UI requests
- Mobile UI requests
- SMS requests
- SIRI Developer API requests
- OBA API requests
All API requests require API keys for access, as described in [API Key Authentication](Api Key Authentication).
The Transit Data Federation module provides the Java API known as the Transit Data Service which consumes inferred messages, maintains current information on all known buses, and uses that information to answer queries from end users (via the presentation components listed below). The Transit Data Federation module has a broader concern of providing data in a federated nature, potentially answering from multiple data sources across geographic boundaries. The Transit Data Federation Webapp provides a thin wrapper that deploys to a web application to expose these web services. The Transit Data Federation need not be deployed as a web application however, it can be embedded in another module as it is with the Inference Engine.
This application provides desktop and mobile interfaces to NYC bus riders to track buses in real time. It serves as a primary interface through which users interact with the system. This is described in detail in User Interfaces section below. This web applicaoitn also provides support for Siri Stop and Vehicle Monitoring aka the developer API.
The SMS web app handles and responds to incoming SMS requests (via Mobile Commons). SMS requests need to be stateful to support paging and multi-request querying. To support distributing this state across load-balanced servers Memcached is used.
The API web app provides the core OneBusAway API with NYC specific configuration. The OneBusAway API includes:
- AgenciesWithCoverage: returns a list of all transit agencies currently supported
- ArrivalsAndDeparturesForStop: current arrivals and departures for a given stop
- RoutesForAgency: routes for a given agency id
- RoutesForLocation: routes near a specific location
- StopsForLocation: stops near a specific location
- StopsForRoute: a set of stops for a given route
- TripsForVehicle: trip details for a given vehicle
- TripsForLocation: active trips near a specific location
- TripsForRoute: active trips for a given route
The Transit Data Manager contains configuration information for the entire system, and serves as a gateway between external data and the OBA system. This includes things like schedule data, service alerts, crew and depot information, etc. Interface Design Document describes these APIs in detail
The Reporting and Archiving server consumes messages from both the raw and inferred queues and archives those messages to archive database. It provides APIs to query real time and historical data. These APIs are described in Interface Design Document.
The Admin component provides web interfaces for performing administrative tasks such as transit bundle building, vehicle status tracking, user management etc. The application can be accessed only by "admin" users having valid credentials in the system.
The GTFS-RT interface provides bus locations, predictions, and service alerts in GTFS-RT, a widely-used open-source realtime transit data format. It is intended as an alternative to the SIRI and OBA APIs.
At the core of the OneBusAway-NYC project is a commitment to open standards. In general, this means:
- Transit Communications Interface Profiles (TCIP) XML or JSON,;
- Service Interface for Real Time Informaton (SIRI) XML; or
- well-documented JSON web services.
However, many of the OneBusAway-NYC services depend on other agencies/departments to provide data, which is often not in a standardized format. OneBusAway-NYC instead introduces conversion adapters to convert pre-formatted data into an open format.
The Interface Design Document lists all the inbound and outbound interfaces to Onebusaway-NYC system. The detailed description of the interfaces are provided in the pages liked to this document.
JavaDoc for Onebusaway-NYC project can be found on http://developer.onebusaway.org/javadoc/modules/. The architecture section above gives high level overview of the various components of the system.
Onebusaway-NYC system uses MySQL database hosted on Amazon's Relational Database Service (RDS) to persist data fetched from the queues. The database consists of a master database which supports both read and write operations as well as read replica which is kept in sync with the master. The read replica does not support write operation. There are two types of databases which differ in terms of information they store. These databases along with the tables are described in the Database Schema page.
Onebusaway-NYC System has two major types of users:
- a "general" user is a bus rider who uses the system for tracking buses in real time; and
- a "super" user such as admin or operator who has administrative privileges. This user works with "admin" section of the system for user management, bundle building, vehicle status etc.
This linked sections below describe each of these user interfaces with screen shots wherever appropriate.
The general user interfaces used by bus riders are described in detail in the General User Interface page.
The admin user interfaces used by admin users such as "admin" and "operator" are described in detail in the Admin User Interface page.
The inference engine contains some debug screens for developers and administrators. More detail is provided in the Debug User Interface page.