Event Subscriptions (rfc) #90

kevinbader · 2018-08-14T13:49:25Z

Event Subscriptions

Previously, RIG's focus was on simple frontend use-cases, where it made sense to built terminology and configuration around userId, groups, etc. Being an event gateway, however, it makes a lot more sense to focus on events and supporting current use-cases through that lense.

A focus on events allows us to make RIG more generic and thus more flexible in terms of the use cases it supports. To that end, frontends should be able to connect to RIG before any user authentication happens. This is needed to support use cases where anonymous users are notified about certain types of events, like a sports game website that broadcasts game events. Additionally, it makes it easier to use RIG even in case you need it only for authenticated users, because with single-page apps it's really simple to connect to RIG after page load and keep the connection open from there on, instead of binding the connection lifetime to the user session's lifetime.

Finally, we are adopting the upcoming Cloud Events Spec in order to streamline interfaces and increase compatibility with other applications down the road.

Let's begin with a deep dive on events and subscriptions.

Events and event types

We're going to use the Cloud Events Spec wherever possible. For example, incoming events are expected to feature an "eventType" field.

An "official" example of such an event type is com.github.pull.create. We can infer the following properties:

Event types use reverse-dns notation, which means the type name contains parent-to-child relations defined by the dot character.
Event types are likely going to be unrelated to specific entities or (user) sessions. For example, for a repository "my-org/my-repo", we do not expect to see events like com.github.pull.create.my-org/my-repo; instead, the repository ID is likely to be found in the CloudEvent's data field (as there is no "subject"-like field mentioned in the spec).

Following those observations/assumptions, we assume to events that look similar to the following (based on Github's get a single pull request API):

{
  "cloudEventsVersion": "0.1",
  "eventType": "com.github.pull.create",
  "source": "/desktop-app",
  "eventID": "A234-1234-1234",
  "eventTime": "2018-04-05T17:31:00Z",
  "data": {
    "assignee": {
      "login": "octocat",
    },
    "head": {
      "repo": {
        "full_name": "octocat/Hello-World",
      },
    },
    "base": {
      "repo": {
        "full_name": "octocat/Hello-World",
      },
    },
  }
}

Because of this, RIG's internal subscriptions cannot rely on the event type only. RIG is built for routing events to users' devices or sessions, so it must also have a notion of those things built into the subscription mechanism.

The idea: introduce "extractors" that can extract information from an event, and allow subscriptions to match against that extracted information. Let's take a look at an example:

Assume there is an event type com.github.pull.create;
Assume the user is interested in events that refer to the "octocat/Hello-World";
Assume the user only interested in new pull requests assigned to the "octocat" user;
We start RIG with an extractor configuration that uses JSON Pointer to find data:

extractors:
  com.github.pull.create:
    assignee:
      # "assignee" is the field name that can be referred to in the subscription request
      # (see subscription request example below).
      # Each field has a field index value that needs to remain the same unless all RIG
      # nodes are stopped and restarted. This can be compared to gRPC field numbers and
      # the same rule of thumb applies: always append fields and never reuse a field
      # index/number.
      stable_field_index: 0
      # JWT values take precedence over values given in a subscription request:
      jwt:
        # Describes where to find the value in the JWT:
        json_pointer: /username
      event:
        # Describes where to find the value in the event:
        json_pointer: /data/assignee/login

    head_repo:
      stable_field_index: 1
      # This is extracted from subscription requests, rather than in the JWT. In the
      # request body the field is referred to by name, so a `json_pointer` is required
      # for the event only:
      event:
        json_pointer: /data/head/repo/full_name

    base_repo:
      stable_field_index: 2
      event:
        json_pointer: /data/base/repo/full_name

The frontend sends a subscription that refers to those fields:

{
  "eventType": "com.github.pull.create",
  "oneOf": [
    { "head_repo": "octocat/Hello-World" },
    { "base_repo": "octocat/Hello-World" }
  ]
}

The frontend receives the event outlined above because one of the constraint defined under oneOf is fulfilled. Note that within each constraint object, all fields must match, so the constraints are defined in conjunctive normal form.

If a JSON Pointer expression returns more than one value, there is a match if, and only if, the target value is included in the JSON Pointer result list.

Implementation

Matching relies on ETS match specs - subscriptions are kept in an ETS table, for each event type. The tables contain all key/value pairs as defined in the extractor for the event type; they contain the values as defined in the subscription. If a value is not set in a subscription, the missing value is set to nil. For example, the subscription above would be reflected in two records:

{connection_pid, {:assignee, "octocat"}, {:head_repo, "octocat/Hello-World"}, {:base_repo, nil}}
{connection_pid, {:assignee, "octocat"}, {:head_repo, nil}, {:base_repo, "octocat/Hello-World"}}

This structure allows for very efficient matching. There is also a dedicated table per event type, so ownership is easy and there are no concurrent requests per table. At the time of writing, the default limit on the number of ETS tables is 1400 per node, but this can be changed using ERL_MAX_ETS_TABLES. If that ever becomes impractical, putting all subscriptions in a single table should work just as well.

The processes consuming events from Kafka and Kinesis are not the right place for running any filtering or routing logic, as we need them to be as fast as possible. Instead, for each event type there is one process on each node, enabling the consumer processes to quickly hand-off events by looking at only the event type field. Those "filter" processes own their event-type specific ETS table. For any given event, they can use their ETS table to obtain the list of processes to send the events to.

                                                               +
                                                      Node A   |   Node B
                                                               |
                                                               |
                                                               |
                        +                                      |                            +
                        |                                      |                            |
                        | events                               |                            | events
                        |                                      |                            |
                        |                                      |                            |
              +---------v----------+                           |                  +---------v----------+
              |                    |                           |                  |                    |
              |   Kafka Consumer   |                           |                  |   Kafka Consumer   |
              |                    |                           |                  |                    |
              +---+-------------+--+                           |                  +---+-------------+--+
                  |             |                              |                      |             |
                  |             |                              |                      |             |
   foo.bar events |             | foo.baz events               |       foo.bar events |             | foo.baz events
                  |             |                              |                      |             |
                  |             |                              |                      |             |
+-----------------v---+     +---v-----------------+            |    +-----------------v---+     +---v-----------------+
|                     |     |                     |            |    |                     |     |                     |
|  Filter             |     |  Filter             |            |    |  Filter             |     |  Filter             |
|  eventType=foo.bar  |     |  eventType=foo.baz  |            |    |  eventType=foo.bar  |     |  eventType=foo.baz  |
|                     |     |                     |            |    |                     |     |                     |
+---------------------+     +----+-------------+--+            |    +---------------------+     +---+-------------+---+
                                 |             |               |                                    |             |
                                 |             |               |                                    |             |
                                 |             |               |                                    |             |
                            foo.bar events that|       <----------------------------------------------------------+
                            satisfy the connections'           |                                    |
                            subscription constraints   <--------------------------------------------+
                                 |             |               |
                                 |             |               |       A connection subscribes to all filters (periodically),
                      +----------v---+     +---v----------+    |       using the filters' process group. For incoming events,
                      |              |     |              |    |       the filter processes check against all subscription
                      |  WebSocket   |     |     SSE      |    |       constraints and forward the events that match to the
                      |  connection  |     |  connection  |    |       respective connection processes (using the pids stored
                      |              |     |              |    |       in the filter's ETS table).
                      +--------------+     +--------------+    +

Processes, process groups and lifecycles:

Consumer processes (Kafka and Kinesis)
- permanent
Filter processes
- The consumer processes have to start filter processes on demand, on their respective node.
- Filter process stop themselves after not receiving messages for some time.
- Filter processes join process groups, such that for each event type there is one such group.
Connection processes
- are tied to the connection itself
Subscription entries in the filters' ETS table..
- ..are created and refreshed periodically by the connection process, which sends the request to all filter processes in the event-type group. The HTTP call that creates the subscription does not directly call a filter process, but instead informs the connection process itself of the new subscription, which in turn registers with the respective filter processes.
- ..have a per record time-to-live, used to keep the data current. If a connection process dies, the subscription records will no longer be refreshed and get removed eventually.

Connection, Authentication & Authorization

In RIG 1.x, a valid JWT was required in order for a frontend to be able to establish a connection. Starting with RIG 2.0, this will no longer be the case:

Any device may establish a WebSocket or SSE connection.
If the connection request carries a valid JWT in the Authorization header, the JWT-based subscriptions are set up automatically.[*] Otherwise, no subscriptions are set up.
Using the subscriptions endpoint, a frontend can add subscriptions to an existing connection. If the request carries a valid JWT in the Authorization header, the JWT-based subscriptions are set up automatically.[*] Previous JWT-based subscriptions are replaced or removed.

[*] Iterating over the extractors configuration, RIG builds a list of field names by looking for jwt field mappings. In the example above that would be only the "assignee" field with the JSON Pointer /username.

In order to permit or prevent users from creating subscriptions, an administrator can choose one of three options:

Anyone can subscribe to any event.
Require a valid JWT to subscribe to events.
Invoke an external service for subscriptions that are not JWT-based. The service sees the subscription request as sent by the frontend and indicates whether to allow or deny the subscription.

A subscription request, which may contain multiple subscriptions:

{
  "subscriptions": [
    {
      "eventType": "com.github.pull.create",
      "oneOf": [
        { "head_repo": "octocat/Hello-World" },
        { "base_repo": "octocat/Hello-World" }
      ]
    }
  ]
}

In this example, the subscription's constraints are fulfilled when either of the head_repo and base_repo fields match. If the subscription should only apply to cases where both fields match, it should look like this instead:

{
  "subscriptions": [
    {
      "eventType": "com.github.pull.create",
      "oneOf": [
        { "head_repo": "octocat/Hello-World", "base_repo": "octocat/Hello-World" }
      ]
    }
  ]
}

The text was updated successfully, but these errors were encountered:

kevinbader added the enhancement label Aug 14, 2018

kevinbader added this to the 2.0.0 milestone Aug 14, 2018

kevinbader self-assigned this Aug 14, 2018

This was referenced Aug 14, 2018

Have a local-development-only Docker image for an easy-to-use dev setup #73

Closed

event subscriptions #93

Merged

kevinbader mentioned this issue Oct 25, 2018

Add Concept of "Session" #98

Closed

kevinbader mentioned this issue Nov 6, 2018

Automated tests in :rig_tests for Kinesis support #100

Open

kevinbader modified the milestones: 2.0.0, 2.0.0-beta.2 Nov 6, 2018

kevinbader closed this as completed in #93 Nov 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event Subscriptions (rfc) #90

Event Subscriptions (rfc) #90

kevinbader commented Aug 14, 2018 •

edited

Loading

Event Subscriptions (rfc) #90

Event Subscriptions (rfc) #90

Comments

kevinbader commented Aug 14, 2018 • edited Loading

Event Subscriptions

Events and event types

Implementation

Connection, Authentication & Authorization

kevinbader commented Aug 14, 2018 •

edited

Loading