Skip to content

getsentry/sentry-kafka-schemas

Repository files navigation

Kafka topic and schema registry for Sentry

Contains the Kafka topics and schema definitions used by the Sentry service.

Defining schemas

Currently only jsonschema is supported. The jsonschema should be placed directly in the schemas directory, and then referenced from the relevant topic via resource property.

We use jsonschema for both JSON- and msgpack-based topics, as most msgpack types have a JSON-equivalent. For bytestrings, we type them using {"description": "msgpack bytes"}, which is currently just interpreted like {} (allow all types).

If you don't want to hand-write it, for generating an initial json schema from a payload we like https://github.com/quicktype/quicktype

How strict should my schema be?

If in doubt, we recommend that schemas are only as strict as is minimally required by all consumers and downstream code required by Sentry. However it is ultimately up to the owners of the schema to decide whether a stricter schema is appropriate in particular scenarios.

Adding example messages

Example messages can be placed in the examples directory and referenced from the relevant topic/version.

Example messages must be stripped of all customer related data. This also includes things like organization and project IDs, which should be replaced with something like project_id: 1 or org_id: 1.

Defining topics

Each topic is a yaml file in the topics directory. This topic name is a "logical" topic name as many services in Sentry support overriding the default name to a different physical topic name if desired. Topic names must be unique in Sentry: the same name cannot be used for different types of data.

The yaml file of a topic has the following keys:

  1. schemas. Schemas is an array. The following should be provided for each schema:

    • version: Incrementing integer. Should start at 1.
    • compatibility_mode: none or backward.
    • type: Can be either json or msgpack. In both cases we use jsonschema to define the message schema.
    • resource: Should match the file name in the schemas directory
    • examples: Should match the file names in the examples directory
  2. topic_configuration_config. Configuration used to create the topic

  3. services. Which Sentry services produce to and consume from the topic.

  4. description.

  5. pipeline.

Using the schema (in Python)

from sentry_kafka_schemas import get_codec, ValidationError
from sentry_kafka_schemas.schema_types.ingest_metrics_v1 import IngestMetric

SCHEMA: Codec[IngestMetric] = get_codec("ingest-metrics")

try:
    decoded = SCHEMA.decode(b'{"type": "c", ...}')
except ValidationError:
    return

# ingest-metrics schema defines retention_days as required type, so this is
# safe.
retention_days = decoded["retention_days"]

Using Python types

Python types are automatically generated under sentry_kafka_schemas.schema_types. A schema for version 1 of the topic foo-bar is exported under sentry_kafka_schemas.schema_types.foo_bar_v1.

Use title attribute on your JSON schema and the various definitions to assign them a stable name.

For example:

// a schema referenced from `topics/events.yaml, containing topic: events
{
    "title": "main_schema",
    "description": "Some additional information about the schema."
    "properties": {
        "subfield": {"$ref": "#/definitions/SubSchema"}
    },
    "definitions": {
        "SubSchema": {
            "type": "object",
            "title": "sub_schema"
        }
    }
}

Produces:

# file: sentry_kafka_schemas/schema_types/events_v1.py

class MainSchema(TypedDict, total=False):
    """Some additional information about the schema."""

    subfield: SubSchema

class SubSchema(TypedDict, total=False):
    ...

title can be added at any level, not just within definitions, to produce types. Use that power tastefully!

Using Rust types

We use a completely different library for generating Rust types, and therefore the rules by which Rust type names are generated are different. Rust types are work-in-progress.

For now, schema files need to be explicitly added to rust/build.rs. The generated types can be viewed with make view-rust-types, cargo doc --open, or online on https://docs.rs/sentry-kafka-schemas.

Release process and development install

For releasing a new stable version from main branch, go to Actions and trigger a new job for the Release workflow.

We usually just increment the patch number for schema changes. e.g. If the last version was 0.1.11, the next version should be 0.1.12. Check https://github.com/getsentry/sentry-kafka-schemas/releases for the latest release numbers.

After releasing a new version, you should immediately bump Sentry, Snuba and Relay to ensure that all services are synchronized onto the new schema as soon as possible.

Most likely you are working on a PR to Snuba or Sentry where you already want to use those types. You can do that by running make build in this repo, then running pip install -e ~/projects/sentry-kafka-schemas/.

You need to re-run make build to update types -- they do not automatically change with schema changes even if you install this package in development mode.

To stop using a development version of this repo in whichever service you're working on, you can reinstall Python dependencies in that repo. Most likely the command is make install-py-dev.

Schema ownership

All topics definitions, schemas and examples should have a defined owner or multiple owners if shared. The CODEOWNERS file should be updated with this information whenever new schemas and topics are added.

Review is only required from one team/owner, not from all of them.