Deimos is the third of three libraries that add functionality on top of each other:
- RubyKafka is the low-level Kafka client, providing API's for producers, consumers and the client as a whole.
- Phobos is a lightweight wrapper on top of RubyKafka that provides threaded consumers, a simpler way to write producers, and lifecycle management.
- Deimos is a full-featured framework using Phobos as its base which provides schema integration (e.g. Avro), database integration, metrics, tracing, test helpers and other utilities.
As of May 12, 2020, the following are the important files to understand in how Deimos fits together:
lib/generators
: Generators to generate database migrations, e.g. for the DB Poller and DB Producer features.lib/tasks
: Rake tasks for starting consumers, DB Pollers, etc.lib/deimos
: Main Deimos code.lib/deimos/deimos.rb
: The bootstrap / startup code for Deimos. Also provides some global convenience methods and (for legacy purposes) the way to start the DB Producer.lib/deimos/backends
: The different plug-in producer backends - e.g. produce directly to Kafka, use the DB backend, etc.lib/deimos/schema_backends
: The different plug-in schema handlers, such as the various flavors of Avro (with/without schema registry etc.)lib/deimos/metrics
: The different plug-in metrics providers, e.g. Datadog.lib/deimos/tracing
: The different plug-in tracing providers, e.g. Datadog.lib/deimos/utils
: Utility classes for things not directly related to producing and consuming, such as the DB Poller, DB Producer, lag reporter, etc.lib/deimos/config
: Classes related to configuring Deimos.lib/deimos/monkey_patches
: Monkey patches to existing libraries. These should be removed in a future update.
Both producers and consumers include the SharedConfig
module, which
standardizes configuration like schema settings, topic, keys, etc.
Consumers come in two flavors: Consumer
and BatchConsumer
. Both include
BaseConsumer
for shared functionality.
While producing messages go to Kafka by default, literally anything else
can happen when your producer calls produce
, by swapping out the producer
backend. This is just a file that needs to inherit from Deimos::Backends::Base
and must implement a single method, execute
.
Producers have a complex workflow while processing the payload to publish. This
is aided by the Deimos::Message
class (not to be confused with the
KafkaMessage
class, which is an ActiveRecord used by the DB Producer feature,
below).
Schema backends are used to encode and decode payloads into different formats such as Avro. These are integrated with producers and consumers, as well as test helpers. These are a bit more involved than producer backends, and must define methods such as:
encode
a payload or key (when encoding a key, for Avro a key schema may be auto-generated)decode
a payload or keyvalidate
that a payload is correct for encodingcoerce
a payload into the given schema (e.g. turn ints into strings)- Get a list of
schema_fields
in the configured schema, used when interacting with ActiveRecord - Define a
mock
backend when the given backend is used. This is used during testing. Typically mock backends will validate values but not actually encode/decode them.
Deimos uses the https://www.github.com/flipp_oss/fig_tree gem for configuration.
The configuration definition for Deimos is in config/configuration.rb
. In
addition, there are methods in config/phobos_config.rb
which translate to/from
the Phobos configuration format and support the old phobos.yml
method
of configuration.
These are simpler than other plugins and must implement the expected methods
(increment
, gauge
, histogram
and time
for metrics, and start
, finish
and set_error
for tracing). These are used primarily in producers and consumers.
Deimos provides an ActiveRecordConsumer
and ActiveRecordProducer
. These are
relatively lightweight ways to save data into a database or read it off
the database as part of app logic. It uses things like the coerce
method
of the schema backends to manage the differences between the given payload
and the configured schema for the topic.
This feature (which provides better performance and transaction guarantees) is powered by two components:
- The
db
publish backend, which saves messages to the database rather than to Kafka; - The
DbProducer
utility, which runs as a separate process, pulls data from the database and sends it to Kafka.
There are a set of utility classes that power the producer, which are largely copied from Phobos:
Executor
takes a set of "runnable" things (which implement astart
andstop
method) puts them in a thread pool and runs them all concurrently. It manages starting and stopping all threads when necessary.SignalHandler
wraps the Executor and handles SIGINT and SIGTERM signals to stop the executor gracefully.
In the case of this feature, the DbProducer
is the runnable object - it
can run several threads at once.
On the database side, the ActiveRecord
models that power this feature are:
KafkaMessage
: The actual message, saved to the database. This message is already encoded by the producer, so only has to be sent.KafkaTopicInfo
: Used for locking topics so only one producer can work on it at once.
A Rake task (defined in deimos.rake
) can be used to start the producer.
This feature (which periodically polls the database to send Kafka messages)
primarily uses other aspects of Deimos and hence is relatively small in size.
The DbPoller
class acts as a "runnable" and is used by an Executor (above).
The PollInfo
class is saved to the database to keep track of where each
poller is up to.
A Rake task (defined in deimos.rake
) can be used to start the pollers.
The utils
folder also contains the LagReporter
(which sends metrics on
lag) and the InlineConsumer
, which can read data from a topic and directly
pass it into a handler or save it to memory.