Remove Kafka dependency #524

jdegoes · 2013-10-05T22:03:53Z

Kafka is a large, complex piece of software which requires installation and maintenance. There are many ways for Kafka to fail, and Kafka requires ongoing management in order to prevent disk overflow and make tradeoffs between recoverability and resource usage.

While Kafka is very appropriate for a large scale distributed ingest system which has to keep up with fluctuating loads and be fully redundant, it is less appropriate for a single node analytics engine like Precog. When Precog becomes distributed, the focus will be on reading data from HDFS, and not on the ingest of that data, so even long-term, the direct use of Kafka in the Precog project is an unnecessary distraction.

In order to simplify the number of moving pieces in Precog, Kafka needs to be eliminated as a dependency.

Ingest can be as simple as batching up a chunk of data and writing it out to the (abstract) file system -- e.g. appending to the relevant file.

This ticket will be considered complete when Kafka is not a dependency of the project nor referenced or utilized anywhere in the source code, unit tests, or documentation.

See @nuttycom's comment below.

nuttycom · 2013-10-06T05:32:14Z

This is very easy to achieve; simply create a new subproject that derives
from the ingest and bifrost projects, and when you mix the cake together
replace the KafkaEventStore with an EventStore implementation that passes
messages directly to the routing actor, and exclude the
KafkaShardIngestActor from the cake entirely..

On Sat, Oct 5, 2013 at 4:03 PM, John A. De Goes notifications@git.luolix.topwrote:

Kafka is a large, complex piece of software which requires installation
and maintenance. There are many ways for Kafka to fail, and Kafka requires
ongoing management in order to prevent disk overflow and make tradeoffs
between recoverability and resource usage.

While Kafka is very appropriate for a large scale distributed ingest
system which has to keep up with fluctuating loads and be fully redundant,
it is less appropriate for a single node analytics engine like Precog. When
Precog becomes distributed, the focus will be on reading data from HDFS,
and not on the ingest of that data, so even long-term, the direct use of
Kafka in the Precog project is an unnecessary distraction.

In order to simplify the number of moving pieces in Precog, Kafka needs to
be eliminated as a dependency.

Ingest can be as simple as batching up a chunk of data and writing it out
to the (abstract) file system -- e.g. appending to the relevant file.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/524
.

jdegoes mentioned this issue Oct 5, 2013

Remove Zookeeper dependency #525

Open

ghost assigned jdegoes Dec 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Kafka dependency #524

Remove Kafka dependency #524

jdegoes commented Oct 5, 2013

nuttycom commented Oct 6, 2013

Remove Kafka dependency #524

Remove Kafka dependency #524

Comments

jdegoes commented Oct 5, 2013

nuttycom commented Oct 6, 2013