-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce pulsar replicator #1582
Comments
@srkukarni @merlimat Pulsar replicator implementation mainly touches 3 things:
#1594 covers all above functionalities. so, once pulsar-connector PR is merged, I will rebase replicator-PR on top of it and later in separate PR, we can merge replicator and connector in one module. Can you please let us know your thoughts.? |
Not to start a tech fight here or anything, but do you think having such similar naming to Confluent Replicator (aka Kafka Connect) would be an issue for "Pulsar Replicator / Connect"? Second point being at least for Kafka, the Connect API is for the interaction points between external systems (such as Dynamo or Kinesis). Confluent Replicator being a closed source version of that API between Kafka Clusters. |
@Cricket007 The naming here was indeed a bit misleading since this is more around integrating heterogeneous systems with Pulsar. Pulsar has always had "replicator" functionalities, in a much more advanced form compared to MirrorMaker or other proprietary solutions (http://pulsar.apache.org/docs/latest/admin/GeoReplication/). Geo-replication targets at replication between Pulsar clusters. Because on both sides we have Pulsar brokers that talk native Pulsar protocol, we can achieve a lot of efficiencies. Regarding the changes for this PR, the consensus has been to focus the efforts on a single "connector" framework, named "Pulsar-IO" which is scheduled for 2.1 release. The work on Pulsar-IO framework address the problem of getting data in & out of Pulsar in the simplest possible way from a user standpoint:
If you're interested, you can checkout the work in progress: https://github.com/apache/incubator-pulsar/tree/master/pulsar-io . There's also a PR with some in-progress documentation: #1749 |
@rdhabalia what is your status of this task? |
@sijie I think we can close this one as we will be trying out pulsar-io framework here. |
Motivation
Pulsar already supports geo-replication that persists messages across multiple clusters of pulsar instances. Therefore, client can set replication clusters for a topic, and pulsar broker internally takes care of replication to all the clusters. However, sometimes application may want to replicate the same published messages to other external systems which is not part of pulsar-eco system such as AWS-Kinesis, DynamoDB. Therefore, right now, client-application has to take this extra burden to publish same messages for pulsar and other external systems.
Therefore, it will be useful to introduce server side replication that can replicate pulsar messages to external system without client intervention. Also server side replication should be extensible which can provide a plugin mechanism to add various replicators to support message-replication to different external systems.
Requirement
PIP:
https://github.com/apache/incubator-pulsar/wiki/PIP-18:-Pulsar-Replicator
The text was updated successfully, but these errors were encountered: