Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-16551: PIP-188: Cluster migration or Blue-Green cluster deployment support in Pulsar #4528

Open
sijie opened this issue Jul 12, 2022 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Jul 12, 2022

Original Issue: apache#16551


Motivation

Cluster migration or Blue-Green cluster deployment is one of the proven solutions to migrate live traffic from one cluster to another. One of the examples is applications running on Kubernetes sometimes require a Kubernetes cluster upgrade which can cause downtime for the entire application during a Kubernetes cluster upgrade. Blue-green deployment is an application release model that gradually transfers user traffic from a previous version of an app or microservice to a nearly identical new release—both of which are running in production.

The old version can be called the blue environment while the new version can be known as the green environment. Once production traffic is fully transferred from blue to green, blue can standby in case of rollback or be pulled from production and updated to become the template upon which the next update is made.

We need such capability in Apache pulsar to migrate live traffic from the blue cluster to the green cluster so, eventually, the entire traffic moves from the blue cluster to the green cluster without causing downtime for the topics.

Goal

This PIP adds support to migrate and redirect the blue cluster’s traffic to the green cluster. Therefore, the Broker will support admin-API using which admin-user can mark migrate cluster along with redirection URLs where traffic should be redirected. Broker persists migration state and new redirected cluster’s URL as part of cluster metadata.

Once the cluster is marked as migrating, the broker asynchronously marks each topic owned by that broker as migrated by calling the new managed-ledger API asyncMigrate(). Once, the topic is marked as migrated, broker notifies all the producers and consumers (which have drained the backlog for their subscriptions) with a new client-protocol command called “Migrated-Topic” which has redirection URLs to the green cluster. Producers and consumers for those topics cache the redirection URLs and retry to connect to the broker with that URL which redirects them to the green cluster.

Broker can redirect only those consumers which have reached to end of a terminated topic or create a new subscription in the blue cluster. Therefore, the broker can determine the redirection of consumers in the consumer-creation phase and the pulsar client has to handle redirection after sending producer/consumer creation requests.

Broker will unsubscribe the subscription once that subscription reaches end of topic and broker will also not allow creation of any new producer or subscription for the topics. Therefore, eventually, all the topics in the blue cluster will not have a subscription or producer attached and eventually, those topics will be deleted by the garbage collector.

Broker marks the cluster state as migration-completed once all the topics are deleted and that cluster will not allow any new topic creation.

Broker/Client changes

Topic termination

This PIP depends on the recently added broker's feature to terminate topic. In case, if any user decides to terminate the topic then it can be done using admin API. Once, broker receives request to terminate the topic, broker

  1. Marks ledger with state : Terminated and doesn't allow any new writes on that managed-ledger.
  2. Broker immediately disconnects all the producers on that topic and fails new producer creation with error: TopicTerminatedError
  3. Broker allows dispatching messages for all types of subscriptions until they reach to end-of-topic. The broker sends end-of-topic message to all consumers as soon as that subscription reads and ack all messages on that topic.

Migration of topic will have enhancement on top of topic-termination feature where migration process will

  1. first terminate the topic along with marking migration flag on managed-ledger.
  2. Instead of sending TopicTerminatedError error to the producer, the broker sends migration-response to producer so, producer can handle migration-response and manage redirection to the new cluster. We will discuss this step in detail in next section.
  3. Broker will continue dispatching messages to subscribers until they reach to end of the topic. Once, the subscriber reaches the end of topic, the broker will send a migration response to the consumer for further redirection.

Managed-Ledger Changes

This PIP will add API to managed-ledger to change ManagedLedger state as migrated. This API will terminate the topic and persist the status of managed-ledger as migrated. Broker triggers migrate API of managed-ledger once cluster becomes the blue cluster and traffic should redirect to the green cluster.

1. ManagedLedger.java

CompletableFuture<Position> asyncMigrate();


5. MLDataFormats.proto

message ManagedLedgerInfo {
   // Flag to check if topic is terminated and migrated to different cluster
   optional bool migrated = 4;

}

Broker changes

  1. Broker provides pulsar-admin API to mark the cluster in migration state.

Pulsar Admin Change

pulsar-admin clusters set-cluster-migrated \
–brokerServiceUrl <> \
–brokerServiceUrlTls <>
  1. Broker stores cluster migration state and redirection urls in cluster-metadata.

Cluster-metadata change

ClusterData.java

boolean migrated;
ClusterUrl migratedClusterUrl;


class ClusterUrl {
        String serviceUrl;
        String serviceUrlTls;
        String brokerServiceUrl;
        String brokerServiceUrlTls;
}
  1. Broker runs periodic task to check cluster migration state and moves topic into migration state by calling managed-ledger’s migration-api.
ServiceConfiguration.java

private long clusterMigrationCheckDurationSeconds = 0; //disable task with default value=0

  1. Broker sends topic migration message to client so, producer/consumer at client side can handle redirection accordingly.
PulsarApi.proto

message CommandMigratedTopic {
    required uint64 consumer_id = 1;
    optional string brokerServiceUrl      = 2;
    optional string brokerServiceUrlTls   = 3;    
}

  1. Avoid data loss at green cluster with new incoming traffic
    Once, topic is marked as migrated, broker will start redirecting producers to new cluster to publish new messages but if consumers are not yet redirected then messages at green cluster can be lost. Therefore, we can apply default retention policy at Green cluster until blue cluster is migrated. So, Green cluster will retain the messages until blue cluster is completely migrated to green cluster.

Replicator and message ordering handling

A. Incoming replication messages from other region's replicator producers to Blue cluster
This will not impact ordering messages coming from the other regions to blue/green cluster. After marking blue cluster, blue cluster will reject replication writes from remote regions and redirects remote producers to the Green cluster where new messages will be written. Consumers of Blue clusters will only be redirected to green once they received all messages from blue. So, migration gives an ordering guarantee for messages replicating from remote regions.

B. Outgoing replication messages from Blue cluster's replicator producers to other regions
The broker can give an ordering guarantee in this case with the trade-off of topic unavailability until the blue cluster replicates all existing published messages in the blue cluster before the topic gets terminated.

  1. Blue cluster marks topic terminated and migrated
  2. Topic will not redirect producers/consumers until all the replicators reaches end of topic and replicates all messages to remote regions. Topic will send TOPIC_UNAVAILABLE message to producers/consumers so, they can keep retrying until replicators reach to end of topics.
  3. Broker disconnects all the replicators and delete them once they reach end of topic.
  4. Broker start sending migrated-command to producer/consumers to redirect clients to green cluster.

Client Changes

Once producers and consumers receive the Migrated-Topic command with a list of redirect URLs, they will cache those URLs and try to reconnect with a broker by using those URLs. The client will add handling of the CommandMigratedTopic protocol.

PIP supported feature

  1. Publish ordering guarantee
  2. Consumer ordering guarantee
  3. Incoming replicator ordering guarantee
  4. Outgoing replicator ordering guarantee with the topic unavailability tradeoff
@sijie sijie added the PIP label Jul 12, 2022
@sijie sijie changed the title ISSUE-16551: PIP-184: Cluster migration or Blue-Green cluster deployment support in Pulsar ISSUE-16551: PIP-188: Cluster migration or Blue-Green cluster deployment support in Pulsar Sep 11, 2022
@sijie sijie added the Stale label Sep 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant