Skip to content

Commit

Permalink
Update doc for support set/get/remove message deduplication policy at…
Browse files Browse the repository at this point in the history
… topic level (#7918)

### Motivation

 PR (#7821) supports set/get/remove the deduplication policy at the topic level. 


### Modifications

Update the doc based on code updates.

The following docs are updated:

- Messaging: message deduplication
- Manage message deduplication
- pulsar admin cli reference
  • Loading branch information
Huanli-Meng authored Aug 28, 2020
1 parent b08f647 commit 9ae78a4
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 9 deletions.
4 changes: 2 additions & 2 deletions site2/docs/concepts-messaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,7 @@ With message expiry, shown at the bottom, some messages are <span style="color:

## Message deduplication

Message duplication occurs when a message is persisted](concepts-architecture-overview.md#persistent-storage) by Pulsar more than once. Message deduplication is an optional Pulsar feature that prevents unnecessary message duplication by processing each message only once, even if the message is received more than once.
Message duplication occurs when a message is [persisted](concepts-architecture-overview.md#persistent-storage) by Pulsar more than once. Message deduplication is an optional Pulsar feature that prevents unnecessary message duplication by processing each message only once, even if the message is received more than once.

The following diagram illustrates what happens when message deduplication is disabled vs. enabled:

Expand All @@ -469,7 +469,7 @@ Message deduplication is disabled in the scenario shown at the top. Here, a prod

In the second scenario at the bottom, the producer publishes message 1, which is received by the broker and persisted, as in the first scenario. When the producer attempts to publish the message again, however, the broker knows that it has already seen message 1 and thus does not persist the message.

> Message deduplication is handled at the namespace level. For more instructions, see the [message deduplication cookbook](cookbooks-deduplication.md).
> Message deduplication is handled at the namespace level or the topic level. For more instructions, see the [message deduplication cookbook](cookbooks-deduplication.md).

### Producer idempotency
Expand Down
19 changes: 12 additions & 7 deletions site2/docs/cookbooks-deduplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,34 @@ To use message deduplication in Pulsar, you need to configure your Pulsar broker

## How it works

You can enable or disable message deduplication on a per-namespace basis. By default, it is disabled on all namespaces. You can enable it in the following ways:
You can enable or disable message deduplication at the namespace level or the topic level. By default, it is disabled on all namespaces or topics. You can enable it in the following ways:

* Enable for all namespaces at the broker-level
* Enable for specific namespaces with the `pulsar-admin namespaces` interface
* Enable deduplication for all namespaces/topics at the broker-level.
* Enable deduplication for a specific namespace with the `pulsar-admin namespaces` interface.
* Enable deduplication for a specific topic with the `pulsar-admin topics` interface.

## Configure message deduplication

You can configure message deduplication in Pulsar using the [`broker.conf`](reference-configuration.md#broker) configuration file. The following deduplication-related parameters are available.

Parameter | Description | Default
:---------|:------------|:-------
`brokerDeduplicationEnabled` | Sets the default behavior for message deduplication in the Pulsar broker. If it is set to `true`, message deduplication is enabled by default on all namespaces; if it is set to `false`, you have to enable or disable deduplication on a per-namespace basis. | `false`
`brokerDeduplicationEnabled` | Sets the default behavior for message deduplication in the Pulsar broker. If it is set to `true`, message deduplication is enabled on all namespaces/topics. If it is set to `false`, you have to enable or disable deduplication at the namespace level or the topic level. | `false`
`brokerDeduplicationMaxNumberOfProducers` | The maximum number of producers for which information is stored for deduplication purposes. | `10000`
`brokerDeduplicationEntriesInterval` | The number of entries after which a deduplication informational snapshot is taken. A larger interval leads to fewer snapshots being taken, though this lengthens the topic recovery time (the time required for entries published after the snapshot to be replayed). | `1000`
`brokerDeduplicationProducerInactivityTimeoutMinutes` | The time of inactivity (in minutes) after which the broker discards deduplication information related to a disconnected producer. | `360` (6 hours)

### Set default value at the broker-level

By default, message deduplication is *disabled* on all Pulsar namespaces. To enable it by default on all namespaces, set the `brokerDeduplicationEnabled` parameter to `true` and re-start the broker.
By default, message deduplication is *disabled* on all Pulsar namespaces/topics. To enable it on all namespaces/topics, set the `brokerDeduplicationEnabled` parameter to `true` and re-start the broker.

Even if you set the value for `brokerDeduplicationEnabled`, enabling or disabling via Pulsar admin CLI overrides the default settings at the broker-level.

### Enable message deduplication

Though message deduplication is disabled by default at broker-level, you can enable message deduplication for specific namespaces using the [`pulsar-admin namespace set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) command. You can use the `--enable`/`-e` flag and specify the namespace. The following is an example with `<tenant>/<namespace>`:
Though message deduplication is disabled by default at the broker level, you can enable message deduplication for a specific namespace or topic using the [`pulsar-admin namespaces set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) or the [`pulsar-admin topics set-deduplication`](reference-pulsar-admin.md#topic-set-deduplication) command. You can use the `--enable`/`-e` flag and specify the namespace/topic.

The following example shows how to enable message deduplication at the namespace level.

```bash
$ bin/pulsar-admin namespaces set-deduplication \
Expand All @@ -44,7 +47,9 @@ $ bin/pulsar-admin namespaces set-deduplication \

### Disable message deduplication

Even if you enable message deduplication at broker-level, you can disable message deduplication for a specific namespace using the [`pulsar-admin namespace set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) command. Use the `--disable`/`-d` flag and specify the namespace. The following is an example with `<tenant>/<namespace>`:
Even if you enable message deduplication at the broker level, you can disable message deduplication for a specific namespace or topic using the [`pulsar-admin namespace set-deduplication`](reference-pulsar-admin.md#namespace-set-deduplication) or the [`pulsar-admin topics set-deduplication`](reference-pulsar-admin.md#topic-set-deduplication) command. Use the `--disable`/`-d` flag and specify the namespace/topic.

The following example shows how to disable message deduplication at the namespace level.

```bash
$ bin/pulsar-admin namespaces set-deduplication \
Expand Down
32 changes: 32 additions & 0 deletions site2/docs/reference-pulsar-admin.md
Original file line number Diff line number Diff line change
Expand Up @@ -1787,6 +1787,9 @@ Subcommands
* `reset-cursor`
* `get-message-by-id`
* `last-message-id`
* `get-deduplication`
* `set-deduplication`
* `remove-deduplication`

### `compact`
Run compaction on the specified topic (persistent topics only)
Expand Down Expand Up @@ -2202,6 +2205,35 @@ Options
|`-l`, `--ledgerId`|The ledger id |0|
|`-e`, `--entryId`|The entry id |0|

### `get-deduplication`
Get a deduplication policy for a topic.

Usage
```bash
$ pulsar-admin topics get-deduplication tenant/namespace/topic
```

### `set-deduplication`
Enable or disable message deduplication on a topic.

Usage
```bash
$ pulsar-admin topics set-deduplication tenant/namespace/topic
```

Options
|Flag|Description|Default|
|---|---|---|
|`--enable`, `-e`|Enable message deduplication on the specified topic.|false|
|`--disable`, `-d`|Disable message deduplication on the specified topic.|false|

### `remove-deduplication`
Remove a deduplication policy from a topic.

Usage
```bash
$ pulsar-admin topics remove-deduplication tenant/namespace/topic
```

## `tenants`
Operations for managing tenants
Expand Down

0 comments on commit 9ae78a4

Please sign in to comment.