Skip to content

Commit

Permalink
Update data-model docs (#45)
Browse files Browse the repository at this point in the history
  • Loading branch information
acelyc111 authored Dec 18, 2023
1 parent 17bb17e commit ec20272
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 9 deletions.
44 changes: 43 additions & 1 deletion _overview/en/data-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,46 @@
permalink: /overview/data-model/
---

TRANSLATING
## Introduction

The data model of Pegasus is a simple Key-Value model, it does not support complex schemas. However, to enhance its expressive power, Key is split into **HashKey** and **SortKey**, namely composite key (`[HashKey, SortKey] ->Value`), which is similar to [DynamoDB](https://aws.amazon.com/dynamodb/)'s [composite primary key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks.corecomponents.html#howitworks.corecomponents.primarykey).

### HashKey

Byte string. Similar to the partition key in DynamoDB, HashKey is used to calculate which partition (a.k.a. shard) the data belongs to. Pegasus uses a specific hash function to calculate the hash value for a HashKey, and then modulo the number of partitions to obtain the **Partition ID** for the data. Therefore, data with the same HashKey is always stored in the same partition.

> Note:
> On the C++ client side, the HashKey length limit is 64KB.
> On the Java client side, if [WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12) is enabled, then the limit is 1KB.
> On the server side, since Pegasus 2.0.0, if `[replication]max_allowed_write_size` is set as non-zero, limit the size of the entire request packet to this value, defaulting to 1MB.
### SortKey

Byte string. Similar to the sort key in DynamoDB, SortKey is used for sorting data within a partition. In fact, when storing data internally in RocksDB, we concatenate HashKey and SortKey as the keys of RocksDB.
> Note:
> On the C++ client side, there is no limit to the length of SortKey.
> On the Java client side, if [WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12) is enabled, then the limit is 1KB.
> On the server side, since Pegasus 2.0.0, if `[replication]max_allowed_write_size` is set as non-zero, limit the size of the entire request packet to this value, defaulting to 1MB.
### Value

Byte string.
> Note:
> On the C++ client side, there is no limit to the length of the Value.
> On the Java client side, if [WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12) is enabled, then the limit is 400KB.
> On the server side, since Pegasus 2.0.0, if `[replication]max_allowed_write_size` is set as non-zero, limit the size of the entire request packet to this value, defaulting to 1MB.
![pegasus-data-model](/assets/images/pegasus-data-model.png){:class="img-responsive docs-image"}

## Pegasus vs. HBase

Although Pegasus is not as semantically rich as HBase's tabular model, it can still meet most applications' needs, thanks to its HashKey+SortKey combination key design.
For example, users can treat HashKey as a row key and SortKey as an attribute name or column name, so that multiple data of the same HashKey can be viewed as one row, which can also express the concept of row in HBase.
Taking this into consideration, Pegasus not only provides the `get`/`set`/`del` interface for accessing individual data, but also provides the `multi_get`/`multi_set`/`multi_del` interfaces for accessing batch data in the same HashKey, and these interfaces provide single line atomic semantics, making it convenient for users to use.

![pegasus-data-model](/assets/images/pegasus-data-model-sample.png){:class="img-responsive docs-image"}

## Pegasus vs. Redis

Although Pegasus does not support rich data structures such as `List`/`Set`/`Hash` like Redis, users can still use Pegasus to implement similar semantics.
For example, users can equate HashKey with Redis' `key` and use SortKey as the `field` of Hash (or `member` of Set) to implement Hash in Redis.
14 changes: 6 additions & 8 deletions _overview/zh/data-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,21 @@
permalink: /overview/data-model/
---

## 数据模型介绍
## 介绍

Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为 **HashKey****SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与 [DynamoDB](https://aws.amazon.com/dynamodb/) 中提供的 [_composite primary key_](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition key and sort key)是很类似的。

这样设计的原因是:
* Pegasus系统采用基于 Hash 的固定分片,必须通过一个方式计算数据的分片ID。最简单的办法就是让用户提供一个 HashKey,然后通过hash函数计算获得。
* 简单的 `HashKey -> Value` 方式,在表达能力上又偏弱,不方便业务使用。
Pegasus 的数据模型非常简单,就是 Key-Value 模型,不支持复杂的 Schema。但是为了增强其表达能力,Key被分裂为 **HashKey****SortKey**,即组合键(composite key, `[HashKey, SortKey] -> Value`),这与 [DynamoDB](https://aws.amazon.com/dynamodb/) 中的 [composite primary key](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey)(partition key and sort key)是类似的。

### HashKey

字节串。类似于 DynamoDB 中的 partition key,HashKey 用于计算数据属于哪个分片。Pegasus 使用一个特定的 hash 函数,对HashKey 计算出一个hash值,然后对分片个数取模,就得到该数据对应的 **Partition ID** 。因此,HashKey 相同的数据总是存储在同一个分片中。
> 注意:在C++客户端侧,HashKey长度限制为64KB。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
> 注意:
> 在C++客户端侧,HashKey长度限制为64KB。
> 在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
> 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size` 为非0,则限制整个请求包的大小为该值,默认为1MB。
### SortKey

字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。HashKey 相同的数据放在一起,并且按照 SortKey 的字节序排序。实际上,在内部存储到RocksDB时,我们将 HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
字节串。类似于 DynamoDB 中的 sort key,SortKey 用于数据在分片内的排序。实际上,在内部存储到RocksDB时,我们将 HashKey 和 SortKey 拼在一起作为 RocksDB 的 key。
> 注意:在C++客户端侧,SortKey长度无限制。在Java客户侧,如果开启了[WriteLimiter](https://github.com/apache/incubator-pegasus/blob/v2.5.0/java-client/src/main/java/org/apache/pegasus/client/ClientOptions.java#L360C12-L360C12),则限制为1KB。
> 在Server侧,从Pegasus 2.0.0开始,如果设置 `[replication]max_allowed_write_size` 为非0,则限制整个请求包的大小为该值,默认为1MB。
Expand Down

0 comments on commit ec20272

Please sign in to comment.