Skip to content

Commit

Permalink
Update background docs
Browse files Browse the repository at this point in the history
  • Loading branch information
acelyc111 committed Dec 15, 2023
1 parent 397b3b5 commit 3ffc393
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 5 deletions.
47 changes: 46 additions & 1 deletion _overview/en/background.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,49 @@
permalink: /overview/background/
---

TRANSLATING
# Design Goals

* High availability: The system must be highly available. Even if some servers go down, the Pegasus cluster can recover services in an extremely short time (in several seconds), minimizing the impact on the Pegasus users and requiring service reliability to reach 99.99% or higher.
* High performance: The system must provide high-performance read and write services, with P99 latency required in milliseconds.
* Strong consistency: The system provides users with strong consistency semantics, making it easier for the Pegasus users to develop applications.
* Scalable: The system must easily scale in and scale out to cope with changes in application throughput loads.
* Easy to use: The system provides Pegasus users with simple and easy-to-use client libraries and interfaces, making it convenient for users to use.

# Implementation

When designing Pegasus, we make some trade-offs in terms of design goals, implementation difficulty, and development efficiency. Overall, including these aspects:
* Development language: Based on performance considerations, we have chosen C++.
* Data model: Using a simple Key-Value data model. This not only simplifies development, but also meets most application development needs. Furthermore, we split the Key into two levels: HashKey and SortKey, enhancing its expressive power.
* Data distribution: Using a fixed hash distribution. Compared to Range distribution and Consistent Hash distribution, Fixed Hash distribution is simpler to implement, and data skewing and scalability can be solved through measures such as reasonable design of hash keys and preset more data shards. We also support [Partition Split](https://pegasus.apache.org/en/administration/partition-split) Function to expand the number of shards.
* Storage medium: It is recommended to choose SSD (Solid State Drive). The performance and cost of SSDs are between memory and HDD (Hard Disk Drive), and considering both application requirements and costs, choosing SSDs is a more appropriate option.
* Local storage engine: Select [RocksDB](https://github.com/facebook/rocksdb). RocksDB has made many optimizations on the basis of LevelDB, which can fully utilize the IOPS performance on SSDs and the performance on multi-core servers.
* Consistency Protocol: Select [PacificA](https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/). Compared to [Raft](https://raft.github.io/), the PacificA protocol has its own characteristics and advantages.
* Fault detection: Unlike the HBase, Pegasus does not use Zookeeper for fault detection, but implements a lease-based fault detection mechanism between MetaServer and ReplicaServer.

# Compare to Apache HBase

The original purpose of the Pegasus system was to compensate for the shortcomings of HBase. Here, we compare the differences between the two from the user's perspective:
* Data model: HBase is a tabular model that uses Range sharding, while Pegasus is a Key Value model that uses Hash sharding.
* Interface: Although HBase's API interface features are rich, its usage is also more complex. The interfaces of Pegasus are simple to understand and use.
* Reliability: Due to architecture and implementation reasons (such as the use of local storage, fault detection, and implementation in C++ language), the reliability of Pegasus is usually better than that of HBase.
* Performance: Due to the layered architecture, the read and write performance of HBase is not very good, P99 latency is usually in the tens or even hundreds of milliseconds, and GC issues can bring glitches. Pegasus's P99 latency can meet sensitive online application requirements in just a few milliseconds.

# Compare to Redis

If only compare on read/write latency and single-machine throughput, Redis is clearly superior to Pegasus. But if compared comprehensively in terms of read/write latency, availability, scalability, cost, etc., Pegasus also has its own advantages.
The main differences compared to Redis are as follows:
* Data model: Both are Key Value models, but Pegasus supports secondary keys (HashKey + SortKey).
* Interface: Redis has richer interfaces and supports container features such as List, Set, Map, etc. The interfaces of Pegasus are simple to understand and use, and its functions are more singular.
* Read/write latency: Redis performs better than Pegasus.
* Scalability: Pegasus has better scalability, making it easier to scale in and scale out, and supporting automatic load balancing. Redis's distributed solution can be quite cumbersome when adding or removing instances.
* Reliability: Pegasus is always persistent application data to disk, and the system architecture ensures its high data integrity. Redis takes a long time to recover after an instance crash, its availability is not good enough, and it may also lose the last period of data.
* Cost: Pegasus uses SSD to store full data, while Redis requires memory to store full data, resulting in lower cost for Pegasus.

# Comprehensive comparison

When selecting Key-Value storage systems, application developers often encounter these issues:
* Although HBase has high availability and is easy to scale, its performance is not good enough.
* Although Redis has good performance, it requires a large amount of memory, resulting in higher hardware costs. If the data volume is too large, use [Redis Cluster](https://redis.io/topics/cluster-tutorial), the availability of the solution is insufficient in the event of machine failure.
* The HBase+Redis solution uses HBase for underlying storage and Redis for upper level caching. The disadvantage of this solution is that it involves two systems, and the user's read and write logic will be relatively complex. Writing two systems simultaneously can lead to consistency issues. A piece of data needs to be stored in both HBase and Redis simultaneously, which is relatively costly. After the Redis machine crashed, it caused partial cache loss, and at this time, the performance of reading from HBase decreased significantly.

Pegasus combines the advantages of HBase and Redis, ensuring high reliability, good scalability, and excellent performance.
8 changes: 4 additions & 4 deletions _overview/zh/background.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ permalink: /overview/background/
* 开发语言:基于性能考虑,我们选择了C++。
* 数据模型:采用简单的Key-Value数据模型。这既简化了开发,也能满足大部分业务需求。进一步地,我们将Key拆分为了HashKey和SortKey两级,加强了其表达能力。
* 数据分布:采用固定Hash分布。相比Range分布和一致性Hash分布,固定Hash分布实现更简单,数据倾斜和可伸缩性可以通过合理设计Hash键、预设更多的数据分片等措施来解决。我们也支持[Partition Split](https://pegasus.apache.org/zh/administration/partition-split)功能来扩展分片数量。
* 存储介质:建议选择SSD。SSD的性能和成本都介于内存和磁盘之间,从业务需求和成本综合考虑,选择SSD是比较合适的。
* 存储介质:建议选择SSD(固态硬盘)。SSD的性能和成本都介于内存和HDD(机械硬盘)之间,从业务需求和成本综合考虑,选择SSD是比较合适的。
* 本地存储引擎:选择[RocksDB](https://github.com/facebook/rocksdb)。RocksDB在LevelDB基础上做了很多优化,能充分利用SSD的IOPS性能和多核服务器的计算性能。
* 一致性协议:选择[PacificA](https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/)。相比[Raft](https://raft.github.io/),PacificA协议具有其自身的特点和优势。
* 故障检测:和HBase不同,Pegasus没有使用Zookeeper来进行故障检测,而是在MetaServer和ReplicaServer之间实现了基于租约的故障检测机制。
Expand All @@ -27,15 +27,15 @@ Pegasus系统的最初目的就是弥补HBase的不足,这里从用户使用
* 数据模型:HBase是表格模型,采用Range分片;Pegasus是Key-Value模型,采用Hash分片。
* 接口:HBase的API接口功能虽然很丰富,但是使用也更复杂;Pegasus的接口简单,对用户更友好。
* 可靠性:由于架构和实现的原因(如Pegasus采用的本地存储、故障检测、使用C++语言实现等),Pegasus的可靠性通常优于HBase。
* 性能:由于分层架构,HBase的读写性能不是太好,P99通常在几十甚至几百毫秒级别,而且GC问题会带来毛刺问题;Pegasus的P99可以在几毫秒,满足低延迟的在线业务需求
* 性能:由于分层架构,HBase的读写性能不是太好,P99延迟通常在几十甚至几百毫秒,而且GC问题会带来毛刺问题;Pegasus的P99可以在几毫秒,满足敏感在线业务的需求

# 与Redis比较

如果仅从性能角度比较,Redis显然是优于Pegasus的。但如果从性能、可用性、伸缩性、成本等方面综合比较,Pegasus也是有其自身的优势的。
如果仅从读写延迟和单机吞吐比较,Redis显然是优于Pegasus的。但如果从读写延迟、可用性、伸缩性、成本等方面综合比较,Pegasus也是有其自身的优势的。
与Redis进行比较的主要区别如下:
* 数据模型:两者都是Key-Value模型,但是Pegasus支持(HashKey + SortKey)的二级键。
* 接口:Redis的接口更丰富,支持List、Set、Map等容器特性;Pegasus的接口相对简单,功能更单一。
* 性能:Redis性能比Pegasus好。
* 读写延迟:Redis性能比Pegasus好。
* 伸缩性:Pegasus伸缩性更好,可以很方便地增减机器节点,并支持自动的负载均衡;Redis的分布式方案在增减机器的时候比较麻烦。
* 可靠性:Pegasus数据总是持久化的,系统架构保证其较高的数据完整性;Redis在机器宕机后需要较长时间恢复,可用性不够好,还可能丢掉最后一段时间的数据。
* 成本:Pegasus使用SSD存储全量数据,而Redis需要使用内存来存储全量数据,Pegasus成本更低。
Expand Down

0 comments on commit 3ffc393

Please sign in to comment.