Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RIP-68] RocketMQ ACL 2.0 #7560

Closed
dingshuangxi888 opened this issue Nov 15, 2023 · 10 comments · Fixed by #7725
Closed

[RIP-68] RocketMQ ACL 2.0 #7560

dingshuangxi888 opened this issue Nov 15, 2023 · 10 comments · Fixed by #7725

Comments

@dingshuangxi888
Copy link
Contributor

dingshuangxi888 commented Nov 15, 2023

Status

Background & Motivation

What do we need to do

  • Will we add a new module? --No.
  • Will we add new APIs? --Yes.
  • Will we add a new feature? --Yes.

Why should we do that

Are there any problems with our current project?

Currently, the RocketMQ ACL design has the following limits:

  1. Non-standard IP whitelist control: As usual, IP whitelist is used for restricting users to access from specific IP or IP ranges. However, in the current RocketMQ ACL, the IP whitelist is used to bypass authentication, which is inconsistent with the common way. This inconsistency may lead to security issues as it allows untrusted IP addresses to bypass access control.
  2. Lack of a scalable interface definition: Currently, ACL configuration information is stored in YAML file pattern. However, in some cloud scenarios, there is a desire to achieve ACL configuration information extension through database storage or microservice provisioning, which is not very convenient in the current design. This limits the flexibility and scalability of ACL configuration, and may not be suitable for specific application scenarios. At the same time, the access control for control-related interfaces is relatively insufficient, resulting in the risk of cluster data leakage.
  3. Ineffective separation of users and permissions: Best practice is to separate the logic of creating users and setting permissions, clearly defining the responsibilities of each. However, in the current design, users and permissions are coupled in a single file, which can lead to the potential leakage of user passwords and subsequently, security issues.

当前RocketMQ ACL的设计存在以下一些限制:

  1. IP白名单控制非标准化。通常情况下,IP白名单用于限制用户只能从特定IP或IP段进行访问。然而,当前RocketMQ ACL中的IP白名单被用于跳过鉴权验证,与常见的IP白名单控制方式不一致。这种不一致性可能导致安全性问题,放过不受信任的IP地址绕过访问控制。
  2. ACL配置缺乏可扩展的接口定义。目前,ACL配置信息采用文件形式存储,但在某些云上场景中,希望能够通过数据库存储或微服务提供来实现ACL配置信息的扩展,但当前设计并不十分方便。这限制了ACL配置的灵活性和可扩展性,并且可能不适用于特定应用场景。同时对于管控相关接口的访问控制也相对不足,导致集群数据存在泄露的风险。
  3. 用户和权限未被有效地分离。最佳实践是将创建用户和设置权限的逻辑进行分离,明确划分两者的职责。然而,当前设计中用户和权限之间耦合在一个文件中,这会导致用户的密码可能会被泄露,从而引发安全问题。

What can we benefit from proposed changes?

  1. Standardized IP whitelist control: The enhanced ACL design provides a more standardized IP whitelist control mechanism. It effectively restricts user requests to specific IP sources and blocks access from untrusted IP addresses.
  2. Scalable ACL configuration and authentication mechanism: The improved design allows for easy extension and implementation of ACL-related logic. Users can conveniently customize and expand ACL configurations to meet their specific requirements. Additionally, the ACL design includes access control for control-related interfaces, enhancing the overall security of the system.
  3. Effective separation of user and permission management: The optimized design successfully achieves a clear separation between user authentication and permission management, establishing explicit responsibilities and boundaries for each. This enhancement significantly improves the security of the system. Additionally, user passwords are securely encrypted, effectively reducing the risk of password leaks.

  1. 更加标准化的IP白名单控制:优化后的ACL设计提供了更加标准化的IP白名单控制机制。它有效地限制了用户只能从特定的IP来源进行请求,并阻止了不受信的IP地址的访问。
  2. 可扩展的ACL配置和鉴权逻辑:改进后的设计允许用户便捷地扩展和实现ACL相关逻辑。用户可以方便地自定义和扩展ACL配置,以满足特定需求。此外,ACL设计还包括对管控接口的ACL访问控制,提升了系统的整体安全性。
  3. 用户权限有效分离:优化后的设计实现了用户认证和权限管理的有效分离,明确划分了它们的职责和边界,提升了系统的安全性。同时,用户密码采用安全加密,有效降低了密码泄露的风险。

Goals

  • What problem is this proposal designed to solve?

To address the aforementioned issues, the key objectives of RocketMQ ACL 2.0 optimization are as follows:

  1. Provide a relatively standardized and universal model to support user authentication for different operations on all resources in RocketMQ. This ensures consistency and comprehensiveness in permission control throughout the RocketMQ system.
  2. Offer a set of standardized interface abstractions for user and permission management, as well as authentication logic in RocketMQ, to facilitate extension and implementation. This enables users to flexibly customize ACL configurations based on their specific needs and seamlessly integrate with other systems.
  3. Implement separation of users and permissions, while supporting encryption of user passwords, to prevent security issues caused by password leakage. By separating user and permission management logic and employing appropriate encryption measures, user password security is ensured.
  4. Provide a certain level of compatibility with ACL 1.0 to facilitate a relatively seamless upgrade from ACL 1.0 to the 2.0 mode. This minimizes inconvenience during the upgrade process and preserves users' existing ACL configurations and permission settings.

为了解决以上问题,RocketMQ ACL 2.0的主要目标如下进行优化:

  1. 提供一个相对标准且通用的模型,以支持RocketMQ中所有资源的不同操作的用户鉴权。这将确保权限控制在整个RocketMQ系统中具有统一性和完备性。
  2. 为RocketMQ的用户、权限的管理,以及鉴权逻辑提供一套标准的接口抽象,以方便进行扩展和实现。这将使用户可以根据自身需求灵活地定制ACL配置,并且能够轻松地与其他系统进行集成。
  3. 实现用户和权限的分离,同时支持用户密码的加密,以避免密码泄露引发安全问题。通过将用户和权限的管理逻辑分离,并采用适当的加密措施,确保用户的密码安全性。
  4. 提供与ACL 1.0的一定程度上兼容的功能,以便用户能够相对轻松地从ACL 1.0升级到2.0模式。这将减少升级过程中的不便,并保护用户现有的ACL配置和权限设置。
  • To what degree should we solve the problem?
  1. Introduce a new user authentication and permission verification system (2.0) to replace the existing ACL 1.0 version.
  2. Authorization scope covers message sending, message consumption, and all admin-related interfaces.
  3. Retain ACL 1.0's authorization logic and configuration, with version switching supported through configuration, allowing for a seamless migration from 1.0 to 2.0.

  1. 新增一套全新的用户认证和权限验证体系(2.0),以替代现有的ACL 1.0版本
  2. 鉴权范围覆盖消息发送、消息消费,以及所有admin相关的接口
  3. 保留ACL 1.0的鉴权逻辑及配置,通过配置支持版本切换,支持从1.0迁移到2.0版本

Non-Goals

  • What problem is this proposal NOT designed to solve?
  1. To ensure the security within the RocketMQ cluster, we plan to introduce a permission control mechanism in scenarios such as Broker registration with NameServer and primary-backup replication between Brokers. This is to prevent data leaks and other security issues. This feature will be supported in subsequent iterations.
    1. During the Broker registration process with NameServer, we will control access permissions for relevant interfaces to ensure that only authorized Brokers can successfully register. This effectively prevents unauthorized Brokers from accessing the cluster, thus protecting the entire cluster's security.
    2. During primary-backup replication between Brokers, we will also implement permission control for relevant interfaces. Only authorized backup Brokers can obtain data replication permissions from the primary Broker, ensuring that data replication only occurs within authorized scope and preventing the risk of unauthorized data leaks.
  2. In order to enhance user management experience, the RocketMQ open-source console plans to introduce permission control functionality to achieve fine-grained resource management. This functionality mainly includes two aspects:
    1. Console login authentication: During the console login process, users need to provide a username and password for identity verification. As the Broker already supports permission control for management-related APIs, the console also needs to carry the corresponding user information when accessing the Broker. This ensures that only authorized users can successfully access the resources. In this iteration, we provide a solution that allows users to add a username and password with full access control permissions in the console's configuration to access the Broker. User login permission management will be further improved in subsequent versions.
    2. Console user and permission management: In order to configure users and permissions in a more friendly and convenient manner, we plan to provide a visual interface for managing users and permissions, instead of relying on command-line operations. This will greatly enhance the user experience and simplify the operation process. This feature will be implemented in subsequent iterations.

  1. 为了确保RocketMQ集群内部的安全性,在诸如Broker向NameServer注册、Broker之间主备复制等场景中,我们计划引入权限控制机制,以防止数据泄露等安全问题的出现。该功能将在后续迭代中予以支持
    1. 在Broker向NameServer注册的过程中,我们将对相关接口进行访问权限的控制,确保只有授权的Broker才能成功注册。这样可以有效防止未经授权的Broker接入集群,从而保护整个集群的安全。
    2. 在Broker之间进行主备复制时,我们也会对相关接口进行权限控制。只有经过授权的备份Broker才能获取主Broker的数据复制权限,确保数据只在授权范围内进行复制,防止未经授权的复制操作导致数据泄露的风险。
  2. 为了提升用户管理体验,RocketMQ开源控制台计划引入权限控制功能,以实现对资源进行精细化管理。该功能主要包括两个方面:
    1. 控制台登录鉴权:控制台登录过程中,用户需要提供用户名和密码进行身份验证。由于Broker已经支持对管控相关API的权限控制,因此控制台访问Broker时也需要携带相应的用户信息。这样可以确保只有经过授权的用户才能成功访问资源。在本次迭代中,我们提供了一种解决方案,允许用户在控制台的配置中添加一个具有全部访问控制权限的用户名和密码,以实现对Broker的访问。用户登录权限的管理将在后续版本中进行完善。
    2. 控制台用户及权限管理:为了更加友好和便捷地配置用户和权限,我们计划提供一个可视化的界面来管理用户和权限,而不再依赖于命令行操作。这将大大提升用户的使用体验,并简化操作流程。该功能将在后续的迭代中实现。

Changes

Concepts

RocketMQ ACL 2.0 primarily adopts the Attribute-Based Access Control (ABAC) model in its permission system. The main concepts are as follows:
关于RocketMQ ACL 2.0主要采用权限体系中的ABAC模型,主要的概念如下:

定义 名称 描述
Principal 主体 The principal typically refers to the entity that attempts to access a resource, which can be a user, application, service, or other entity. 主体通常是指试图访问资源的实体,可以是用户、应用程序、服务或其他实体。
Resource 资源 A resource is an object that needs to be subject to access control, such as a cluster, topic, group, and so on. 资源是需要受到访问控制的对象,如集群、Topic、Group等。
Action 动作 An action represents the operation that a subject attempts to perform, such as read, write, delete, execute, and so on. Different actions may require different permissions and policies for control. 动作表示主体试图执行的操作,如读取、写入、删除、执行等。不同的动作可能需要不同的权限和策略来控制。
Environment 环境 An environment represents the contextual environment in which an access request occurs, such as time, location, network status, and so on. 环境表示访问请求发生的上下文环境,例如时间、地点、网络状态等
Attributes 属性 Attributes are characteristics or information about subjects, resources, and context, used to describe various aspects of entities. 属性是关于主体、资源和上下文的特征或信息,用于描述实体的各个方面
Policy 策略 Authorization policies define authorization rules and conditions that determine whether a subject is allowed to perform specific actions on a resource. 授权策略定义了授权规则和条件,用于决定主体是否有权对访问资源执行特定的动作
Decision 决策 A decision is the outcome of an access request, where the system determines whether to grant or deny access based on the defined access policies and attribute information. 决策是访问请求的结果,基于定义的访问策略和属性信息,系统决定是否授予访问权限。通常是允许(Grant)或拒绝(Deny)。
Evaluator 评估器 An evaluator is a component used to execute ABAC rules and policies. It makes access control decisions based on the attributes of the subject, attributes of the resource, and the required action. The evaluator typically includes permission validation and decision making. 评估器是用于执行ABAC规则和策略的组件,它根据主体的属性、资源的属性以及所需的动作来做出访问控制决策。评估器通常包括了许可的验证和决策。

Architecture

Original

image
In the original architecture of RocketMQ, the Broker is responsible for managing the storage of users and permissions, as well as implementing the logic for authentication and authorization. Whether it is handling interfaces such as message sending and receiving, cluster management, or data querying, all must go through ACL's authentication and authorization logic to ensure the legitimacy of all requests.
在RocketMQ的原始架构中,Broker负责管理用户和权限的存储,并实现认证和授权的逻辑。无论是处理消息的发送和接收、集群管理还是数据查询等接口,都必须经过ACL的认证和授权逻辑,以确保所有请求的合法性。

With Proxy

image
In the RocketMQ architecture with Proxy, the sending and consumption of messages are delegated to the Proxy component, hence authentication and authorization are carried out within the Proxy component. However, the cluster management-related interfaces are still handled by the Broker, requiring authentication and authorization from the Broker. As for the storage of user and permission information, due to the fact that only the Broker is a stateful application, it is still stored on the Broker, and the Broker provides corresponding interfaces for the Proxy to query and perform subsequent authentication logic.
在具备Proxy的RocketMQ架构中,消息的发送和消费由Proxy组件进行代理,因此在Proxy组件中进行认证和授权。然而,集群管理的相关接口仍由Broker负责处理,因此这些接口需要经过Broker的认证和授权。关于用户和权限的存储方面,由于只有Broker是有状态应用,因此仍将其存储在Broker上,并由Broker提供相关接口供Proxy查询,以执行后续的鉴权逻辑。

Domain Model

image

Date Type

Principal

字段 名称 描述
User 用户

Resource

字段 名称 描述
Cluster 集群
Namespace 命名空间
Topic 主题
Group 消费者组

Action

字段 名称 描述
PUB Produce messages 生产消息
SUB Consume messages 消费消息
Create Create resource 创建资源
Update Update resource 更新资源
Delete Delete resource 删除资源
Get Get resource detail 描述资源详情
List Batch list resources 批量查询资源

Interface Design/Change

  1. First, all APIs related to RocketMQ ACL 1.0 will be removed.
  2. Then, all APIs related to RocketMQ ACL 2.0 will be added.

  1. 首先,会去除所有关于RocketMQ ACL 1.0相关的API
  2. 然后,会新增所有关于RocketMQ ACL 2.0相关的API

The following are the API definitions for RocketMQ ACL 2.0:
以下是RocketMQ ACL 2.0的API定义:

User management

Interface definition Interface name Interface description
CreateUser Create user api 创建用户
DeleteUser Delete user api 删除用户 Deleting a user will also delete their permissions. 删除用户同时会删除权限
UpdateUser Update user api 更新用户 Primary update the user's password.
主要更新用户密码
DescribeUser Describe user detail api 查询用户详情 This interface will return the user's password. 该接口会返回用户密码
ListUser Batch list user api 查询用户列表 This interface does not return the user's password. 该接口不返回用户密码

Permission management

Interface definition Interface name Interface description
CreateAcl Create acl 创建权限
DeleteAcl Delete acl 删除权限 It is possible to delete individual permission policies under an account or delete the entire permission policy. 可删除账号下单个权限策略,也可以删除整个权限策略
UpdateAcl Update acl 更新权限 Individual permission policies under an account can be updated, as well as the entire permission policy. 可更新账号下单个权限策略,也可以更新整个权限策略
DescribeAcl Describe acl detail 查询权限
ListAcl Batch list acl 查询权限列表

Implementation

Mapping of API, Resources, and Actions

为了方便获取每个API请求对应的操作和资源信息进行鉴权,需要进行以下改造:
To facilitate the authorization process by obtaining information about the operation and resource corresponding to each API request, the following modifications need to be made:

Remoting API

For Remoting-related interfaces, as there are a large number of interfaces and each interface has a different resource name definition, to facilitate obtaining resource definitions, annotations can be used to label each API's RequestHeader, indicating which fields are resources and their IDs. This way, it will be easier to retrieve relevant information during the authorization process. The specific steps are as follows:
对于Remoting相关接口,由于接口数量较多且每个接口定义的资源名称不同,为了方便获取资源定义,可以通过注解的方式为每个API的RequestHeader进行打标,标识字段中哪些是资源以及资源的ID。这样可以在鉴权过程中更方便地获取相关信息。具体如下:
image
Considering the frequent invocation and high performance requirements of the message sending and receiving interfaces, a hard-coded implementation approach is still used to ensure performance.
考虑到消息收发相关接口调用频繁且对性能要求较高,仍然采用硬编码的方式实现消息收发接口,以保证性能。

gRPC API

For gRPC-related interfaces, it is difficult to implement them using abstract classes and annotations because the protocol is defined and generated through proto files. Currently, it mainly involves PUB and SUB types of interfaces, so it can be implemented through hardcoding. Please refer to:
对于gRPC相关的接口,由于协议是通过proto文件进行定义和生成的,难以通过抽象类和注解的方式实现。目前主要涉及PUB和SUB类型的接口,因此可以先通过硬编码的方式实现,具体参考:
org.apache.rocketmq.acl.plain.PlainAccessResource#parse(com.google.protobuf.GeneratedMessageV3, org.apache.rocketmq.acl.common.AuthenticationHeader)

Storage model

The storage of ACL is primarily kept locally in the Broker, and there are two main options: file storage and RocksDB-based storage. In this case, RocksDB is used for storage, while file storage is provided as an example for reference purposes.
关于ACL的存储主要在Broker本地进行,有两种主要方案可供选择,一种是文件存储,另一种是基于RocksDB的存储。本次存储采用RocksDB,而文件存储仅作为示例供参考。

4.2.1 File storage

User file storage
[{
  "username": "rocketmq",
  "password": "xxxxxx"
}]
Permissions file storage
[{
  "principal": "User:rocketmq",
  "policies":[{
    "policyId": 1,
    "resources":["Topic:topic-*","Group:group-*"],
    "actions":["PUB","SUB"],
    "environment":{
      "sourceIps": ["192.168.0.0/24"]
    },
    "decision": "Grant"
  }]
}]

RocksDB-based storage

In RocksDB, the data storage method is similar to file storage, but it uses the principal as the primary key for storage. Compared to file storage, using RocksDB may be simpler in terms of management.
在RocksDB中,存储的数据方式与文件存储相似,但是使用主体(principal)作为主键进行存储。相对于文件存储,使用RocksDB的方式可能在管理层面更加简单。

ACL evaluation

image

Compatibility, Deprecation, and Migration Plan

Are backward and forward compatibility taken into consideration?

Yes. In order to facilitate user migration, relevant compatibility handling needs to be done. Here are several modifications for compatibility:

  1. Add the corresponding version number in the configuration item, such as rocketmq.acl.version. By default, the value is 2.0, but users can change it to 1.0 to read the existing files and execute the original authentication logic.
  2. For ACL 1.0, the existing authentication logic should be kept separate from the newly added 2.0 authentication logic to avoid mutual impacts on subsequent code modifications.

是的。为了方便用户进行迁移,需要进行相关的兼容性处理。以下是针对兼容的几点改造:

  1. 在配置项中添加相应的版本号,例如rocketmq.acl.version,默认情况下该值为2.0,但用户可以将其改为1.0,以读取原有的文件并执行原有的鉴权逻辑。
  2. 针对ACL 1.0,仍然保留现有的鉴权逻辑,和新增的2.0鉴权逻辑分离,避免后续代码修改的相互影响。

Are there deprecated APIs?

Yes, the APIs related to ACL 1.0 will be removed. If you want to modify the permissions of ACL 1.0, you can achieve it by directly modifying the file.
是的,ACL1.0相关的API会被移除。如果你想要修改ACL1.0的权限,你可以通过直接修改文件的方式来实现。

How do we do migration?

  1. Add the configuration rocketmq.acl.version=1.0 to the Broker's configuration file, then upgrade to a RocketMQ version that supports ACL 2.0.
  2. Use the Admin tool to migrate the permission configurations from ACL 1.0 to ACL 2.0, ensuring data integrity. Note that the global IP whitelist is no longer supported. Please confirm whether you need to use it and implement it through alternative means.
  3. Modify the Broker configuration by changing rocketmq.acl.version to 2.0, then restart the Broker to confirm the correctness of the authentication.

  1. 在Broker的配置文件中添加 rocketmq.acl.version=1.0 配置,然后升级到支持ACL 2.0的RocketMQ版本。
  2. 使用Admin工具将ACL 1.0的权限配置迁移到ACL 2.0中,确保数据完整性。请注意,全局IP白名单不再支持,请确认是否需要使用,并通过其他方式实现。
  3. 修改Broker配置,将 rocketmq.acl.version 改为2.0,并重启Broker,确认鉴权是否正确。

Implementation Outline

The feature is currently under development, and I will complete it as soon as possible and submit it to the community.
该功能正在研发中,我会尽快完成并提交到社区。

@lizhimins lizhimins changed the title RIP 67 RocketMQ ACL 2.0 [RIP-67] RocketMQ ACL 2.0 Nov 15, 2023
@BeiQiaoT
Copy link

Wow~ looking forward to it.

@dingshuangxi888 dingshuangxi888 changed the title [RIP-67] RocketMQ ACL 2.0 [RIP-68] RocketMQ ACL 2.0 Nov 17, 2023
@echooymxq
Copy link
Contributor

Regarding the ACL storage component, i have a little question, especially as the Controller and Dledger modes are increasingly becoming the top choices. Why not place ACL information in Namesrv? This could potentially simplify Broker scaling, and proxy will get the metadata(include acl) from namesrv.

@dingshuangxi888
Copy link
Contributor Author

@echooymxq
First, the NameServer is a stateless application that allows for flexible scaling and dynamic upscaling or downscaling without bearing the responsibilities related to data persistence storage.
Second, at the architectural level, the NameServer does not handle metadata capabilities; instead, it serves the functions of service registration and routing discovery.
Finally, as a stateful application, the Broker has the responsibility and capability for metadata storage at the data storage layer.

@echooymxq
Copy link
Contributor

@dingshuangxi888 The namesrv not stateless actually. such as in controller mode, maybe the controller mode is special case. but there's some kvConfig.json about orderConf, and rocketmq-mqtt also rely on it.

@yp969803
Copy link
Contributor

Interesting, looking forward to contribute!

@yp969803
Copy link
Contributor

@dingshuangxi888 can you assign somthing to me to contribute on this feature?

@Cloud-Yao
Copy link

when this feature release?

lizhimins pushed a commit that referenced this issue Mar 18, 2024
[ISSUE #7560] [RIP-68] Support RocketMQ ACL 2.0 (#7725)
@zhangjidi2016
Copy link
Contributor

If you use the proxy cluster mode and enable authentication, each request is authenticated twice (proxy and broker)?@dingshuangxi888

@dingshuangxi888
Copy link
Contributor Author

If you use the proxy cluster mode and enable authentication, each request is authenticated twice (proxy and broker)?@dingshuangxi888

The configuration for the proxy and the broker is different; only the proxy will be enabled if it's in local mode.

@frinda
Copy link
Contributor

frinda commented Jul 25, 2024

Are there any plans to reverse this ability into RocketMQ 4.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants