Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchRegister service might cause distro sync handle exception and data delay after timeout. #11701

Closed
KomachiSion opened this issue Jan 26, 2024 · 2 comments
Assignees
Labels
area/Naming kind/bug Category issues or prs related to bug.
Milestone

Comments

@KomachiSion
Copy link
Collaborator

Describe the bug
Dubbo 3 中的多端口多协议使用了nacos的batch注册接口,dubbo3中会先进行普通tripple协议的实例注册,随后再进行batch注册。
由于是同一个连接,会进行覆盖注册。

对于连接所处nacos-server,覆盖数据没有问题。但是将这部分数据同步之后,由于distro Handler中的数据处理中的强制转换, 导致handler中处理同步数据失败。

由于数据不一致、distro的verify数据验证会一直失败, 导致这部分数据在数据过期前,会不停的进行同步发起,然后继续失败,直到数据过期(默认3分钟),在数据移除后,下一次的verify会重新获得一次全部数据, 此时的处理会正常,最终数据能保持一致恢复。

 // DistroClientDataProcessor

    private static void processBatchInstanceDistroData(Set<Service> syncedService, Client client,
            ClientSyncData clientSyncData) {
        BatchInstanceData batchInstanceData = clientSyncData.getBatchInstanceData();
        if (batchInstanceData == null || CollectionUtils.isEmpty(batchInstanceData.getNamespaces())) {
            Loggers.DISTRO.info("[processBatchInstanceDistroData] BatchInstanceData is null , clientId is :{}",
                    client.getClientId());
            return;
        }
        List<String> namespaces = batchInstanceData.getNamespaces();
        List<String> groupNames = batchInstanceData.getGroupNames();
        List<String> serviceNames = batchInstanceData.getServiceNames();
        List<BatchInstancePublishInfo> batchInstancePublishInfos = batchInstanceData.getBatchInstancePublishInfos();
        
        for (int i = 0; i < namespaces.size(); i++) {
            Service service = Service.newService(namespaces.get(i), groupNames.get(i), serviceNames.get(i));
            Service singleton = ServiceManager.getInstance().getSingleton(service);
            syncedService.add(singleton);
            BatchInstancePublishInfo batchInstancePublishInfo = batchInstancePublishInfos.get(i);
            // 之前的数据可能是普通的InstancePublishInfo, 导致这里强转失败。
            BatchInstancePublishInfo targetInstanceInfo = (BatchInstancePublishInfo) client
                    .getInstancePublishInfo(singleton);
            boolean result = false;
            if (targetInstanceInfo != null) {
                result = batchInstancePublishInfo.equals(targetInstanceInfo);
            }
            if (!result) {
                client.addServiceInstance(singleton, batchInstancePublishInfo);
                NotifyCenter.publishEvent(
                        new ClientOperationEvent.ClientRegisterServiceEvent(singleton, client.getClientId()));
            }
        }
    }

Expected behavior
非网络等底层问题时, batchRegister的变更应该快速一致收敛

Actually behavior
需要等到数据过期,才能通过verify机制恢复一致

How to Reproduce
Steps to reproduce the behavior:

  1. 注册任意一个服务,然后快速使用batchRegister注册这个服务的一批实例
  2. 观察集群的naming-distro.log, 查看是否有报错
  3. 如果没有,多次重复。

Additional context
应该是设计时没考虑到会有这种用法, 这里简单的使用了强转,但是从逻辑上应该支持这种用法。

@KomachiSion KomachiSion added area/Naming kind/bug Category issues or prs related to bug. labels Jan 26, 2024
@KomachiSion KomachiSion added this to the 2.3.1 milestone Jan 26, 2024
@Daydreamer-ia
Copy link
Contributor

i will resolve it

@Jalyn-X
Copy link

Jalyn-X commented Sep 26, 2024

请问 server 端对【数据同步合并时间】和【数据过期时间】的控制是哪两个参数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/Naming kind/bug Category issues or prs related to bug.
Projects
None yet
Development

No branches or pull requests

3 participants