Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbortPolicyWithReport 高并发情况下 dump 多次的问题 #13042

Closed
1 task done
giraffe-tree opened this issue Sep 11, 2023 · 0 comments · Fixed by #13043
Closed
1 task done

AbortPolicyWithReport 高并发情况下 dump 多次的问题 #13042

giraffe-tree opened this issue Sep 11, 2023 · 0 comments · Fixed by #13043
Labels
type/bug Bugs to being fixed

Comments

@giraffe-tree
Copy link
Contributor

giraffe-tree commented Sep 11, 2023

  • I have searched the issues of this repository and believe that this is not a duplicate.

Environment

  • Dubbo version: 2.7.x, 3.x
  • Operating System version: ALL
  • Java version: ALL

Steps to reproduce this issue

高并发请求时, 偶发复现

Expected Behavior

间隔指定时间, 比如 10 分钟, 再进行 threaddump

Actual Behavior

下面是服务器上的日志记录, 可以看到 dubbo 线程池耗尽时的 threaddump 并没有按照预期的间隔 10分钟进行

-rw-rw-r-- 1 root root 695929  Sep 1  19:30 Dubbo_JStack.log.2023-09-01_19:30:39
-rw-rw-r-- 1 root root 1068779 Sep 3  09:00 Dubbo_JStack.log.2023-09-03_09:00:35
-rw-rw-r-- 1 root root 2326331 Sep 3  23:53 Dubbo_JStack.log.2023-09-03_23:53:21
-rw-rw-r-- 1 root root 1081798 Sep 5  09:00 Dubbo_JStack.log.2023-09-05_09:00:35
-rw-rw-r-- 1 root root 1652182 Sep 5  09:00 Dubbo_JStack.log.2023-09-05_09:00:38
-rw-rw-r-- 1 root root 956693  Sep 9  19:30 Dubbo_JStack.log.2023-09-09_19:30:34
-rw-rw-r-- 1 root root 1515179 Sep 9  19:30 Dubbo_JStack.log.2023-09-09_19:30:37
-rw-rw-r-- 1 root root 981556  Sep 10 09:00 Dubbo_JStack.log.2023-09-10_09:00:38
-rw-rw-r-- 1 root root 1443054 Sep 10 09:00 Dubbo_JStack.log.2023-09-10_09:00:42
-rw-rw-r-- 1 root root 999105  Sep 11 09:00 Dubbo_JStack.log.2023-09-11_09:00:38
-rw-rw-r-- 1 root root 1500287 Sep 11 09:00 Dubbo_JStack.log.2023-09-11_09:00:43

原因分析

按照以下顺序获取锁就可能导致前面的现象, 线程切换 + 错误的顺序设置 lastPrintTime 共同导致了这个问题

thread1:  1.  检查10分钟间隔通过  3. 获取锁  4. threaddump(stw) 5. 释放锁    7. 设置 lastPrintTime
thread2:  2. 检查10分钟间隔通过                                    6. 获取锁     8. threaddump(stw)  9. 释放锁

源码如下 https://github.com/giraffe-tree/dubbo/blob/3.2/dubbo-common/src/main/java/org/apache/dubbo/common/threadpool/support/AbortPolicyWithReport.java

        //dump every 10 minutes
        if (now - lastPrintTime < TEN_MINUTES_MILLS) {
            return;
        }

        if (!guard.tryAcquire()) {
            return;
        }

        ExecutorService pool = Executors.newSingleThreadExecutor();
        pool.execute(() -> {
            //  ....代码省略
            try (FileOutputStream jStackStream = new FileOutputStream(
                new File(dumpPath, "Dubbo_JStack.log" + "." + dateStr))) {
                jstack(jStackStream);
            } catch (Exception t) {
                logger.error(COMMON_UNEXPECTED_CREATE_DUMP, "", "", "dump jStack error", t);
            } finally {
                guard.release();
            }
            lastPrintTime = System.currentTimeMillis();
        });
        //must shutdown thread pool ,if not will lead to OOM
        pool.shutdown();

修复方案

  1. 在获取锁后再次判断是否满足时间要求
  2. 在释放锁前设置 lastPrintTime
        if (now - lastPrintTime < TEN_MINUTES_MILLS) {
            return;
        }

        if (!guard.tryAcquire()) {
            return;
        }
       // 1. 此处再次判断时间
       if (System.currentTimeMillis() - lastPrintTime < TEN_MINUTES_MILLS) {
            return;
        }

        ExecutorService pool = Executors.newSingleThreadExecutor();
        pool.execute(() -> {
            //  ....代码省略
            try (FileOutputStream jStackStream = new FileOutputStream(
                new File(dumpPath, "Dubbo_JStack.log" + "." + dateStr))) {
                jstack(jStackStream);
            } catch (Exception t) {
                logger.error(COMMON_UNEXPECTED_CREATE_DUMP, "", "", "dump jStack error", t);
            } finally {
                // 2. 在释放锁前设置 lastPrintTime
                lastPrintTime = System.currentTimeMillis();
                guard.release();
            }
            
        });
        //must shutdown thread pool ,if not will lead to OOM
        pool.shutdown();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bugs to being fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant