Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.*] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: #4124

Closed
RocsZH opened this issue Nov 3, 2020 · 21 comments
Labels
status/invalid This doesn't seem right

Comments

@RocsZH
Copy link

RocsZH commented Nov 3, 2020

2020-11-03 16:59:46.088 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.beat.sender] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.089 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.failover] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 2020-11-03 16:59:46.090 [main] WARN o.a.c.loader.WebappClassLoaderBase [173] - The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.naming.push.receiver] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.net.PlainDatagramSocketImpl.receive0(Native Method) java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143) java.net.DatagramSocket.receive(DatagramSocket.java:812) com.alibaba.nacos.client.naming.core.PushReceiver.run(PushReceiver.java:73) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)

and then the java process exit!

p.s. "-XX:+HeapDumpOnOutOfMemoryError",but there is no dump output.

@horizonzy
Copy link
Collaborator

which nacos-client version did you use. This problem is fixed after nacos-client 1.3.1.

@RocsZH
Copy link
Author

RocsZH commented Nov 4, 2020

which nacos-client version did you use. This problem is fixed after nacos-client 1.3.1.
spring-cloud-alibaba-nacos-discovery-0.9.0.RELEASE
nacos-client version is 1.0.0;
nacos service&console version is 1.1.4

@RocsZH
Copy link
Author

RocsZH commented Nov 4, 2020

which nacos-client version did you use. This problem is fixed after nacos-client 1.3.1.

can we upgrade from 1.0.0 to 1.3.1+? and the service&console keeps 1.1.4.

@KomachiSion
Copy link
Collaborator

KomachiSion commented Nov 4, 2020

which nacos-client version did you use. This problem is fixed after nacos-client 1.3.1.

can we upgrade from 1.0.0 to 1.3.1+? and the service&console keeps 1.1.4.

yes, it should be compatible. But you need to check whether spring-cloud version is compatible new verison nacos-client.

@KomachiSion KomachiSion added the status/invalid This doesn't seem right label Nov 4, 2020
@RocsZH
Copy link
Author

RocsZH commented Nov 4, 2020

compatible

what's the compatible information ?

@RocsZH
Copy link
Author

RocsZH commented Nov 4, 2020

which nacos-client version did you use. This problem is fixed after nacos-client 1.3.1.

image-20201104102950448

NO EFFECT !

@horizonzy
Copy link
Collaborator

How do you reproduce it, bro.

@KomachiSion
Copy link
Collaborator

After 1.3.1 nacos-client provide destroy interface for NamingService. You need to call it to stop.

If you use spring-cloud-alibaba, please upgrade the version of spring-cloud-alibaba.

And this WARNING is not OOM ERROR, so your "-XX:+HeapDumpOnOutOfMemoryError" is not be used.

@RocsZH
Copy link
Author

RocsZH commented Nov 6, 2020

How do you reproduce it, bro.
Always Be There!

@KomachiSion
Copy link
Collaborator

Can you have a try for new version of spring-cloud-alibaba-nacos-discovery ?

@RocsZH
Copy link
Author

RocsZH commented Nov 6, 2020

Can you have a try for new version of spring-cloud-alibaba-nacos-discovery ?

Which is the suitable version?

@KomachiSion
Copy link
Collaborator

newest

@wencyGit
Copy link

wencyGit commented Nov 24, 2020

应该是你引入 spring-boot-starter-web 依赖和nacos有冲突.
我也出现多 The web application [ROOT] appears to have started a thread named [com.alibaba.nacos.client.naming.updater] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: 错误
我尝试 把 spring-boot-starter-web 移除就能正常启动

我使用的是

            <dependency>
                <groupId>com.alibaba.cloud</groupId>
                <artifactId>spring-cloud-alibaba-dependencies</artifactId>
                <version>2.2.3.RELEASE</version>
            </dependency>

@KomachiSion KomachiSion added kind/enhancement Category issues or prs related to enhancement. kind/user experience not necessarily an error but can be improved for user experience contribution welcome and removed status/invalid This doesn't seem right labels Dec 8, 2020
@Moo920
Copy link

Moo920 commented Jan 7, 2021

I meet this problem because "@value("${some.config}")" is missing in config yaml

@haoyann
Copy link
Collaborator

haoyann commented Jan 7, 2021

I meet this problem because "@value("${some.config}")" is missing in config yaml

请问你是用的什么版本呢

@horizonzy
Copy link
Collaborator

我这边定位到了这个问题的原因。确认一下是否你们在应用中引入了spring-boot-starter-actuator的依赖?

@horizonzy
Copy link
Collaborator

horizonzy commented Jan 10, 2021

我本地项目成功复现该问题。
mave依赖中包含spring-boot-starter-actuator.

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>com.alibaba.cloud</groupId>
            <artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
            <version>2.2.3.RELEASE</version>
        </dependency>

application配置:

spring:
  application:
    name: nacos-producer
  cloud:
    nacos:
      discovery:
        server-addr: 127.0.0.1:8848

启动类:

@SpringBootApplication
@EnableDiscoveryClient
public class App {

    public static void main(String[] args) {
        SpringApplication.run(App.class, args);
    }
}

本地的nacos-server没有进行启动,8848端口不进行监听。项目启动时,会去注册实例到nacos-server, 但是nacos-server没有启动,就会出现connection refused的报错,导致spring初始化失败,关闭tomcat容器。而警告日志就是在关闭tomcat容器的时候打出来的。

tomcat中打警告日志的逻辑见:org.apache.catalina.loader.WebappClassLoaderBase#clearReferencesThreads. 这个org.apache.catalina.loader.WebappClassLoaderBase的实现类其实就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.

下面是clearReferencesThreads方法的部分逻辑。

 private void clearReferencesThreads() {
        Thread[] threads = getThreads();
        List<Thread> executorThreadsToStop = new ArrayList<>();

        // Iterate over the set of threads
        for (Thread thread : threads) {
            if (thread != null) {
                ClassLoader ccl = thread.getContextClassLoader();
                if (ccl == this) {
                    // Don't warn about this thread
                    if (thread == Thread.currentThread()) {
                        continue;
                    }

                    final String threadName = thread.getName();

                    // JVM controlled threads
                    ThreadGroup tg = thread.getThreadGroup();
                    if (tg != null && JVM_THREAD_GROUP_NAMES.contains(tg.getName())) {
                        // HttpClient keep-alive threads
                        if (clearReferencesHttpClientKeepAliveThread &&
                                threadName.equals("Keep-Alive-Timer")) {
                            thread.setContextClassLoader(parent);
                            log.debug(sm.getString("webappClassLoader.checkThreadsHttpClient"));
                        }

                        // Don't warn about remaining JVM controlled threads
                        continue;
                    }

                    // Skip threads that have already died
                    if (!thread.isAlive()) {
                        continue;
                    }

                    // TimerThread can be stopped safely so treat separately
                    // "java.util.TimerThread" in Sun/Oracle JDK
                    // "java.util.Timer$TimerImpl" in Apache Harmony and in IBM JDK
                    if (thread.getClass().getName().startsWith("java.util.Timer") &&
                            clearReferencesStopTimerThreads) {
                        clearReferencesStopTimerThread(thread);
                        continue;
                    }

                    if (isRequestThread(thread)) {
                       //打印相关的警告
                        log.warn(sm.getString("webappClassLoader.stackTraceRequestThread",
                                getContextName(), threadName, getStackTrace(thread)));
                    } else {
                      //打印相关的警告
                        log.warn(sm.getString("webappClassLoader.stackTrace",
                                getContextName(), threadName, getStackTrace(thread)));
                    }
}

Springboot程序在启动失败过后,会去关闭Tomcat容器,在关闭Tomcat容器的时候会扫描线程,如果对应的线程满足一下几个点,就会打警告日志。

1.判断线程的ContextClassLoader是否和是自己, 也就是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader.
2.判断该线程不是当前线程。
满足两点过后,再判断当前线程的栈帧中是否存在org.apache.catalina.connector.CoyoteAdapter,如果包含这个类,说明当前线程正在处理请求,打警告日志:

The web application [{0}] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Context implementation. Stack trace of request processing thread:[{2}]

如果不包含这个类的话,打警告日志:

The web application [{0}] appears to have started a thread named [{1}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:{2}

这个就是打警告日志的原因。

创建线程时,线程的contextClassLoader和创建线程的线程的contextClassLoader是一致的。
以下来自Thread的构造函数的部分相关逻辑:

   Thread parent = currentThread();
   if (security == null || isCCLOverridden(parent.getClass()))
            this.contextClassLoader = parent.getContextClassLoader();
        else
            this.contextClassLoader = parent.contextClassLoader;

所以,之前的判断打警告日志时,对应线程的classLoader只要是org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader就会打相关的警告日志。

如果创建相关的线程时,Thread.currentThread()的contextClassLoader对应的是TomcatEmbeddedWebappClassLoader的话,
就会发生之前打警告日志的问题。
这涉及到org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedContext初始化的一个逻辑,见方法org.apache.catalina.core.StandardContext#startInternal.相关逻辑如下:

  try {
            if (ok) {
                // Start our subordinate components, if any
                Loader loader = getLoader();
                if (loader instanceof Lifecycle) {
                    ((Lifecycle) loader).start();
                }

                // since the loader just started, the webapp classloader is now
                // created.
                setClassLoaderProperty("clearReferencesRmiTargets",
                        getClearReferencesRmiTargets());
                setClassLoaderProperty("clearReferencesStopThreads",
                        getClearReferencesStopThreads());
                setClassLoaderProperty("clearReferencesStopTimerThreads",
                        getClearReferencesStopTimerThreads());
                setClassLoaderProperty("clearReferencesHttpClientKeepAliveThread",
                        getClearReferencesHttpClientKeepAliveThread());
                setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
                        getClearReferencesObjectStreamClassCaches());
                setClassLoaderProperty("clearReferencesObjectStreamClassCaches",
                        getClearReferencesObjectStreamClassCaches());
                setClassLoaderProperty("clearReferencesThreadLocals",
                        getClearReferencesThreadLocals());

                // By calling unbindThread and bindThread in a row, we setup the
                // current Thread CCL to be the webapp classloader
                unbindThread(oldCCL);
                oldCCL = bindThread();

                // Initialize logger again. Other components might have used it
                // too early, so it should be reset.
                logger = null;
                getLogger();

                Realm realm = getRealmInternal();
                if(null != realm) {
                    if (realm instanceof Lifecycle) {
                        ((Lifecycle) realm).start();
                    }

                    // Place the CredentialHandler into the ServletContext so
                    // applications can have access to it. Wrap it in a "safe"
                    // handler so application's can't modify it.
                    CredentialHandler safeHandler = new CredentialHandler() {
                        @Override
                        public boolean matches(String inputCredentials, String storedCredentials) {
                            return getRealmInternal().getCredentialHandler().matches(inputCredentials, storedCredentials);
                        }

                        @Override
                        public String mutate(String inputCredentials) {
                            return getRealmInternal().getCredentialHandler().mutate(inputCredentials);
                        }
                    };
                    context.setAttribute(Globals.CREDENTIAL_HANDLER, safeHandler);
                }

                // Notify our interested LifecycleListeners
                fireLifecycleEvent(Lifecycle.CONFIGURE_START_EVENT, null);

                // Start our child containers, if not already started
                for (Container child : findChildren()) {
                    if (!child.getState().isAvailable()) {
                        child.start();
                    }
                }

                // Start the Valves in our pipeline (including the basic),
                // if any
                if (pipeline instanceof Lifecycle) {
                    ((Lifecycle) pipeline).start();
                }

                // Acquire clustered manager
                Manager contextManager = null;
                Manager manager = getManager();
                if (manager == null) {
                    if (log.isDebugEnabled()) {
                        log.debug(sm.getString("standardContext.cluster.noManager",
                                Boolean.valueOf((getCluster() != null)),
                                Boolean.valueOf(distributable)));
                    }
                    if ((getCluster() != null) && distributable) {
                        try {
                            contextManager = getCluster().createManager(getName());
                        } catch (Exception ex) {
                            log.error(sm.getString("standardContext.cluster.managerError"), ex);
                            ok = false;
                        }
                    } else {
                        contextManager = new StandardManager();
                    }
                }

                // Configure default manager if none was specified
                if (contextManager != null) {
                    if (log.isDebugEnabled()) {
                        log.debug(sm.getString("standardContext.manager",
                                contextManager.getClass().getName()));
                    }
                    setManager(contextManager);
                }

                if (manager!=null && (getCluster() != null) && distributable) {
                    //let the cluster know that there is a context that is distributable
                    //and that it has its own manager
                    getCluster().registerManager(manager);
                }
            }

            if (!getConfigured()) {
                log.error(sm.getString("standardContext.configurationFail"));
                ok = false;
            }

            // We put the resources into the servlet context
            if (ok)
                getServletContext().setAttribute
                    (Globals.RESOURCES_ATTR, getResources());

            if (ok ) {
                if (getInstanceManager() == null) {
                    setInstanceManager(createInstanceManager());
                }
                getServletContext().setAttribute(
                        InstanceManager.class.getName(), getInstanceManager());
                InstanceManagerBindings.bind(getLoader().getClassLoader(), getInstanceManager());
            }

            // Create context attributes that will be required
            if (ok) {
                getServletContext().setAttribute(
                        JarScanner.class.getName(), getJarScanner());
            }

            // Set up the context init params
            mergeParameters();

            // Call ServletContainerInitializers
            for (Map.Entry<ServletContainerInitializer, Set<Class<?>>> entry :
                initializers.entrySet()) {
                try {
                    entry.getKey().onStartup(entry.getValue(),
                            getServletContext());
                } catch (ServletException e) {
                    log.error(sm.getString("standardContext.sciFail"), e);
                    ok = false;
                    break;
                }
            }

            // Configure and call application event listeners
            if (ok) {
                if (!listenerStart()) {
                    log.error(sm.getString("standardContext.listenerFail"));
                    ok = false;
                }
            }

            // Check constraints for uncovered HTTP methods
            // Needs to be after SCIs and listeners as they may programmatically
            // change constraints
            if (ok) {
                checkConstraintsForUncoveredMethods(findConstraints());
            }

            try {
                // Start manager
                Manager manager = getManager();
                if (manager instanceof Lifecycle) {
                    ((Lifecycle) manager).start();
                }
            } catch(Exception e) {
                log.error(sm.getString("standardContext.managerFail"), e);
                ok = false;
            }

            // Configure and call application filters
            if (ok) {
                if (!filterStart()) {
                    log.error(sm.getString("standardContext.filterFail"));
                    ok = false;
                }
            }

            // Load and initialize all "load on startup" servlets
            if (ok) {
                if (!loadOnStartup(findChildren())){
                    log.error(sm.getString("standardContext.servletFail"));
                    ok = false;
                }
            }

            // Start ContainerBackgroundProcessor thread
            super.threadStart();
        } finally {
            // Unbinding thread
            unbindThread(oldCCL);
        }

其中bindThread()方法会把当前线程的contextClassLoader,也就是AppClassLoader替换成TomcatEmbeddedWebappClassLoader. 后续在finnaly再执行 unbindThread(oldCCL), 再把AppClassLoader进行还原。
在bindThread和unbindThread之间,就会做一些listener的相关start.
关键在于// Call ServletContainerInitializers, 这里会把org.springframework.boot.web.servlet.ServletContextInitializer类型的bean进行获取,org.springframework.boot.actuate.endpoint.web.ServletEndpointRegistrar这个类是spring-boot-starter-actuator中的,它实现了ServletContextInitializer接口,也会在// Call ServletContainerInitializers进行加载。在加载过程中,就会进行ServletEndpointRegistrar的相关实现类加载,而在sca中,存在类com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration,其中有nacos相关的endpoint的实现,这之中会进行NacosNamingService的初始化,这个时候初始化时,contextClassLoader还没有进行unBind, 所以这时创建的线程的contextClassLoader都是TomcatEmbeddedWebappClassLoader。

如果不引入spring-boot-starter-actuator的情况下,NacosNamingService初始化已经是unbindThread之后了,这个时候当前线程的contextClassLoader已经还原成了AppClassLoader。

因此后续如果在spring的初始化周期抛出了没有catch的异常,进行tomcat销毁时,就会判断线程的contextClassLoader是否是TomcatEmbeddedWebappClassLoader,在不引入spring-boot-starter-actuator的情况下,线程的contextClassLoader是AppClassLoader, 不会打警告日志。引入之后,contextClassLoader是TomcatEmbeddedWebappClassLoader,会打警告日志

@horizonzy
Copy link
Collaborator

horizonzy commented Jan 10, 2021

本质原因就是如果在org.springframework.boot.web.servlet.ServletContextInitializer的bean的切入点过程中创建的线程,其contextClassLoader是TomcatEmbeddedWebappClassLoader,如果在unBindThread之前(把contextClassLoader还原成AppClassLoader), spring初始化失败,关闭tomcat容器时,就会打警告日志。

@horizonzy horizonzy added kind/research and removed contribution welcome kind/enhancement Category issues or prs related to enhancement. kind/user experience not necessarily an error but can be improved for user experience labels Jan 10, 2021
@horizonzy horizonzy added status/invalid This doesn't seem right and removed kind/research labels Jan 20, 2021
@horizonzy
Copy link
Collaborator

This is not a problem, so close it.

@zhangmingjia
Copy link

所以到底应该如何解决呢?我试了剔除actuator或者升级nacos-api到1.3.1都不行

@lookupman
Copy link

所以到底应该如何解决呢?我试了剔除actuator或者升级nacos-api到1.3.1都不行

这不是问题,所以关闭它。

问题是你重现的不是别人提的问题啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

8 participants