Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in Jetty using Logback #8354

Closed
alekkol opened this issue Jul 28, 2022 · 22 comments
Closed

Deadlock in Jetty using Logback #8354

alekkol opened this issue Jul 28, 2022 · 22 comments
Labels
Bug For general bugs on Jetty side

Comments

@alekkol
Copy link

alekkol commented Jul 28, 2022

Jetty version(s)
Jetty 10.0.11
Logback 1.2.11

Java version/vendor
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode)

OS type/version
Initially reproduced in GitHub Actions on the ubuntu-latest runner
Also reproduced locally on OSX 12.2

Description
org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy uses internal AutoLock instance to synchronize access to mutable state, this lock is acquired on each toString() call. Logback library internals contains synchronized method ch.qos.logback.core.AppenderBase.doAppend() which is invoked for every logging event.
The following situation causes deadlock and full application outage due to blocked logging:
Thread 1:

// lock AppenderBase.doAppend() and then awaiting for releasing AdaptiveExecutionStrategy.toString()
AdaptiveExecutionStrategy.tryProduce() {
  if (LOG.isDebugEnabled()) {
    LOG.debug("{} tryProduce {}", this, wasPending);
  } 
}

Thread 2:

// lock AdaptiveExecutionStrategy and then awaiting for releasing AppenderBase.doAppend()
AdaptiveExecutionStrategy.selectSubStrategy() {
  try (AutoLock l = _lock.lock()) {
    LOG.info("hello"); 
  }
}

Thread dump with the issue:

"embedded-jetty-177" #177 prio=5 os_prio=31 cpu=4.44ms elapsed=391.19s tid=0x000000011a9fe800 nid=0x1490b waiting on condition  [0x000000028b73d000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
	- parking to wait for  <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.3/AbstractQueuedSynchronizer.java:715)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.3/AbstractQueuedSynchronizer.java:938)
	at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@17.0.3/ReentrantLock.java:153)
	at java.util.concurrent.locks.ReentrantLock.lock(java.base@17.0.3/ReentrantLock.java:322)
	at org.eclipse.jetty.util.thread.AutoLock.lock(AutoLock.java:44)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.toString(AdaptiveExecutionStrategy.java:524)
	at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:277)
	at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:249)
	at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:211)
	at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:161)
	at ch.qos.logback.classic.spi.LoggingEvent.getFormattedMessage(LoggingEvent.java:293)
	at com.workato.agent.log.CloudAppender.postProcessEvent(CloudAppender.java:97)
	at com.workato.agent.log.CloudAppender.postProcessEvent(CloudAppender.java:39)
	at com.workato.agent.log.CircularBufferAppender.append(CircularBufferAppender.java:125)
	at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:82)
	- locked <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
	at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
	at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
	at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
	at ch.qos.logback.classic.Logger.debug(Logger.java:490)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$616/0x00000008005191f0.run(Unknown Source)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	at java.lang.Thread.run(java.base@17.0.3/Thread.java:833)
	
"embedded-jetty-178" #178 prio=5 os_prio=31 cpu=0.86ms elapsed=391.19s tid=0x000000012a640000 nid=0xb607 waiting for monitor entry  [0x000000028bb55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:63)
	- waiting to lock <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
	at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
	at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
	at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
	at ch.qos.logback.classic.Logger.debug(Logger.java:490)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:724)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor.startReservedThread(ReservedThreadExecutor.java:257)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor.tryExecute(ReservedThreadExecutor.java:232)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.tryExecute(QueuedThreadPool.java:735)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.selectSubStrategy(AdaptiveExecutionStrategy.java:331)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$616/0x00000008005191f0.run(Unknown Source)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	at java.lang.Thread.run(java.base@17.0.3/Thread.java:833)

   Locked ownable synchronizers:
	- <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
@alekkol alekkol added the Bug For general bugs on Jetty side label Jul 28, 2022
@alekkol
Copy link
Author

alekkol commented Jul 28, 2022

Also reproduced on Jetty 10.0.10, but cannot reproduce on 10.0.9

@joakime
Copy link
Contributor

joakime commented Jul 28, 2022

Is there a reason you are using an old logback?
This is a known issue in Logback 1.2.x series.

For Jetty 10 and Jetty 11 we use Logback 1.3.0 series.

@joakime
Copy link
Contributor

joakime commented Jul 28, 2022

Also reproduced on Jetty 10.0.10, but cannot reproduce on 10.0.9

The class named AdaptiveExecutionStrategy came into being for release 10.0.6.
It had an update in release 10.0.8 for javadoc, no change in logic.

https://github.com/eclipse/jetty.project/commits/jetty-10.0.x/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/AdaptiveExecutionStrategy.java

There should be the same behavior between 10.0.10 and 10.0.9.

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

@joakime Logback 1.2.* is not abandoned and in fact is being developing in parallel with 1.3.* Migration to the latest version is not an option right now due to dependencies issues (e.g. 1.3.* has the Jakarta API in transitive dependencies).
As for the deadlock issue, using an exclusive lock in the toString() implementation and leaking this reference into uncontrolled code leads to dangerous situation: toString() can be called later in any thread under any locks.
As a simple fix this reference can be replaced with explicit toString() call:
LOG.debug("{} tryProduce {}", this, wasPending) -> LOG.debug("{} tryProduce {}", this.toString(), wasPending)
If you find these argument having sense I can prepare the PR with the fixes.

@joakime
Copy link
Contributor

joakime commented Jul 29, 2022

Lets look at the history of this lock in toString().
As if this was a concern, it would happen for far more of a range then just 10.0.10 thru 10.0.9.
It's been present in every release of Jetty 9.4.4 onwards for the past 7 years (even includes all releases of Jetty 10 and Jetty 11).

July 29, 2022 (Today)

Last seen in Jetty 10.0.11
First seen in Jetty 10.0.6

The current AdaptiveExecutionStrategy.toString()

https://github.com/eclipse/jetty.project/blob/4a6c2744a0d4ff404daee0a521e88ae386356e06/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/AdaptiveExecutionStrategy.java#L522-L528

June 16, 2021

Last seen in Jetty 10.0.5
First seen in Jetty 10.0.0

https://github.com/eclipse/jetty.project/blob/a492473a73fee96680baf56aa9b3223d83502f0c/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L436-L442

This is what it looked like before the class was renamed in Jetty 10.0.6.

The rename of EatWhatYouKill to AdaptiveExecutionStrategy occurred in commit a415606

July 27, 2020

Last seen in Jetty 10.0.0
First seen in Jetty 10.0.0

https://github.com/eclipse/jetty.project/blob/8d69fc41a7f8c26fcd90fc0f7300f81eaede06be/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L438-L441

August 8, 2017

Last seen in Jetty 10.0.0-alpha
First seen in Jetty 9.4.7

https://github.com/eclipse/jetty.project/blob/a105be95e44fae91fcf8b85b5d6d0f78fd143a04/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L266-L272

This was a change in commit a105be9

March 31, 2017

Last seen in Jetty 9.4.6
First seen in Jetty 9.4.4

https://github.com/eclipse/jetty.project/blob/f06c7c358f23674d33b629b3d5f402042ffd4656/jetty-util/src/main/java/org/eclipse/jetty/util/thread/strategy/EatWhatYouKill.java#L336-L345

When the class was first created in commit f06c7c3

@joakime
Copy link
Contributor

joakime commented Jul 29, 2022

@joakime Logback 1.2.* is not abandoned and in fact is being developing in parallel with 1.3.* Migration to the latest version is not an option right now due to dependencies issues (e.g. 1.3.* has the Jakarta API in transitive dependencies).

That jakarta dependency might look scary at first, but follow the trail ...

This references jakarta.servlet:jakarta-servlet-api:<ver-from-parent> declared as <optional>true</optional> so it won't pull it in without you specifically declaring it.

This references jakarta.servlet:jakarta-servlet-api:5.0.0

Servlet 5.0 is part of the Jakarta EE9 effort which had namespace changes from javax.servlet to jakarta.servlet (that happened in Jakarta EE9 as part of the "jakarta ee big bang"). Servlet 5.0 is compatible with Jetty 11, but not Jetty 10, which you are using now.

The logback-core has no actual direct usage of servlet.
The existence of those dependencies is to satisfy the jpms requirements for logback-classic and logback-access.
logback 1.3.0 changed to using jakarta.servlet in commit qos-ch/logback@2ff2d4e for release 1.3.0-alpha11 (and was motivated by the changes in Spring 6.0)

But servlet is still optional for logback, not mandatory.
If you don't use the servlet features in logback-classic or logback-access then use logback without worry.

If you use logback-access, then note that logback-access has other issues, so don't use it (any version of logback-access) until they fix the open bugs with it.

logback-access bugs:

Jetty bugs about logback-access:

We are heavy users of logback ourselves, in fact we prefer it over log4j2 and alternatives, and have not experienced this deadlock issue ourselves, even during extreme load testing (think saturated multi gigabit network interfaces during testing). A deadlock would definitely be noticed.
We use logback 1.2 in Jetty 9.x (which is now at End of Community Support) and logback 1.3+ for Jetty 10/11/12 in our own testing.

Have you considered that this deadlock could be due to a specific configuration quirk on your logback setup/configuration?
This has happened before, see LOGBACK-1422 (it shows a few different examples of configurations that can trigger it, and even some workarounds/fixes)

@joakime
Copy link
Contributor

joakime commented Jul 29, 2022

As a simple fix this reference can be replaced with explicit toString() call:
LOG.debug("{} tryProduce {}", this, wasPending) -> LOG.debug("{} tryProduce {}", this.toString(), wasPending)
If you find these argument having sense I can prepare the PR with the fixes.

This is a simple enough change, I can make it.
But I have doubts it will change anything.
As that specific line is not ever within a Lock.

Ideally, a testcase that can replicate this is an important first step.
If it is an issue, we should be able to replicate it without the need for logback deps.
Using the jetty-slf4j-impl, you can instantiate a new JettyLogger with a test/custom Appender that does the locking in the same way as logback.
Then attempt to cause the deadlock in the identified code.
Then, finally, a PR to fix it can follow.

joakime added a commit that referenced this issue Jul 29, 2022
…eExecutionStrategy

Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
@joakime
Copy link
Contributor

joakime commented Jul 29, 2022

When we break down your threaddump, we see ...

"embedded-jetty-177" #177 prio=5 os_prio=31 cpu=4.44ms elapsed=391.19s tid=0x000000011a9fe800 nid=0x1490b waiting on condition  [0x000000028b73d000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
	- parking to wait for  <0x00000007e4905158> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	...
	at org.eclipse.jetty.util.thread.AutoLock.lock(AutoLock.java:44)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.toString(AdaptiveExecutionStrategy.java:524)
    ...
	at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:82)
	- locked <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	
"embedded-jetty-178" #178 prio=5 os_prio=31 cpu=0.86ms elapsed=391.19s tid=0x000000012a640000 nid=0xb607 waiting for monitor entry  [0x000000028bb55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:63)
	- waiting to lock <0x00000007e4864560> (a com.workato.agent.log.CloudAppender)
	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	...	
	at ch.qos.logback.classic.Logger.debug(Logger.java:490)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:724)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor.startReservedThread(ReservedThreadExecutor.java:257)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor.tryExecute(ReservedThreadExecutor.java:232)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.tryExecute(QueuedThreadPool.java:735)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.selectSubStrategy(AdaptiveExecutionStrategy.java:331)
	...

The lock created by com.workato.agent.log.CloudAppender seems to be the cause, not the lock in AdaptiveExecutionStrategy (at least that's what we can tell from your limited stacktrace)

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

The lock created by com.workato.agent.log.CloudAppender seems to be the cause, not the lock in AdaptiveExecutionStrategy (at least that's what we can tell from your limited stacktrace)

This class doesn't contain any synchronized methods. It extends ch.qos.logback.core.AppenderBase with synchronized method so that's why its instance actually hold the lock.

I agree that our setup may be not the most common one since we use the custom Logback appender and the custom Jetty connector. But we don't use any Jetty hacks just documented API and there is the proof that the deadlock can be formally reproduced.
I'm trying to reproduce the issue with the mininal test scenario but it'll take some time.

Just the idea: maybe the ReadWriteLock will work better for org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy since reading operations won't cause any exclusive locking on this instance.

@sbordet
Copy link
Contributor

sbordet commented Jul 29, 2022

Sorry but the logging framework should deal with these issues, as toString() is entitled to lock and it must be possible to log with the same lock held.

The problem is the lock on the CloudAppender like @joakime said.

If that is out of your control because it's performed by the Logback implementation, then it's a bug in Logback.

From your description, I bet you can reproduce this bug without any dependency on Jetty, just by implementing a toString() that grabs a lock. Which would confirm it's a Logback issue.

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

@sbordet A servlet container as a universal tool is expected to be able to correctly work in different environments. In this scenario, one of the critical classes from Jetty internals leaks its reference into third-party SLF4J API implementation. This implementation is out of control and can acquire the exclusive lock at any time and interrupt normal server behavior. It's a common practice to pass this into a logging API but not when it can be locked for debug purposes. Doubt anyone expects that a simple toString() call can be locked forever awaiting the most recent object state.
As an analogy: the same way should work for external callbacks passed from the external code, if they fail the server should continue to operate normally.

@sbordet
Copy link
Contributor

sbordet commented Jul 29, 2022

@alekkol the same Jetty code with another logging library that is not Logback does not have any problem, because the other logging libraries do not have the Logback bug.

It is not a Jetty issue.

Try, and you will be able to reproduce the bug using Logback without any Jetty dependency.

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

@sbordet Of course, it can be reproduced with the Logback library only. But the same is true for the opposite: we have been using this logger for years and never had the same issues before. It appeared only after migrating to Jetty from another servlet container.
I can agree that calling the synchronized method for every logging event is not the best solution for Logback but the same is true for toString() with exclusive locks in Jetty. And both of these implementation details leads to deadlock finally.

@sbordet
Copy link
Contributor

sbordet commented Jul 29, 2022

The lock in toString() is necessary for correctness, because it accesses internal state that is protected by that lock.

We are not the only ones grabbing a lock in toString(), even the OpenJDK code does it, for example:
https://github.com/openjdk/jdk/blob/jdk-20%2B0/src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java#L1507

You agree that you are using a buggy logging library. File an issue there, or change the buggy library you are using.

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

File an issue there, or change the buggy library you are using.

That's why this issue was created. Both Jetty and Logback have the same bug - they call alien methods under the exclusive lock section provoking deadlocks. Jetty does it in AdaptiveExecutionStrategy.selectSubStrategy method, Logback in AppenderBase.doAppend.

We are not the only ones grabbing a lock in toString(), even the OpenJDK code does it, for example:
https://github.com/openjdk/jdk/blob/jdk-20%2B0/src/java.base/share/classes/sun/nio/ch/SocketChannelImpl.java#L1507

This example is not relevant since SocketChannelImpl doesn't call external methods under this lock.

@sbordet
Copy link
Contributor

sbordet commented Jul 29, 2022

That's why this issue was created.

You have created it in the wrong project. It is not a Jetty issue.

And we are not calling alien methods.
We are calling a logging API (SLF4J) that you have bound to a buggy implementation.
Change the logging implementation and your problems will be solved.

If we start implementing workarounds in Jetty for other libraries bugs, we would not have time to do any work on Jetty.

@alekkol
Copy link
Author

alekkol commented Jul 29, 2022

And we are not calling alien methods.
We are calling a logging API (SLF4J) that you have bound to a buggy implementation.
Change the logging implementation and your problems will be solved.

Why "calling a logging API (SLF4J)" in a critical section in Jetty is not a bug but calling Object.toString() in the synchronized method in Logback is? Using this logic we can say that the Logback was "bound to a buggy implementation" of the servlet container. The fact that the issue in Logback was created earlier than this one can not be used as an argument of the problem's origin on their side.
Any implementation of SLF4J API can call Object.toString() on a function argument inside a synchronized block because there are no explicit restrictions for that. And it can cause a sporadic Jetty outage due to the intermittent nature of the bug. Right now anyone can bootstrap a server with the latest version of Jetty and SLF4J+Logback and get unreliable software. I believe that it's not acceptable for mission-critical libraries.

If we start implementing workarounds in Jetty for other libraries bugs, we would not have time to do any work on Jetty.

It's a workaround for the Jetty bug. And it should be pretty easy to make this error happen rarely or even never by just removing Jetty critical classes references leakage into external API.

@sbordet
Copy link
Contributor

sbordet commented Sep 24, 2022

Not a Jetty issue.

@sbordet sbordet closed this as completed Sep 24, 2022
@diziaq
Copy link

diziaq commented Nov 3, 2022

Guess somethimes my app is having somethig similar after Jetty version up, I'm trying to investigate if it is related.

@alekkol
The reproduction algorithm is not clear enough, so I cannot trace the bug.
Could you elaborate on how you reproduce the bug?

It seems the two locks are always being "logically" nested, because
toString invocation always follows AFTER entering doAppend method (i.e. sync on AppenderBase monitor).

Correct me if I'm wrong, the order of key events is:

  1. lock A: enter doAppend => sync on AppenderBase
  2. lock B: enter toString => AutoLock in AdaptiveExecutionStrategy
  3. unlock B: return from toString => AutoLock in AdaptiveExecutionStrategy
  4. unlock A: return from doAppend => unsync on AppenderBase

What is logically the possible scenario of Dead lock?

@joakime
Copy link
Contributor

joakime commented Nov 3, 2022

@diziaq that logic is a red herring, the problem has nothing to do with the AdaptiveExecutionStrategy.
The AutoLock on AdaptiveExecutionStrategy is unrelated to the issue, and was just caught up in the thread dump.

In this specific case (of the original issue) the custom appender (CloudAppender) is causing this issue entirely.

The original issue's CloudAppender (and CircularBufferAppender) chose to implement their solution on ch.qos.logback.core.AppenderBase which has a synchronize modifier on Appender.doAppend() and that fact is introducing the bug in their implementation, that they have to deal with.

See:

They could have chosen ch.qos.logback.core.UnsynchronizedAppenderBase instead and not had this issue. (BTW, most of the logger implementations in Logback use the UnsynchronizedAppenderBase)

@artemik
Copy link

artemik commented Sep 6, 2024

While Logback does seemingly have its issues, I don't think I like Jetty maintainers responses here.

  1. I'm not so sure the custom CloudAppender itself caused the issue. Like @alekkol explained, the locking on CloudAppender comes from the parent Logback's AppenderBase.doAppend. It seems that what exactly CloudAppender is doing is irrelevant for the issue. Instead, what's important is that Logback internals, under a synchronized method, call toString() on a logging argument that happens to be a Jetty object whose toString() in turn uses both its own locking and a logger call.
  2. If we relate to the JCIP 10.1.3: "Invoking an alien method with a lock held is asking for liveness trouble. The alien method might acquire other locks (risking deadlock) or block for an unexpectedly long time, stalling other threads that need the lock you hold"
    • I do agree that Logback does seem to violate it, because, from under a synchronized method, it calls toString() on an "alien" object. Logback seems to agree with it: https://jira.qos.ch/browse/LOGBACK-1492
    • But in exactly the same way, Jetty, when having a lock (AutoLock), calls an "alien" log.debug... method. Hence the issue happens, because the two violate the rule. @sbordet: "we are not calling alien methods. We are calling a logging API (SLF4J)". But it is an alien method, from Jetty perspective. Alien = code out of your control, i.e. not in your codebase. It's exactly that here. And logging API indeed may be buggy, but that's actually what JCIP 10.1.3 warns us about and recommends to avoid?
  3. Just as a whole, "Jetty + Logback might cause deadlock" combination doesn't sound good, regardless of whose fault that is, because both are popular libraries in the community, and it's certainly not very welcoming for somebody who just transitioned to Jetty. Isn't it something that Jetty maintainers should be interested to help with? Instead, I see "not a Jetty issue" and pushing users, who were kind to spend time to highlight an issue (related to Jetty, one way or another) and who are a minority (compared to Jetty and Logback) to go and deal with Logback team. Aren't we all in the same boat as a community?
  4. @sbordet: "If we start implementing workarounds in Jetty for other libraries bugs, we would not have time to do any work on Jetty." Yet, @joakime says Jetty team is heavy users of Logback anyway. So isn't that something you'd benefit from anyway, but you're not willing to help? Like I said, aren't we in the same boat? (though again, it still seems to me that technically speaking Jetty violates JCIP 10.1.3 as well. Exactly because both libs violate, the issue occurs. It seems to be mentioned under Logback ticket discussion as well, and there is a fix suggestion which would alleviate the issue even if other user code violates JCIP 10.1.3. So similarly, it seems Jetty could prepare logging body, but call the actual logging outside of a lock. Not because of other libraries bugs, but because of JCIP 10.1.3.
  5. Please don't get me wrong, I do agree that logging libraries regardless should make it safe to call log.x(..., arg) letting it safely convert arguments to strings. And regardless of JCIP strictness, choosing between Jetty and Logback, I'd prefer Logback to fix the issue. But as a whole, as you see, practically things aren't that straightforward, especially considering that locks under toString() is rather a peculiar behavior to me. The fact that JDK also does it somewhere doesn't make anybody's life easier.
  6. Why Jetty team haven't experience that issue before unlike @alekkol and whether it means the issue doesn't actually exist - I don't know, but the reasonings and stacktraces seem to make sense to me. And please forgive me if I missed something. I came across this ticket randomly but which got my attention and will now make me think twice when choosing both Jetty or Logback for my projects.

@gregw
Copy link
Contributor

gregw commented Sep 6, 2024

The "fix" of jetty not logging with locks held is not a fix, as it makes those logs non deterministic and thus not useful. If you don't want those logs then turn off that logger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants