Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed deadlock when rolling the file under a heavy load #24980

Merged
merged 5 commits into from
Jun 9, 2024

Conversation

dmatej
Copy link
Contributor

@dmatej dmatej commented Jun 5, 2024

  • Fixed LogRecordBuffer configuration - it always used defaults
    • That caused overload without any possibility to configure maxWaitTime or increase the buffer size
  • Fixed deadlock - don't use loggers in critical section.
    • Deadlock can be caused when we are rolling the file with the full log record buffer AND we log anything from this thread.
    • Solution: if you really need to log something, use asynchronous thread.
    • Under a heavy load (1 record every 3 microseconds) such log record can occur even after another 5000 other records in the buffer or trying to be added to the buffer. It is just another one. Usually it will be the first one.
  • Moving from synchronized to reentrant locks + fixed deadlock

dmatej added 3 commits June 5, 2024 16:56
Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
- when buffer is full, every new log will be blocked. If this happens for
  records comming from the thread moving the server.log file, everything
  stops forever.

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
- Deadlock can be caused when we are rolling the file with the full log record
  buffer AND we log anything from this thread.
- Solution: if you really need to log something, use asycnhronous thread.
- Under a heavy load (1 record every 3 microseconds) such log record can occur
  even after another 5000 other records in the buffer or trying to be added to
  the buffer. It is just another one.

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
@dmatej dmatej added the bug Something isn't working label Jun 5, 2024
@dmatej dmatej self-assigned this Jun 5, 2024
@dmatej dmatej added this to the 7.0.16 milestone Jun 5, 2024
@dmatej dmatej linked an issue Jun 6, 2024 that may be closed by this pull request
@dmatej dmatej marked this pull request as ready for review June 6, 2024 09:53
@pzygielo pzygielo removed their request for review June 6, 2024 15:00
@dmatej dmatej requested a review from avpinchuk June 6, 2024 18:50
Copy link
Contributor

@avpinchuk avpinchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code did not guarantee visibility of variables. Each new thread could see the old values (or mix from old and new values) and do reconfiguration.

This code is now safe, although somewhat slower :)

@avpinchuk
Copy link
Contributor

avpinchuk commented Jun 6, 2024

Hmmm, if we move check under write lock, we have deadlock...

UPD. pollOrWait() method blocked on ABQ.take() with acquired read lock. Then reconfigure() tried to acqure write lock and wait indefinitely.

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
@dmatej
Copy link
Contributor Author

dmatej commented Jun 6, 2024

Good lesson, I read the javadoc again, and also - I should never push without running tests of the module at least.

@dmatej dmatej marked this pull request as draft June 6, 2024 21:56
@dmatej
Copy link
Contributor Author

dmatej commented Jun 6, 2024

Well, now I passed tests, but I did some manual testing and I have another deadlock. Also I see too many logging pumps for STDOUT+STDERR, seems after every reconfiguration it creates another ...

EDIT: Oh, I got it. Those pollOrWait are waiting forever locked, so we can never get the write lock!

And I forgot also about the possibility to decrease the size of the full buffer.

Signed-off-by: Alexander Pinčuk <alexander.v.pinchuk@gmail.com>
@dmatej dmatej marked this pull request as ready for review June 8, 2024 12:02
@arjantijms arjantijms merged commit 0748496 into eclipse-ee4j:master Jun 9, 2024
2 checks passed
@dmatej dmatej deleted the deadlock-gjule branch June 10, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detected deadlock in logging - rolling the file
3 participants