Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock error since version 1.14.0 #3385

Closed
joeosburn opened this issue Jan 16, 2024 · 10 comments · Fixed by #3426
Closed

Deadlock error since version 1.14.0 #3385

joeosburn opened this issue Jan 16, 2024 · 10 comments · Fixed by #3426
Milestone

Comments

@joeosburn
Copy link

I've been using DataDog for several years without issue on a ruby project. Since version 1.14.0, up to the most recent version, 1.19.0, I've run into a deadlocking issue related to the logger configuration I use.

I now get this error:

2024-01-16T22:19:21.767Z pid=249 tid=5qt WARN: ThreadError: deadlock; recursive locking
2024-01-16T22:19:21.768Z pid=249 tid=5qt WARN: /Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/core/configuration.rb:228:in `synchronize'
/Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/core/configuration.rb:228:in `safely_synchronize'
/Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/core/configuration.rb:195:in `components'
/Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/tracing.rb:134:in `components'
/Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/tracing.rb:138:in `tracer'
/Users/joe/.rvm/gems/ruby-3.2.2/gems/ddtrace-1.19.0/lib/datadog/tracing.rb:76:in `correlation'
/Users/joe/projects/api/config/api.rb:67:in `block (2 levels) in logger'
...

It appears to be related to the interaction of DataDog within my logger code.

Here is what my DataDog configuration looks like:

Datadog.configure do |c|
  ...
  c.logger.instance = Api.logger
  ...
end

If I comment out the c.logger.instance = line, the error goes away.

In my Api.logger code, I have this code:

correlation = Datadog::Tracing.correlation
data[:dd] = {
  trace_id: correlation.trace_id,
  span_id: correlation.span_id
}

This is because my logger instance needs tracing data from DataDog. Again, if I comment this code out, the error goes away.

This was working fine until version 1.14.0 and then something changed. I am currently on Ruby 3.2.2

@benhutton
Copy link

to add some details here,

We maintain a ruby app that uses Ougai for structured logging. We have followed your instructions for adding trace correlation to our structured logging.

So we are adding Datadog trace correlation data to the Ougai logger, and we are configuring Datadog to use the Ougai logger as its logger.

It seems that something changed in 1.14.0 to make that no longer supported.

Is this a bug in this gem, or is there something we need to change about our usage of it?

@ivoanjo
Copy link
Member

ivoanjo commented Jan 19, 2024

Hey! This is definitely an interesting one. The ddtrace logger can get called during initialization, and by chaining it back to ddtrace, it hits this issue. 😅

I'm guessing even if we didn't have locking, we might've had some recursion on our hands... 🤔

What may be happening here is that something is trying to log information in 1.19 that didn't before in 1.14.

To investigate this further, do you mind sharing the full backtrace from the deadlock? The backtrace you shared unfortunately doesn't show what thread deadlocked and in what part of the code it was when it tried to log.

@benhutton
Copy link

@ivoanjo take a look at this: https://gist.github.com/benhutton/c28099f46d6a582c3c9e567dc481d32e

That's a backtrace I got when trying to book up a console on heroku. I think the "real" error is the bottom one, then it triggered the ones above it.

Also, to be clear, 1.13.1 is the last version that work. Whatever broke it broke it in 1.14.0.

@ivoanjo
Copy link
Member

ivoanjo commented Jan 19, 2024

Ahh, it's the environment logger... that would fit with the change in #3020 that we merged for 1.14.0. That information used to be logged later in the gem life cycle, and we moved it to earlier.

It's possible to disable this behavior by setting the DD_TRACE_STARTUP_LOGS env variable to false and/or via code in your configure block:

Datadog.configure { |c| c.diagnostics.startup_logs.enabled = false }

@joeosburn
Copy link
Author

That solution gets us going. What's the long term fix for this scenario?

@ivoanjo
Copy link
Member

ivoanjo commented Jan 22, 2024

Great! Let me bring this back to the team and see what we can do to avoid this sharp edge.

ivoanjo added a commit that referenced this issue Jan 31, 2024
**What does this PR do?**

This PR fixes #3385 by preventing components initialization when
calling the `Datadog::Tracing.correlation` API.

Instead, we use the `allow_initialization: false` option, which was
added exactly to help break these kinds of initialization cycles.

**Motivation:**

This fixes a deadlock issue such as:

```
lib/datadog/core/configuration.rb:228:in `synchronize': deadlock; recursive locking (ThreadError)
  from lib/datadog/core/configuration.rb:228:in `safely_synchronize'
  from lib/datadog/core/configuration.rb:195:in `components'
  from lib/datadog/tracing.rb:134:in `components'
  from lib/datadog/tracing.rb:138:in `tracer'
  from lib/datadog/tracing.rb:76:in `correlation'
  from repro.rb:5:in `add'
  from logger-1.6.0/lib/logger.rb:691:in `debug'
  from lib/datadog/core/remote/component.rb:29:in `initialize'
  from lib/datadog/core/remote/component.rb:156:in `new'
  from lib/datadog/core/remote/component.rb:156:in `build'
  from lib/datadog/core/configuration/components.rb:90:in `initialize'
  from datadog-ci-0.7.0/lib/datadog/ci/configuration/components.rb:31:in `initialize'
  from lib/datadog/core/configuration.rb:248:in `new'
  from lib/datadog/core/configuration.rb:248:in `build_components'
  from lib/datadog/core/configuration.rb:89:in `block in configure'
  from lib/datadog/core/configuration.rb:230:in `block in safely_synchronize'
  from lib/datadog/core/configuration.rb:228:in `synchronize'
  from lib/datadog/core/configuration.rb:228:in `safely_synchronize'
  from lib/datadog/core/configuration.rb:84:in `configure'
  from lib/datadog/tracing/contrib/extensions.rb:64:in `configure'
  from repro.rb:9:in `<main>'
```

that happens when a logger is configured to also print correlation

**Additional Notes:**

While it may seem that I've changed the API semantics by returning
an `Identifier` when there is no tracer, the previous code was
actually misleading.

There is actually no way (that I know of) for `tracer` to be
`nil` on dd-trace-rb currently. It can be disabled, but an instance
will still exist.

The disabled instance, when called, returns an empty `Identifier`,
so here I'm replicating that behavior, without needing the
components to be initialized.

I guess this is open for debate? We could return `nil` instead
in the future.

**How to test the change?**

This tiny snippet can be used to simulate the exact logger issue
reported by the customer:

```ruby
require 'ddtrace'

class TestLogger < Datadog::Core::Logger
  def add(severity, message = nil, progname = nil, &block)
    puts Datadog::Tracing.correlation
  end
end

Datadog.configure do |c|
  c.logger.instance = TestLogger.new(STDOUT)
end
```

Fixes #3385
@ivoanjo
Copy link
Member

ivoanjo commented Jan 31, 2024

PR to fix this #3426 :)

@joeosburn
Copy link
Author

Thank you.

@TonyCTHsu TonyCTHsu added this to the 1.21.0 milestone Feb 5, 2024
@TonyCTHsu
Copy link
Contributor

👋 @joeosburn , 1.20.0 has been released! Give it a try 👍

@joeosburn
Copy link
Author

Success! Works great.

@ivoanjo ivoanjo modified the milestones: 1.21.0, 1.20.0 Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants