Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

completed task shows warning for having lost it's waker #345

Open
cedricje opened this issue May 2, 2022 · 13 comments
Open

completed task shows warning for having lost it's waker #345

cedricje opened this issue May 2, 2022 · 13 comments
Labels
A-instrumentation Area: instrumentation. S-bug Severity: bug

Comments

@cedricje
Copy link

cedricje commented May 2, 2022

What crate(s) in this repo are involved in the problem?

tokio-console

What is the issue?

tokio-console shows a warning for a task having lost it's waker. But the task should be finished.
I think it's related to the task spawning an other task

How can the bug be reproduced?

The problem can be reproduced with the following sample application.
Connect to 127.0.0.1:5555 and it will spawn 2 tasks. The first one should be completed after 1 second. But in tokio-console this one will be shown as having lost it's waker.

use tokio::net::TcpListener;

#[tokio::main]
async fn main() {
    console_subscriber::init();

    if let Ok(listener) = TcpListener::bind("127.0.0.1:5555").await {
        loop {
            if let Ok((_, _)) = listener.accept().await {
                let handle = tokio::task::spawn(async move {
                    tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
                    tokio::task::spawn(async move {
                        tokio::time::sleep(tokio::time::Duration::from_secs(600)).await;
                    });
                });

                let _ = handle.await;
                println!("done");
            }
        }
    }
}

Logs, error output, etc

image

Versions

console crates versions:

tokio-console-warning v0.1.0 (/home/cedric/src/rust-test/tokio-console-warning)
├── console-subscriber v0.1.5
│   ├── console-api v0.2.0

tokio-console is built from git, version:

git describe
tokio-console-v0.1.4-5-g472ff52

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

maybe

@cedricje cedricje added the S-bug Severity: bug label May 2, 2022
@hawkw
Copy link
Member

hawkw commented May 4, 2022

I think the issue is that the task's span is (incorrectly) being kept alive by the child task's span. We'll probably want to change the tokio instrumentation so that task spans have parent: None and instead follow from the current span, so that they don't keep the current task's span alive incorrectly.

@hawkw hawkw added the A-instrumentation Area: instrumentation. label May 4, 2022
@WaffleLapkin
Copy link

I think I've encountered this issue. I also want to highlight that the task that has completed (I've .awaited the JoinHandle) and that "has lost its waker" according to tokio-console warning is still being polled (or at least tokio-console says so).

Screenshots

2022-07-04_13-40
2022-07-04_13-40_1

I also don't think that this issue requires using spawn inside of spawn since for me this occurs with just a single spawn deep inside of #[tokio::main] marked function. Though there are a lot of code, so I'm not sure if I'll be able to make a good MRE.

@hawkw
Copy link
Member

hawkw commented Jul 6, 2022

Another option that could be worth considering is giving the task spans some way of explicitly indicating that the task has terminated (e.g. it could record a field value like task.terminated = true). This way, we would know when a task has completed, even when the task's span is kept alive because it has children. Unlike the solution of changing task spans to set parent: None, this would also resolve cases where the task span is kept alive by a child span that is not a task...which is what I suspect is going on in @WaffleLapkin's case:

I also don't think that this issue requires using spawn inside of spawn since for me this occurs with just a single spawn deep inside of #[tokio::main] marked function. Though there are a lot of code, so I'm not sure if I'll be able to make a good MRE.

@landaire
Copy link

landaire commented Jul 9, 2022

I'm sorry for the off-topic discussion but searching for this warning brought me here. I'm new to async in Rust and saw this warning while debugging an issue. Can anyone point me towards a resource for why this warning occurs (I.e. what pattern in async Rust leads to such a scenario and how to fix it)? I'm trying to determine if it's a console bug as indicated in this issue, or a real bug in my own code.

@kornelski
Copy link

I'm seeing this when using actix-web. tokio-console reports lost wakers in:

~/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/actix-rt-2.7.0/src/runtime.rs:80:20
~/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/actix-rt-2.7.0/src/lib.rs:207:5

which look like regular uses of spawn_local() and local_set.block_on().

I also get one in ~/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/hyper-0.14.23/src/common/exec.rs:49:21

and another reported leak is in tokio itself:

~/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/tokio-1.23.0/src/task/local.rs:586:35

Is this something I should be worried about? My server leaks about 20 wakers almost instantly, and it usually goes up to 50 before it becomes quite unresponsive. I don't know if that's the cause, or a symptom of some other issue, or just something unrelated.

@traviscross
Copy link

I'm seeing this when holding on to a WeakSender whose associated Sender was created in the exited task. Here's a minimal reproduction:

use std::sync::{Arc, Mutex};
use tokio::{sync::mpsc, task::JoinSet};

async fn handler(holder: Arc<Mutex<Vec<mpsc::WeakSender<()>>>>) {
  let (handler_tx, _) = mpsc::channel::<()>(1);
  let mut holder = holder.lock().unwrap();
  holder.push(handler_tx.downgrade()); // Push `WeakSender` to Vec.
  drop(holder);
  panic!(); // Not required to reproduce.
}

#[tokio::main(flavor = "current_thread")]
async fn main() {
  console_subscriber::init();
  let mut set = JoinSet::new();
  let holder = Arc::new(Mutex::new(Vec::new()));
  for _ in 0..16 {
    set.spawn(handler(holder.clone()));
  }
  // Properly await all exited tasks.
  while let Some(_) = set.join_next().await {}
  std::future::pending::<()>().await;
}

When run, tokio-console shows "16 tasks have lost their waker", and for each waker shows "This task has lost its waker, and will never be woken again".

I would expect this to leak the 16 WeakSender handles, but given that all tasks were properly awaited, I would not expect any tasks to leak.

hds added a commit that referenced this issue Nov 17, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.33.0, this test fails, but the PR FIXME:TBD
fixes this and the test should pass from Tokio 1.34 onwards.
hds added a commit to tokio-rs/tokio that referenced this issue Nov 18, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
hds added a commit that referenced this issue Nov 18, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.34.0, this test fails, but the PR
tokio-rs/tokio#XXXX fixes this and the test should pass from Tokio 1.35
onwards.
hds added a commit that referenced this issue Nov 18, 2023
In the Tokio instrumentation, a tracing span is created for each task
which is spawned. Since the new span is created within the context of
where `tokio::spawn()` (or similar) is called from, it gets a contextual
parent attached.

In tracing, when a span has a child span (either because the child was
created in the context of the parent, or because the parent was set
explicitly) then that span will not be closed until the child has
closed.

The result in the console subscriber is that a task which spawns another
task won't have a `dropped_at` time set until the spawned task exits,
even if the parent task exits much earlier. This causes Tokio Console to
show an incorrect lost waker warning (#345). It also affects other spans
that are entered when a task is spawned (#412).

The solution is to modify the instrumentation in Tokio so that task
spans are explicit roots (`parent: None`). This will be done as part of
enriching the Tokio instrumentation (tokio-rs/tokio#5792).

This change adds functionality to the test framework within
`console-subscriber` so that the state of a task can be set as an
expectation. The state is calculated based on 4 values:
* `console_api::tasks::Stats::dropped_at`
* `console_api::tasks::Stats::last_wake`
* `console_api::PollStats::last_poll_started`
* `console_api::PollStats::last_poll_ended`

It can then be tested that a task that spawns another task and then ends
actually goes to the `Completed` state, even if the spawned task is
still running. As of Tokio 1.34.0, this test fails, but the PR
tokio-rs/tokio#6158 fixes this and the test should pass from Tokio 1.35
onwards.
hds added a commit to tokio-rs/tokio that referenced this issue Nov 19, 2023
In Tokio, tasks are optionally instrumented with tracing spans to allow
analysis of the runtime behavior to be performed with tools like
tokio-console.

The span that is created for each task gets currently follows the
default tracing behavior and has a contextual parent attached to it
based on the span that is actual when `tokio::spawn` or similar is
called.

However, in tracing, a span will remain "alive" until all its children
spans are closed. This doesn't match how spawned tasks work. A task may
outlive the context in which is was spawned (and frequently does). This
causes tasks which spawn other - longer living - tasks to appear in
`tokio-console` as having lost their waker when instead they should be
shown as completed (tokio-rs/console#345). It can also cause undesired
behavior for unrelated tracing spans if a subscriber is receiving both
the other spans as well as Tokio's instrumentation.

To fix this mismatch in behavior, the task span has `parent: None` set
on it, making it an explicit root - it has no parent. The same was
already done for all spans representing resources in #6107. This change
is made within the scope of #5792.

Due to a defect in the currently available `tracing-mock` crate, it is
not possible to test this change at a tracing level
(tokio-rs/tracing#2440). Instead, a test for the `console-subscriber`
has been written which shows that this change fixes the defect as
observed in `tokio-console` (tokio-rs/console#490).
@hds
Copy link
Collaborator

hds commented Nov 19, 2023

This issue is now fixed in the Tokio instrumentation on master (tokio-rs/tokio#6158). Let's wait until the next Tokio release before we close the issue.

@NULLx76
Copy link

NULLx76 commented Dec 11, 2023

I think this can be closed now, the new tokio version includes the fix

b-yap added a commit to pendulum-chain/spacewalk that referenced this issue May 7, 2024
b-yap added a commit to pendulum-chain/spacewalk that referenced this issue May 9, 2024
b-yap added a commit to pendulum-chain/spacewalk that referenced this issue May 13, 2024
b-yap added a commit to pendulum-chain/spacewalk that referenced this issue May 22, 2024
* first iteration:
add `tracing` feature of tokio, and the console-subscriber

* update on tokio version, based on:
tokio-rs/console#345 (comment)

* use `tracing` library, instead of log.

* remove unnecessary logs;
add necessary logs;
add timeout in `read_message_from_stellar` because it gets stuck

* cleanup zombie task

* Update README.md

add documentation of the `tokio-console`.

* remove duplicate `Parachain Block Listener`

* https://github.com/pendulum-chain/spacewalk/pull/517/files#r1600267305

* #517 (comment),
#517 (comment),

* https://github.com/pendulum-chain/spacewalk/actions/runs/9094123182/job/24994504193?pr=517

* remocehttps://github.com/pendulum-chain/spacewalk/actions/runs/9095519314/job/24998905803?pr=517

* https://github.com/pendulum-chain/spacewalk/actions/runs/9096987912/job/25003758078?pr=517

* https://github.com/pendulum-chain/spacewalk/actions/runs/9098087121/job/25007476694?pr=517

* https://github.com/pendulum-chain/spacewalk/actions/runs/9108826563/job/25040418069?pr=517
hds added a commit to tokio-rs/tokio that referenced this issue Jul 26, 2024
When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.

(so that the span doesn't have a parent). This
is necessary, because otherwise a resource that is created inside a task
(or some other span) and then sent elsewhere will keep that contextual
parent span open (but not necessarily active) for the lifetime of the
resource itself.
hds added a commit to tokio-rs/tokio that referenced this issue Jul 26, 2024
When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.
hds added a commit to tokio-rs/tokio that referenced this issue Jul 29, 2024
…ts (#6727)

When instrumenting resources in Tokio, a span is created for each
resource. Previously, all resources inherited the currently active span
as their parent (tracing default). However, this would keep that parent
span alive until the resource (and its span) were dropped. This is often
not correct, as a resource may be created in a task and then sent
elsewhere, while the originating task ends.

This artificial extension of the parent span's lifetime would make it
look like that task was still alive (but idle) in any system reading the
tracing instrumentation in Tokio, for example Tokio Console as reported
in tokio-rs/console#345.

In #6107, most of the existing resource spans were updated to
make them explicit roots, so they have no contextual parent. However,
2. were missed:
- `Sleep`
- `BatchSemaphore`

This change alters the resource spans for those 2 resources to also make
them explicit roots.
@gftea
Copy link

gftea commented Aug 28, 2024

with tokio version 1.39.2 and console-subscriber v0.4.0, I see such warnings, It seems a incorrect warning

@hds
Copy link
Collaborator

hds commented Aug 28, 2024

@gftea Could you please provide a screenshot showing the tasks that are showing the warnings? And also describe why you believe that it is incorrect?

@gftea
Copy link

gftea commented Aug 28, 2024

Hi, here is one task
image

and the line in 61
image

@hds
Copy link
Collaborator

hds commented Aug 28, 2024

According to tokio console, that task has been idle for over an hour and has never been woken, which would imply that it awaited at line 62 (sig_stream.recv().await) and was never polled after that.

Do you believe the task has actually finished?

@gftea
Copy link

gftea commented Aug 28, 2024

According to tokio console, that task has been idle for over an hour and has never been woken, which would imply that it awaited at line 62 (sig_stream.recv().await) and was never polled after that.

Do you believe the task has actually finished?

yes, this is the task that waiting for TERM/KILL signals, so it would be long waiting tasks until service is shutdown by user, and I tested it, after this is reported, I do send kill signal and the task handle shutdown signal correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-instrumentation Area: instrumentation. S-bug Severity: bug
Projects
None yet
Development

No branches or pull requests

9 participants