Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When disk is full, after expanded disk ood-daemon run panic #214

Closed
Tracked by #227
lizhihongTest opened this issue Apr 15, 2023 · 3 comments
Closed
Tracked by #227

When disk is full, after expanded disk ood-daemon run panic #214

lizhihongTest opened this issue Apr 15, 2023 · 3 comments
Assignees
Labels
bug Something isn't working OOD-daemon The OOD-daemon basic service Panic Panic related issues

Comments

@lizhihongTest
Copy link
Collaborator

Describe the bug
When disk is full, after expanded disk ood-daemon run panic, And the ood-daemon monitor process will return to pull up and repeatedly throw this exception

[2023-04-15 08:51:31.967738 +00:00] INFO [ThreadId(1)] [component/cyfs-debug/src/panic/panic.rs:114] stack_hash=
c04118763b3f5f6d1046c0ef5df13379
[2023-04-15 08:51:31.967811 +00:00] INFO [ThreadId(1)] [component/cyfs-debug/src/panic/panic.rs:31] thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `Some(ThreadId(1))`,
 right: `Some(ThreadId(1))`': /mnt/e/BuildAgentWork/4a71081484236186/src/component/cyfs-debug/src/check/reentry_checked_mutex.rs:29
0: 0x0000562f87422470
1: 0x0000562f87417400
2: 0x0000562f873afe50
3: 0x0000562f87986990
4: 0x0000562f879868b0
5: 0x0000562f87985000
6: 0x0000562f87986620
7: 0x0000562f86b5e3f0
8: 0x0000562f879abf80
9: 0x0000562f86b2d690
10: 0x0000562f86b93bc0
11: 0x0000562f86c5e590
12: 0x0000562f86c8c250
13: 0x0000562f86c5b8d0
14: 0x0000562f86c8f600
15: 0x0000562f86c97450
16: 0x0000562f86ca4880
17: 0x0000562f86cab0a0
18: 0x0000562f86c6ac20
19: 0x0000562f86bc3890
20: 0x0000562f86cab8c0
21: 0x0000562f86cabf20
22: 0x0000562f86cab780
23: 0x0000562f86ba20f0
24: 0x0000562f86cf8ce0
25: 0x0000562f86caaed0
26: 0x0000562f86b80210
27: 0x0000562f879744e0
28: 0x0000562f86cf8f50
29: 0x00007ffbe7522d10
30: 0x00007ffbe7522dc0
31: 0x0000562f86b5ed50
32: 0x0000000000000000
[2023-04-15 08:51:31.967867 +00:00] INFO [ThreadId(1)] [component/cyfs-debug/src/panic/panic.rs:32] thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `Some(ThreadId(1))`,
 right: `Some(ThreadId(1))`': /mnt/e/BuildAgentWork/4a71081484236186/src/component/cyfs-debug/src/check/reentry_checked_mutex.rs:29
   0: cyfs_debug::panic::manager::PanicManager::start::{{closure}}
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic_handler::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: rust_begin_unwind
   5: core::panicking::panic_fmt
   6: core::panicking::assert_failed_inner
   7: core::panicking::assert_failed
   8: cyfs_debug::check::reentry_checked_mutex::ReenterCheckedMutex<T>::lock
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  13: ood_daemon::daemon::daemon::Daemon::run_check_update::{{closure}}
  14: ood_daemon::main_run::{{closure}}
  15: std::thread::local::LocalKey<T>::with
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  17: async_io::driver::block_on
  18: std::thread::local::LocalKey<T>::with
  19: std::thread::local::LocalKey<T>::with
  20: std::thread::local::LocalKey<T>::with
  21: async_std::task::builder::Builder::blocking
  22: ood_daemon::main
  23: std::sys_common::backtrace::__rust_begin_short_backtrace
  24: std::rt::lang_start::{{closure}}
  25: std::rt::lang_start_internal
  26: main
  27: <unknown>
  28: __libc_start_main
  29: _start

ood-daemon_3872228_rCURRENT.log

@lizhihongTest lizhihongTest added the bug Something isn't working label Apr 15, 2023
@lurenpluto lurenpluto added the OOD-daemon The OOD-daemon basic service label Apr 15, 2023
@lurenpluto
Copy link
Member

This panic is caused by the reentry detection of Mutex in cyfs-debug. In the code, after entering the error branch, it will try to reenter this Mutex. Therefore, even if ReenterCheckedMutex is not used here, it would still lead to a deadlock.

let ret = self.info.lock().unwrap().load_package_file(&file).map_err(|e| {
self.info.lock().unwrap().state = ServicePackageLocalState::Invalid;
e
})?;

Besides this bug, it may also be necessary to review other similar logic in ood-daemon service to avoid similar problems.

@lurenpluto
Copy link
Member

This issue has been fixed in commit 1f76898

Another thing to note is that under extreme conditions where the disk containing the {cyfs} directory is full, there may be many problems. These situations may require constructing similar extreme environments for testing to see how the services perform under extreme conditions.

@lurenpluto lurenpluto self-assigned this Apr 15, 2023
@lurenpluto lurenpluto added the Panic Panic related issues label Apr 15, 2023
@lizhihongTest
Copy link
Collaborator Author

This issuse has been tested and verified in version 1.1.0.751.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working OOD-daemon The OOD-daemon basic service Panic Panic related issues
Projects
Status: Done
Development

No branches or pull requests

2 participants