Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mutex to avoid concurrent btrfs operations #43

Merged
merged 1 commit into from
Nov 24, 2020

Conversation

talex5
Copy link
Contributor

@talex5 talex5 commented Nov 18, 2020

Might help with problems such as this:

[11030132.006555] INFO: task ocluster-worker:602217 blocked for more than 120 seconds.
[11030132.015596]       Not tainted 5.4.0-40-generic #44-Ubuntu
[11030132.022547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11030132.032061] ocluster-worker D    0 602217      1 0x00004000
[11030132.032069] Call Trace:
[11030132.032092]  __schedule+0x2e3/0x740
[11030132.032106]  ? __switch_to_asm+0x40/0x70
[11030132.032116]  ? __switch_to_asm+0x34/0x70
[11030132.032126]  schedule+0x42/0xb0
[11030132.032130]  schedule_preempt_disabled+0xe/0x10
[11030132.032132]  __mutex_lock.isra.0+0x182/0x4f0
[11030132.032142]  ? try_to_del_timer_sync+0x54/0x80
[11030132.032145]  __mutex_lock_slowpath+0x13/0x20
[11030132.032148]  mutex_lock+0x2e/0x40
[11030132.032199]  btrfs_start_delalloc_roots+0x60/0x280 [btrfs]
[11030132.032238]  flush_space+0x5dd/0x740 [btrfs]
[11030132.032281]  ? lock_extent_buffer_for_io+0x370/0x370 [btrfs]
[11030132.032325]  ? __clear_extent_bit+0x201/0x4a0 [btrfs]
[11030132.032372]  priority_reclaim_metadata_space.isra.0+0x18b/0x220 [btrfs]
[11030132.032429]  ? can_overcommit.part.0+0x5f/0xc0 [btrfs]
[11030132.032466]  btrfs_reserve_metadata_bytes+0x578/0x950 [btrfs]
[11030132.032501]  ? btrfs_truncate_inode_items+0x35e/0xdb0 [btrfs]
[11030132.032505]  ? __mutex_lock.isra.0+0x429/0x4f0
[11030132.032557]  ? __btrfs_block_rsv_release+0x1c1/0x300 [btrfs]
[11030132.032595]  btrfs_block_rsv_refill+0x7d/0xa0 [btrfs]
[11030132.032628]  evict_refill_and_join+0x39/0xd0 [btrfs]
[11030132.032670]  btrfs_evict_inode+0x417/0x4c0 [btrfs]
[11030132.032689]  evict+0xd2/0x1b0
[11030132.032698]  iput+0x148/0x210
[11030132.032708]  dentry_unlink_inode+0xc6/0x110
[11030132.032720]  d_delete+0x76/0x80
[11030132.032727]  vfs_rmdir+0x179/0x1a0
[11030132.032732]  do_rmdir+0x18c/0x1c0
[11030132.032736]  __x64_sys_rmdir+0x17/0x20
[11030132.032744]  do_syscall_64+0x57/0x190
[11030132.032747]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Might help with problems such as this:

```
[11030132.006555] INFO: task ocluster-worker:602217 blocked for more than 120 seconds.
[11030132.015596]       Not tainted 5.4.0-40-generic ocurrent#44-Ubuntu
[11030132.022547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11030132.032061] ocluster-worker D    0 602217      1 0x00004000
[11030132.032069] Call Trace:
[11030132.032092]  __schedule+0x2e3/0x740
[11030132.032106]  ? __switch_to_asm+0x40/0x70
[11030132.032116]  ? __switch_to_asm+0x34/0x70
[11030132.032126]  schedule+0x42/0xb0
[11030132.032130]  schedule_preempt_disabled+0xe/0x10
[11030132.032132]  __mutex_lock.isra.0+0x182/0x4f0
[11030132.032142]  ? try_to_del_timer_sync+0x54/0x80
[11030132.032145]  __mutex_lock_slowpath+0x13/0x20
[11030132.032148]  mutex_lock+0x2e/0x40
[11030132.032199]  btrfs_start_delalloc_roots+0x60/0x280 [btrfs]
[11030132.032238]  flush_space+0x5dd/0x740 [btrfs]
[11030132.032281]  ? lock_extent_buffer_for_io+0x370/0x370 [btrfs]
[11030132.032325]  ? __clear_extent_bit+0x201/0x4a0 [btrfs]
[11030132.032372]  priority_reclaim_metadata_space.isra.0+0x18b/0x220 [btrfs]
[11030132.032429]  ? can_overcommit.part.0+0x5f/0xc0 [btrfs]
[11030132.032466]  btrfs_reserve_metadata_bytes+0x578/0x950 [btrfs]
[11030132.032501]  ? btrfs_truncate_inode_items+0x35e/0xdb0 [btrfs]
[11030132.032505]  ? __mutex_lock.isra.0+0x429/0x4f0
[11030132.032557]  ? __btrfs_block_rsv_release+0x1c1/0x300 [btrfs]
[11030132.032595]  btrfs_block_rsv_refill+0x7d/0xa0 [btrfs]
[11030132.032628]  evict_refill_and_join+0x39/0xd0 [btrfs]
[11030132.032670]  btrfs_evict_inode+0x417/0x4c0 [btrfs]
[11030132.032689]  evict+0xd2/0x1b0
[11030132.032698]  iput+0x148/0x210
[11030132.032708]  dentry_unlink_inode+0xc6/0x110
[11030132.032720]  d_delete+0x76/0x80
[11030132.032727]  vfs_rmdir+0x179/0x1a0
[11030132.032732]  do_rmdir+0x18c/0x1c0
[11030132.032736]  __x64_sys_rmdir+0x17/0x20
[11030132.032744]  do_syscall_64+0x57/0x190
[11030132.032747]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
```
@talex5 talex5 merged commit c22da3b into ocurrent:master Nov 24, 2020
@talex5 talex5 deleted the btrfs-maybe-crash-less branch November 24, 2020 10:20
@talex5
Copy link
Contributor Author

talex5 commented Nov 26, 2020

Didn't fix it:

[1350642.668227] INFO: task ocluster-worker:1891256 blocked for more than 120 seconds.
[1350642.668294]       Not tainted 5.4.0-52-generic #57-Ubuntu
[1350642.668307] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1350642.668323] ocluster-worker D    0 1891256      1 0x00040000
[1350642.668327] Call Trace:
[1350642.668330] [c00000008c8eb480] [c00000008c8eb530] 0xc00000008c8eb530 (unreliable)
[1350642.668336] [c00000008c8eb660] [c000000000021a1c] __switch_to+0x2dc/0x450
[1350642.668340] [c00000008c8eb6d0] [c000000000eebd5c] __schedule+0x2ec/0x930
[1350642.668343] [c00000008c8eb7b0] [c000000000eec3f8] schedule+0x58/0x130
[1350642.668346] [c00000008c8eb7e0] [c000000000eeca10] schedule_preempt_disabled+0x20/0x30
[1350642.668348] [c00000008c8eb800] [c000000000eeed28] __mutex_lock.isra.0+0x2b8/0x790
[1350642.668379] [c00000008c8eb8c0] [c0080000022f12c0] btrfs_start_delalloc_roots+0x78/0x3f0 [btrfs]
[1350642.668406] [c00000008c8eb970] [c0080000023972ec] flush_space+0x694/0xa80 [btrfs]
[1350642.668433] [c00000008c8eba60] [c008000002397b0c] priority_reclaim_metadata_space+0x234/0x438 [btrfs]
[1350642.668460] [c00000008c8ebad0] [c0080000023996b0] btrfs_reserve_metadata_bytes+0x758/0xb98 [btrfs]
[1350642.668486] [c00000008c8ebbe0] [c00800000239a644] btrfs_block_rsv_refill+0xdc/0x160 [btrfs]
[1350642.668513] [c00000008c8ebc20] [c0080000022d8258] evict_refill_and_join+0x50/0x158 [btrfs]
[1350642.668541] [c00000008c8ebca0] [c0080000022e9b18] btrfs_evict_inode+0x510/0x620 [btrfs]
[1350642.668543] [c00000008c8ebd30] [c0000000004aa330] evict+0x100/0x250
[1350642.668546] [c00000008c8ebd70] [c000000000496848] do_unlinkat+0x258/0x3a0
[1350642.668549] [c00000008c8ebe20] [c00000000000b278] system_call+0x5c/0x68
[1350642.668552] INFO: task ocluster-worker:3594560 blocked for more than 120 seconds.
[1350642.668569]       Not tainted 5.4.0-52-generic #57-Ubuntu
[1350642.668581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1350642.668598] ocluster-worker D    0 3594560      1 0x00040000
[1350642.668600] Call Trace:
[1350642.668601] [c00000001f6b7310] [c00000001f6b73c0] 0xc00000001f6b73c0 (unreliable)
[1350642.668605] [c00000001f6b74f0] [c000000000021a1c] __switch_to+0x2dc/0x450
[1350642.668608] [c00000001f6b7560] [c000000000eebd5c] __schedule+0x2ec/0x930
[1350642.668611] [c00000001f6b7640] [c000000000eec3f8] schedule+0x58/0x130
[1350642.668613] [c00000001f6b7670] [c000000000ef2078] schedule_timeout+0x168/0x1c0
[1350642.668615] [c00000001f6b7740] [c000000000eee538] wait_for_completion+0xf8/0x230
[1350642.668643] [c00000001f6b77d0] [c0080000023019fc] btrfs_wait_ordered_extents+0x2d4/0x410 [btrfs]
[1350642.668671] [c00000001f6b78b0] [c008000002301cd0] btrfs_wait_ordered_roots+0x198/0x340 [btrfs]
[1350642.668698] [c00000001f6b7970] [c008000002397310] flush_space+0x6b8/0xa80 [btrfs]
[1350642.668728] [c00000001f6b7a60] [c008000002397b0c] priority_reclaim_metadata_space+0x234/0x438 [btrfs]
[1350642.668769] [c00000001f6b7ad0] [c0080000023996b0] btrfs_reserve_metadata_bytes+0x758/0xb98 [btrfs]
[1350642.668815] [c00000001f6b7be0] [c00800000239a644] btrfs_block_rsv_refill+0xdc/0x160 [btrfs]
[1350642.668865] [c00000001f6b7c20] [c0080000022d8258] evict_refill_and_join+0x50/0x158 [btrfs]
[1350642.668907] [c00000001f6b7ca0] [c0080000022e9b18] btrfs_evict_inode+0x510/0x620 [btrfs]
[1350642.668912] [c00000001f6b7d30] [c0000000004aa330] evict+0x100/0x250
[1350642.668916] [c00000001f6b7d70] [c000000000496848] do_unlinkat+0x258/0x3a0
[1350642.668922] [c00000001f6b7e20] [c00000000000b278] system_call+0x5c/0x68

talex5 added a commit to talex5/opam-repository that referenced this pull request Dec 30, 2020
CHANGES:

- Add support for nested / multi-stage builds (@talex5 ocurrent/obuilder#48 ocurrent/obuilder#49).
  This allows you to use a large build environment to create a binary and then
  copy that into a smaller runtime environment. It's also useful to get better caching
  if two things can change independently (e.g. you want to build your software and also
  a linting tool, and be able to update either without rebuilding the other).

- Add healthcheck feature (@talex5 ocurrent/obuilder#52).
  - Checks that Docker is running.
  - Does a test build using busybox.

- Clean up left-over runc containers on restart (@talex5 ocurrent/obuilder#53).
  If btrfs crashes and makes the filesystem read-only then after rebooting there will be stale runc directories.
  New jobs with the same IDs would then fail.

- Remove dependency on dockerfile (@talex5 ocurrent/obuilder#51).
  This also allows us more control over the formatting
  (e.g. putting a blank line between stages in multi-stage builds).

- Record log output from docker pull (@talex5 ocurrent/obuilder#46).
  Otherwise, it's not obvious why we've stopped at a pull step, or what is happening.

- Improve formatting of OBuilder specs (@talex5 ocurrent/obuilder#45).

- Use seccomp policy to avoid necessary sync operations (@talex5 ocurrent/obuilder#44).
  Sync operations are really slow on btrfs. They're also pointless,
  since if the computer crashes while we're doing a build then we'll just throw it away and start again anyway.
  Use a seccomp policy that causes all sync operations to "fail", with errno 0 ("success").
  On my machine, this reduces the time to `apt-get install -y shared-mime-info` from 18.5s to 4.7s.
  Use `--fast-sync` to enable to new behaviour (it requires runc 1.0.0-rc92).

- Use a mutex to avoid concurrent btrfs operations (@talex5 ocurrent/obuilder#43).
  Btrfs deadlocks enough as it is. Don't stress it further by trying to do two things at once.

Internal changes:

- Improve handling of file redirections (@talex5 ocurrent/obuilder#46).
  Instead of making the caller do all the work of closing the file descriptors safely, add an `FD_move_safely` mode.

- Travis tests: ensure apt cache is up-to-date (@talex5 ocurrent/obuilder#50).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant