fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

sgielen · 2020-05-13T20:09:45Z

Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the
ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code
to reduce lock contention.

In userspace, it is also intended to be used to reduce lock contention, but not
by defining it to the current CPU ID, since there's no portable and fast way to
get the ID of the current CPU. Instead, it is set using the last bits of
pthread_self() to find a value that will hopefully differ per thread. Since the
resulting value is used to index into arrays of size boot_ncpus, the amount of
CPUs in the system, the value must be lower than boot_ncpus.

Unintentionally, max_ncpus was used instead of boot_ncpus, causing a
possibility of returning a value outside of the array, causing an array out of
bounds read/write.

This was hidden in almost all cases, since pthread_self() actually seems to
return an aligned address on amd64, meaning the last bits are always zero,
therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock
contention evasion ineffective.

In this commit, there are three changes to the value of CPU_SEQID. First of
all, instead of using the least significant bits of pthread_self(), the value
is shifted right to skip the least significant bits. This ensures that even
with aligned addresses, the outcome will differ per thread. Then, instead of
using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in
the system, preventing array out-of-bounds. And, since we can't ensure that
boot_ncpus is a power of two, we can't use the bitwise AND, so we use the
modulus to reach the intended effect. This is somewhat slower, but since lock
contention has a much more beneficial effect, I expect performance will improve
overall.

For more background, also see openebs/openebs#3028.

Checklist:

Fixes #
PR Title follows the convention of <type>(<scope>): <subject>
Has the change log section been updated?
Commit has unit tests
Commit has integration tests
(Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
(Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code to reduce lock contention. In userspace, it is also intended to be used to reduce lock contention, but not by defining it to the current CPU ID, since there's no portable and fast way to get the ID of the current CPU. Instead, it is set using the last bits of pthread_self() to find a value that will hopefully differ per thread. Since the resulting value is used to index into arrays of size boot_ncpus, the amount of CPUs in the system, the value must be lower than boot_ncpus. Unintentionally, max_ncpus was used instead of boot_ncpus, causing a possibility of returning a value outside of the array, causing an array out of bounds read/write. This was hidden in almost all cases, since pthread_self() actually seems to return an aligned address on amd64, meaning the last bits are always zero, therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock contention evasion ineffective. In this commit, there are three changes to the value of CPU_SEQID. First of all, instead of using the least significant bits of pthread_self(), the value is shifted right to skip the least significant bits. This ensures that even with aligned addresses, the outcome will differ per thread. Then, instead of using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in the system, preventing array out-of-bounds. And, since we can't ensure that boot_ncpus is a power of two, we can't use the bitwise AND, so we use the modulus to reach the intended effect. This is somewhat slower, but since lock contention has a much more beneficial effect, I expect performance will improve overall.

vishnuitta

changes are good

vishnuitta · 2020-05-14T18:10:58Z

dmu, txg and zio are three areas where this CPU_SEQID is used.
In zio, its used to get the root of async ZIOs. If things were fine wrt multithreading when only '0' was used, it should be good even it selects different root.
In txg, locks are fine in txg_hold_open. So, that area is fine.
In dmu, its used in allocating objects. I hadn't really understood what this piece of code is doing. There is an array of uint64 initialized to 0. Based on the value in an index found through CPU_SEQID, objects are traversed. Here also, applying the logic that when only '0' was used, things should be fine starting at different objects also wrt multithreading.
cc: @pawanpraka1 @mynktl

mynktl

lgtm

pawanpraka1

looks good.

spa_async_zio_root seems to be allocated using max_ncpu. We are using CPU_SEQID in zio_nowait (https://github.com/openebs/cstor/blob/3823e2d08d881aa6014480dba823ec62bdf49f38/module/zfs/zio.c#L1797). We will have some memory which is never going to be used. It will not lead to any crash/corruption as boot_ncpu will always be less than max_ncpu.

mynktl · 2020-05-15T14:04:50Z

Hi @sgielen

Can you add a changelog commit in this PR. https://github.com/openebs/cstor/blob/develop/code-standard.md#adding-a-changelog

commit can be like openebs/velero-plugin@45bceff

Thanks!

Signed-off-by: Sjors Gielen <sjors@sjorsgielen.nl>

sgielen · 2020-05-15T14:38:32Z

Hi @sgielen

Can you add a changelog commit in this PR. https://github.com/openebs/cstor/blob/develop/code-standard.md#adding-a-changelog

commit can be like openebs/velero-plugin@45bceff

Thanks!

@mynktl Done! I have included the sign-off for the DCO. Would you like me to add the sign-off for the initial commit as well, or is this OK like this?

mynktl · 2020-05-15T15:13:54Z

@mynktl Done! I have included the sign-off for the DCO. Would you like me to add the sign-off for the initial commit as well, or is this OK like this?

You can skip for first commit. We do squash and merge, so DCO from second commit will be used for merge commit. cc: @vishnuitta

vishnuitta

changes are good

sgielen mentioned this pull request May 13, 2020

zrepl crashing on arm64 openebs/openebs#3028

Closed

vishnuitta requested review from vishnuitta, mynktl and pawanpraka1 May 14, 2020 08:13

vishnuitta previously approved these changes May 14, 2020

View reviewed changes

mynktl previously approved these changes May 15, 2020

View reviewed changes

pawanpraka1 previously approved these changes May 15, 2020

View reviewed changes

mynktl added the bug Something isn't working label May 15, 2020

mynktl added this to the 1.11 milestone May 15, 2020

sgielen dismissed stale reviews from pawanpraka1, mynktl, and vishnuitta via 9410c70 May 15, 2020 14:36

Adding changelog.

6b915b3

Signed-off-by: Sjors Gielen <sjors@sjorsgielen.nl>

Merge branch 'develop' into fix/cpu-seqid-out-of-bounds

c946dc9

vishnuitta approved these changes May 15, 2020

View reviewed changes

vishnuitta merged commit 565eaeb into mayadata-io:develop May 15, 2020

sgielen deleted the fix/cpu-seqid-out-of-bounds branch May 15, 2020 17:28

sgielen mentioned this pull request May 15, 2020

fix(cstor): Remove our userspace implementation of io_getevents. openebs-archive/libcstor#63

Closed

7 tasks

sgielen mentioned this pull request May 24, 2020

ERROR: Module rbd not found. rancher/k3os#493

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

sgielen commented May 13, 2020 •

edited

Loading

vishnuitta left a comment

vishnuitta commented May 14, 2020

mynktl left a comment

pawanpraka1 left a comment •

edited

Loading

mynktl commented May 15, 2020

sgielen commented May 15, 2020 •

edited

Loading

mynktl commented May 15, 2020

vishnuitta left a comment

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

Conversation

sgielen commented May 13, 2020 • edited Loading

vishnuitta left a comment

Choose a reason for hiding this comment

vishnuitta commented May 14, 2020

mynktl left a comment

Choose a reason for hiding this comment

pawanpraka1 left a comment • edited Loading

Choose a reason for hiding this comment

mynktl commented May 15, 2020

sgielen commented May 15, 2020 • edited Loading

mynktl commented May 15, 2020

vishnuitta left a comment

Choose a reason for hiding this comment

sgielen commented May 13, 2020 •

edited

Loading

pawanpraka1 left a comment •

edited

Loading

sgielen commented May 15, 2020 •

edited

Loading