Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

Merged
merged 3 commits into from
May 15, 2020
Merged

fix(cstor): Fix the value of CPU_SEQID. Fixes crash on arm64, optimization on amd64. #309

merged 3 commits into from
May 15, 2020

Conversation

sgielen
Copy link

@sgielen sgielen commented May 13, 2020

Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the
ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code
to reduce lock contention.

In userspace, it is also intended to be used to reduce lock contention, but not
by defining it to the current CPU ID, since there's no portable and fast way to
get the ID of the current CPU. Instead, it is set using the last bits of
pthread_self() to find a value that will hopefully differ per thread. Since the
resulting value is used to index into arrays of size boot_ncpus, the amount of
CPUs in the system, the value must be lower than boot_ncpus.

Unintentionally, max_ncpus was used instead of boot_ncpus, causing a
possibility of returning a value outside of the array, causing an array out of
bounds read/write.

This was hidden in almost all cases, since pthread_self() actually seems to
return an aligned address on amd64, meaning the last bits are always zero,
therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock
contention evasion ineffective.

In this commit, there are three changes to the value of CPU_SEQID. First of
all, instead of using the least significant bits of pthread_self(), the value
is shifted right to skip the least significant bits. This ensures that even
with aligned addresses, the outcome will differ per thread. Then, instead of
using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in
the system, preventing array out-of-bounds. And, since we can't ensure that
boot_ncpus is a power of two, we can't use the bitwise AND, so we use the
modulus to reach the intended effect. This is somewhat slower, but since lock
contention has a much more beneficial effect, I expect performance will improve
overall.

For more background, also see openebs/openebs#3028.

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

Within the kernel, CPU_SEQID is defined to smp_processor_id, which returns the
ID of the current processor, between 0 and NR_CPUS. It's used in the ZFS code
to reduce lock contention.

In userspace, it is also intended to be used to reduce lock contention, but not
by defining it to the current CPU ID, since there's no portable and fast way to
get the ID of the current CPU. Instead, it is set using the last bits of
pthread_self() to find a value that will hopefully differ per thread. Since the
resulting value is used to index into arrays of size boot_ncpus, the amount of
CPUs in the system, the value must be lower than boot_ncpus.

Unintentionally, max_ncpus was used instead of boot_ncpus, causing a
possibility of returning a value outside of the array, causing an array out of
bounds read/write.

This was hidden in almost all cases, since pthread_self() actually seems to
return an aligned address on amd64, meaning the last bits are always zero,
therefore CPU_SEQID was always zero. Next to hiding the bug, this made the lock
contention evasion ineffective.

In this commit, there are three changes to the value of CPU_SEQID. First of
all, instead of using the least significant bits of pthread_self(), the value
is shifted right to skip the least significant bits. This ensures that even
with aligned addresses, the outcome will differ per thread. Then, instead of
using max_ncpus, boot_ncpus is used to bound by the actual number of CPUs in
the system, preventing array out-of-bounds. And, since we can't ensure that
boot_ncpus is a power of two, we can't use the bitwise AND, so we use the
modulus to reach the intended effect. This is somewhat slower, but since lock
contention has a much more beneficial effect, I expect performance will improve
overall.
vishnuitta
vishnuitta previously approved these changes May 14, 2020
Copy link

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good

@vishnuitta
Copy link

dmu, txg and zio are three areas where this CPU_SEQID is used.
In zio, its used to get the root of async ZIOs. If things were fine wrt multithreading when only '0' was used, it should be good even it selects different root.
In txg, locks are fine in txg_hold_open. So, that area is fine.
In dmu, its used in allocating objects. I hadn't really understood what this piece of code is doing. There is an array of uint64 initialized to 0. Based on the value in an index found through CPU_SEQID, objects are traversed. Here also, applying the logic that when only '0' was used, things should be fine starting at different objects also wrt multithreading.
cc: @pawanpraka1 @mynktl

mynktl
mynktl previously approved these changes May 15, 2020
Copy link
Member

@mynktl mynktl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pawanpraka1
pawanpraka1 previously approved these changes May 15, 2020
Copy link
Member

@pawanpraka1 pawanpraka1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

spa_async_zio_root seems to be allocated using max_ncpu. We are using CPU_SEQID in zio_nowait (https://github.com/openebs/cstor/blob/3823e2d08d881aa6014480dba823ec62bdf49f38/module/zfs/zio.c#L1797). We will have some memory which is never going to be used. It will not lead to any crash/corruption as boot_ncpu will always be less than max_ncpu.

@mynktl mynktl added the bug Something isn't working label May 15, 2020
@mynktl mynktl added this to the 1.11 milestone May 15, 2020
@mynktl
Copy link
Member

mynktl commented May 15, 2020

Hi @sgielen

Can you add a changelog commit in this PR. https://github.com/openebs/cstor/blob/develop/code-standard.md#adding-a-changelog

commit can be like openebs/velero-plugin@45bceff

Thanks!

@sgielen sgielen dismissed stale reviews from pawanpraka1, mynktl, and vishnuitta via 9410c70 May 15, 2020 14:36
Signed-off-by: Sjors Gielen <sjors@sjorsgielen.nl>
@sgielen
Copy link
Author

sgielen commented May 15, 2020

Hi @sgielen

Can you add a changelog commit in this PR. https://github.com/openebs/cstor/blob/develop/code-standard.md#adding-a-changelog

commit can be like openebs/velero-plugin@45bceff

Thanks!

@mynktl Done! I have included the sign-off for the DCO. Would you like me to add the sign-off for the initial commit as well, or is this OK like this?

@mynktl
Copy link
Member

mynktl commented May 15, 2020

@mynktl Done! I have included the sign-off for the DCO. Would you like me to add the sign-off for the initial commit as well, or is this OK like this?

You can skip for first commit. We do squash and merge, so DCO from second commit will be used for merge commit. cc: @vishnuitta

Copy link

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good

@vishnuitta vishnuitta merged commit 565eaeb into mayadata-io:develop May 15, 2020
@sgielen sgielen deleted the fix/cpu-seqid-out-of-bounds branch May 15, 2020 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants