Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance trace inline asm in os_common.h #492

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

disconnect3d
Copy link

@disconnect3d disconnect3d commented Jun 14, 2020

This commit fixes an inline assembly block that uses rdtsc instruction in order to trace the SQLite performance.

The issue is that the __asm__ block is not marked as __volatile__ and so an optimizing compiler (e.g. GCC 10.1 with -O3 compilation flag) may optimize out a second call to the hwtime() function, assuming it should return the same value.

This behavior is also described in GCC docs in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile after the following block:

The following example demonstrates a case where you need to use the volatile qualifier. It uses the x86 rdtsc instruction, which reads the computer’s time-stamp counter. Without the volatile qualifier, the optimizers might assume that the asm block will always return the same value and therefore optimize away the second call.

The issue can also bee seen on https://godbolt.org/z/v_a-Qy

This commit fixes an inline assembly block that uses `rdtsc` instructin in order to trace the SQLite performance.

The issue is that the `__asm__` block is not marked as `__volatile__` and so an optimizing compiler (e.g. GCC 10.1 with -O3 compilation flag) may optimize out a second call to the `hwtime()` function, assuming it should return the same value.

This behavior is also described in GCC docs in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile after the following block:
> The following example demonstrates a case where you need to use the volatile qualifier. It uses the x86 rdtsc instruction, which reads the computer’s time-stamp counter. Without the volatile qualifier, the optimizers might assume that the asm block will always return the same value and therefore optimize away the second call.

The issue can also bee seen on https://godbolt.org/z/v_a-Qy
@disconnect3d
Copy link
Author

btw this code comes from sqlite and it seems newer sqlite has it fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant