-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc][docs] codify Policy on Assembler Sources #88185
Conversation
@llvm/pr-subscribers-libc Author: Nick Desaulniers (nickdesaulniers) ChangesIt would be helpful in future code reviews to document a policy with regards to Link: #87837 Full diff: https://github.com/llvm/llvm-project/pull/88185.diff 1 Files Affected:
diff --git a/libc/docs/dev/code_style.rst b/libc/docs/dev/code_style.rst
index 22a18b7a4cc1dd..a87e971f4dd8e9 100644
--- a/libc/docs/dev/code_style.rst
+++ b/libc/docs/dev/code_style.rst
@@ -186,3 +186,32 @@ We expect contributions to be free of warnings from the `minimum supported
compiler versions`__ (and newer).
.. __: https://libc.llvm.org/compiler_support.html#minimum-supported-versions
+
+Policy on Assembler sources
+===========================
+
+Coding in high level languages such as C++ provides benefits relative to low
+level languages like Assembler, such as:
+
+* Improved safety
+* Instrumentation
+
+ * Code coverage
+ * Profile collection
+* Sanitization
+* Debug info
+
+While its not impossible to have Assembler code that correctly provides all of
+the above, we do not wish to maintain such Assembler sources in llvm-libc.
+
+That said, there a few functions provided by llvm-libc that are more difficult
+to implement or maintain in C++ than Assembler. We do use inline or out-of-line
+Assembler in an intentionally minimal set of places; typically places where the
+stack or individual register state must be manipulated very carefully for
+correctness.
+
+Contributions adding Assembler for performance are not welcome. Contributors
+should strive to stick with C++ for as long as it remains reasonable to do so.
+llvm-libc maintainers reserve the right to reject Assembler contributions that
+could they feel could be better maintained if rewritten in C++, and to revisit
+this policy in the future.
|
libc/docs/dev/code_style.rst
Outdated
That said, there a few functions provided by llvm-libc that are more difficult | ||
to implement or maintain in C++ than Assembler. We do use inline or out-of-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, there a few functions provided by llvm-libc that are more difficult | |
to implement or maintain in C++ than Assembler. We do use inline or out-of-line | |
That said, there a few functions provided by llvm-libc that are impossible | |
to reliably implement in C++. We do use inline or out-of-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add, "...to reliably implement in C++ for all compilers supported for building the libc."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's a point worth making explicit --- functions like this were hard enough for bionic to deal with, and we only had to support gcc or clang at any given time. just the different versions of the same compiler were painful enough, without having to worry about two completely different compilers. (and we did junk several such functions as we transitioned from gcc to clang!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded this in c06aa9d3d5ee to:
That said, there a few functions provided by llvm-libc that are impossible
to reliably implement in C++ for all compilers supported for building
llvm-libc.
cc @dpxf |
libc/docs/dev/code_style.rst
Outdated
stack or individual register state must be manipulated very carefully for | ||
correctness. | ||
|
||
Contributions adding Assembler for performance are not welcome. Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're not carving out an exception for the string/memory functions? or just punting that down the road until commercial users show up?
explicitly mention that anyone wanting to go that route should consider whether they can do it via a builtin? (like all the math functions that turn into a single instruction on modern architectures, or stuff like ffs() or abs() etc?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have commercial users of the libc's string/memory functions (i.e. Google production).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aye, but google3 is a bit of a monoculture... i meant something more like Android, where every SoC vendor will want workarounds for their SoC's (and SoCs') specific characteristics. commercial in the sense of "where people sell products based on benchmark scores", for better and -- mostly -- worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new suggested edit, in general if you can write "better" assembly than the compiler can generate, we should teach the compiler to generate the better code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on teaching compilers to generate better code.
OTOH, as a library, we do have to support older compiler versions than the latest ones, so maybe in case of performance-critical, we can consider case-by-case with:
- upstream compilers' issues / PRs to improve the generated codes,
- deprecation timeline / plan when the minimum supported compilers' versions get updated,
- something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's easier for Android where we explicitly only support one version of one compiler at any given time.
in my experience though you're unlikely to have people actually take you up on this. certainly not the same people who'll want to give you -- naming no names -- "a binary blob in ASCII form, containing the output of our proprietary compiler on source that may or may not be the same as your source", who are the usual people i have to tell to go away and improve llvm instead :-) (this is typically for libm stuff.)
the libc stuff is always the string/memory routines, and there it's going to be stuff like ARM-software/optimized-routines#68 that you're not going to want anyway. luckily it's been quite quiet for us for a decade, as arm64 was effectively single-source, arm32 was dead, and x86/x86-64 weren't commercially significant (though we do have all the usual s*e*[2-4]
and avx\d+
variants anyway). personally i'd be very interested to see how many of those you can make go away with function multiversioning. (but probably never the non-temporal stuff, and convincing people who're paid to make benchmarks go up that making benchmarks go up is useless, well, "It is difficult to get a man to understand something when his salary depends upon his not understanding it".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document as it stands sounds like it prohibits the existing op_x86.h's usage,
LIBC_INLINE static void repmovsb(void *dst, const void *src, size_t count) {
asm volatile("rep movsb" : "+D"(dst), "+S"(src), "+c"(count) : : "memory");
}
Is that the intent? IMO that's too strict.
I'd say there's 3 levels:
- Very short, targeted, inline-asm statements (only a couple of instructions).
- Longer inline-asm statements.
- Functions written entirely in asm (whether standalone asm files or "naked" function).
All levels should require some justification, but each should requires substantially greater justification than the previous. Level 3, I think is widely agreed, should be outright prohibited except when it's required for correctness.
More flexibility might be reasonable for the others: e.g. level 1 should perhaps be acceptable simply if the desired instruction is not readily accessible as an intrinsic today (e.g. rep movsb
, cacheline management, potentially things like specialized math instructions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I didn't realize we did have inline asm for these kind of optimizations. @gchatelet is attending EuroLLVM this week, but maybe next week he can catch up on this thread and help us draw the distinction of precisely where we want to draw the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time the semantics are accessible through compiler builtins (e.g. prefetch instructions) but there are situations where its impossible to access a specific CPU instruction from C++ (e.g. rep movsb
, dc zva
).
In principle, I agree with @dpxf that we should change the compiler and make sure we have a way to reach out to special semantics from the frontend (e.g. introduction of __builtin_memcpy_inline
) but in practice, the fact that we have to support old versions of both Clang and GCC makes it much harder. For instance, inline_memcpy
has a special path for Clang and a fallback for GCC. I'm not digging into the details here but this can lead to suboptimal codegen for GCC when generating memcpy
, it also requires that the GCC version is compiled with -fno-builtin-memcpy
, where the Clang version doesn't. So, if -in principle- we should improve the compiler, the effort to get both Clang and GCC inline on these intrinsics is substantial, notwithstanding the compiler release cycles, fallback code and build systems flags we'll have to maintain to support older versions.
So practically, for these admittedly rare cases, I think inline asm
strikes a good balance between portability, maintenance and functionality. The layered approach from #88185 (comment) makes perfect sense to me. How about also requiring that such PRs containing asm
statements should always be reviewed by at least two project maintainers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the feedback. I've updated this in 0749f082f33b30fb3fdd5f8888177c4ade17dd6b. PTAL.
I now mention:
-
instances where a specific instruction sequence does not have a corresponding compiler builtin function today.
-
Contributions adding functions implemented purely in Assembly for performance are not welcome.
-
Ideally, bugs should be filed against compiler vendors, and links to those bug reports should appear in commit messages or comments that seek to add Assembly to llvm-libc.
-
Patches containing any amount of Assembly ideally should be approved by 2 maintainers.
libc/docs/dev/code_style.rst
Outdated
stack or individual register state must be manipulated very carefully for | ||
correctness. | ||
|
||
Contributions adding Assembly for performance are not welcome. Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...are not welcome. If you feel you can write better assembly than the compiler can generate, file a bug against LLVM, as we should instead be teaching the compiler to generate the better code."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the wording here in 0749f08
Ideally, bugs should be filed against compiler vendors, and links to those bug reports should appear in commit messages or comments that seek to add Assembly to llvm-libc.
PTAL
Thanks all for the feedback so far. PTAL at the latest revision, and provide any feedback. I appreciate the discussion; there do not seem to be any strongly dissenting view points expressed in this thread so far. |
I plan to merge this Monday April 22nd unless there's additional feedback from reviewers by then. nudge nudge wink wink Edit: that's not a deadline for having a policy set in stone. We can always revisit and revise this later. It's more helpful to have some ground rules written down so we can proceed with fixing setjmp+longjmp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after fixing the typo gchatelet pointed out.
It would be helpful in future code reviews to document a policy with regards to where and when Assembler sources are appropriate. That way when reviewers point out infractions, they can point to this written policy, which may help contributors understand that it's not the solely personal preferences of reviewers but rather a previously agreed upon rule by maintainers. Link: llvm#87837 Link: llvm#88157 Link: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/12
a9c0ce3
to
280255d
Compare
It would be helpful in future code reviews to document a policy with regards to
where and when Assembler sources are appropriate. That way when reviewers point
out infractions, they can point to this written policy, which may help
contributors understand that it's not solely the personal preferences of
individual reviewers but instead rather a previously agreed upon rule by maintainers.
Link: #87837
Link: #88157
Link: https://discourse.llvm.org/t/hand-written-in-assembly-in-libc-setjmp-longjmp/73249/12