-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPUID submodule #37479
Add CPUID submodule #37479
Conversation
As I said in #36367 this should not be a public API since using this to generate code is wrong. A strictly private API would be fine and should just use the existing C function instead. |
Also, this is capturing the build time environment which is wrong. Note that the selection of external binary also must be done at runtime and not package installation time or precompilation time. This is the problem shared by a lot of packages that must be fixed before they can be accepted to base/stdlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix detection time, do not export julia API.
Sorry, I read through again and it's just not connecting for me. How would this be used for code generation? |
For example, LoopVectorization.jl must not (and as I understand it currently is) using anything like this to determine what code to generate. And again, having some dirty hack that should work for library loading most of the time is fine, as long as it's clear that no one outside of base should use it and the API may be deleted at any time without notice. |
Fair point about detecting the features at runtime. Regarding the API, my understanding is that something is considered "public" if its documentation/docstring is exposed in the manual, which isn't the case in this pull request and wasn't my intention. Regarding the use of |
For now this can just go in the module that does the loading. That'll lower the chance of someone using it.
You can just run a julia script through C preprocessor. We do that for quite a few other headers. |
Any example to look at? |
Lines 24 to 34 in 4a412e3
|
Codecov Report
@@ Coverage Diff @@
## master #37479 +/- ##
==========================================
+ Coverage 87.54% 87.57% +0.03%
==========================================
Files 351 351
Lines 71009 71009
==========================================
+ Hits 62166 62188 +22
+ Misses 8843 8821 -22
Continue to review full report at Codecov.
|
Ok, I exported This module is not going to stay in |
fc6e8f7
to
14cd2d1
Compare
I've moved the module under the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the largest piece missing from this is any kind of mapping for aarch64 and armv7l. Once we have those pieces, this looks exactly like what I'm looking for.
I addressed the last two comments |
On the julia> using Base.BinaryPlatforms.CPUID
julia> for f in filter(n -> startswith(String(n), "JL_AArch64"), names(CPUID; all=true))
CPUID.test_cpu_feature(getfield(CPUID, f)) && println(f)
end
JL_AArch64_aes
JL_AArch64_crc
JL_AArch64_sha2 @yuyichao any clue why only these features are set? In particular, the machine is armv8.1-a, shouldn't julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_v8_1a)
false |
ISA version is impossible to detect (it's not a thing, basically). The only way to set it is through guessing. What is the CPU? Looking at the list, the only v8.1a I've seen is thunderx2.... |
# https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html. Keep in sync with | ||
# `arch_march_isa_mapping`. | ||
const ISAs_by_family = Dict( | ||
"x86_64" => ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make these enums so you know from the type what possible the possible values are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Dict's better so that we can add to them via Pkg hooks in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to change the Dict, just the entries, "core2", "nehalem"
etc. So you know that the keys are arches, and not say, the life work of Shakespeare, and you can reflect on the instances of the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the keys like "core2"
from a string to an enum type, how do we add a new value to that enum type at runtime? What I'm considering is that Julia 1.6 is around for 3+ years, and in that time we want to add new ISAs, which we will be able to do from within a package by running something like Base.BinaryPlatforms.ISAs_by_family["x86_64"]["cannonlake"] = ...
.
We can already reflect on keys(ISAs_by_family["x86_64"])
, what does an enum type give us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be able to interactively add new ISA's at runtime from packages then it clearly can't be an enum. I didn't know that was desirable, especially when it seems like there are multiple data structures here that need to be kept in sync and that the features are generated from the C-code which made it look like the data here is the one source of truth. But if that's not the case, then yeah, can't use an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which we will be able to do from within a package by running something like
Base.BinaryPlatforms.ISAs_by_family["x86_64"]["cannonlake"] = ....
Note that nothing from this PR will be a stable API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed how arch_march_isa_mapping
is defined below, which should make it easier to keep that dictionary in sync with this one.
The features shown in
I'm a bit confused by the comments in Lines 14 to 15 in 8bdf569
aes is implied by aes ? The feature at line 15 is called aes (like the one in line 14) but the comment mentions PMULL , is this correct?
|
.... but what CPU is it...
Yes it is correct. It says |
The list of features was the only information I could find in |
What's the full content of
ThunderX should be v8 |
repeated a bunch of times for all CPUs |
Yes that's a thunderx88 and we should be detecting it correctly. Why do you believe it's v8.1a? |
I was told the machine was a Thunder X2. Heh. Well this is good progress! |
Confirmed that on a Graviton2, this successfully identifies it as |
Since this detection is unreliable (it's based on known name) I would not recommend relying on it. You should check for the features you need instead. |
We get to set the features we need; e.g. we're generating binaries in BB by using compiler flags like |
Well, that's the part that needs to be fixed. v8.2a comes with features (don't remember which one off the top of my head) that cannot be reliably detected. |
Fixed by whom? Is it the LLVM detection that is unreliable? |
Fixed by you, to change the compilation flags.
No, there's simply no way at the hardware level to detect which version it is. The only thing available is the feature set. And the linux kernel also does not expose all the features that LLVM may use. |
So is what you're saying that we should avoid the |
Yes. At least until someone can find a way to reliably detect all features in all |
Okay, this we can work with. So you suggest that we look through what is defined in https://github.com/JuliaLang/julia/blob/master/src/features_aarch64.h, take those as the official list of flags that can be reliably detected on AArch64, and map those to what is implied by each |
Yes. |
All of those maps to kernel ABI so that's about as stable as it can be. It's possible that more could be added later for earlier versions but doing that shouldn't break anything. |
As a datapoint, I looked at what Example mappings for I'm going to collect here a list of features that we should map to: I'm going to edit this comment until I'm happy with it. :) First off, some useful resources:
Notably, I see that there are no base Questions:
|
You can just use our definition. Lines 230 to 341 in 4864161
No, they are just required feature. Soft floating point is not supported. I'm not sure what can be tested about something that just isn't supposed to work.
It was "clear" for x86 only because there are changes on the vector registers that somewhat overshadows other changes. There are actually also a few other extensions like bit manipulation and cache (prefetch, invalidate, flush) manipulations, half precisions, that doesn't really fit into the sse/avx evolution timeline. Of course since there are only two major players there aren't too many lines of succession. Even then, AMD is known for deprecating old and unused extensions in newer processors and actually removing them so it's not actually not as simple as the illusion one might get from just looking at the vector registers. On AArch64, the only extra SIMD level exists is SVE (and SVE2 I guess) so there just isn't the same evolution line that overshadow everything else. Also, SVE may or may not actually work with our compiler and unless someone get a chip/server from fujitsu or is working on fugaku you'll not be able to actually use the instruction. There is at least still another useful enough line of succession on aarch64 though, which is the memory model. 8.1-atomic (i.e. Other than the atomics, there are basically about the same number of other random interesting features. They are just not burried among a million avx512 flavors........ The usuable hardwares with these extensions are scarce enough that I don't really know what are the ones that people uses. (if anyone want to teach me how to hack a mid-high end phone/tablet so that I can run gdb on it I'll be very interested.............) OTOH, despite the dozens of cores and hundreds of SoC names, the actual unique features are actually pretty limited. If you have a look at the feature lists we have, most cores are basically either v8-a or v8.2-a and potentially with extensions that agrees with the cortex-a line fairly well. Apple seems to be the most agressive player and is bumping their ISA version on every new chip since a11 but unless someone can figure out how to run linux on there it won't be very relavent for linux libraries. So in the end I'll say just go with what known cores have is probably fine. This is also how I decided to suggest thunderx originally as an additional target since at the time it seems like it'll be the most immediate "server" platform that people can use. Feature wise this shouldn't give too many variance to support (it seems that |
Thanks, that's really helpful Yichao!
It's not about testing with a platform that doesn't have
So it sounds like just two marches for now may be okay:
I'm not sure how important |
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
Well, you are free to ignore these anywhere in the stack if you want copy-paste from Also, the two are always exactly the same according to the arm architecture manual so allowing both will be even more confusing. It's the same reason many other features are commented out, (e.g. |
According to Keno, rcpc is suspected to be required by |
19e5300
to
d73df58
Compare
This is a small internal module which defines the API to query the Instruction Set Architecture (ISA) of the current CPU. This can be used for example to select an artifact in a JLL package which is compatible with the current CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He's not 100% certain, so we're going to do some experiments to figure out exactly what is required by building |
@yuyichao I didn't see this. It would be great to have an API that provides features that are safe to really on. A more abstracted api would be generally useful. It's be great to have a reliable FMA_NATIVE, for example. |
The answer from LLVM was wrong as well. |
It solved the issue on Macs, but it is incorrect when starting Julia with Aside from "don't generate machine specific code from Julia at all". Unless I'm mistaken, |
@echo >> $@ | ||
endef | ||
|
||
$(BUILDDIR)/features_h.jl: ../src/features_x86.h ../src/features_aarch32.h ../src/features_aarch64.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke out of dir build":
vchuravy@odin ~/b/julia-replloader [2]> make -j
make[1]: *** No rule to make target '../src/features_x86.h', needed by 'features_h.jl'. Stop.
make[1]: *** Waiting for unfinished jobs....
make: *** [/home/vchuravy/src/julia/Makefile:66: julia-base] Error 2
make: *** Waiting for unfinished jobs....
This is a small internal module which defines the API to query the Instruction
Set Architecture (ISA) of the current CPU. This can be used for example to
select an artifact in a JLL package which is compatible with the current CPU.
This is related to #37320, and might change to accommodate some needs of that PR.
Fix #36367. CC: @staticfloat.