-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/asm: add neon, vector instructions for arm #7300
Comments
This will not happen for the 1.5 release |
I agree this would be useful, and I apologize that we haven't had a chance to do it yet. Note that if you really need the instructions you can figure out what the encodings are (for example using the GNU assembler) and then use WORD directives to insert them in your assembly. I know that's less than ideal, but it's a workaround. Right now there's more we'd like to do than we have bandwidth for, so the reality is that this one is unplanned. |
... however, 'implementation' in this case just means, to fallback to the pure Go implementation, because go assembly does not yet support arm neon instructions (which are the equivalent of SSE). See golang/go#7300
This change adds a package, chacha20poly1305, which implements the ChaCha20-Poly1305 AEAD from RFC 7539. This AEAD has several attractive features: 1. It's naturally constant time. AES-GCM needs either dedicated hardware or extreme effort to be fast and constant-time, while this design is easy to make constant-time. 2. It's fast on modern processors: it runs at 1GB/s on my IvyBrige system. 3. It's seeing significant use in TLS. (A change for crypto/tls is forthcoming.) This change merges two CLs: https://go-review.googlesource.com/#/c/24717 https://go-review.googlesource.com/#/c/26691 I took the amd64-optimised AEAD implementation from the former because it was significantly faster. But the structure of the change is taken from the latter. This version will be checked into x/crypto. This package will then be vendored into the stdlib so that it can be used from crypto/tls. Change-Id: I5a60587958b7afeec81ca1091e603a7e8517000b Reviewed-on: https://go-review.googlesource.com/30728 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
This conversion used to take about 3s (for 4960x7016 RGB pixels). With the new code it takes about 300ms. More potential for improvement: we could run this code while reading pixels via USB. I haven’t looked at the USB timing in detail, but my guess is that we could squeeze in this post-processing into the time between requesting data from the device and receiving the data from the kernel. If that doesn’t work out, we could parallelize and post-process the previous buffer while reading the current buffer. Note that we need to use the WORD instruction because the Go assembler is lacking support for the NEON instructions, see golang/go#7300 related to issue #7
Any clue how much effort is needed to implement support for NEON? The "quick guide to Go's assembler" says that updating go's assembler is "straightforward" - I'm looking for some more details. May someone point me to some PR/diff with some similar implementation that was already done (for example SIMD for intel)? |
@henrydcase a change adding SSE4: https://golang.org/cl/57470 |
Specialized code also has to feature-detect for NEON, so a flag needs to be added to internal/cpu (and correspondingly x/sys/cpu) for |
I did benchmark with M1 and 10th Gen Core i5, Ryzen 9 on https://github.com/SimonWaldherr/golang-benchmarks. I got interesting result.
I think M1 native Go implementation doesn't use NEON, but Rosetta2 translate SSE instructions into NEON. I read hash/crc32 code, only amd64.s uses SIMD instructions. So I suppose this issue is important for improving benchmark result of ARM. |
any news here? |
I second your statement. It's 2023, ARM is getting more popular by the day and ARM servers are now available on all the 3 major cloud service providers. On top of that, Apple Silicon's performance is absolutely phenomenal, golang with NEON support on Apple Silicon will be just amazing. Rust already supports it, I think it's high time golang supports it too. |
by byron.rakitzis:
The text was updated successfully, but these errors were encountered: