-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FPU detected but no SIMD optimized encryption #9215
Comments
To test I'd suggest finding (or making) a large encrypted file, bringing the machine to idle, and doing If you can't find that function in the output or it's named something different then yeah you're probably not using AES-NI. Avoiding the ARC for this test is up to you. |
Thanks, I'm seeing |
Fletcher 4 is indeed SIMD optimized, |
To add to this,
Since I use The ZFS module is |
@AttilaFueloep if you're running f09fda5 then no other changes should be needed. You can check which optimized versions are available by checking the contents of the following files.
Micro-benchmarks indicating which version was determined to be the fastest are also available for Fletcher 4 and RAIDZ.
Encryption is a slightly different story, no micro-benchmarks are run for AES or GCM. When an optimized version is available it is assumed to perform better than the generic code and preferentially used. There is one caveat, the accelerated version won't be used when first decrypting the wrapping key as part of If that's not the case, we'll certainly want to did deeper. Note: |
@behlendorf First of all, thank you for your detailed explanations. I've no idea how I managed to miss the fact that ZoL uses ilumos crypto, usually I do know that. The It took me a while to sort this out, but I've a reproducer now. On a freshly booted system with an mostly idling Desktop run the following ($pool has mountpoint=none, not sure if this matters)
While
If you do an unmount/mount cycle of
GCM seems to pick up the expected implementation. |
@AttilaFueloep I was able to reproduce this issue locally and understand the issue. I'll see about putting together a patch. |
I've had Encrypted ZFS slow down very significantly and I think I have the same issue:
|
@lovesegfault The issue here is that newly created datasets do not pick up the fastest AES implementation until the dataset is remounted or the machine is rebooted. What implementation do you see getting used if you follow DeHackEds suggestion or the first part of my reproducer? What bandwidth are you observing on what hardware? I'm seeing 500 MB/s with all cores at 100% regardless of the AES implementation used. It does seem that the GCM calculation are the limiting factor. Currently I'm taking a stab at speeding things up, lets see how this goes. |
Thanks for commenting, I should have a patch ready for testing by the end of the week. |
Take your time, it's easy to work around. |
@AttilaFueloep I use ZFS as my root drive, I can't just unmount/remount, so this issue basically means my whole system is always slow. It's annoying, but at least everything continues to work :) |
I do as well,still I'm seeing the SIMD versions getting used. It's hard to tell more without knowing any details. |
@behlendorf I mentioned in the PR, but I figured this is a better place, it seems to me that somehow the SIMD algos are still not being picked. |
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
When adding the SIMD compatibility code in e5db313 the decryption of a dataset wrapping key was left in a user thread context. This was done intentionally since it's a relatively infrequent operation. However, this also meant that the encryption context templates were initialized using the generic operations. Therefore, subsequent encryption and decryption operations would use the generic implementation even when executed by an I/O pipeline thread. Resolve the issue by initializing the context templates in an I/O pipeline thread. And by updating zio_do_crypt_uio() to dispatch any encryption operations to a pipeline thread when called from the user context. For example, when performing a read from the ARC. Tested-by: Attila Fülöp <attila@fueloep.org> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9215 Closes openzfs#9296
System information
Describe the problem you're observing
It's my understanding that with the integration of #8965 SIMD support should work again and indeed I see avx2 as the fastest implementation in e.g. fletcher_4_bench, indicating that ZFS is able to use the FPU.
But reading a file off of an encrypted filesystem peaks at 500 MB/s and uses all CPU, clearly
indicating AES-NI isn't used. I'd expect well above 1 GB/s (NVMe drive, Intel i7-8750H)
at a much lower CPU load otherwise. If SIMD optimizations are really used for checksum calculations I can't tell, but it seems to me that reading from an unencrypted filesystem produces more CPU load then before, when SIMD support was working (pre 5.0), not sure though.
How would I debug this problem? Is there anything I have to tweak to get SIMD accelerated encryption back again?
I already asked on zfs-discuss but got no enlightening input.
Thanks
Attila
PS
Please refrain from starting any "evil Linux devs" discussion, all has been said in this regard already.
Describe how to reproduce the problem
Read a large file off of an encrypted filesystem and monitor throughput and CPU load.
Include any warning/errors/backtraces from the system logs
N/A
The text was updated successfully, but these errors were encountered: