-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel BUG with QAT during decompression #9276
Comments
Further info: trace 2 with BUG in
|
Also happens with ZFS git master (a49dbbb) and newest QAT version (1.7.l.4.5.0-00034). Even a simple |
Thanks Luki for reporting the issue. The root cause is missing the return value check (-ERESTARTSYS) when call wait_for_completion_interruptible_timeout, we will submit a PR to fix this issue later. |
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression openzfs#9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes openzfs#9276 Closes openzfs#9303
1. Fix issue: Kernel BUG with QAT during decompression #9276. Now it is uninterruptible for a specific given QAT request, but Ctrl-C interrupt still works in user-space process. 2. Copy the digest result to the buffer only when doing encryption, and vise-versa for decryption. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com> Closes #9276 Closes #9303
System information
Describe the problem you're observing
Kernel oops in
z_uncompress
orqat_compress_impl
when reading a file withdd
from a dataset with gzip compression and QAT enabled, after pressing Ctrl+C to abort the command. This is reproducible nearly every time on two different servers, one with PCIe DH8950 card and with with the Intel C627 chipset on the motherboard. Aborting a file copy doesn't seem to trigger it, onlydd
orcat
to /dev/null (so far).This is a new system with the C627 chipset and is not yet in production and contains no important data. It is otherwise stable for days during benchmarks with
fio
. I observed the same problem a year ago on different system with the PCIe card, but that was before 0.8.0 was officially released so I figured kinks are still being worked out.Describe how to reproduce the problem
or:
Include any warning/errors/backtraces from the system logs
The BUG generates one of the two stack traces (50/50 chance). Haven't seen any other ones yet.
Trace 1:
Trace 2:
It appears that Ctrl+C causes the
qat_compress
to fail and the decompression is retried in software byuncompress_func
(trace 1). How do I determine which line in that function offset 0xc6 corresponds to? At different times the BUG is triggered inqat_compress_impl
itself (trace 2).Disabling the QAT on the fly via
/sys/module/zfs/parameters/zfs_qat_compress_disable
mitigates the issue so this is definitely somehow related to the QAT.The text was updated successfully, but these errors were encountered: