Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist() #12439

Open
rincebrain opened this issue Jul 27, 2021 · 3 comments
Open

PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist() #12439

rincebrain opened this issue Jul 27, 2021 · 3 comments
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rincebrain
Copy link
Contributor

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10.10
Kernel Version 4.19.0-17-amd64
Architecture x86_64
OpenZFS Version 2.0.3-8~bpo10+1

Describe the problem you're observing

Someone said to me "hey, I tried zfs send -w pool/encrypted/filesystem | zfs receive some/where and it works, but zfs send -v -w pool/encrypted/filesystem | zfs receive some/where errors out with Invalid argument, what gives", I went "I don't think I expect that to work", tried zfs send -w mypool/encroot > /dev/null and the magic smoke came out.

(Since I couldn't find any report of this before, I figured it was worth reporting for search results even if I try reproducing it on git master and it's been corrected.)

Describe how to reproduce the problem

Above.

Include any warning/errors/backtraces from the system logs

zfs get all phantasm/encroot:

NAME              PROPERTY               VALUE                                SOURCE
phantasm/encroot  type                   filesystem                           -
phantasm/encroot  creation               Fri May 28  1:15 2021                -
phantasm/encroot  used                   707G                                 -
phantasm/encroot  available              2.82T                                -
phantasm/encroot  referenced             760K                                 -
phantasm/encroot  compressratio          1.08x                                -
phantasm/encroot  mounted                no                                   -
phantasm/encroot  quota                  none                                 default
phantasm/encroot  reservation            none                                 default
phantasm/encroot  recordsize             128K                                 default
phantasm/encroot  mountpoint             /phantasm/encroot                    default
phantasm/encroot  sharenfs               off                                  default
phantasm/encroot  checksum               on                                   default
phantasm/encroot  compression            gzip-9                               local
phantasm/encroot  atime                  off                                  inherited from phantasm
phantasm/encroot  devices                on                                   default
phantasm/encroot  exec                   on                                   default
phantasm/encroot  setuid                 on                                   default
phantasm/encroot  readonly               off                                  default
phantasm/encroot  zoned                  off                                  default
phantasm/encroot  snapdir                hidden                               default
phantasm/encroot  aclmode                discard                              default
phantasm/encroot  aclinherit             restricted                           default
phantasm/encroot  createtxg              15965940                             -
phantasm/encroot  canmount               on                                   default
phantasm/encroot  xattr                  sa                                   inherited from phantasm
phantasm/encroot  copies                 1                                    default
phantasm/encroot  version                5                                    -
phantasm/encroot  utf8only               off                                  -
phantasm/encroot  normalization          none                                 -
phantasm/encroot  casesensitivity        sensitive                            -
phantasm/encroot  vscan                  off                                  default
phantasm/encroot  nbmand                 off                                  default
phantasm/encroot  sharesmb               off                                  default
phantasm/encroot  refquota               none                                 default
phantasm/encroot  refreservation         none                                 default
phantasm/encroot  guid                   8760648724915161698                  -
phantasm/encroot  primarycache           all                                  default
phantasm/encroot  secondarycache         all                                  default
phantasm/encroot  usedbysnapshots        0B                                   -
phantasm/encroot  usedbydataset          760K                                 -
phantasm/encroot  usedbychildren         707G                                 -
phantasm/encroot  usedbyrefreservation   0B                                   -
phantasm/encroot  logbias                latency                              default
phantasm/encroot  objsetid               914                                  -
phantasm/encroot  dedup                  off                                  default
phantasm/encroot  mlslabel               none                                 default
phantasm/encroot  sync                   standard                             default
phantasm/encroot  dnodesize              auto                                 local
phantasm/encroot  refcompressratio       1.00x                                -
phantasm/encroot  written                760K                                 -
phantasm/encroot  logicalused            760G                                 -
phantasm/encroot  logicalreferenced      302K                                 -
phantasm/encroot  volmode                default                              default
phantasm/encroot  filesystem_limit       none                                 default
phantasm/encroot  snapshot_limit         none                                 default
phantasm/encroot  filesystem_count       none                                 default
phantasm/encroot  snapshot_count         none                                 default
phantasm/encroot  snapdev                hidden                               default
phantasm/encroot  acltype                off                                  default
phantasm/encroot  context                none                                 default
phantasm/encroot  fscontext              none                                 default
phantasm/encroot  defcontext             none                                 default
phantasm/encroot  rootcontext            none                                 default
phantasm/encroot  relatime               off                                  default
phantasm/encroot  redundant_metadata     all                                  default
phantasm/encroot  overlay                on                                   default
phantasm/encroot  encryption             aes-256-gcm                          -
phantasm/encroot  keylocation            file:///workspace/highchurn/enc.key  local
phantasm/encroot  keyformat              raw                                  -
phantasm/encroot  pbkdf2iters            0                                    default
phantasm/encroot  encryptionroot         phantasm/encroot                     -
phantasm/encroot  keystatus              available                            -
phantasm/encroot  special_small_blocks   0                                    default
phantasm/encroot  com.sun:auto-snapshot  false                                inherited from phantasm

dmesg | tail -n mumble

[1284794.092468] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[1284794.092493] PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist()
[1284794.092513] Showing stack for process 35452
[1284794.092515] CPU: 15 PID: 35452 Comm: zfs Kdump: loaded Tainted: P           OE     4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284794.092516] Hardware name: Supermicro Super Server/X10SDV-TLN4F, BIOS 2.1 11/22/2019
[1284794.092516] Call Trace:
[1284794.092525]  dump_stack+0x66/0x81
[1284794.092536]  spl_panic+0xd3/0xfb [spl]
[1284794.092623]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[1284794.092692]  ? zap_hashbits+0xa/0x20 [zfs]
[1284794.092759]  ? zap_hash+0x36/0x210 [zfs]
[1284794.092826]  ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284794.092878]  dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284794.092930]  dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284794.092938]  ? tsd_hash_search+0x75/0xa0 [spl]
[1284794.092988]  dmu_send+0x4ba/0x7e0 [zfs]
[1284794.092991]  ? __switch_to+0x115/0x440
[1284794.092995]  ? __switch_to_asm+0x35/0x70
[1284794.092998]  ? _cond_resched+0x15/0x30
[1284794.093002]  ? __kmalloc_node+0x1ea/0x2c0
[1284794.093006]  ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284794.093008]  ? _cond_resched+0x15/0x30
[1284794.093015]  ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284794.093019]  ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284794.093022]  ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284794.093026]  ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284794.093029]  ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284794.093097]  zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284794.093168]  ? dump_bytes_cb+0x20/0x20 [zfs]
[1284794.093237]  zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284794.093309]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284794.093312]  do_vfs_ioctl+0xa4/0x630
[1284794.093316]  ? do_munmap+0x33c/0x430
[1284794.093317]  ksys_ioctl+0x60/0x90
[1284794.093319]  __x64_sys_ioctl+0x16/0x20
[1284794.093323]  do_syscall_64+0x53/0x110
[1284794.093326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284794.093327] RIP: 0033:0x7f9cc7861427
[1284794.093329] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[1284794.093330] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284794.093332] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284794.093333] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284794.093333] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284794.093334] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284794.093335] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340
[1284926.573319] INFO: task zfs:35452 blocked for more than 120 seconds.
[1284926.573345]       Tainted: P           OE     4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284926.573368] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1284926.573391] zfs             D    0 35452  35451 0x80000004
[1284926.573394] Call Trace:
[1284926.573400]  __schedule+0x29f/0x840
[1284926.573404]  schedule+0x28/0x80
[1284926.573414]  spl_panic+0xf9/0xfb [spl]
[1284926.573504]  ? zap_hashbits+0xa/0x20 [zfs]
[1284926.573569]  ? zap_hash+0x36/0x210 [zfs]
[1284926.573635]  ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284926.573686]  dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284926.573737]  dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284926.573745]  ? tsd_hash_search+0x75/0xa0 [spl]
[1284926.573794]  dmu_send+0x4ba/0x7e0 [zfs]
[1284926.573799]  ? __switch_to+0x115/0x440
[1284926.573802]  ? __switch_to_asm+0x35/0x70
[1284926.573805]  ? _cond_resched+0x15/0x30
[1284926.573808]  ? __kmalloc_node+0x1ea/0x2c0
[1284926.573813]  ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284926.573814]  ? _cond_resched+0x15/0x30
[1284926.573821]  ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284926.573825]  ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284926.573828]  ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284926.573831]  ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284926.573834]  ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284926.573899]  zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284926.573967]  ? dump_bytes_cb+0x20/0x20 [zfs]
[1284926.574033]  zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284926.574101]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284926.574105]  do_vfs_ioctl+0xa4/0x630
[1284926.574108]  ? do_munmap+0x33c/0x430
[1284926.574110]  ksys_ioctl+0x60/0x90
[1284926.574112]  __x64_sys_ioctl+0x16/0x20
[1284926.574115]  do_syscall_64+0x53/0x110
[1284926.574118]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284926.574120] RIP: 0033:0x7f9cc7861427
[1284926.574125] Code: Bad RIP value.
[1284926.574126] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284926.574127] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284926.574128] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284926.574129] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284926.574129] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284926.574130] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340
@rincebrain rincebrain added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 27, 2021
@rincebrain
Copy link
Contributor Author

A few data points:

  • The pool was created on 2018-01-01, presumably on a stable 0.7.X (I don't have the relevant logs that far back, and the only features that were explicitly enabled after creation were spacemap_v2 and encryption...), with no features disabled
  • The encrypted filesystem was created on 2021-05-28 with zfs create phantasm/encroot -o encryption=on -o keylocation=file:///workspace/highchurn/enc.key -o keyformat=raw and version 2.0.3-1~bpo10+1 (DEBUG mode) on the pool, not by receiving or anything weird like that
  • feature@encryption had been flipped to enabled a month prior, when the system was running v0.8.6-1~bpo10+1 (DEBUG mode), then a toy filesystem was created and shortly destroyed with -o encryption=on
  • It does the same thing on 1b50749 (unsurprising, since I didn't think anything fundamentally had changed, but just for completeness)
  • It happily functions normally if you do send/recv from a non-DEBUG build to a new pool created on 2.x...
  • ...and still triggers this if you then try sending the received one around on a DEBUG build

If you'd like a pool in a file you can examine that will do this, here you go.

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jul 31, 2022
@stale stale bot closed this as completed Oct 29, 2022
@rincebrain rincebrain reopened this Oct 29, 2022
@rincebrain rincebrain added the Bot: Not Stale Override for the stale bot label Oct 29, 2022
@stale stale bot removed the Status: Stale No recent activity for issue label Oct 29, 2022
@robn
Copy link
Member

robn commented Aug 9, 2023

I hit this yesterday, doing some checks in preparation for a restore from backup.

I've been testing today, and its still present on master. However, I'm almost certain its a spurious ASSERT with --enable-debug, and works fine otherwise (my backups are on zfs-kmod-2.1.99-FreeBSD_gad0a55461, a FreeBSD 14 prerelease, which has all the debug compiled in as a matter of course).

Reproduction from scratch as follows. Create pool, encrypted dataset, put some data in, create a backup stream, and burn it all down. To be clear, it doesn't have to be a filesystem, a proper snapshot does the same thing.

zpool create tank loop0

zfs create \
  -o encryption=aes-256-gcm \
  -o keylocation=file:///tmp/zfskey \
  -o keyformat=passphrase \
    tank/enc

dd if=/dev/random of=/tank/enc/file bs=128k count=8

zfs unmount -a

zfs send -wecL tank/enc > /tmp/stream

zpool destroy tank
rm -f /tmp/zfskey

Then, create a new pool, and try to recieve the stream:

zpool create tank loop0
zfs recv tank/enc < /tmp/stream

zfs list -t all

zfs send -wecL tank/enc > /tmp/stream2

If compiled with --enable-debug, it will crash:

+ zfs send -wecL tank/enc
[    3.788982] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[    3.789085] PANIC at dsl_crypt.c:2442:dsl_crypto_populate_key_nvlist()
...

If not, the stream will be accepted. Sending it back out appears to yield effectively the same stream:

cat /tmp/stream | zstream dump > /tmp/dump
cat /tmp/stream2 | zstream dump > /tmp/dump2

diff -u /tmp/dump /tmp/dump2
--- /tmp/dump	2023-08-09 00:19:41.714426834 +0000
+++ /tmp/dump2	2023-08-09 00:19:41.722426834 +0000
@@ -5,7 +5,7 @@
 	creation_time = 64d2db9d
 	type = 2
 	flags = 0xc
-	toguid = 3d4286753e33efde
+	toguid = 8f983d6834869827
 	fromguid = 0
 	toname = tank/enc@--head--
 	payloadlen = 1028
@@ -34,7 +34,7 @@
 		from_ivset_guid = 0x0
 	(end crypt_keydata)

-END checksum = 204c956ee7cd3/1f568d9cad058282/784e1ceb4822b03d/e382b1aa382b10ce
+END checksum = 204d7bed72106/1f61f3165ce72beb/79dfb23bb8ce9bd/790bfb7e8346e1ee
 SUMMARY:
 	Total DRR_BEGIN records = 1 (1028 bytes)
 	Total DRR_END records = 1 (0 bytes)

So the answer may simply be that this assertion is erroneous, and should be removed. I haven't done the study to have any confidence though, and won't have time to do so for a while.

The workaround appears to be to send a snapshot, not a filesystem. For me at least, this won't be a big deal - this is a backup host carrying snapshots (no key material), so snapshots are a thing I have plenty of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants