Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS 2.0.4 build issue on Linux/sparc64: __bswapdi2 and __bswapsi2 missing #12008

Closed
koachan opened this issue May 6, 2021 · 10 comments
Closed
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@koachan
Copy link

koachan commented May 6, 2021

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version N/A
Linux Kernel 5.9.15
Architecture sparc64
ZFS Version 2.0.4
SPL Version 2.0.4

Describe the problem you're observing

ZFS codebase includes some byteswapping routines (mainly, in zstd code) that are implemented as __builtin_bswap32/__builtin_bswap64, which then gets compiled into __bswapsi2 and __bswapdi2 libgcc calls. Those are missing when compiling as a kernel module, so I'm getting errors like these:

ERROR: modpost: "__bswapdi2" [/zfs/module/zstd/zzstd.ko] undefined!
ERROR: modpost: "__bswapsi2" [/zfs/module/zstd/zzstd.ko] undefined!

I decided to file it here since it only happens when zstd is built as part of ZFS, but please let me know if I should file this to zstd people instead :)

Describe how to reproduce the problem

Build ZFS as usual on a sparc64 machine:

$ git clone https://github.com/openzfs/zfs && cd zfs
$ git checkout zfs-2.0.4
$ ./autogen.sh && ./configure && make

Include any warning/errors/backtraces from the system logs

  MODPOST /zfs/module/Module.symvers
ERROR: modpost: "__bswapdi2" [/zfs/module/zstd/zzstd.ko] undefined!
ERROR: modpost: "__bswapsi2" [/zfs/module/zstd/zzstd.ko] undefined!
make[4]: *** [scripts/Makefile.modpost:111: /zfs/module/Module.symvers] Error 1
make[4]: *** Deleting file '/zfs/module/Module.symvers'
make[3]: *** [Makefile:1714: modules] Error 2
make[3]: Leaving directory '/usr/src/linux-5.10.27-gentoo'
make[2]: *** [Makefile:48: modules-Linux] Error 2
make[2]: Leaving directory '/zfs/module'
make[1]: *** [Makefile:883: all-recursive] Error 1
make[1]: Leaving directory '/zfs'
make: *** [Makefile:744: all] Error 2
@koachan koachan added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 6, 2021
@koachan
Copy link
Author

koachan commented May 7, 2021

Oh, just adding that latest master commit (610cb4f) still suffers from the same issue.

@rincebrain
Copy link
Contributor

rincebrain commented May 10, 2021

So, I fixed the immediate issue of those missing functions using an implementation from elsewhere in the codebase, but while the resulting module loads and passes almost every test I put it through, it fails on the minor issue that attempting to import it on one of my x86_64 systems results in errors reading many of the files. (These files read fine if the pool is then imported again on the originating sparc64 system.)

So I'm suspicious that something in the ZFS glue code might be doing something endian-unsafe. (I'm reasonably confident the byteswap implementation is not the problem, having tried both the version from elsewhere in the codebase and the fallback version in zstd's own codebase, and gotten the same results.)

@ikozhukhov
Copy link
Contributor

i have fixed build on sparc64 for DilOS by commit: https://bitbucket.org/dilos/dilos-illumos/commits/0978372eb920c486b41b0a8d7d80f022a6774da8
i have tested on sun4v T5220 and sun4u sunfire V490 - all works as well with zstd.
but DilOS is illumos based platform.

@rincebrain
Copy link
Contributor

@ikozhukhov have you tried importing any pools with zstd-compressed files written from the sun4[uv] systems on an x86 system, and reading those files?

I tried overriding the definitions of clz[ds]i2 like you did just in case, and the resulting module has the same pathological behavior.

@ikozhukhov
Copy link
Contributor

zstd not working between SPARC <> Intel, but LZ4 and GZIP are both fine:

root@sparc9:/var/tmp# uname -a
SunOS sparc9 5.11 2.0.2.85 sun4u sparc SUNW,Sun-Fire-V490

create a pool on a sparse file:
root@sparc9:/var/tmp# mkdir -p sparc
root@sparc9:/var/tmp# cd sparc
root@sparc9:/var/tmp/sparc# truncate -s 1g disk01.data
root@sparc9:/var/tmp/sparc# zpool create -o ashift=12 -o cachefile=none tpool $PWD/disk01.data
root@sparc9:/var/tmp/sparc# zpool list tpool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tpool   960M   372K   960M        -         -     0%     0%  1.00x    ONLINE  -

Create some datasets with different compression algorithms:
root@sparc9:/var/tmp/sparc# zfs create -o compression=zstd tpool/zstd-fs
root@sparc9:/var/tmp/sparc# zfs create -o compression=gzip tpool/gzip-fs
root@sparc9:/var/tmp/sparc# zfs create -o compression=lz4 tpool/lz4-fs

Copy a text file to every dataset:
root@sparc9:/var/tmp/sparc# cp ../os-upgrade.log /tpool/zstd-fs
root@sparc9:/var/tmp/sparc# cp ../os-upgrade.log /tpool/gzip-fs
root@sparc9:/var/tmp/sparc# cp ../os-upgrade.log /tpool/lz4-fs

Compute MD5 of the files:
root@sparc9:/var/tmp/sparc# md5sum /tpool/*-fs/os-upgrade.log
2070688ec9252c92a5aa5a92b109183b  /tpool/gzip-fs/os-upgrade.log
2070688ec9252c92a5aa5a92b109183b  /tpool/lz4-fs/os-upgrade.log
2070688ec9252c92a5aa5a92b109183b  /tpool/zstd-fs/os-upgrade.log

export the pool, pack the sparse file and send it to an intel host system:
root@sparc9:/var/tmp/sparc# zpool export tpool
root@sparc9:/var/tmp/sparc tar cvSf tpool.tar disk01.data
disk01.data
root@sparc9:/var/tmp/sparc# bzip2 -9 tpool.tar

/* sending tpool.tar.bz2 to intel */

root@lenovo:/var/tmp/sparc# uname -a
SunOS lenovo 5.11 2.0.2.85 i86pc i386 i86pc
root@lenovo:/var/tmp/sparc# ls -l
total 681
-rw-r--r-- 1 root root 1073741824 May 10 17:36 disk01.data

import a pool from the file:
root@lenovo:/var/tmp/sparc# zpool import -o cachefile=none -d $PWD tpool
root@lenovo:/var/tmp/sparc# zpool list tpool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tpool   960M  1.04M   959M        -         -     0%     0%  1.00x    ONLINE  -
root@lenovo:/var/tmp/sparc# zfs list -r tpool
NAME            USED  AVAIL     REFER  MOUNTPOINT
tpool           948K   831M      112K  /tpool
tpool/gzip-fs   124K   831M      124K  /tpool/gzip-fs
tpool/lz4-fs    132K   831M      132K  /tpool/lz4-fs
tpool/zstd-fs   124K   831M      124K  /tpool/zstd-fs

try to compute MD5:
root@lenovo:/var/tmp/sparc# md5sum /tpool/*-fs/os-upgrade.log
2070688ec9252c92a5aa5a92b109183b  /tpool/gzip-fs/os-upgrade.log
2070688ec9252c92a5aa5a92b109183b  /tpool/lz4-fs/os-upgrade.log
md5sum: /tpool/zstd-fs/os-upgrade.log: I/O error
root@lenovo:/var/tmp/sparc# zpool export tpool
root@lenovo:/var/tmp/sparc# cd ..
root@lenovo:/var/tmp# mkdir -p intel
root@lenovo:/var/tmp# cd intel

create a pool on the intel host:
root@lenovo:/var/tmp/intel# truncate -s 1g disk02.data
root@lenovo:/var/tmp/intel# zpool create -o cachefile=none -o ashift=12 ipool $POOL/disk02.data

create exactly the same datasets:
root@lenovo:/var/tmp/intel# zfs create -o compression=gzip ipool/gzip-fs
root@lenovo:/var/tmp/intel# zfs create -o compression=lz4 ipool/lz4-fs
root@lenovo:/var/tmp/intel# zfs create -o compression=zstd ipool/zstd-fs

and copy a text file to the datasets:
root@lenovo:/var/tmp/intel# cp ../os-upgrade.log /ipool/zstd-fs
root@lenovo:/var/tmp/intel# cp ../os-upgrade.log /ipool/gzip-fs
root@lenovo:/var/tmp/intel# cp ../os-upgrade.log /ipool/lz4-fs

compute MD5 of the files:
root@lenovo:/var/tmp/intel# md5sum /ipool/*-fs/os-upgrade.log
41301166a8f6f7e6a1293a0c15f453e9  /ipool/gzip-fs/os-upgrade.log
41301166a8f6f7e6a1293a0c15f453e9  /ipool/lz4-fs/os-upgrade.log
41301166a8f6f7e6a1293a0c15f453e9  /ipool/zstd-fs/os-upgrade.log

export the pool, pack the sparse file and send it to the sparc host system:
root@lenovo:/var/tmp/intel# zpool export ipool
root@lenovo:/var/tmp/intel# tar cvSf ipool.tar disk02.data 
disk02.data
root@lenovo:/var/tmp/intel# bzip2 -9 ipool.tar 
root@lenovo:/var/tmp/intel# ls -l
total 8757
-rw-r--r-- 1 root root 1073741824 May 10 17:39 disk02.data
-rw-r--r-- 1 root root    3964180 May 10 17:39 ipool.tar.bz2

/* sending ipool.tar.bz2 to sparc */

root@sparc9:/var/tmp# mkdir -p intel
root@sparc9:/var/tmp# cd intel
root@sparc9:/var/tmp/intel# ls -l
total 4700
-rw-r--r-- 1 root root 1073741824 May 10 17:39 disk02.data
root@sparc9:/var/tmp/intel# zpool import -o cachefile=none -d $PWD ipool   
root@sparc9:/var/tmp/intel# zpool list ipool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
ipool   960M  6.03M   954M        -         -     0%     0%  1.00x    ONLINE  -

try to compute MD5 of the files:
root@sparc9:/var/tmp/intel# md5sum /ipool/*-fs/os-upgrade.log
41301166a8f6f7e6a1293a0c15f453e9  /ipool/gzip-fs/os-upgrade.log
41301166a8f6f7e6a1293a0c15f453e9  /ipool/lz4-fs/os-upgrade.log
md5sum: /ipool/zstd-fs/os-upgrade.log: I/O error
root@sparc9:/var/tmp/intel# zpool export ipool

@rincebrain
Copy link
Contributor

Unfortunately, that's consistent with what I'm seeing.

So something is rotten here...

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture and removed Status: Triage Needed New issue which needs to be triaged labels May 10, 2021
@rincebrain
Copy link
Contributor

Courtesy of some help from Allan Jude...

It seems like the header information for the blocks are getting (endian?)mangled. Normally, zdb -Zddddddbbbbbb [dataset] [objid] would say
[blahblah] ZSTD:size=67387:version=10405:level=5:NORMAL

When reading a file generated on x86_64 on sparc64, you see:
[blahblah] ZSTD:size=14916:version=2663683:level=0:NORMAL

(It seems important to note here that 10405 = 0x28a5, 2663683 = 0x28a503, and the compression level+version are stored in a 32bit struct together as 24b+8b .)

(don't read anything into the size being different, these weren't the same object).

Still poking around to see what's wrong - the endianness macros don't appear to obviously be broken, at least on the sparc64 side...

@rincebrain
Copy link
Contributor

rincebrain commented May 11, 2021

Okay, I tentatively have a patch that fixes this. I'm going to do a bunch more testing and refinement (I haven't tried compiling it on anything except sparc64, for example...), but it does allow my sparc64 system to interoperably use zstd on a pool with an x86_64 machine.

(Bonus fact: this also affected ppc64 systems, and I am presuming all big endian systems - the non-interoperable zstd, that is, ppc64 doesn't have the missing symbols this issue was opened about.)

rincebrain added a commit to rincebrain/zfs that referenced this issue May 12, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd implementation
is quite nonportable, and results in various configurations of ondisk header
that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64, and one written by current FBSD/ppc64.

(I'll add one for Linux/sparc64 once the unmodified git checkout finishes
building. Linux/ppc64 behaves identically to Linux/sparc64.)

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue May 12, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd implementation
is quite nonportable, and results in various configurations of ondisk header
that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64, and one written by current FBSD/ppc64.

(I'll add one for Linux/sparc64 once the unmodified git checkout finishes
building. Linux/ppc64 behaves identically to Linux/sparc64.)

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue May 12, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64, and one written by current FBSD/ppc64.

(I'll add one for Linux/sparc64 once the unmodified git checkout
finishes building. Linux/ppc64 behaves identically to Linux/sparc64.)

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue May 12, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64, one for Linux/sparc64, and one written by FBSD/ppc64.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue May 12, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64, one for Linux/sparc64, and one written by FBSD/ppc64.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue May 19, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@yhaenggi
Copy link

yhaenggi commented Aug 8, 2021

Whats the current status for this? Im affected as well on sparc64.

@rincebrain
Copy link
Contributor

I made #12022, if you have a burning need you could either run that or, if you absolutely don't plan to ever use zstd and don't want to run a large PR before it merges, just pick out the relevant changes for this bug from it.

I would suggest asking in the PR, because AFAIK, there are no technical complaints outstanding.

rincebrain added a commit to rincebrain/zfs that referenced this issue Aug 28, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Fixes: openzfs#12008

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 15, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12008
Closes openzfs#12022
rincebrain added a commit to rincebrain/zfs that referenced this issue Sep 17, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
jwk404 pushed a commit that referenced this issue Sep 20, 2021
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030
rincebrain added a commit to rincebrain/zfs that referenced this issue Sep 22, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12008
Closes openzfs#12022
rincebrain added a commit to rincebrain/zfs that referenced this issue Sep 22, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12030
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 2, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12008
Closes openzfs#12022
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 2, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12030
tonyhutter pushed a commit that referenced this issue Nov 1, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
tonyhutter pushed a commit that referenced this issue Nov 1, 2021
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12008
Closes openzfs#12022
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
As detailed in openzfs#12022 and openzfs#12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12030
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants