Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundled htscodecs build failure on ARM #1450

Closed
jmarshall opened this issue Jun 12, 2022 · 8 comments · Fixed by #1451
Closed

Bundled htscodecs build failure on ARM #1450

jmarshall opened this issue Jun 12, 2022 · 8 comments · Fixed by #1451

Comments

@jmarshall
Copy link
Member

jmarshall commented Jun 12, 2022

On current develop on an ARM platform (specifically aarch64 Debian GNU/Linux 11 (bullseye)):

$ autoreconf -i

$ ./configure
checking for gcc... gcc
checking whether the C compiler works... yes
[…]
checking whether C compiler accepts -mssse3 -mpopcnt -msse4.1... no
checking whether C compiler accepts -mavx2... no
checking whether C compiler accepts -mavx512f... no
[…]
configure: creating ./config.status
config.status: creating config.mk
config.status: creating htslib.pc.tmp
config.status: creating config.h
config.status: linking htscodecs_bundled.mk to htscodecs.mk

$ make
[…]
echo '#define HTS_VERSION_TEXT "1.15.1-36-gfee3bbb"' > version.h
[…]
Updating htscodecs/htscodecs/version.h : #define HTSCODECS_VERSION_TEXT "1.2.2-35-g9cd552e"
[…]
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/arith_dynamic.o htscodecs/htscodecs/arith_dynamic.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/fqzcomp_qual.o htscodecs/htscodecs/fqzcomp_qual.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/htscodecs.o htscodecs/htscodecs/htscodecs.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/pack.o htscodecs/htscodecs/pack.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static4x16pr.o htscodecs/htscodecs/rANS_static4x16pr.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static32x16pr_avx2.o htscodecs/htscodecs/rANS_static32x16pr_avx2.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static32x16pr_avx512.o htscodecs/htscodecs/rANS_static32x16pr_avx512.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static32x16pr_sse4.o htscodecs/htscodecs/rANS_static32x16pr_sse4.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static32x16pr.o htscodecs/htscodecs/rANS_static32x16pr.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rANS_static.o htscodecs/htscodecs/rANS_static.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/rle.o htscodecs/htscodecs/rle.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/tokenise_name3.o htscodecs/htscodecs/tokenise_name3.c
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o htscodecs/htscodecs/utils.o htscodecs/htscodecs/utils.c
[…]
ar -rc libhts.a kfunc.o kstring.o bcf_sr_sort.o bgzf.o errmod.o faidx.o header.o hfile.o hts.o hts_expr.o hts_os.o md5.o multipart.o probaln.o realn.o regidx.o region.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o textutils.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_stats.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/string_alloc.o htscodecs/htscodecs/arith_dynamic.o htscodecs/htscodecs/fqzcomp_qual.o htscodecs/htscodecs/htscodecs.o htscodecs/htscodecs/pack.o htscodecs/htscodecs/rANS_static4x16pr.o htscodecs/htscodecs/rANS_static32x16pr_avx2.o htscodecs/htscodecs/rANS_static32x16pr_avx512.o htscodecs/htscodecs/rANS_static32x16pr_sse4.o htscodecs/htscodecs/rANS_static32x16pr.o htscodecs/htscodecs/rANS_static.o htscodecs/htscodecs/rle.o htscodecs/htscodecs/tokenise_name3.o htscodecs/htscodecs/utils.o  hfile_libcurl.o hfile_gcs.o
ranlib libhts.a
[…]
gcc -Wall -g -O2 -fvisibility=hidden  -I.  -c -o bgzip.o bgzip.c
gcc -fvisibility=hidden  -o bgzip bgzip.o libhts.a -llzma -lbz2 -lz -lm  -lcurl -lpthread
/usr/bin/ld: libhts.a(rANS_static4x16pr.o): in function `rans_enc_func':
…/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1020: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1020: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O1_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1016: undefined reference to `rans_compress_O1_32x16_neon'
/usr/bin/ld: libhts.a(rANS_static4x16pr.o): in function `rans_dec_func':
…/rANS_static4x16pr.c:1039: undefined reference to `rans_uncompress_O1_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1039: undefined reference to `rans_uncompress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1039: undefined reference to `rans_uncompress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1039: undefined reference to `rans_uncompress_O1_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1043: undefined reference to `rans_uncompress_O0_32x16_neon'
/usr/bin/ld: …/rANS_static4x16pr.c:1043: undefined reference to `rans_uncompress_O0_32x16_neon'
collect2: error: ld returned 1 exit status
make: *** [Makefile:479: bgzip] Error 1

Note that rANS_static32x16pr_avx2.o et al are built and included in libhts.a but rANS_static32x16pr_neon.o is not. It would appear that htscodecs_bundled.mk needs to be able to be configured as appropriate for the host platform.

Note also that the configure script probes unnecessarily for Intel compiler target options, but does not probe for any ARM-related target options (if any are indeed needed). So the configure script could also usefully check $basic_host etc and do this target probing as appropriate for the target host.

@jkbonfield
Copy link
Contributor

Building the avx2 etc on ARM isn't a problem as they have #ifdef guards and basically boil down to empty files. The htscodecs build system itself does this and it works fine.

What's not ideal is the lack of any ARM specific CI in htslib when we have ARM specific code being used. This was added to htscodecs itself, but htslib uses its own build system and I didn't spot this as a problem when reviewing things. We really need to improve Htslib's cirrus.ci here. It's easy enough just to copy over the config from htscodecs.

@jmarshall
Copy link
Member Author

jmarshall commented Jun 13, 2022

Even building the #ifdefed-out avx2 etc translation units unnecessarily doesn't work completely fine:

ar -rc libhts.a kfunc.o kstring.o bcf_sr_sort.o bgzf.o errmod.o faidx.o header.o hfile.o hts.o hts_expr.o hts_os.o md5.o multipart.o probaln.o realn.o regidx.o region.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o textutils.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_stats.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/string_alloc.o htscodecs/htscodecs/arith_dynamic.o htscodecs/htscodecs/fqzcomp_qual.o htscodecs/htscodecs/htscodecs.o htscodecs/htscodecs/pack.o htscodecs/htscodecs/rANS_static4x16pr.o htscodecs/htscodecs/rANS_static32x16pr_avx2.o htscodecs/htscodecs/rANS_static32x16pr_avx512.o htscodecs/htscodecs/rANS_static32x16pr_sse4.o htscodecs/htscodecs/rANS_static32x16pr.o htscodecs/htscodecs/rANS_static.o htscodecs/htscodecs/rle.o htscodecs/htscodecs/tokenise_name3.o htscodecs/htscodecs/utils.o  plugin.o
ranlib: file: libhts.a(rANS_static32x16pr_avx2.o) has no symbols
ranlib: file: libhts.a(rANS_static32x16pr_avx512.o) has no symbols
ranlib: file: libhts.a(rANS_static32x16pr_sse4.o) has no symbols
ranlib libhts.a
ranlib: file: libhts.a(rANS_static32x16pr_avx2.o) has no symbols
ranlib: file: libhts.a(rANS_static32x16pr_avx512.o) has no symbols
ranlib: file: libhts.a(rANS_static32x16pr_sse4.o) has no symbols

I don't know if there are platforms on which this would cause ar or ranlib to fail. However ideally the platform-dependent parts of HTSCODECS_OBJS would be built up by the AX_CHECK_COMPILE_FLAG stanzas in configure.ac.

@jkbonfield
Copy link
Contributor

I did think about that, but this seemed like the far easier solution given all the platforms we've tried it on so far work just fine. Cross that bridge when (if!) we ever come to it.

@jmarshall
Copy link
Member Author

It's so easy to do, it might as well be done right and forestall any bug reports about the warnings.

Anyway, the point of this issue is that a bundled htscodecs build currently does actually fail on ARM due to rANS_static32x16pr_neon.o not being built at all. (And to fix that, one might as well do the right thing anyway.)

@daviesrob
Copy link
Member

A simple solution to the ranlib warnings would be to ensure that the translation units always include at least one symbol. Which could just be a string saying that avx2 or whatever wasn't included.

jmarshall added a commit to jmarshall/htslib that referenced this issue Jun 13, 2022
Add test compilations to detect ARM Neon support to configure.ac and
hts_probe_cc.sh.

If compiler support is present, add rANS_static32x16pr_neon.c to
$(HTSCODECS_SOURCES) in htscodecs_bundled.mk. Fixes samtools#1450.

In htscodecs_bundled.mk, only add rANS_static32x16pr_avx2.c et al
to $(HTSCODECS_SOURCES) if the respective AVX2, AVX512, SSE4 support
is present. As building these files already uses GNU Make-specific
constructs and the $(HTS_CFLAGS_AVX2) variables are either empty or an
option string, this is easily achieved via `$(if $(HTS_CFLAGS_AVX2),...)`.

There is no compiler flag required for Neon, so invent HTS_HAVE_NEON
and use it to control building rANS_static32x16pr_neon.c without adding
any bespoke compilation options for it.
@jmarshall
Copy link
Member Author

Fixed by #1451; as building these SIMD files already uses GNU Make-specific constructs, omitting the unneeded translation units is even more trivial than it would otherwise be.

daviesrob pushed a commit that referenced this issue Jun 13, 2022
Add test compilations to detect ARM Neon support to configure.ac and
hts_probe_cc.sh.

If compiler support is present, add rANS_static32x16pr_neon.c to
$(HTSCODECS_SOURCES) in htscodecs_bundled.mk. Fixes #1450.

In htscodecs_bundled.mk, only add rANS_static32x16pr_avx2.c et al
to $(HTSCODECS_SOURCES) if the respective AVX2, AVX512, SSE4 support
is present. As building these files already uses GNU Make-specific
constructs and the $(HTS_CFLAGS_AVX2) variables are either empty or an
option string, this is easily achieved via `$(if $(HTS_CFLAGS_AVX2),...)`.

There is no compiler flag required for Neon, so invent HTS_HAVE_NEON
and use it to control building rANS_static32x16pr_neon.c without adding
any bespoke compilation options for it.
@daviesrob
Copy link
Member

Thanks for the fix. Hopefully builds on new Macs will work a bit better now.

@jmarshall
Copy link
Member Author

Or, in my case, a build on a Raspberry Pi…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants