Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect ARM Neon support and only build appropriate SIMD object files #1451

Merged
merged 2 commits into from
Jun 13, 2022

Conversation

jmarshall
Copy link
Member

Add test compilations to detect ARM Neon support to configure.ac and hts_probe_cc.sh. If compiler support is present, add rANS_static32x16pr_neon.c to $(HTSCODECS_SOURCES) in htscodecs_bundled.mk. Fixes #1450.

Similarly only add rANS_static32x16pr_avx2.c et al to $(HTSCODECS_SOURCES) if the respective AVX2, AVX512, SSE4 support is present. As building these files already uses GNU Make-specific constructs (namely target-specific variables in Makefile) and the $(HTS_CFLAGS_AVX2) Make variables already exist and are either empty or a non-empty option string, this is easily achieved via $(if $(HTS_CFLAGS_AVX2),...).

There is no compiler flag required for Neon, so invent HTS_HAVE_NEON and use it to control building rANS_static32x16pr_neon.c without adding any bespoke compilation options for it.

Add test compilations to detect ARM Neon support to configure.ac and
hts_probe_cc.sh.

If compiler support is present, add rANS_static32x16pr_neon.c to
$(HTSCODECS_SOURCES) in htscodecs_bundled.mk. Fixes samtools#1450.

In htscodecs_bundled.mk, only add rANS_static32x16pr_avx2.c et al
to $(HTSCODECS_SOURCES) if the respective AVX2, AVX512, SSE4 support
is present. As building these files already uses GNU Make-specific
constructs and the $(HTS_CFLAGS_AVX2) variables are either empty or an
option string, this is easily achieved via `$(if $(HTS_CFLAGS_AVX2),...)`.

There is no compiler flag required for Neon, so invent HTS_HAVE_NEON
and use it to control building rANS_static32x16pr_neon.c without adding
any bespoke compilation options for it.
@daviesrob
Copy link
Member

Well, that worked better than my effort, which discovered that empty translation units are not allowed in pedantic mode. Would you mind if I pushed a commit here to add an ARM CI test?

@jmarshall
Copy link
Member Author

No worries, go ahead.

@daviesrob
Copy link
Member

Cheers. I will once I've tuned it up suitably. Sadly going for Address Sanitizer looks like it was a step too far. It seems to be really slow on arm.

Tries to be as strict as possible.  Unfortunately Address Sanitizer
appears to be very slow on this platform at the moment, so has
been left out for now.  It would be good to add it later should
its performance improve.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bundled htscodecs build failure on ARM
2 participants