Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal Instruction vinserti128 in set_host_implementation #371

Closed
mkitti opened this issue Jun 29, 2023 · 11 comments
Closed

Illegal Instruction vinserti128 in set_host_implementation #371

mkitti opened this issue Jun 29, 2023 · 11 comments

Comments

@mkitti
Copy link
Contributor

mkitti commented Jun 29, 2023

When using c-blosc from Julia, I encountered the following problem with c-blosc 1.21.4. I do not encounter the issue with c-blosc 1.21.2.

Thread 1 "julia" received signal SIGILL, Illegal instruction.
0x00007fffbd40cadc in set_host_implementation () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
(gdb) bt
#0  0x00007fffbd40cadc in set_host_implementation () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
#1  0x00007ffff7c99f68 in __pthread_once_slow (once_control=0x7fffbd60f5e8 <implementation_initialized>, init_routine=0x7fffbd40c7a0 <set_host_implementation>) at ./nptl/pthread_once.c:116
#2  0x00007fffbd40cb79 in blosc_internal_shuffle () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
#3  0x00007fffbd401f77 in blosc_c () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
#4  0x00007fffbd403a31 in do_job () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
#5  0x00007fffbd404c80 in blosc_compress () from /home/mkitti/.julia/artifacts/b1d485d780339a5f9a3edac1f97a961bf359ea5f/lib/libblosc.so
#6  0x00007fffe04bfd3f in blosc_compress () at /home/mkitti/.julia/packages/Blosc/jk4Np/src/Blosc.jl:36
#7  julia_blosc_filter_33 () at /home/mkitti/.julia/dev/HDF5/filters/H5Zblosc/src/H5Zblosc.jl:109

This is due to the instruction vinserti128 according to gdb's layout asm.

vinserti128 is an AVX2 instruction.
https://www.felixcloutier.com/x86/vinserti128:vinserti32x4:vinserti64x2:vinserti32x8:vinserti64x4

I encountered this on an old machine using an AMD FX-8350 processor which lacks AVX2:

$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8350 Eight-Core Processor
stepping	: 0
microcode	: 0x6000852
cpu MHz		: 1517.134
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 16
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs		: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed
bogomips	: 8669.48
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
@mkitti
Copy link
Contributor Author

mkitti commented Jun 29, 2023

AVX2 is detected as not available ...

$ BLOSC_PRINT_SHUFFLE_ACCEL=1 julia --project=. test.jl
Shuffle CPU Information:
SSE2 available: True
SSE3 available: True
SSSE3 available: True
SSE4.1 available: True
SSE4.2 available: True
AVX2 available: False
AVX512BW available: False
XSAVE available: True
XSAVE enabled: True
XMM state enabled: True
YMM state enabled: True
ZMM state enabled: False
Invalid instruction at 0x7f23ff40cadc: 0xc4, 0xe3, 0x75, 0x38, 0xc0, 0x01, 0xc5, 0xfa, 0x7f, 0x05, 0xde, 0x2a, 0x20, 0x00, 0xc4

mkitti added a commit to mkitti/HDF5.jl that referenced this issue Jun 29, 2023
Also expand definitions for dataspace with two positional arguments.
Pin Blosc_jll to 1.21.2 if AVX2 is not detected. See Blosc/c-blosc#371
@FrancescAlted
Copy link
Member

Thanks for the heads up. I think this should be a packaging problem introduced in eb981b7, that was merged in between 1.21.2 and 1.21.4 . Do you agree @t20100 ?

@t20100
Copy link
Contributor

t20100 commented Jun 29, 2023

Hi, yes looking at the diff, it's most likely related to PR #352.

PR #352 is a fix for PR #347 which was included in c-blosc v1.21.2 and PR #347 makes some changes to the way AVX2 related macros are used (and was disabling AVX2... thus the fix in PR #352).
From a look at the code, it's possible that the issue at hand was there before v1.21.2, was hidden by issue introduced in PR #347 which was fixed by PR #347.

@mkitti did you or could you check with a version of c-blosc<v1.21.2?

@mkitti
Copy link
Contributor Author

mkitti commented Jun 29, 2023

1.21.1 and 1.21.0 both work fine.

@t20100
Copy link
Contributor

t20100 commented Jun 29, 2023

Thanks for the testing!
So, compiling shuffle.c with __AVX2__ and __SSE2__ looks to be the problem. I don't know exactly why however.

In #347 (comment) I proposed an alternative to modifying shuffle.c and compiling it with __AVX2__ flags to support universal2 build that was rejected at the time:

An alternative could be to define blosc_internal_avx2 and blosc_internal_sse2 functions even when AVX2/SSE2 are not defined with a body calling e.g., abort().

@FrancescAlted if you think it is a good solution, I can propose a PR.

@mkitti
Copy link
Contributor Author

mkitti commented Jun 29, 2023

This is from objdump -d from 1.21.4:

000000000000c7a0 <set_host_implementation>:
    c7a0:	55                   	push   %rbp
    c7a1:	31 f6                	xor    %esi,%esi
    c7a3:	89 f0                	mov    %esi,%eax
    c7a5:	89 f1                	mov    %esi,%ecx
    c7a7:	48 89 e5             	mov    %rsp,%rbp
    c7aa:	41 57                	push   %r15
    c7ac:	41 56                	push   %r14
    c7ae:	41 55                	push   %r13
    c7b0:	41 54                	push   %r12
    c7b2:	53                   	push   %rbx
    c7b3:	48 83 e4 e0          	and    $0xffffffffffffffe0,%rsp
    c7b7:	48 83 ec 20          	sub    $0x20,%rsp
    c7bb:	0f a2                	cpuid  
    c7bd:	89 c7                	mov    %eax,%edi
    c7bf:	89 f1                	mov    %esi,%ecx
    c7c1:	b8 01 00 00 00       	mov    $0x1,%eax
    c7c6:	0f a2                	cpuid  
    c7c8:	89 c8                	mov    %ecx,%eax
    c7ca:	c1 ea 1a             	shr    $0x1a,%edx
    c7cd:	83 e0 01             	and    $0x1,%eax
    c7d0:	41 89 d7             	mov    %edx,%r15d
    c7d3:	88 44 24 15          	mov    %al,0x15(%rsp)
    c7d7:	89 c8                	mov    %ecx,%eax
    c7d9:	41 83 e7 01          	and    $0x1,%r15d
    c7dd:	c1 e8 09             	shr    $0x9,%eax
    c7e0:	83 e0 01             	and    $0x1,%eax
    c7e3:	88 44 24 14          	mov    %al,0x14(%rsp)
    c7e7:	89 c8                	mov    %ecx,%eax
    c7e9:	c1 e8 13             	shr    $0x13,%eax
    c7ec:	83 e0 01             	and    $0x1,%eax
    c7ef:	88 44 24 13          	mov    %al,0x13(%rsp)
    c7f3:	89 c8                	mov    %ecx,%eax
    c7f5:	c1 e8 14             	shr    $0x14,%eax
    c7f8:	83 e0 01             	and    $0x1,%eax
    c7fb:	88 44 24 12          	mov    %al,0x12(%rsp)
    c7ff:	89 c8                	mov    %ecx,%eax
    c801:	c1 e9 1b             	shr    $0x1b,%ecx
    c804:	c1 e8 1a             	shr    $0x1a,%eax
    c807:	83 e0 01             	and    $0x1,%eax
    c80a:	88 44 24 18          	mov    %al,0x18(%rsp)
    c80e:	89 c8                	mov    %ecx,%eax
    c810:	83 e0 01             	and    $0x1,%eax
    c813:	83 ff 06             	cmp    $0x6,%edi
    c816:	88 44 24 17          	mov    %al,0x17(%rsp)
    c81a:	0f 8e f0 02 00 00    	jle    cb10 <set_host_implementation+0x370>
    c820:	b8 07 00 00 00       	mov    $0x7,%eax
    c825:	89 f1                	mov    %esi,%ecx
    c827:	0f a2                	cpuid  
    c829:	41 89 de             	mov    %ebx,%r14d
    c82c:	c1 eb 1e             	shr    $0x1e,%ebx
    c82f:	41 c1 ee 05          	shr    $0x5,%r14d
    c833:	83 e3 01             	and    $0x1,%ebx
    c836:	41 83 e6 01          	and    $0x1,%r14d
    c83a:	0f b6 44 24 18       	movzbl 0x18(%rsp),%eax
    c83f:	22 44 24 17          	and    0x17(%rsp),%al
    c843:	88 44 24 16          	mov    %al,0x16(%rsp)
    c847:	74 74                	je     c8bd <set_host_implementation+0x11d>
    c849:	0f b6 44 24 15       	movzbl 0x15(%rsp),%eax
    c84e:	44 89 f2             	mov    %r14d,%edx
    c851:	09 da                	or     %ebx,%edx
    c853:	44 09 f8             	or     %r15d,%eax
    c856:	0a 44 24 14          	or     0x14(%rsp),%al
    c85a:	0a 44 24 13          	or     0x13(%rsp),%al
    c85e:	0a 44 24 12          	or     0x12(%rsp),%al
    c862:	0f b6 c0             	movzbl %al,%eax
    c865:	09 c2                	or     %eax,%edx
    c867:	74 4f                	je     c8b8 <set_host_implementation+0x118>
    c869:	31 c9                	xor    %ecx,%ecx
    c86b:	0f 01 d0             	xgetbv 
    c86e:	48 c1 e2 20          	shl    $0x20,%rdx
    c872:	89 c0                	mov    %eax,%eax
    c874:	48 09 c2             	or     %rax,%rdx
    c877:	48 89 d0             	mov    %rdx,%rax
    c87a:	48 89 d1             	mov    %rdx,%rcx
    c87d:	83 e2 70             	and    $0x70,%edx
    c880:	48 d1 e8             	shr    %rax
    c883:	48 c1 e9 02          	shr    $0x2,%rcx
    c887:	83 e0 01             	and    $0x1,%eax
    c88a:	83 e1 01             	and    $0x1,%ecx
    c88d:	0f b6 f8             	movzbl %al,%edi
    c890:	89 7c 24 0c          	mov    %edi,0xc(%rsp)
    c894:	0f b6 f9             	movzbl %cl,%edi
    c897:	89 7c 24 08          	mov    %edi,0x8(%rsp)
    c89b:	31 ff                	xor    %edi,%edi
    c89d:	48 83 fa 70          	cmp    $0x70,%rdx
    c8a1:	40 0f 94 c7          	sete   %dil
    c8a5:	21 c1                	and    %eax,%ecx
    c8a7:	89 7c 24 04          	mov    %edi,0x4(%rsp)
    c8ab:	88 4c 24 16          	mov    %cl,0x16(%rsp)
    c8af:	eb 24                	jmp    c8d5 <set_host_implementation+0x135>
    c8b1:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
    c8b8:	c6 44 24 16 00       	movb   $0x0,0x16(%rsp)
    c8bd:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%rsp)
    c8c4:	00 
    c8c5:	c7 44 24 08 00 00 00 	movl   $0x0,0x8(%rsp)
    c8cc:	00 
    c8cd:	c7 44 24 0c 00 00 00 	movl   $0x0,0xc(%rsp)
    c8d4:	00 
    c8d5:	48 8d 3d 61 0a 00 00 	lea    0xa61(%rip),%rdi        # d33d <_fini+0x595>
    c8dc:	e8 1f 4d ff ff       	call   1600 <getenv@plt>
    c8e1:	48 85 c0             	test   %rax,%rax
    c8e4:	0f 84 4e 01 00 00    	je     ca38 <set_host_implementation+0x298>
    c8ea:	4c 8d 2d 34 0a 00 00 	lea    0xa34(%rip),%r13        # d325 <_fini+0x57d>
    c8f1:	48 8d 3d 5f 0a 00 00 	lea    0xa5f(%rip),%rdi        # d357 <_fini+0x5af>
    c8f8:	4c 8d 25 21 0a 00 00 	lea    0xa21(%rip),%r12        # d320 <_fini+0x578>
    c8ff:	e8 5c 4d ff ff       	call   1660 <puts@plt>
    c904:	45 84 ff             	test   %r15b,%r15b
    c907:	48 8d 3d 62 0a 00 00 	lea    0xa62(%rip),%rdi        # d370 <_fini+0x5c8>
    c90e:	4c 89 ee             	mov    %r13,%rsi
    c911:	49 0f 45 f4          	cmovne %r12,%rsi
    c915:	31 c0                	xor    %eax,%eax
    c917:	e8 a4 4d ff ff       	call   16c0 <printf@plt>
    c91c:	80 7c 24 15 00       	cmpb   $0x0,0x15(%rsp)
    c921:	48 8d 3d 5d 0a 00 00 	lea    0xa5d(%rip),%rdi        # d385 <_fini+0x5dd>
    c928:	4c 89 e6             	mov    %r12,%rsi
    c92b:	49 0f 44 f5          	cmove  %r13,%rsi
    c92f:	31 c0                	xor    %eax,%eax
    c931:	e8 8a 4d ff ff       	call   16c0 <printf@plt>
    c936:	80 7c 24 14 00       	cmpb   $0x0,0x14(%rsp)
    c93b:	48 8d 3d 42 0a 00 00 	lea    0xa42(%rip),%rdi        # d384 <_fini+0x5dc>
    c942:	4c 89 e6             	mov    %r12,%rsi
    c945:	49 0f 44 f5          	cmove  %r13,%rsi
    c949:	31 c0                	xor    %eax,%eax
    c94b:	e8 70 4d ff ff       	call   16c0 <printf@plt>
    c950:	80 7c 24 13 00       	cmpb   $0x0,0x13(%rsp)
    c955:	48 8d 3d 3d 0a 00 00 	lea    0xa3d(%rip),%rdi        # d399 <_fini+0x5f1>
    c95c:	4c 89 e6             	mov    %r12,%rsi
    c95f:	49 0f 44 f5          	cmove  %r13,%rsi
    c963:	31 c0                	xor    %eax,%eax
    c965:	e8 56 4d ff ff       	call   16c0 <printf@plt>
    c96a:	80 7c 24 12 00       	cmpb   $0x0,0x12(%rsp)
    c96f:	48 8d 3d 39 0a 00 00 	lea    0xa39(%rip),%rdi        # d3af <_fini+0x607>
    c976:	4c 89 e6             	mov    %r12,%rsi
    c979:	49 0f 44 f5          	cmove  %r13,%rsi
    c97d:	31 c0                	xor    %eax,%eax
    c97f:	e8 3c 4d ff ff       	call   16c0 <printf@plt>
    c984:	45 85 f6             	test   %r14d,%r14d
    c987:	48 8d 3d 37 0a 00 00 	lea    0xa37(%rip),%rdi        # d3c5 <_fini+0x61d>
    c98e:	4c 89 e6             	mov    %r12,%rsi
    c991:	49 0f 44 f5          	cmove  %r13,%rsi
    c995:	31 c0                	xor    %eax,%eax
    c997:	e8 24 4d ff ff       	call   16c0 <printf@plt>
    c99c:	85 db                	test   %ebx,%ebx
    c99e:	48 8d 3d 34 0a 00 00 	lea    0xa34(%rip),%rdi        # d3d9 <_fini+0x631>
    c9a5:	4c 89 e6             	mov    %r12,%rsi
    c9a8:	49 0f 44 f5          	cmove  %r13,%rsi
    c9ac:	31 c0                	xor    %eax,%eax
    c9ae:	e8 0d 4d ff ff       	call   16c0 <printf@plt>
    c9b3:	80 7c 24 18 00       	cmpb   $0x0,0x18(%rsp)
    c9b8:	48 8d 3d 32 0a 00 00 	lea    0xa32(%rip),%rdi        # d3f1 <_fini+0x649>
    c9bf:	4c 89 e6             	mov    %r12,%rsi
    c9c2:	49 0f 44 f5          	cmove  %r13,%rsi
    c9c6:	31 c0                	xor    %eax,%eax
    c9c8:	e8 f3 4c ff ff       	call   16c0 <printf@plt>
    c9cd:	80 7c 24 17 00       	cmpb   $0x0,0x17(%rsp)
    c9d2:	48 8d 3d 2d 0a 00 00 	lea    0xa2d(%rip),%rdi        # d406 <_fini+0x65e>
    c9d9:	4c 89 e6             	mov    %r12,%rsi
    c9dc:	49 0f 44 f5          	cmove  %r13,%rsi
    c9e0:	31 c0                	xor    %eax,%eax
    c9e2:	e8 d9 4c ff ff       	call   16c0 <printf@plt>
    c9e7:	8b 44 24 0c          	mov    0xc(%rsp),%eax
    c9eb:	48 8d 3d 27 0a 00 00 	lea    0xa27(%rip),%rdi        # d419 <_fini+0x671>
    c9f2:	4c 89 ee             	mov    %r13,%rsi
    c9f5:	85 c0                	test   %eax,%eax
    c9f7:	49 0f 45 f4          	cmovne %r12,%rsi
    c9fb:	31 c0                	xor    %eax,%eax
    c9fd:	e8 be 4c ff ff       	call   16c0 <printf@plt>
    ca02:	8b 54 24 08          	mov    0x8(%rsp),%edx
    ca06:	48 8d 3d 23 0a 00 00 	lea    0xa23(%rip),%rdi        # d430 <_fini+0x688>
    ca0d:	4c 89 ee             	mov    %r13,%rsi
    ca10:	85 d2                	test   %edx,%edx
    ca12:	49 0f 45 f4          	cmovne %r12,%rsi
    ca16:	31 c0                	xor    %eax,%eax
    ca18:	e8 a3 4c ff ff       	call   16c0 <printf@plt>
    ca1d:	8b 4c 24 04          	mov    0x4(%rsp),%ecx
    ca21:	4c 89 e6             	mov    %r12,%rsi
    ca24:	48 8d 3d 1c 0a 00 00 	lea    0xa1c(%rip),%rdi        # d447 <_fini+0x69f>
    ca2b:	85 c9                	test   %ecx,%ecx
    ca2d:	49 0f 44 f5          	cmove  %r13,%rsi
    ca31:	31 c0                	xor    %eax,%eax
    ca33:	e8 88 4c ff ff       	call   16c0 <printf@plt>
    ca38:	80 7c 24 16 00       	cmpb   $0x0,0x16(%rsp)
    ca3d:	41 0f b6 c7          	movzbl %r15b,%eax
    ca41:	74 0b                	je     ca4e <set_host_implementation+0x2ae>
    ca43:	89 c2                	mov    %eax,%edx
    ca45:	83 ca 02             	or     $0x2,%edx
    ca48:	45 85 f6             	test   %r14d,%r14d
    ca4b:	0f 45 c2             	cmovne %edx,%eax
    ca4e:	a8 02                	test   $0x2,%al
    ca50:	0f 85 ca 00 00 00    	jne    cb20 <set_host_implementation+0x380>
    ca56:	48 8d 15 13 a4 ff ff 	lea    -0x5bed(%rip),%rdx        # 6e70 <blosc_internal_bshuf_untrans_bit_elem_scal>
    ca5d:	48 8d 35 cc d1 ff ff 	lea    -0x2e34(%rip),%rsi        # 9c30 <blosc_internal_bshuf_untrans_bit_elem_sse2>
    ca64:	83 e0 01             	and    $0x1,%eax
    ca67:	48 8d 3d 82 ca ff ff 	lea    -0x357e(%rip),%rdi        # 94f0 <blosc_internal_bshuf_trans_bit_elem_sse2>
    ca6e:	48 8d 0d fb 97 ff ff 	lea    -0x6805(%rip),%rcx        # 6270 <blosc_internal_unshuffle_generic>
    ca75:	4c 8d 05 84 97 ff ff 	lea    -0x687c(%rip),%r8        # 6200 <blosc_internal_shuffle_generic>
    ca7c:	48 0f 44 f2          	cmove  %rdx,%rsi
    ca80:	48 8d 15 d9 9e ff ff 	lea    -0x6127(%rip),%rdx        # 6960 <blosc_internal_bshuf_trans_bit_elem_scal>
    ca87:	48 8d 05 a7 08 00 00 	lea    0x8a7(%rip),%rax        # d335 <_fini+0x58d>
    ca8e:	48 0f 44 fa          	cmove  %rdx,%rdi
    ca92:	48 8d 15 f7 be ff ff 	lea    -0x4109(%rip),%rdx        # 8990 <blosc_internal_unshuffle_sse2>
    ca99:	48 0f 44 d1          	cmove  %rcx,%rdx
    ca9d:	48 8d 0d 4c ba ff ff 	lea    -0x45b4(%rip),%rcx        # 84f0 <blosc_internal_shuffle_sse2>
    caa4:	49 0f 44 c8          	cmove  %r8,%rcx
    caa8:	4c 8d 05 7c 08 00 00 	lea    0x87c(%rip),%r8        # d32b <_fini+0x583>
    caaf:	4c 0f 44 c0          	cmove  %rax,%r8
    cab3:	48 89 7c 24 18       	mov    %rdi,0x18(%rsp)
    cab8:	4c 89 05 01 2b 20 00 	mov    %r8,0x202b01(%rip)        # 20f5c0 <host_implementation>
    cabf:	c5 fa 7e 54 24 18    	vmovq  0x18(%rsp),%xmm2
    cac5:	48 89 4c 24 18       	mov    %rcx,0x18(%rsp)
    caca:	c5 fa 7e 5c 24 18    	vmovq  0x18(%rsp),%xmm3
    cad0:	c4 e3 e9 22 c6 01    	vpinsrq $0x1,%rsi,%xmm2,%xmm0
    cad6:	c4 e3 e1 22 ca 01    	vpinsrq $0x1,%rdx,%xmm3,%xmm1
    cadc:	c4 e3 75 38 c0 01    	vinserti128 $0x1,%xmm0,%ymm1,%ymm0
    cae2:	c5 fa 7f 05 de 2a 20 	vmovdqu %xmm0,0x202ade(%rip)        # 20f5c8 <host_implementation+0x8>
    cae9:	00 
    caea:	c4 e3 7d 39 05 e4 2a 	vextracti128 $0x1,%ymm0,0x202ae4(%rip)        # 20f5d8 <host_implementation+0x18>
    caf1:	20 00 01 
    caf4:	c5 f8 77             	vzeroupper 
    caf7:	48 8d 65 d8          	lea    -0x28(%rbp),%rsp
    cafb:	5b                   	pop    %rbx
    cafc:	41 5c                	pop    %r12
    cafe:	41 5d                	pop    %r13
    cb00:	41 5e                	pop    %r14
    cb02:	41 5f                	pop    %r15
    cb04:	5d                   	pop    %rbp
    cb05:	c3                   	ret    
    cb06:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
    cb0d:	00 00 00 
    cb10:	31 db                	xor    %ebx,%ebx
    cb12:	45 31 f6             	xor    %r14d,%r14d
    cb15:	e9 20 fd ff ff       	jmp    c83a <set_host_implementation+0x9a>
    cb1a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
    cb20:	48 8d 35 e9 fa ff ff 	lea    -0x517(%rip),%rsi        # c610 <blosc_internal_bshuf_untrans_bit_elem_avx2>
    cb27:	48 8d 3d 72 f9 ff ff 	lea    -0x68e(%rip),%rdi        # c4a0 <blosc_internal_bshuf_trans_bit_elem_avx2>
    cb2e:	48 8d 15 cb f2 ff ff 	lea    -0xd35(%rip),%rdx        # be00 <blosc_internal_unshuffle_avx2>
    cb35:	48 8d 0d a4 ef ff ff 	lea    -0x105c(%rip),%rcx        # bae0 <blosc_internal_shuffle_avx2>
    cb3c:	4c 8d 05 ed 07 00 00 	lea    0x7ed(%rip),%r8        # d330 <_fini+0x588>
    cb43:	e9 6b ff ff ff       	jmp    cab3 <set_host_implementation+0x313>
    cb48:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    cb4f:	00 

At cadc you can see it it uses vinserti128.

@FrancescAlted
Copy link
Member

Thanks for the testing! So, compiling shuffle.c with __AVX2__ and __SSE2__ looks to be the problem. I don't know exactly why however.

In #347 (comment) I proposed an alternative to modifying shuffle.c and compiling it with __AVX2__ flags to support universal2 build that was rejected at the time:

An alternative could be to define blosc_internal___avx2 and blosc_internal___sse2 functions even when AVX2/SSE2 are not defined with a body calling e.g., abort().

@FrancescAlted if you think it is a good solution, I can propose a PR.

Yes, I do think now it that we have @mkitti in the loop for testing, it is a good time to address this. Please proceed when you have the opportunity. Thanks!

mkitti added a commit to JuliaIO/HDF5.jl that referenced this issue Jul 6, 2023
* Fix #1083 maxdims -> max_dims in dataspace documentation

* Fix #1084, create_dataset with Type and Dataspace

Also expand definitions for dataspace with two positional arguments.
Pin Blosc_jll to 1.21.2 if AVX2 is not detected. See Blosc/c-blosc#371

* Formatter
@DennisHeimbigner
Copy link

DennisHeimbigner commented Jul 22, 2023

I am having the same problem under unbuntu22 on VirtualBox.
I do not see the corresponding PR with git br -r; Is there anyway I can access it?

@mkitti
Copy link
Contributor Author

mkitti commented Aug 17, 2023

I believe this is the corresponding PR:

#373

mkitti added a commit to mkitti/HDF5.jl that referenced this issue Aug 20, 2023
…uliaIO#1086)

* Fix JuliaIO#1083 maxdims -> max_dims in dataspace documentation

* Fix JuliaIO#1084, create_dataset with Type and Dataspace

Also expand definitions for dataspace with two positional arguments.
Pin Blosc_jll to 1.21.2 if AVX2 is not detected. See Blosc/c-blosc#371

* Formatter
mkitti added a commit to JuliaIO/HDF5.jl that referenced this issue Aug 23, 2023
* Fix _typed_load fast path for Julia 1.10 (currently nightly) (#1075)

* Fix _typed_load fast path for Julia 1.10 (currently nightly) by thresholding on `1.10.0-DEV.1390`
* Use `Libc.memcpy` after that threshold

* Add initial support for H5Dchunk_iter (#1031)

* Add initial support for H5Dchunk_iter

* Implement h5d_chunk_iter_helper

* Implement HDF5.get_all_chunk_info

* Make tests pass via HDF5 1.14.0

* Apply formatting

* Test filters with filter_mask via H5Dchunk_iter

* Require functions to return an integer

* Provide index based chunk iteration, rename to HDF5.get_chunk_info_all

* Fix formatting

* Fix documentation

* Fix documentation

* Improve testing

* Always define _get_chunk_info_all_by_iter for documenter

* Update src/datasets.jl

Co-authored-by: Simon Byrne <simonbyrne@gmail.com>

* Precompile get_chunk_info_all implementations before benchmarking

* Fix documentation

* Fix tests

* Formatting

---------

Co-authored-by: Simon Byrne <simonbyrne@gmail.com>

* Simplify formatter check (#1078)

This will display the diff, and return a non-zero exit code if there are changes

* Fix #1083 maxdims -> max_dims in dataspace documentation (#1085)

* Update light and dark logos for readme (#1087)

* Fixup readme logo links and readme style (#1088)

* Fixup readme logo links and readme style

* Tweak

* Tweak logo centering

* Upload curves directly instead of fonts for logo independence (#1089)

* Allow create_dataset to take a Type and Dataspace, Fix #1084 (#1086)

* Fix #1083 maxdims -> max_dims in dataspace documentation

* Fix #1084, create_dataset with Type and Dataspace

Also expand definitions for dataspace with two positional arguments.
Pin Blosc_jll to 1.21.2 if AVX2 is not detected. See Blosc/c-blosc#371

* Formatter

* Attempt to fix documentation (#1091)

* doc typo: attributes(parent), not attribute(parent) (#1095)

* Fix tests for Julia 1.3

* Fix tests for Julia 1.3, Windows SZIP is broken again

* Use static if block to fix 1.3

* Fix version and formatting

* Fix tests

* Bump version to 0.16.16

* Formatting

---------

Co-authored-by: Simon Byrne <simonbyrne@gmail.com>
Co-authored-by: Mustafa M <mus-m@outlook.com>
Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
@mgood7123
Copy link

can this be closed?

@mkitti
Copy link
Contributor Author

mkitti commented Nov 12, 2023

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@FrancescAlted @DennisHeimbigner @mkitti @t20100 @mgood7123 and others