Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mach compilation again; fold_constant has to be the same section as crc16_t10dif_copy_pmull #226

Closed
wants to merge 1 commit into from

Conversation

cielavenir
Copy link
Contributor

No description provided.

…as crc16_t10dif_copy_pmull

Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
@cielavenir
Copy link
Contributor Author

~/devel/isa-l ((3c78474...)) $ qemu-aarch64-static ./crc16_t10dif_test
Test crc16_t10dif_test .............................................................Test done: Pass
~/devel/isa-l ((3c78474...)) $ qemu-aarch64-static ./crc16_t10dif_copy_test 
Test crc16_t10dif_copy_test:
....Test done: Pass

@@ -379,13 +379,14 @@ v_br1 .req v5
.size crc16_t10dif_copy_pmull, .-crc16_t10dif_copy_pmull
#endif

ASM_DEF_RODATA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want them in the .text section now? They were in .rodata before. You show a cross build in the comments but cross build seems to work fine when this is in rodata like other constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked for gcc but did not work for (apple-)clang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ make -f Makefile.unx CC=arm64-apple-darwin-gcc AR=arm64-apple-darwin-ar arch=aarch64 host_cpu=aarch64 DEFINES="-fno-stack-check" lib programs/igzip -j8
mkdir -p bin
  ---> Building crc/aarch64/crc16_t10dif_pmull.S  aarch64 
  ---> Building crc/aarch64/crc16_t10dif_copy_pmull.S  aarch64 
crc/aarch64/crc16_t10dif_pmull.S:222:9: error: unknown AArch64 fixup kind!
        ldr q_fold_const, fold_constant
        ^
crc/aarch64/crc16_t10dif_copy_pmull.S:231:9: error: unknown AArch64 fixup kind!
        ldr q_fold_const, fold_constant
        ^
make: *** [bin/crc16_t10dif_pmull.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [bin/crc16_t10dif_copy_pmull.o] Error 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been confirmed by #226 (comment)

@cielavenir
Copy link
Contributor Author

cielavenir commented Oct 31, 2022

@kirbyzhou would you mind taking a look? I mean, does this improve compilation?

@kirbyzhou
Copy link

@kirbyzhou would you mind taking a look? I mean, does this improve compilation?

I'll try it in a 2 or 3 days. I've been busy recently.

@kirbyzhou
Copy link

It can be compiled with a lot of warnings:

ld: warning: -undefined dynamic_lookup may not work with chained fixups
ld: warning: arm64 function not 4-byte aligned: _gf_vect_dot_prod_neon from erasure_code/aarch64/.libs/gf_vect_dot_prod_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lloop128_init from erasure_code/aarch64/.libs/gf_vect_dot_prod_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lloop128 from erasure_code/aarch64/.libs/gf_vect_dot_prod_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lloop128_vects from erasure_code/aarch64/.libs/gf_vect_dot_prod_neon.o
...
ld: warning: arm64 function not 4-byte aligned: .Lloop16_vects from raid/aarch64/.libs/pq_check_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lloop16_vects_end from raid/aarch64/.libs/pq_check_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lloop16_end from raid/aarch64/.libs/pq_check_neon.o
ld: warning: arm64 function not 4-byte aligned: .Lerror from raid/aarch64/.libs/pq_check_neon.o
copying selected object files to avoid basename conflicts...
  CCLD     programs/igzip

The compression level=1,2,default is broken, only level=0,3 works

% time ./igzip -3 -c -z  /Users/kirbyzhou/Downloads/centos7-boot.iso  > /dev/null
./igzip -3 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null  1.79s user 0.09s system 97% cpu 1.914 total

% time ./igzip -2  -c -z  /Users/kirbyzhou/Downloads/centos7-boot.iso  > /dev/null
zsh: segmentation fault  ./igzip -2 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null
./igzip -2 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null  0.00s user 0.01s system 36% cpu 0.034 total

% time ./igzip -1  -c -z  /Users/kirbyzhou/Downloads/centos7-boot.iso  > /dev/null
zsh: segmentation fault  ./igzip -1 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null
./igzip -1 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null  0.00s user 0.01s system 37% cpu 0.034 total

% time ./igzip -0 -c -z  /Users/kirbyzhou/Downloads/centos7-boot.iso  > /dev/null
./igzip -0 -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null  1.31s user 0.07s system 98% cpu 1.410 total

% time ./igzip  -c -z  /Users/kirbyzhou/Downloads/centos7-boot.iso  > /dev/null 
zsh: segmentation fault  ./igzip -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null
./igzip -c -z /Users/kirbyzhou/Downloads/centos7-boot.iso > /dev/null  0.00s user 0.02s system 43% cpu 0.044 total

The decompression speed is the same as gzip of system and cloudera.

isal-l % time ./igzip -c -d /tmp/x.gz | wc -c 
 531628032
./igzip -c -d /tmp/x.gz  0.24s user 0.14s system 13% cpu 2.774 total
wc -c  2.54s user 0.05s system 93% cpu 2.771 total

macos-system % time gzip -c -d /tmp/x.gz | wc -c 
 531628032
gzip -c -d /tmp/x.gz  0.20s user 0.14s system 12% cpu 2.615 total
wc -c  2.55s user 0.05s system 99% cpu 2.613 total

cloudera % time ./minigzip -c -d /tmp/x.gz | wc -c
 531628032
./minigzip -c -d /tmp/x.gz  0.28s user 0.20s system 18% cpu 2.635 total
wc -c  2.54s user 0.06s system 99% cpu 2.628 total

@cielavenir
Copy link
Contributor Author

It can be compiled with a lot of warnings

Could you tell me the Xcode version used?

@cielavenir
Copy link
Contributor Author

Also, is it true that you used master, not my patch?

@cielavenir cielavenir closed this Nov 17, 2022
@cielavenir cielavenir reopened this Nov 17, 2022
@kirbyzhou
Copy link

Xcode version 14.1

commit 3c784749380f7f167ff7777fd0414f138f9debd5 (HEAD -> fix_mach2)
Author: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>
Date:   Sun Oct 30 18:46:04 2022 +0900

    Fix mach compilation again; fold_constant has to be the same section as crc16_t10dif_copy_pmull
    
    Signed-off-by: Taiju Yamada <tyamada@bi.a.u-tokyo.ac.jp>

commit fec429e1b952405dc539dea93b44ffa4151490a8 (origin/master, origin/HEAD, master)
Author: Greg Tucker <greg.b.tucker@intel.com>
Date:   Fri Oct 28 17:05:01 2022 -0700

    build: Add top-level read-only permissions to ci actions
    
    This is recommended by ossf scorecard:
    https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions
    
    Change-Id: I48a36cc6625fa3f1e6babb9edbe81c9522f41a13
    Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>

@cielavenir
Copy link
Contributor Author

cielavenir commented Nov 28, 2022

@kirbyzhou could you check whether fec429e compiles on Xcode 14?

Edit: resolved #226 (comment)

@akotlar
Copy link

akotlar commented Jun 7, 2023

Any updates on this? Do we need this for isal support on Apple silicon? Install of the python library seems to be fine through a conda-installed Python 3.11, but I'm not clear whether it's safe to use without this patch.

@cielavenir
Copy link
Contributor Author

cielavenir commented Jun 9, 2023

@akotlar could you check if 3c78474 and fec429e compiles on mac?

I believe fec429e does not, but I need another witness.

Edit: resolved #226 (comment)

@pablodelara
Copy link
Contributor

Any more info here?

@cielavenir
Copy link
Contributor Author

cielavenir commented Jan 5, 2024

I need someone who can check #226 (comment) , which takes less than 30min.

Otherwise I need to buy a M1 mac as github-hosted M1 runner is not free and cannot be used for experiments.

Edit: resolved #226 (comment)

@akotlar
Copy link

akotlar commented Jan 5, 2024

Sorry missed this notification. I’ll double check thanks!

@pablodelara
Copy link
Contributor

@akotlar thanks for looking into it. Let us know asap, thanks!

@pablodelara
Copy link
Contributor

We are releasing this week, so I'll defer this to next release, unless it is clear this is fixing the issue and not breaking any other ARM architectures.

@rhpvorderman rhpvorderman mentioned this pull request Jan 31, 2024
@jimharris
Copy link

SPDK ran into this general problem as well, see spdk/spdk#3247.

We pushed a fix to our isa-l submodule fork which got me unblocked for now. But it would be great to have this fixed in isa-l upstream.

@pablodelara
Copy link
Contributor

@liuqinfei could you check this is OK with you?

@liuqinfei
Copy link
Contributor

liuqinfei commented Feb 12, 2024 via email

@liuqinfei
Copy link
Contributor

I have verified the latest master branch with this PR on kunpeng Aarch64 platform. All of the func test cases are passed.

[root@node1 isa-l]# cat /etc/os-release
NAME="openEuler"
VERSION="22.03 (LTS-SP1)"
ID="openEuler"
VERSION_ID="22.03"
PRETTY_NAME="openEuler 22.03 (LTS-SP1)"
ANSI_COLOR="0;31"

[root@node1 isa-l]# uname -r
5.10.0-136.12.0.86.oe2203sp1.aarch64

[root@node1 isa-l]# make check
make --no-print-directory check-am
make --no-print-directory erasure_code/gf_vect_mul_test erasure_code/erasure_code_test erasure_code/gf_inverse_test erasure_code/erasure_code_update_test raid/xor_gen_test raid/pq_gen_test raid/xor_check_test raid/pq_check_test crc/crc16_t10dif_test crc/crc16_t10dif_copy_test crc/crc64_funcs_test crc/crc32_funcs_test igzip/igzip_rand_test igzip/igzip_wrapper_hdr_test igzip/checksum32_funcs_test mem/mem_zero_detect_test
make --no-print-directory check-TESTS
PASS: erasure_code/gf_vect_mul_test
PASS: erasure_code/erasure_code_test
PASS: erasure_code/gf_inverse_test
PASS: erasure_code/erasure_code_update_test
PASS: raid/xor_gen_test
PASS: raid/pq_gen_test
PASS: raid/xor_check_test
PASS: raid/pq_check_test
PASS: crc/crc16_t10dif_test
PASS: crc/crc16_t10dif_copy_test
PASS: crc/crc64_funcs_test
PASS: crc/crc32_funcs_test
PASS: igzip/igzip_rand_test
PASS: igzip/igzip_wrapper_hdr_test
PASS: igzip/checksum32_funcs_test
PASS: mem/mem_zero_detect_test

Testsuite summary for libisal 2.31.0

TOTAL: 16

PASS: 16

SKIP: 0

XFAIL: 0

FAIL: 0

XPASS: 0

ERROR: 0

============================================================================

@liuqinfei
Copy link
Contributor

Hi, @jimharris I read your PR3247 for spdk. The issue is interesting. It seems that the behavior ldr data in .rodata section should be forbidden with (apple-)clang. For this case, GCC works but (apple-)clang fails. So, it seems that it is better to fix the (apple-)clang. Right?

@cielavenir
Copy link
Contributor Author

So, with macos-14 runner, I can run CI finally. Though I needed to add libtool explicitly cf #277

So this pull request is valid at least (unless clang should be fixed as Liuqinfei says)

However igzip_rand_test segfaults (other 15 tests pass though), which should be investigated. But I think the investigation is super-difficult without local arm64 mac.

@cielavenir
Copy link
Contributor Author

@yuhaoth
#164 (comment)

x18 problem should be fix . I will raise an issue later.

This looks forgotten. Could you take a look?

~/devel/isa-l (master) $ ag declare_generic_reg.*18
igzip/aarch64/igzip_deflate_finish_aarch64.S
100:	declare_generic_reg	m_bit_count,	18,w

igzip/aarch64/igzip_deflate_body_aarch64.S
98:	declare_generic_reg	m_bit_count,	18,w

igzip/aarch64/igzip_decode_huffman_code_block_aarch64.S
276:	declare_generic_reg	start_out,	18,x

igzip/aarch64/isal_deflate_icf_finish_hash_hist.S
120:	declare_generic_reg	file_start,		18,x

igzip/aarch64/isal_deflate_icf_body_hash_hist.S
110:	declare_generic_reg	level_buf,	18,x

I remember I tried but in igzip_decode_huffman_code_block_aarch64.S all x0 to x28 are occupied. I'm pretty not sure if x18 can be simply switched to x29.

@cielavenir
Copy link
Contributor Author

@yuhaoth seems like x29 is safe. And actually, thanks to github M1-mac runner, I was avoid x18 by my own. With M1-mac github runner CI fixed, I published #278

@kirbyzhou , 278 should not have segmentation fault anymore.

@cielavenir
Copy link
Contributor Author

closing to avoid confusion

@cielavenir cielavenir closed this Mar 4, 2024
@pablodelara
Copy link
Contributor

Reopening this PR, then #278 can be rebased. @liuqinfei, are you OK with this fix?

@pablodelara pablodelara reopened this Mar 6, 2024
@liuqinfei
Copy link
Contributor

Reopening this PR, then #278 can be rebased. @liuqinfei, are you OK with this fix?

As i verified, this patch works well on arm64. If no deep study on clang, it's ok to accept this.

@pablodelara
Copy link
Contributor

PR merged in the repo now, thanks!

@pablodelara pablodelara closed this Mar 7, 2024
@chenrui333
Copy link

The patch works for me, any chance of cutting a new release? cc @pablodelara

relates to Homebrew/homebrew-core#168488

@cielavenir
Copy link
Contributor Author

@chenrui333 actually you should include #278 whole content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants