Skip to content

Commit

Permalink
Minor performance fix for NEON RAID-Z
Browse files Browse the repository at this point in the history
The NEON code replicates too closely the SSE code, including
a masked 16-bits shift. But NEON, like AltiVec (#9539), has
unsigned 8-bits shift, so use that instead and drop the masking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9725
  • Loading branch information
rdolbeau authored and behlendorf committed Dec 18, 2019
1 parent fe56484 commit 118fc3e
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions module/zfs/vdev_raidz_math_aarch64_neon_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -479,10 +479,8 @@ typedef struct v {
/* upper part */ \
"and v14.16b," VR0(r) ".16b,v15.16b\n" \
"and v13.16b," VR1(r) ".16b,v15.16b\n" \
"sshr " VR0(r) ".8h," VR0(r) ".8h,#4\n" \
"sshr " VR1(r) ".8h," VR1(r) ".8h,#4\n" \
"and " VR0(r) ".16b," VR0(r) ".16b,v15.16b\n" \
"and " VR1(r) ".16b," VR1(r) ".16b,v15.16b\n" \
"ushr " VR0(r) ".16b," VR0(r) ".16b,#4\n" \
"ushr " VR1(r) ".16b," VR1(r) ".16b,#4\n" \
\
"tbl v12.16b,{v10.16b}," VR0(r) ".16b\n" \
"tbl v10.16b,{v10.16b}," VR1(r) ".16b\n" \
Expand Down

0 comments on commit 118fc3e

Please sign in to comment.