Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add arm support #286

Closed
wants to merge 53 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
2d89df4
move x86 arch and SIMD types to x86 arch folder
markos Sep 17, 2020
6a40793
move cpuid stuff to util/arch/x86
markos Sep 17, 2020
ea721c9
move crc32 SSE42 implementation to util/arch/x86
markos Sep 18, 2020
956b001
move masked_move* AVX2 implementation to util/arch/x86
markos Sep 18, 2020
8ed5f4a
fix include paths for masked_move
markos Sep 18, 2020
aac1f0f
move x86 bitutils.h implementations to util/arch/x86/bitutils.h
markos Sep 22, 2020
6581aae
move x86 popcount.h implementations to util/arch/x86/popcount.h
markos Sep 22, 2020
9f3ad89
move andn helper function to bitutils.h
markos Sep 22, 2020
e915d84
no need to check for WIN32*
markos Sep 22, 2020
e8e188a
move x86 implementations of simd_utils.h to util/arch/x86/
markos Sep 22, 2020
f7a6b89
add some set*() functions, harmonize names, rename setAxB to set1_AxB…
markos Sep 23, 2020
5333467
fix names, use own intrinsic instead of explicit _mm* ones
markos Sep 23, 2020
04fbf24
Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h"
markos Sep 23, 2020
f0e70bc
Revert "Revert "move x86 popcount.h implementations to util/arch/x86/…
markos Sep 24, 2020
b1170bc
add arm checks in platform.cmake
markos Oct 6, 2020
5952c64
add necessary modifications to CMake system to enable building on ARM…
markos Oct 6, 2020
e91082d
use right intrinsic
markos Oct 6, 2020
9a04942
minor fix
markos Oct 7, 2020
4c924cc
add arm architecture basic defines
markos Oct 7, 2020
5d773dd
use C implementation of popcount for arm
markos Oct 7, 2020
d2cf1a7
move cpuid_flags.h header to common
markos Oct 8, 2020
1c2c73b
add C implementation of pdep64()
markos Oct 8, 2020
a921217
add arm bitutils.h header
markos Oct 8, 2020
31ac671
add ARM version of simd_utils.h
markos Oct 13, 2020
5b425bd
add arm simple cpuid_flags
markos Oct 15, 2020
c5a7f4b
add ARM simd_utils vectorized functions for 128-bit vectors
markos Oct 15, 2020
45bfed9
add scalar versions of the vectorized functions for architectures tha…
markos Oct 15, 2020
e7e1308
fix compilation paths for cpuid_flags for x86
markos Oct 16, 2020
83977db
split arch-agnostic simd_utils.h functions into the common file
markos Oct 16, 2020
4bce012
Revert "move x86 popcount.h implementations to util/arch/x86/popcount.h"
markos Oct 16, 2020
c4db636
scalar implementations of diffrich256 and diffrich384
markos Oct 16, 2020
149ea93
don't redefine function on x86
markos Oct 16, 2020
0bef151
don't use SSE directly in the tests
markos Oct 30, 2020
5482429
fix ARM implementations
markos Oct 30, 2020
547f79b
small optimization in storecompress*()
markos Oct 30, 2020
592b190
needed for ARM vector type conversions
markos Oct 30, 2020
18296ee
fix 32-bit/64-bit detection
markos Nov 5, 2020
7b8cf97
add extra instructions (currently arm-only), fix order of elements in…
markos Nov 5, 2020
3390418
add compress128 function and implementation
markos Nov 5, 2020
501f60e
add some debug info
markos Nov 5, 2020
62fed20
add some debug and minor optimizations in unit test
markos Nov 5, 2020
c4f1372
remove debug from functions
markos Nov 5, 2020
606c53a
fix compiler flag testcase
markos Nov 24, 2020
1c26f04
when building in debug mode, vgetq_lane_*() and vextq_*() need immedi…
markos Nov 24, 2020
d763652
helper functions to print a m128 vector in debug mode
markos Nov 24, 2020
17ab42d
small optimization that was for some reason failing in ARM, should be…
markos Nov 24, 2020
259c257
define debug vector print functions to NULL in non-debug mode
markos Dec 3, 2020
38477b0
fix movq and load_m128_from_u64a and resp. test for NEON
markos Dec 3, 2020
c38722a
add ARM platform
markos Dec 3, 2020
39945b7
clear zones array
markos Dec 3, 2020
773dc6f
optimize *shiftbyte_m128() functions to use palign instead of variabl…
markos Dec 7, 2020
e088c6a
remove forgotten printf
markos Dec 7, 2020
61b963a
fix x86 compilation
markos Dec 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add some debug and minor optimizations in unit test
  • Loading branch information
markos committed Nov 5, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 62fed20ad051848c39d735900b978ffe261a51d3
29 changes: 22 additions & 7 deletions unit/internal/state_compress.cpp
Original file line number Diff line number Diff line change
@@ -98,8 +98,8 @@ TEST(state_compress, m128_1) {
char buf[sizeof(m128)] = { 0 };

for (u32 i = 0; i < 16; i++) {
char mask_raw[16] = { 0 };
char val_raw[16] = { 0 };
char ALIGN_ATTR(16) mask_raw[16] = { 0 };
char ALIGN_ATTR(16) val_raw[16] = { 0 };

memset(val_raw, (i << 4) + 3, 16);

@@ -109,17 +109,32 @@ TEST(state_compress, m128_1) {
mask_raw[15 - i] = 0xff;
val_raw[15 - i] = i;

m128 val;
m128 mask;

memcpy(&val, val_raw, sizeof(val));
memcpy(&mask, mask_raw, sizeof(mask));
m128 val = load128(val_raw);
m128 mask = load128(mask_raw);

storecompressed128(&buf, &val, &mask, 0);

m128 val_out;
loadcompressed128(&val_out, &buf, &mask, 0);

int8_t ALIGN_ATTR(16) data[16];
store128(data, val);
printf("val: ");
for (int j=0; j < 16; j++) printf("%02x ", data[j]);
printf("\n");
store128(data, mask);
printf("mask: ");
for (int j=0; j < 16; j++) printf("%02x ", data[j]);
printf("\n");
store128(data, and128(val, mask));
printf("and128(val, mask): ");
for (int j=0; j < 16; j++) printf("%02x ", data[j]);
printf("\n");
store128(data, val_out);
printf("val_out: ");
for (int j=0; j < 16; j++) printf("%02x ", data[j]);
printf("\n");

EXPECT_TRUE(!diff128(and128(val, mask), val_out));

mask_raw[i] = 0x0f;