From cb7a9674142c137367bf75a01b79c6e214a73199 Mon Sep 17 00:00:00 2001 From: Jinpeng Date: Thu, 15 Aug 2024 03:00:42 -0700 Subject: [PATCH] adding some bad parquet files (#58) * adding some bad parquet files * move to bad-data * fmt? --------- Co-authored-by: jp0317 Co-authored-by: mwish --- bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet | Bin 0 -> 533 bytes bad_data/ARROW-RS-GH-6229-LEVELS.parquet | Bin 0 -> 609 bytes bad_data/README.md | 6 +++++- 3 files changed, 5 insertions(+), 1 deletion(-) create mode 100755 bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet create mode 100644 bad_data/ARROW-RS-GH-6229-LEVELS.parquet diff --git a/bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet b/bad_data/ARROW-RS-GH-6229-DICTHEADER.parquet new file mode 100755 index 0000000000000000000000000000000000000000..7d14d5ec7995c3b3826f767723824c0e1f04a47b GIT binary patch literal 533 zcmY+Cze>YU6vj`kN&lo#tM=R`5a?uZsDlyRGnBRvN-1s%rIeteNo}cvtMm=roq~h2 zkKyPb_y*2SBItKR#R&IzPj0@G@0^^Z9}M=G_(j8N)_hzRxI`p~$k(uu+SF+U=)j=w zuL5EbAd$+z1QlQ*bro?9tb;1p0GnV7)Bpns>cCcKogdW-d7t60Y=f}f8gv8Y91Pm6 z-ch#~prq>J?CI^`7}2(Mt!k0^e5d-#E95j`sk1<(EMz5!MX1h5MzFZrWRblqd=kv;E2dau$w6v~Z-bJEdbjD^Lvu+3yNKc6JSi|7VUxS=PbI9h76==7gf z8BgLkO6J4yePYuh%Oubv-vZYco>r7l2a$!J#nihWz{;K5+ pQjE~MFE1VP4P;4aL#7Sr>MKZ}cFe2MYP%)sCB0&r{P#Xk`oC+A9#E0>L zd<w$Q$h5N8yYuE!~Bm7t*>!wr1ke1GfaIK(tf>tnP}3#c%JNH z0B8+zXjOnUxC6KW7(hvmz;yxcX;=cdqlon;Vs%X|>1H1C)>t*xEoxTCC1^YX1Vy8f zyLx%@qk#C;GO1|g^LCyFWnW4OFsFOaB>IgqS>HzXbK9m8KIy&>$wfUKaMy`WJhAy7 z7ks(eA)oMMEz=HZ#nr#Qya=FJWcu;6H=ixTI2%l-N2_qY7*5_#bGd?3$TlCqYgX!y OhDlH6C&ieDkLwo|hlFAP literal 0 HcmV?d00001 diff --git a/bad_data/README.md b/bad_data/README.md index 885af61..30802a5 100644 --- a/bad_data/README.md +++ b/bad_data/README.md @@ -21,7 +21,11 @@ These are files used for reproducing various bugs that have been reported. * PARQUET-1481.parquet: tests a case where a schema Thrift value has been - corrupted + corrupted. +* ARROW-RS-GH-6229-DICTHEADER.parquet: tests a case where the number of values + stored in dictionary page header is negative. +* ARROW-RS-GH-6229-LEVELS.parquet: tests a case where a page has insufficient + repetition levels. * ARROW-GH-41321.parquet: test case of https://github.com/apache/arrow/issues/41321 where decoded rep / def levels is less than num_values in page_header. * ARROW-GH-41317.parquet: test case of https://github.com/apache/arrow/issues/41317