Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding some bad parquet files #58

Merged
merged 4 commits into from
Aug 15, 2024
Merged

adding some bad parquet files #58

merged 4 commits into from
Aug 15, 2024

Conversation

jp0317
Copy link
Contributor

@jp0317 jp0317 commented Aug 13, 2024

Hi, these two bad files can help reproduce the two examples mentioned in apache/arrow-rs#6228.

@mapleFU
Copy link
Member

mapleFU commented Aug 14, 2024

Just wonder how these file is generated? 🤔 These files are little, which is good for our testing

@mapleFU
Copy link
Member

mapleFU commented Aug 15, 2024

@jp0317 I've check the bad-dict file, it throw here in C++ code:

  if (col_start < 0 || col_length < 0) {
    throw ParquetException("Invalid column metadata (corrupt file?)");
  }

  if (AddWithOverflow(col_start, col_length, &col_end) || col_end > source_size) {
    throw ParquetException("Invalid column metadata (corrupt file?)");
  }

So maybe arrow-rs can also check this? ( also cc @alamb )

@mapleFU
Copy link
Member

mapleFU commented Aug 15, 2024

@pitrou Would you mind check the naming of the 2 new bad file here?

Besides, the bad-levels file is similiar to existing bad level file, but I think one more bad file is ok. Want to hear what do you think of this

@alamb
Copy link

alamb commented Aug 15, 2024

So maybe arrow-rs can also check this? ( also cc @alamb )

I agree we should add tests to parquet-rs to test these bad files. I'll see what i can do

@alamb
Copy link

alamb commented Aug 15, 2024

So maybe arrow-rs can also check this? ( also cc @alamb )

I made a PR to check these files with Rust: apache/arrow-rs#6262

@jp0317
Copy link
Contributor Author

jp0317 commented Aug 16, 2024

@jp0317 I've check the bad-dict file, it throw here in C++ code:

  if (col_start < 0 || col_length < 0) {
    throw ParquetException("Invalid column metadata (corrupt file?)");
  }

  if (AddWithOverflow(col_start, col_length, &col_end) || col_end > source_size) {
    throw ParquetException("Invalid column metadata (corrupt file?)");
  }

So maybe arrow-rs can also check this? ( also cc @alamb )

looks like the col start and length are checked against negative here but isn't checked against the file length.

Beside, it seems we'd better check data_len against file length instead of "with_capacity() then take() here

@mapleFU
Copy link
Member

mapleFU commented Aug 16, 2024

I've an issue here: apache/arrow-rs#6255 , glad to learn some rust code lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants