Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug COALESCING reading does not work for v2 parquet/orc datasource #5171

Merged
merged 5 commits into from
Apr 13, 2022

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented Apr 8, 2022

COALESCING reading is not working for v2 datasource, which means ORC and Parquet will auto-fall back to multi-threaded reading even without the input_xxx expressions or ignoreCorruptFiles since the flag queryUsesInputFile is always true. Finally

  override val canUseCoalesceFilesReader: Boolean =
    rapidsConf.isParquetCoalesceFileReadEnabled && !(queryUsesInputFile || ignoreCorruptFiles)

canUseCoalesceFilesReader will always be false.

Thx @firestarman for the issue reporting.

To close #5215

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Apr 8, 2022

build

…ource

Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Copy link
Collaborator

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be hard to integration test but it might be nice to have unit test for this.

@wbo4958 wbo4958 requested a review from tgravescs April 8, 2022 13:27
@wbo4958 wbo4958 marked this pull request as draft April 11, 2022 12:02
@wbo4958 wbo4958 changed the title Fix the bug COALESCING reading does not work for v2 parquet/orc datasource [draft] Fix the bug COALESCING reading does not work for v2 parquet/orc datasource Apr 11, 2022
@tgravescs
Copy link
Collaborator

was this purely discovered internally by us testing or customer reported? Please file an issue as well.

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Apr 12, 2022

build

@wbo4958 wbo4958 changed the title [draft] Fix the bug COALESCING reading does not work for v2 parquet/orc datasource Fix the bug COALESCING reading does not work for v2 parquet/orc datasource Apr 12, 2022
@wbo4958 wbo4958 marked this pull request as ready for review April 12, 2022 09:33
@wbo4958
Copy link
Collaborator Author

wbo4958 commented Apr 13, 2022

build

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Apr 13, 2022

@tgravescs could you help to review it?

@jlowe jlowe added this to the Apr 4 - Apr 15 milestone Apr 13, 2022
@sameerz sameerz added the bug Something isn't working label Apr 13, 2022
@wbo4958 wbo4958 merged commit 720f68f into NVIDIA:branch-22.06 Apr 13, 2022
@wbo4958 wbo4958 deleted the v2-coalescing branch April 13, 2022 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Coalescing reading is not working for v2 parquet/orc datasource
4 participants