Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushdown nested fields #3

Merged
merged 2 commits into from
May 29, 2019
Merged

Pushdown nested fields #3

merged 2 commits into from
May 29, 2019

Conversation

JamesRTaylor
Copy link

Please review, @billonahill. I've adapted the patch from qqibrow here to be on top of Presto 309. I'm hoping this will address the perf issue for pruning of nested fields and close the gap between Presto & BQ.

cc @puneetjaiswal

@JamesRTaylor JamesRTaylor requested a review from billonahill May 29, 2019 17:03
@billonahill
Copy link

@JamesRTaylor were you able to verify the effectiveness of this patch in staging?

@JamesRTaylor
Copy link
Author

Next step post-merge is to deploy this to next-staging on top of the branch I've already been working with so that we can test perf.

@billonahill
Copy link

billonahill commented May 29, 2019

Got it. This is a big patch and I haven't been totally up to speed with all the upstream PRs and discussions, but looking into this. Can you give a tl;dr; of some of the key areas of the code to focus on?

And which parts have we considerably changed from the upstream PR?

@billonahill
Copy link

billonahill commented May 29, 2019

👍

I gave this a cursory look but would need quite a bit more time to familiarize with the code and I don't want to block on that. Some of the comments I have would be better suited for the OSS patch, like missing tests and questions about the API/object model, so they might not be that constructive here.

I recommend putting it in staging and we can test it.

@JamesRTaylor
Copy link
Author

The main part of the patch is threading through a map of NestedColumn objects to the Hive connector and then using that in ParquetPageSource to only read what's projected. I'll get another PR together for prestoinfra to push this to next staging so we can evaluate the performance.

@JamesRTaylor JamesRTaylor merged commit 398cef2 into lyft-309 May 29, 2019
@JamesRTaylor JamesRTaylor deleted the pushdown-nested-fields branch May 29, 2019 21:16
puneetjaiswal pushed a commit that referenced this pull request Jul 31, 2019
* Pushdown dereference expression to paruqet reader (qqibrow)

* Fix TestMergeNestedColumns
bearcage pushed a commit that referenced this pull request Aug 30, 2019
* Pushdown dereference expression to paruqet reader (qqibrow)

* Fix TestMergeNestedColumns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants