Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add caching of parsed Types #21325

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

kevinwilfong
Copy link
Contributor

We've seen cases of queries that spend a large amount of time just parsing types when converting the Presto Plan to Velox. This seems to be because it parses the same large Row Types that are used across many field accesses.

Adding caching within a request shows a substantial decrease in the amount of time it takes to do the conversion.

Notably, this helps with timeouts we're seeing making calls from the coordinator to create tasks on the Workers.

@kevinwilfong kevinwilfong requested a review from a team as a code owner November 6, 2023 23:21
@kevinwilfong kevinwilfong marked this pull request as draft November 6, 2023 23:21
@kevinwilfong kevinwilfong force-pushed the cache_type_conversion branch 2 times, most recently from 2d8fc42 to 17b39fa Compare November 7, 2023 21:31
@kevinwilfong kevinwilfong marked this pull request as ready for review November 8, 2023 19:11
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinwilfong nice catch. Thanks for the optimization!

@@ -61,6 +67,7 @@ class VeloxExprConverter {
const protocol::CallExpression& pexpr) const;

velox::memory::MemoryPool* pool_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark pool_ and typeParser_ as consts? Thanks!

@@ -35,7 +37,7 @@ class VeloxQueryPlanConverterBase {
explicit VeloxQueryPlanConverterBase(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: drop explicit as the ctor takes more than one input? Thanks!

@@ -218,6 +220,7 @@ class VeloxQueryPlanConverterBase {
velox::memory::MemoryPool* pool_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: mark poo_ and queryCtx_ as consts?

velox::memory::MemoryPool* const pool_;
velox::core::QueryCtx* const queryCtx_;

We've seen cases of queries that spend a large amount of time just parsing
types when converting the Presto Plan to Velox. This seems to be because it
parses the same large Row Types that are used across many field accesses.

Adding caching within a request shows a substantial decrease in the amount of
time it takes to do the conversion.

Notably, this helps with timeouts we're seeing making calls from the coordinator
to create tasks on the Workers.
@xiaoxmeng xiaoxmeng merged commit d1c5d83 into prestodb:master Nov 13, 2023
59 checks passed
@majetideepak
Copy link
Collaborator

majetideepak commented Nov 15, 2023

@kevinwilfong I am adding Presto type parser support using Flex/Bison in Velox. facebookincubator/velox#7568
The end goal is to replace Antlr with that and remove a dependency.
I will add support for caching.
Is there a benchmark to evaluate the performance?

velox::TypePtr parse(const std::string& text) const;

private:
mutable std::unordered_map<std::string, velox::TypePtr> cache_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the SimpleLRUCache from Velox?
We use that to cache file handles
https://github.com/facebookincubator/velox/blob/main/velox/connectors/hive/FileHandle.h#L62

@majetideepak
Copy link
Collaborator

I am worried that without a bound, the cache might grow too big in a production system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants