-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PrestoQueryRunner #7628
Add PrestoQueryRunner #7628
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duanmeng Thank you for bringing in PrestoQueryRunner to Velox. Looks good % a question about type parsing.
std::vector<TypePtr> types; | ||
for (const auto& column : response_["columns"]) { | ||
names.push_back(column["name"].asString()); | ||
// TODO:types.push_back(parseTypeSignature(column["type"].asString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe parse the JSON and use logic similar to Type::create to create Type from it?
"typeSignature": {
"rawType": "row",
"typeArguments": [
{
"rawType": "array",
"typeArguments": [
{
"rawType": "map",
"typeArguments": [
{
"rawType": "varchar",
"typeArguments": [],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type parser #7568 from @majetideepak is about to land, I will add parseType(column["type"].asString())
here to get the Velox type once it is merged.
VELOX_CHECK_EQ( | ||
response.status_code, | ||
200, | ||
"Post to {} failed: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post -> POST
VELOX_CHECK_EQ( | ||
response.status_code, | ||
200, | ||
"Get from {} failed: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get -> GET
@mbasmanova Hi Masha, I've resolved the comments and used the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrestoQueryRunner( | ||
std::string coordinatorUri, | ||
std::string user, | ||
const int32_t timeout = 10'000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Drop const.
- Document this parameter.
- Perhaps, add units to the name for clarity: timeoutMillis or use an std::chrono type.
Why 10'000? Is this 10 seconds?
Why does the test need to override this to 20'000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the const, documented this parameter, and used std::chrono type.
Why 10'000? Is this 10 seconds?
I followed the default value used in PrestoDB. I just changed it to 1000ms.
Why does the test need to override this to 20'000?
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Could we add the document file in the fellow-up PR? I found that I need to make code changes to show how to use PrestoQueryRunner
in Velox. I want to add a follow-up PR to make the runner configurable through the gflags
along with the document on how to use PrestoQueryRunner. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add the document file in the fellow-up PR?
Sure.
I want to add a follow-up PR to make the runner configurable through the gflags
Wondering why would we want to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why would we want to do that.
I found that I need to make code changes to show how to use PrestoQueryRunner in Velox, say velox_aggregation_fuzzer_test
, when I am trying to write the document of how to use PrestoQueryRunner, something like the following. With the follow-up, we could control the creation of the query runner instance to switch runner only through program parameters.
diff --git a/velox/exec/tests/AggregationFuzzerTest.cpp b/velox/exec/tests/AggregationFuzzerTest.cpp
--- a/velox/exec/tests/AggregationFuzzerTest.cpp
+++ b/velox/exec/tests/AggregationFuzzerTest.cpp
+#include "velox/exec/tests/utils/PrestoQueryRunner.h"
- auto duckQueryRunner =
+ auto queryRunner =
+ std::make_unique<facebook::velox::exec::test::PrestoQueryRunner>("http://127.0.0.1:8080", "hive");
@@ -343,5 +337,5 @@ int main(int argc, char** argv) {
facebook::velox::exec::test::getCustomInputGenerators();
options.timestampPrecision =
facebook::velox::VectorFuzzer::Options::TimestampPrecision::kMilliSeconds;
- return Runner::run(initialSeed, std::move(duckQueryRunner), options);
+ return Runner::run(initialSeed, std::move(queryRunner), options);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to allow for switching between Duck and Presto query runners in the Fuzzer. Once Presto query runner works, we want to always use that and remove Duck query runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once Presto query runner works, we want to always use that and remove Duck query runner.
Got, it. I will highlight this in the follow-up.
@mbasmanova Hi Masha, where should we add this .md file? |
Perhaps, next to PrestoQueryRunner.h, i.e. PrestoQueryRunner.md. |
velox/exec/tests/CMakeLists.txt
Outdated
@@ -288,3 +288,11 @@ add_test( | |||
COMMAND cpr_http_client_test | |||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) | |||
target_link_libraries(cpr_http_client_test cpr::cpr gtest gtest_main) | |||
|
|||
add_executable(presto_query_runner_test PrestoQueryRunnerTest.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate executable for his test? We used to have a problem with CI ran out of disk space because all these executables are very large. CC: @kgpai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova Removed this executable, and linked the runner test into velox_exec_infra_test
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgpai Krishna, would you help merge this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work @duanmeng
} | ||
}; | ||
|
||
TEST_F(PrestoQueryRunnerTest, DISABLED_basic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add a comment here that this is disabled because it requires a presto co-ordinator running at localhost etc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
/// method. | ||
/// @param resultType Expected type of the results. | ||
/// @return Data received from Presto. | ||
std::multiset<std::vector<velox::variant>> execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe in a future pr we can add in the option to pass in table name, so we can potentially have multiple queries running at same time. This can only run single threaded atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgpai How about generating a random table name when constructing the PrestoQueryRunner
, say, tmp_$randonStringName_$threadId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep things simple and first change the CI to allow running PrestoQueryRunnerTest, then replace Duck runner in the fuzzer.
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…tal jobs. (#8314) Summary: **What ?** Recently support for [PrestoQueryRunner ](#7628 added to AggregationFuzzer, so that it can use Presto as a source of truth. This PR sets up an experimental job that uses the Presto Java container to run Aggregation Fuzzer with Presto as source of truth. **Why ?** Running the aggregation fuzzer against Presto gives us greater confidence on the correctness of our aggregate functions. **How ?** 1. [Previously we created a docker image with support for java and Presto](#7920) 2. This still requires some changes to get Presto to work with AggregationFuzzer. These changes are listed below. 3. Adds support for Hive metastore - we do this via a file based metastore by adding the hive.properties file to $PRESTO_HOME/etc/catalog (see #8111) 4. We also set the pid file on the launcher to ensure there is only one run of the Presto server 5. Another thing to note is that we set the timezone to be ['Americas/Los_Angeles' due to details described here ](#8127). **Testing:** There is no automated way to test this. Here are manual steps I follow: 1. Create presto-java-container by running `docker-compose build presto-java` 6. Ssh into the presto-java-container 7. Copy Velox sources into the presto-java-container and build Velox 8. Launch Presto service: /opt/start-prestojava.sh 9. Run Aggregation fuzzer with --presto_url=http://127.0.0.1:8080 flag Example Run: Here is an example run https://github.com/facebookincubator/velox/actions/runs/7508139395/job/20443035814 Pull Request resolved: #8314 Reviewed By: kagamiori Differential Revision: D52701237 Pulled By: kgpai fbshipit-source-id: e4ba284aed469bc69cc95c90f708a0c92a040205
No description provided.