-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Read Support for Presto-MaxCompute Connector #4
base: main
Are you sure you want to change the base?
Conversation
@dingxin-tech thank you for your contribution. Regarding testing: can you describe how to add tests for this plugin? Is it possible to add integration tests that run against a mocked version of MaxCompute, or through some other means? |
@tdcmeehan I have no idea how to add integration tests since MaxCompute currently does not provide external testing accounts. Do you know how other similar databases handle this? Maybe I should implement something like a MaxCompute emulator similar to the BigQuery emulator? |
@dingxin-tech ideally, Alibaba would give us a testing token to just run some minimal workload. If you could create an emulator, that would be awesome. |
Yes, I am currently working on the development of the MaxCompute-Emulator, and I am pleased to report that we have made some significant incremental progress. With the current pace, I anticipate that we will be able to complete support for Presto regression within the next two to three weeks. |
@dingxin-tech I see you've added an emulator, is this ready to review? |
Thank you for your attention. Yes, I believe this PR is ready for review. Although some critical features (such as reading the partition table, writing to the table) are not supported yet (in fact, the emulator also does not support these functionalities at the moment), I believe the current functionalities form a good initial whole for the first commit in this repository. These new features can be added in subsequent PRs to further improve the connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work. Can you please add integration style tests and more unit tests for the various utilities? For example, I think we should have decent unit test coverage over the Arrow to Page conversions. I also think we can add a dependency on presto-tests
, and run AbstractTestIntegrationSmokeTest
using your new query runner. Can you give it a try?
src/main/java/com/facebook/presto/maxcompute/MaxComputeInputSplit.java
Outdated
Show resolved
Hide resolved
src/main/resources/META-INF/services/com.facebook.presto.spi.Plugin
Outdated
Show resolved
Hide resolved
Co-authored-by: Timothy Meehan <tim@timdmeehan.com>
…rated smoke testing Update dependencies to support the new version of the MaxCompute simulator image, and introduce new test classes to verify basic integration behavior. At the same time, Maven configuration is enhanced to better support the Presto plug-in development process.
I have made changes based on your suggestions:
Could you please review it again? @tdcmeehan |
Can you please add a Github action to trigger the tests to run in CI? See here for reference: https://github.com/prestodb/airlift/pull/77/files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our current thinking about adding new connectors to the core repo as opposed to third party repos? Is there anything preventing this from being a separate project?
pom.xml
Outdated
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<version>26.0-jre</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can push this to the current version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
@@ -0,0 +1,222 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this file belongs in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should add it ?
} | ||
return new SimpleStruct((StructTypeInfo) structTypeInfo, values); | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific about the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I realized that this method doesn't actually throw a checked exception, so I've removed the try-catch block. I'm not sure why I implemented it that way :)
src/main/java/com/facebook/presto/maxcompute/utils/ArrowUtils.java
Outdated
Show resolved
Hide resolved
|
||
private CommonUtils() {} | ||
|
||
public static String serialize(Serializable object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is object serialization used at all? It's mostly avoided in java these days as it's been a common source of security issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialize
and deserialize
methods are specifically utilized for serializing and deserializing TableBatchReadSession
objects. This is particularly important for MaxComputeSplit
, where JSON format is employed for message transmission within Presto. Given that TableBatchReadSession
contains numerous variables and has been well-implemented with the Serializable
interface, I believe utilizing these methods is a sound approach.
throw new IOException("Failed after retries", e); | ||
} | ||
try { | ||
LOG.warn("Retrying after exception: " + e.getMessage(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actionable? Will anyone read it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method was actually not being used, so I've removed it.
There's nothing preventing this from being a separate project, however we would like to move more of our modules in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation! I commented with several nits about phrasing, and a couple of suggestions about formatting. If any of my suggestions don't work for you, I'm happy to discuss them if you like and explain my reasoning.
Also, I suggest you add a new page to the Presto documentation for the new connector, by creating a new file in https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/connector and adding that new file to the Connectors index file.
- `analytics.properties` for the analysis project. | ||
With this setup, `sales` and `analytics` directories will be created in Presto respectively. | ||
|
||
### Data Type Mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Data Type Mapping section would be ideal to move into the connector documentation for this new connector, in a new file in https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/connector.
- Upgraded `odps.sdk.version` from 0.48.5-public to 0.49.0-public - Upgrade guava version from 26.0-jre to 33.3.1-jre
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Thank you for your input; all of your suggestions have been incorporated into the changes. I agree that it would be beneficial to add a link to the MaxCompute connector in the Presto documentation repository. However, this should be done in a separate PR, potentially initiated after this current PR has been merged. |
Sure, I've added. :) |
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc update! A few more suggestions, nothing major.
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
View file in GitHub to review it as it will be formatted when merged, everything looks good. Thanks!
This PR introduces the read support for the Presto-MaxCompute Connector, enabling users to perform queries on tables within Alibaba's MaxCompute.
While the current state of the connector does not support write operations, DDL operations, or querying partitioned tables, it is a significant step forward in integrating MaxCompute with Presto.
Key features included in this PR:
The following are yet to be implemented in future PRs: