Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Hive S3 datalake over proxy #11310

Conversation

aczajkowski
Copy link
Member

@aczajkowski aczajkowski commented Mar 3, 2022

Hive data lake over proxy test

Description

Test Hive S3 datalake over proxy

Related issues, pull requests, and links

This PR was introduced to test #11255 And is adding integration test on top of this PR.

Documentation

(X) No documentation is needed.
(V) Sufficient documentation is included in this PR.
(X) Documentation PR is available with #prnumber.
(X) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(V) No release notes entries required.
(X) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Mar 3, 2022
@aczajkowski aczajkowski requested a review from losipiuk March 3, 2022 16:18
@aczajkowski aczajkowski force-pushed the acz/hive_data_lake_over_proxy_test branch from 16f902e to 6af27ce Compare March 3, 2022 20:13
Comment on lines +27 to +24
public class TestHive2OnDataLakeOverProxy
extends BaseTestHiveOnDataLake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many, and which tests do we run with S3 HTTP proxy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the first one in trino repository. And in this particular case all underlining tests in class will be using S3 (MinIO) over proxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but the tests actually invoked look like a random selection. For example flush procedure tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true. Cause behind my decision is that. I needed:

  • Hive Data Lake set up
  • test's utilising complex operations on S3 (MinIO).
    BaseTestHiveOnDataLake test seems to be perfect fit aside cache flush (which are pretty fast so shouldn't be a big deal).
    We could ofcourse sub-select some tests and either replicate them or make separate base class.
    But I'm not sure if it's worth. In current setup it's a natural place to add new use cases for hive data lake. And we will have a proof that all of those use cases would work over proxy as well (which i agree might be an overkill 🤷‍♂️ )

@aczajkowski aczajkowski force-pushed the acz/hive_data_lake_over_proxy_test branch 2 times, most recently from 110c38c to 1cdec88 Compare March 22, 2022 12:04
@aczajkowski aczajkowski requested a review from findepi March 22, 2022 12:05
@aczajkowski aczajkowski force-pushed the acz/hive_data_lake_over_proxy_test branch from 1cdec88 to 029d9e6 Compare March 22, 2022 18:01
@aczajkowski
Copy link
Member Author

Used to prof and/or assure #11255 works. Seems it's more of testing AWS S3 Cli than Trino. Closing

@aczajkowski aczajkowski closed this Apr 4, 2022
@aczajkowski aczajkowski deleted the acz/hive_data_lake_over_proxy_test branch September 27, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants