Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hidden $partition column to Hive connector #3582

Merged
merged 1 commit into from
Jun 27, 2020

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Apr 29, 2020

Initial support for #5

@cla-bot cla-bot bot added the cla-signed label Apr 29, 2020
@@ -384,9 +385,12 @@ public int getIndex()
}
}
else {
String partitionKeyValues = String.join("/", partitionKeys.stream()
.map(partitionKey -> format("%s=%s", partitionKey.getName(), partitionKey.getValue()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some escaping of / or = here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wouln't MAP be a better type here instead of VARCHAR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think escaping or encoding is needed. Let me update it.

Exactly, MAP type is also a candidate for this column.
@martint Do you have any opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might be also be :-escaping and canonicalization of representation.

but, since we know partition name before hand, why dont we just pass it here, instead of reconsructing?
did you consider this approach?

cc @electrum

Copy link
Member Author

@ebyhr ebyhr May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with partitionName provided by Hive split.
We may still need to add manual escaping in tests.

@ebyhr ebyhr force-pushed the hive-partition-column branch 3 times, most recently from dcb8c53 to 57ce58c Compare May 1, 2020 07:34
@martint martint requested review from findepi and losipiuk June 26, 2020 04:43
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; minor comments.
Please rebase as there are conflicts.

assertEquals(results.getRowCount(), 9);

assertUpdate("DROP TABLE test_partition_hidden_column");
assertFalse(getQueryRunner().tableExists(getSession(), "test_partition_hidden_column"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop this assertion

assertEquals(getPartitions("test_partition_hidden_column").size(), 9);

MaterializedResult results = computeActual(format("SELECT *, \"%s\" FROM test_partition_hidden_column", PARTITION_COLUMN_NAME));
for (int i = 0; i < results.getRowCount(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop index from loop. IMO this looks a bit cleaner:

        for (MaterializedRow row : results.getMaterializedRows()) {
            String actualPartition = (String) row.getField(3);
            String expectedPartition = format("col1=%s/col2=%s", row.getField(1), row.getField(2));
            assertEquals(actualPartition, expectedPartition);
        }

ColumnMetadata columnMetadata = columnMetadatas.get(i);
assertEquals(columnMetadata.getName(), columnNames.get(i));
if (columnMetadata.getName().equals(PARTITION_COLUMN_NAME)) {
// $partition should be hidden column
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop comment

@ebyhr ebyhr force-pushed the hive-partition-column branch from 57ce58c to 5b6eaa9 Compare June 27, 2020 06:49
@ebyhr ebyhr merged commit b3d6df3 into trinodb:master Jun 27, 2020
@ebyhr ebyhr deleted the hive-partition-column branch June 27, 2020 09:06
@ebyhr ebyhr mentioned this pull request Jun 27, 2020
8 tasks
@findepi findepi added this to the 338 milestone Jun 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants