Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decryption functionality to presto #17479

Closed
wants to merge 2 commits into from

Conversation

shangxinli
Copy link
Collaborator

Co-authored-by: ggershinsky ggershinsky@users.noreply.github.com

Summary:
This is to port parquet-mr decryption funtionality. The main commits in parquet-mr for encryption/decryption is apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.

Test plan - (Please fill in how you tested your changes)

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@shangxinli shangxinli force-pushed the column_encrytion_dev3 branch 5 times, most recently from 7e9d4df to 36cd4db Compare March 17, 2022 22:27
Co-authored-by: ggershinsky <ggershinsky@users.noreply.github.com>

Summary: This is to port parquet-mr decryption funtionality. The main commits in parquet-mr for encryption/decryption is apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.
@shangxinli shangxinli force-pushed the column_encrytion_dev3 branch from 36cd4db to 2b39b13 Compare March 17, 2022 23:47
@shangxinli
Copy link
Collaborator Author

@beinan @zhenxiao @vkorukanti The Parquet Decryption change is ready for review.

@shangxinli
Copy link
Collaborator Author

There are some conflict files. I will fix them for the next commit. Hopefully, it won't block your review.

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good except a couple of minor style issues


// Lambda expression below requires final variable
final ParquetDataSource parquetDataSource = buildHdfsParquetDataSource(inputStream, path, stats);
dataSource = parquetDataSource;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is dataSource never got used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in line 263.

@@ -132,7 +135,8 @@ public ParquetReader(MessageColumnIO
boolean enableVerification,
Predicate parquetPredicate,
List<ColumnIndexStore> blockIndexStores,
boolean columnIndexFilterEnabled)
boolean columnIndexFilterEnabled,
InternalFileDecryptor fileDecryptor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest to use Optional< InternalFileDecryptor > to avoid null values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment on lines +304 to +305
final ParquetDataSource parquetDataSource = buildHdfsParquetDataSource(inputStream, path, fileFormatDataSourceStats);
dataSource = parquetDataSource;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are there both parquetDataSource and dataSource? is there any particular meaning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly because the lampda expression below (line 311). If I use datasource in that line, then there is build error "Variable used in lambda expression should be final or effectively final".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that make sense, thx!

long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
if (firstDataPage >= start && firstDataPage < start + length) {
footerBlocks.add(block);
Integer firstIndex = MetadataReader.findFirstNonHiddenColumnId(block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import findFirstNonHiddenColumnId

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset();
if (firstDataPage >= start && firstDataPage < start + length) {
footerBlocks.add(block);
Integer firstIndex = MetadataReader.findFirstNonHiddenColumnId(block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import findFirstNonHiddenColumnId

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

}
boolean encryptedFooterMode = EMAGIC.equals(magic);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to inline this variable

ByteArrayInputStream tempInputStream = new ByteArrayInputStream(encryptedMetadataBuffer);
byte[] columnMetaDataAAD = AesCipher.createModuleAAD(fileDecryptor.getFileAAD(), ModuleType.ColumnMetaData, rowGroup.ordinal, columnOrdinal, -1);
try {
return Util.readColumnMetaData(tempInputStream, columnDecryptionSetup.getMetaDataDecryptor(), columnMetaDataAAD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import readColumnMetaData

// Decrypt the ColumnMetaData
InternalColumnDecryptionSetup columnDecryptionSetup = fileDecryptor.setColumnCryptoMetadata(columnPath, true, false, columnKeyMetadata, columnOrdinal);
ByteArrayInputStream tempInputStream = new ByteArrayInputStream(encryptedMetadataBuffer);
byte[] columnMetaDataAAD = AesCipher.createModuleAAD(fileDecryptor.getFileAAD(), ModuleType.ColumnMetaData, rowGroup.ordinal, columnOrdinal, -1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the -1 mean here? define a constant?

Comment on lines +297 to +299
byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
from.read(nonce);
byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import

Comment on lines 62 to 63
OffsetIndex offsetIndex,
BlockCipher.Decryptor blockDecryptor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to use Optional for these two

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@shangxinli shangxinli requested a review from a team as a code owner May 3, 2022 16:16
@shangxinli
Copy link
Collaborator Author

Due to the conflict resolution, I have to create a new PR. Please review from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants