-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decryption functionality to presto #17479
Conversation
7e9d4df
to
36cd4db
Compare
Co-authored-by: ggershinsky <ggershinsky@users.noreply.github.com> Summary: This is to port parquet-mr decryption funtionality. The main commits in parquet-mr for encryption/decryption is apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.
36cd4db
to
2b39b13
Compare
@beinan @zhenxiao @vkorukanti The Parquet Decryption change is ready for review. |
There are some conflict files. I will fix them for the next commit. Hopefully, it won't block your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good except a couple of minor style issues
|
||
// Lambda expression below requires final variable | ||
final ParquetDataSource parquetDataSource = buildHdfsParquetDataSource(inputStream, path, stats); | ||
dataSource = parquetDataSource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is dataSource
never got used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used in line 263.
@@ -132,7 +135,8 @@ public ParquetReader(MessageColumnIO | |||
boolean enableVerification, | |||
Predicate parquetPredicate, | |||
List<ColumnIndexStore> blockIndexStores, | |||
boolean columnIndexFilterEnabled) | |||
boolean columnIndexFilterEnabled, | |||
InternalFileDecryptor fileDecryptor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest to use Optional< InternalFileDecryptor > to avoid null values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
final ParquetDataSource parquetDataSource = buildHdfsParquetDataSource(inputStream, path, fileFormatDataSourceStats); | ||
dataSource = parquetDataSource; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are there both parquetDataSource and dataSource? is there any particular meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mainly because the lampda expression below (line 311). If I use datasource in that line, then there is build error "Variable used in lambda expression should be final or effectively final".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that make sense, thx!
long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset(); | ||
if (firstDataPage >= start && firstDataPage < start + length) { | ||
footerBlocks.add(block); | ||
Integer firstIndex = MetadataReader.findFirstNonHiddenColumnId(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import findFirstNonHiddenColumnId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
long firstDataPage = block.getColumns().get(0).getFirstDataPageOffset(); | ||
if (firstDataPage >= start && firstDataPage < start + length) { | ||
footerBlocks.add(block); | ||
Integer firstIndex = MetadataReader.findFirstNonHiddenColumnId(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import findFirstNonHiddenColumnId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
} | ||
boolean encryptedFooterMode = EMAGIC.equals(magic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to inline this variable
ByteArrayInputStream tempInputStream = new ByteArrayInputStream(encryptedMetadataBuffer); | ||
byte[] columnMetaDataAAD = AesCipher.createModuleAAD(fileDecryptor.getFileAAD(), ModuleType.ColumnMetaData, rowGroup.ordinal, columnOrdinal, -1); | ||
try { | ||
return Util.readColumnMetaData(tempInputStream, columnDecryptionSetup.getMetaDataDecryptor(), columnMetaDataAAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import readColumnMetaData
// Decrypt the ColumnMetaData | ||
InternalColumnDecryptionSetup columnDecryptionSetup = fileDecryptor.setColumnCryptoMetadata(columnPath, true, false, columnKeyMetadata, columnOrdinal); | ||
ByteArrayInputStream tempInputStream = new ByteArrayInputStream(encryptedMetadataBuffer); | ||
byte[] columnMetaDataAAD = AesCipher.createModuleAAD(fileDecryptor.getFileAAD(), ModuleType.ColumnMetaData, rowGroup.ordinal, columnOrdinal, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the -1
mean here? define a constant?
byte[] nonce = new byte[AesCipher.NONCE_LENGTH]; | ||
from.read(nonce); | ||
byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import
OffsetIndex offsetIndex, | ||
BlockCipher.Decryptor blockDecryptor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to use Optional for these two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Due to the conflict resolution, I have to create a new PR. Please review from there. |
Co-authored-by: ggershinsky ggershinsky@users.noreply.github.com
Summary:
This is to port parquet-mr decryption funtionality. The main commits in parquet-mr for encryption/decryption is apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.
Test plan - (Please fill in how you tested your changes)
Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. Don't forget to follow our attribution guidelines for any code copied from other projects.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.
If release note is NOT required, use: