Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47746] Implement ordinal-based range encoding in the RocksDBStateEncoder #45905

Closed
wants to merge 4 commits into from

Conversation

neilramaswamy
Copy link
Contributor

What changes were proposed in this pull request?

The RocksDBStateEncoder now implements range projection by reading a list of ordering ordinals, and using that to project certain columns, in big-endian, to the front of the Array[Byte] encoded rows returned by the encoder.

Why are the changes needed?

StateV2 implementations (and other state-related operators) project certain columns to the front of UnsafeRows, and then rely on the RocksDBStateEncoder to range-encode those columns. We can avoid the initial projection by just passing the RocksDBStateEncoder the ordinals to encode at the front. This should avoid any GC or codegen overheads associated with projection.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs. All existing UTs should pass.

Was this patch authored or co-authored using generative AI tooling?

Yes

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits.

private val rangeScanKeyFieldsWithIdx: Seq[(StructField, Int)] = {
keySchema.zipWithIndex.take(numOrderingCols)
private val rangeScanKeyFieldsWithOrdinal: Seq[(StructField, Int)] = {
orderingOrdinals.map(ordinal => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private val remainingKeyFieldsWithIdx: Seq[(StructField, Int)] = {
keySchema.zipWithIndex.drop(numOrderingCols)
private val remainingKeyFieldsWithOrdinal: Seq[(StructField, Int)] = {
0.to(keySchema.length - 1).diff(orderingOrdinals).map(ordinal => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same, { ordinal =>

assert(valueRowToData(store.get(keyRow, cfName)) === 1)
}

// scalastyle:off
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you had to disable style? when you disable style for valid reason, please be specific to disable a single rule, and also make sure to re-enable as long as it doesn't need to be disabled further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo!

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants