-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Optimize UnsafeRow deserialize for scalar type #10797
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Could you help review? Thanks! @rui-mo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserialize rows by memoryAddress and offset
Could you add more information on where the performance improvement mainly comes from? I notice previously the deserialize methods in UnsafeRowPrimitiveBatchDeserializer create intermediate values and set it to the vector, while the new way writes to the position directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
velox/row/UnsafeRowDeserializers.h
Outdated
/// @param offsets offset of the row deserialized data. It's size should be | ||
/// equal to the deserialized rows. First offset is 0. | ||
/// @param pool the memory pool to allocate Vectors. | ||
static VectorPtr deserialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Is it necessary to keep two APIs for the deserialization instead of replacing the original one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This API only supports scalar types, it is hard to support complex data type, because it is hard to get the offsets
for them. Another exists API supports all the data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the APIs be unified so that only complex types use the row-row deserialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, ok, I will do the refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous API read the rows from several std::string_view
, but the new API requires all the row in one buffer started with memoryAddress
, for this public override API, it supplies ByteInputStream
which consists several buffers, so we cannot get all the rows in one buffer, so I keep the two APIs but the new API supports the complex data type by revoking the old API.
https://github.com/facebookincubator/velox/blob/main/velox/serializers/UnsafeRowSerializer.cpp#L135
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users could use any of them as they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here requires the old API, because the serialize format is
| row size | serialized row | row size | serialized row
https://github.com/facebookincubator/velox/blob/main/velox/serializers/UnsafeRowSerializer.cpp#L145
This optimization requires the serialized data is in contiguous memory.
So we should keep the 2 APIs.
Added to the PR description. |
Anymore comments? @rui-mo |
velox/row/UnsafeRowDeserializers.cpp
Outdated
|
||
int32_t getTotalStringSize( | ||
int32_t columnIdx, | ||
int32_t numRows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems numRows is 'vector_size_t' type in the 'deserialize' method. Perhaps also fix this in the createFlatVectorFast functions.
velox/row/UnsafeRowDeserializers.cpp
Outdated
for (auto pos = 0; pos < numRows; pos++) { | ||
if (!isNullAt(memoryAddress + offsets[pos], columnIdx)) { | ||
const uint8_t* srcptr = (memoryAddress + offsets[pos] + fieldOffset); | ||
uint8_t* destptr = rawValues + (pos << shift); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use camel case for variable names. srcptr-> srcPtr, wordoffset -> wordOffset, destptr -> destPtr, ...
velox/row/UnsafeRowDeserializers.cpp
Outdated
int32_t wordoffset = static_cast<int32_t>(offsetAndSize >> 32); | ||
uint8_t bytesValue[length]; | ||
memcpy(bytesValue, memoryAddress + offsets[pos] + wordoffset, length); | ||
uint8_t bytesValue2[16]{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any better naming for 'bytesValue2'?
velox/row/UnsafeRowDeserializers.h
Outdated
/// @param offsets offset of the row deserialized data. It's size should be | ||
/// equal to the deserialized rows. First offset is 0. | ||
/// @param pool the memory pool to allocate Vectors. | ||
static VectorPtr deserialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the APIs be unified so that only complex types use the row-row deserialization?
Could you help review again? Thanks! @rui-mo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added some minors.
velox/row/UnsafeRowDeserializers.h
Outdated
/// Deserialize fast when all the column types are primitive type, otherwise, | ||
/// deserialize rows one by one, as previously. Fast deserialization | ||
/// computes the start memoryAddress of one data by the column index and row | ||
/// number, and set all the data of one column at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment suggestion:
/// Deserializes rows that are stored in contiguous memory.
/// If all column types are primitive, fast deserialization is used.
/// Otherwise, rows are deserialized one by one as before.
/// Fast deserialization calculates the starting memory address of data
/// based on the column index and row number, setting all data for one column at once.
velox/row/UnsafeRowDeserializers.h
Outdated
/// deserialize rows one by one, as previously. Fast deserialization | ||
/// computes the start memoryAddress of one data by the column index and row | ||
/// number, and set all the data of one column at once. | ||
/// @param memoryAddress the start memory address of the serialized rows . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra white-space after 'rows'.
velox/row/UnsafeRowDeserializers.h
Outdated
/// @param type the element type. | ||
/// @param offsets offset of each row serialized data. It's size should be | ||
/// equal to the deserialized rows. First offset is 0. | ||
/// @param pool the memory pool to allocate Vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory pool used to allocate vector.
velox/row/UnsafeRowDeserializers.cpp
Outdated
memory::MemoryPool* pool) { | ||
if (fastSupported(type)) { | ||
return deserializeFast(memoryAddress, type, offsets, pool); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for 'else' after return.
columnIdx, numRows, fieldOffset, offsets, memoryAddress); | ||
char* rawBuffer = column->getRawStringBufferWithSpace(totalSize, true); | ||
for (auto row = 0; row < numRows; row++) { | ||
if (!isNullAt(memoryAddress + offsets[row], columnIdx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 'memoryAddress + offsets[row]' is used several times, any chance to make it a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not. It uses 3 variables and used at hot spot branch, make it to a function does not make code clear.
velox/row/UnsafeRowDeserializers.cpp
Outdated
*(int64_t*)(memoryAddress + offsets[row] + fieldOffset); | ||
const int32_t length = static_cast<int32_t>(offsetAndSize); | ||
const int32_t wordOffset = static_cast<int32_t>(offsetAndSize >> 32); | ||
auto valueSrcPtr = memoryAddress + offsets[row] + wordOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use auto* to indicate a pointer.
velox/row/UnsafeRowDeserializers.h
Outdated
* @param pool the memory pool to allocate Vectors from | ||
*data to a array. | ||
* @return a VectorPtr | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not your change, but it would be better to not use the old style /* */ for comments.
velox/row/UnsafeRowDeserializers.cpp
Outdated
const vector_size_t numRows = offsets.size(); | ||
for (auto i = 0; i < numRows; i++) { | ||
const auto length = | ||
(i == numRows - 1 ? offsets[i] : offsets[i + 1] - offsets[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (i == numRows - 1), the length is 'offsets[i]'. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it, the offsets size should be numRows + 1, then we can get the correct last row length.
velox/row/UnsafeRowDeserializers.cpp
Outdated
namespace facebook::velox::row { | ||
namespace { | ||
|
||
inline int64_t getFieldOffset(int64_t nullBitsetWidthInBytes, int32_t index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add some comments for this method?
velox/row/UnsafeRowDeserializers.cpp
Outdated
} | ||
|
||
inline bool isNullAt(const uint8_t* memoryAddress, int32_t index) { | ||
return bits::isBitSet(memoryAddress, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function just calls another function with the original parameters. Wondering if we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need it. It is more clear to indicate it is the null condition.
velox_row_fast | ||
velox_vector_fuzzer | ||
Folly::folly | ||
${FOLLY_BENCHMARK}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folly::follybenchmark
Deserialize rows by memoryAddress and offset, and add the tests and benchmark, the performance improves 3 times.
Current deserialization
StructBatchIterator
deserializes rows one by one, get the data of the columns of one row, only set one value each time, but current implementation can get the data of one column by memory address of the rows and offset of each column, so we can set all rows of one column at once.