-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new format short key index #1572
Conversation
In this patch, we create a new format for short key index. In orgin code index is stored in format like RowCusor which is not effecient to compare. Now we encode multiple column into binary, and we assure that this binary is sorted same with the key columns.
// How many rows in this segment | ||
optional uint32 num_segment_rows = 6; | ||
// Total bytes for this segment | ||
optional uint32 segment_bytes = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the usage of segment_id
, and segment_bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between num_items and num_segment_rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a version
field so that we can easily evolve the format in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the usage of
segment_id
, andsegment_bytes
?
segment_id
is used to say its is in this rowset.
segment_bytes
is used to store how many bytes in this segment
I put these fields here because in old version, storage get these information from index. We put these here, maybe put these other place later.
num_items
is the number of index items, num_segment_rows
is total rows in segment.
About version
, we can add one
ASSERT_EQ(val, check_val); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to wrap the following block in a loop, say test 10000 random pairs
ASSERT_EQ(val, check_val); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to wrap the following block in a loop, say test 10000 random pairs
// How many rows in this segment | ||
optional uint32 num_segment_rows = 6; | ||
// Total bytes for this segment | ||
optional uint32 segment_bytes = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between num_items and num_segment_rows?
// equal with or larger than given key. | ||
// NOTE: This function holds that without common prefix key, the one | ||
// who has more length it the bigger one. Two key is the same only | ||
// when their length are equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should review this comment. It is confusing
class ShortKeyIndexBuilder { | ||
public: | ||
ShortKeyIndexBuilder(uint32_t segment_id, | ||
uint32_t num_rows_per_block) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these two arguments for?
Close this PR, because this patch is contained in #1577 |
…#17851)" (apache#1572) move load big lateral view from p1 to p2, this case takes a long time to execute Co-authored-by: Pxl <pxl290@qq.com>
…#17851)" (apache#1572) move load big lateral view from p1 to p2, this case takes a long time to execute Co-authored-by: Pxl <pxl290@qq.com>
In this patch, we create a new format for short key index. In orgin code
index is stored in format like RowCusor which is not effecient to
compare. Now we encode multiple column into binary, and we assure that
this binary is sorted same with the key columns.