-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LAST_INSERT_IDS function #1469
Implement LAST_INSERT_IDS function #1469
Conversation
72bf3b9
to
5fd1a67
Compare
7b83325
to
047f48a
Compare
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I haven't gotten very far with the review yet, but I ran a few experiments and hit an assertion with the following backtrace:
To reproduce:
Test script
Run the test script concurrently against itself, i.e. two connections each inserting 64K rows and selecting last_insert_ids() after. |
Reproduced with myqslslap, two threads, inserting 128 rows at a time |
It seems there is no bug, but the assert is written incorrectly. I was expecting the left side of |
047f48a
to
b1d32c3
Compare
@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing. |
Adjusted the assert, removed unused iterator fields, rebased |
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Can you add a test that tests the concurrency of the code and validates the list of IDs matches the values generated by the query? You can probably do this via mysqlslap where each client does the following:
Run 4 threads of each transaction, or maybe dedicate some threads to committing transactions rows and others to rolling them back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Just some requests for extra test and additional assertions.
if (delta == 1) { | ||
++current_run_of_ones_len; | ||
} else { | ||
if (current_run_of_ones_len > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably make this } else if (current_run_of_ones_len > 0) {
and not need the additional level of indentation.
Alternatively, to dedup the data.reserve()
and write(delta)
, you could have this conditional only process the run_of_ones data. It might be more efficient to call data.reserve() with larger counts. inserts tend to be concurrent and if the increments that are not delta = 1
ends up calling reserve significantly. Maybe initialize reserve 128B and then let the system reserve as needed? This simplifies the code to:
} else {
if (current_run_of_ones_len > 0) {
write(current_run_of_ones_len, true);
current_run_of_ones_len = 0;
}
write(delta, false);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worried that unconditional reserve(128)
will be wasteful in too many cases, where we only have a single value inserted and when the deltas are only equal to 1, and we don't touch the vector at all. Maybe we can put reserve(128)
into this else branch. The first time it runs, it will do that, and will be a no-op afterwards.
is_run_of_ones_length || | ||
number == 0U); | ||
if (!continues) | ||
is_run_of_ones_length = next_lowest_byte & RUN_OF_ONES_MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert(is_runs_of_ones_length == !!(next_lowest_byte & RUN_OF_ONES_MASK))
in the case of continues
is true? The write path sets the ones_mask for all bytes, or for none of the bytes.
} | ||
|
||
void advance() noexcept { | ||
if (stream_itr == container.data.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that the last bytes stored in the container is never a run-of-ones, so that if we are at the end of the container, then remaining_run_of_ones_len should be 0?
Also, maybe assert (remaining_runs_of_ones_len == 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also added an assert that it's not the end at the if (is_run_of_ones_length)
below
number |= (next_lowest_byte << next_byte_pos); | ||
next_byte_pos += DATA_PER_BYTE_LEN; | ||
} while (continues); | ||
assert(number > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also assert that if this is run_of_ones, then there should be more data in the container because the end of the container should never be a run_of_ones.
b1d32c3
to
cc617cc
Compare
@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing. |
@hermanlee , addressed all review comments and rebased |
It just occurred to me that the 2nd and further bytes in each multibyte-encoded number can use 7 instead of 6 bits for data for better compression, but I'm not sure it's worth it |
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing an assert on ubsan builds when running the stress test:
----------SERVER LOG START-----------
/data/sandcastle/boxes/trunk-git-mysql/sql/compressed_uint64_fifo.h:94:37: runtime error: shift exponent 36 is too large for 32-bit type 'int'
#0 0x89bd175 in compressed_uint64_fifo::const_iterator::read() /data/sandcastle/boxes/trunk-git-mysql/sql/compressed_uint64_fifo.h:94:37
#1 0x89bd020 in compressed_uint64_fifo::const_iterator::advance() /data/sandcastle/boxes/trunk-git-mysql/sql/compressed_uint64_fifo.h:120:52
#2 0x89bc9fd in compressed_uint64_fifo::const_iterator::operator++() /data/sandcastle/boxes/trunk-git-mysql/sql/compressed_uint64_fifo.h:153:7
#3 0x89bc8fb in Table_function_last_insert_ids::fill_result_table() /data/sandcastle/boxes/trunk-git-mysql/sql/table_function.cc:801:33
#4 0x837e70f in MaterializedTableFunctionIterator::Init() /data/sandcastle/boxes/trunk-git-mysql/sql/iterators/composite_iterators.cc:1979:25
#5 0x8936090 in Query_expression::ExecuteIteratorQuery(THD*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_union.cc:1763:26
#6 0x8936577 in Query_expression::execute(THD*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_union.cc:1825:10
#7 0x8861e8d in Sql_cmd_dml::execute_inner(THD*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_select.cc:868:15
#8 0x8860e3b in Sql_cmd_dml::execute(THD*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_select.cc:616:9
#9 0x87aaa4d in mysql_execute_command(THD*, bool, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_parse.cc:5498:29
#10 0x87a664d in dispatch_sql_command(THD*, Parser_state*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_parse.cc:6285:21
#11 0x87a2b79 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_parse.cc:2566:7
#12 0x87a49d0 in do_command(THD*) /data/sandcastle/boxes/trunk-git-mysql/sql/sql_parse.cc:1746:18
#13 0x8a53cc3 in handle_connection(void*) (.__uniq.199618011249897873951969475570011787283) /data/sandcastle/boxes/trunk-git-mysql/sql/conn_handler/connection_handler_per_thread.cc:307:13
#14 0xa38dff3 in pfs_spawn_thread(void*) (.__uniq.62711780575692142442886335758301887438) /data/sandcastle/boxes/trunk-git-mysql/storage/perfschema/pfs.cc:3022:3
#15 0x7f3c85a9abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434:8
#16 0x7f3c85b2cf5b in __GI___clone3 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /data/sandcastle/boxes/trunk-git-mysql/sql/compressed_uint64_fifo.h:94:37 in
safe_process[449604]: Child process: 449609, exit: 1
|
||
--let $count=200 | ||
--let $table=clients_finished | ||
--let $wait_timeout=60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a larger timeout for asan debug builds. Can you set this to 600?
Fixing, |
This is a multiple-value counterpart of the existing LAST_INSERT_ID function. Its results are returned as a table. - Hardcode the function name LAST_INSERT_IDS into the lexer and the parser due to how MySQL supports table-returning functions, patch the parser for the function. - To keep the autoincrement value set efficiently, implement a data structure that 1) uses delta encoding for consecutive values; 2) special-cases encoding of consecutive deltas equal to one into a single number. - Maintain the last_insert_ids set in THD for the previous and the current statement, similar to existing first_successful_insert_id_in_prev_stmt and first_successful_insert_id_in_cur_stmt fields. - Add main.last_insert_ids and main.last_insert_ids_stress tests. - Since MyISAM is an engine that supports composite primary keys with autoincrement suffixes, this scenario is supported as well by allowing the values to be actually non-monotonic by delta encoding writing large deltas that make the value wrap around. For this case add a main.last_insert_ids_myisam test.
cc617cc
to
9c1b380
Compare
Addressed both review comments, ready for review again (no rebase this time because #1488 hasn't landed yet) |
@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing. |
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I think there might be one more location that needs to be updated:
|
...
The line numbers seem to be off - is this a run from the latest push? |
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Sorry, I think I accidentally reverted the change during the import process. Let me resolve it again. |
mysql> CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY); mysql> set sql_mode='NO_AUTO_VALUE_ON_ZERO'; mysql> mysql> mysql> SELECT LAST_INSERT_ID(); These two behaviors are different.. do we need to align or? |
I am not sure. autoincrement value of zero is forbidden, and here you are just inserting zero as a regular value. |
"SELECT LAST_INSERT_ID();" return 1 value while "SELECT * FROM LAST_INSERT_IDS() AS ids;" return 0 value. maybe both should return 1 value or 0 value? |
|
There's a lot of undefined behavior in LAST_INSERT_ID(). I don't think LAST_INSERT_IDS() returning 0 is valid though, because 0 is never an ID assigned by auto-increment. I think we can only rely on results from LAST_INSERT_ID(), and subsequently LAST_INSERT_IDS(), where insert successfully inserted at least one row. |
This pull request has been merged in 00c726c. |
This is a multiple-value counterpart of the existing LAST_INSERT_ID function.
Its results are returned as a table.
to how MySQL supports table-returning functions, patch the parser for the
function.
that 1) uses delta encoding for consecutive values; 2) special-cases encoding
of consecutive delta equal to one values into a single number.
statement, similar to existing first_successful_insert_id_in_prev_stmt and
first_successful_insert_id_in_cur_stmt fields.
autoincrement suffixes, this scenario is supported as well by allowing the
values to be actually non-monotonic by delta encoding writing large deltas
that make the value wrap around. For this case add a
main.last_insert_ids_myisam test.