-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pause timer calculation. #130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use more explanation. Even now, I don't understand what's wrong with the old code and how this change fixes it.
Previous code: I added this to the description. |
Extending the deadline from producer_plugin is not correct. It's reasonable if the deadline is from --max-transaction-time, but not if it's because we need to ship out a block on time. |
That is why it checks |
|
It does not, it is the deadline calculated at the start of the transaction. We want to honor that deadline regardless how long we are paused. We want to make sure we ship a block out on time. The |
I agree that respecting |
Often the producer plugin deadline is less restrictive because it is wall clock and not block cpu, but you are correct that it could be more restrictive, especially if cpu effort is set low. I was trying to avoid passing into push_transaction an indication if this was a block deadline. I do need to honor that however. I'll put in some more work on this. Thanks for your thorough review. |
Summary of the problem as I understand it:
|
Nicely stated. I'll look at this some more, but maybe the correct thing would be to provide push_transaction with the max-transaction-time and a block deadline and then let transaction_context do all this calculation. This would simplify the producer_plugin and move all this into transaction_context. |
…or the wall clock block deadline. Pass both block_deadline and max_transaction_time to transaction_context so it can honor both of these correctly.
…line as calling before is a no-op.
plugins/producer_plugin/include/eosio/producer_plugin/subjective_billing.hpp
Outdated
Show resolved
Hide resolved
BOOST_CHECK( cpu_limit <= 150'000 ); | ||
|
||
// Verify that the restriction on the transaction of 24'999 is honored even though there is wall clock time to | ||
// load the wasm. If this test fails it is possible that the wasm loaded faster or slower than expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need two tests
- The 24999 limit is extended by a pause (this check)
- The block deadline is not extended by a pause (I don't see this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and one more:
- The cpu limit is is originally before the block deadline, but is extended past it by a pause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests.
unittests/api_tests.cpp
Outdated
// If this check fails but duration is >= 25'000 (previous check did not fail), then the check here is likely | ||
// because it took longer than 2.5 ms for checktime to trigger, trace to be created, and to get to the now() call. | ||
// Timer is paused during WASM load, so checktime will not trigger until after WASM is loaded. | ||
BOOST_CHECK( dur < 150'000 ); // should not run to max_transaction_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too long and does not match the comment of 2.5 ms. For this test, I would expect to see:
- The base timeout (25 ms) is an upper bound on the wasm load time
- The expected duration is an upper bound on timeout + checktime overhead
- The expected duration is a lower bound on timeout + WASM compile time (not satisfied)
A high value here means that this test will always pass even if the block deadline is extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If random noise makes this test too flaky, then we might have to give up on testing this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this test into two. One which gives time for the WASM to load but is limited to block deadline. This test I tweaked the values and comments to more closely reflect this particular test. Turns out on my machine I was seeing WASM load times of 35ms.
unittests/api_tests.cpp
Outdated
@@ -1000,8 +1000,8 @@ BOOST_AUTO_TEST_CASE(checktime_pause_max_trx_cpu_extended_test) { try { | |||
// This assumes that loading the WASM takes at least 1.5 ms | |||
// If this check fails but duration is >= 24'999 (previous check did not fail), then the check here is likely | |||
// because WASM took less than 1.5 ms to load. | |||
BOOST_CHECK( dur > 26'500 ); | |||
BOOST_CHECK( dur < 150'000 ); // Should not run to block_cpu_usage deadline | |||
BOOST_CHECK_MESSAGE( dur > 26'500, "elapsed " << std::to_string(dur) << "us" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOOST_TEST will print dur
automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api_tests.cpp(1102): error: in "api_tests/checktime_pause_block_deadline_not_extended_test": check dur < 30'000 has failed
It wasn't for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd. That's what I'd expect from BOOST_CHECK
. BOOST_TEST
is supposed to decompose the expression and has been the recommended macro to use for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you meant the to_string
is not needed. Removed the to_string
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mandel/unittests/api_tests.cpp(1003): error: in "api_tests/checktime_pause_max_trx_cpu_extended_test": check dur < 26'500 has failed [28130 >= 26500]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOOST_TEST( dur < 26'500 );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you get it to print that? Some special command line arg to unit_test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you said BOOST_TEST
. Ah, need to switch to using that in the future. Thanks.
Fix for the long-standing issue with not correctly accounting for WASM load times. We pause the billing timer when loading WASMs. However, we were not using the correct deadline after restarting the paused timer which sometimes would cause the transaction to fail even though it had not actually run its allotted time.
Previous code:
deadline
was the user provided deadline which was either the transaction deadline calculated frommax-transaction-time
or the wall clock block deadline passed in from theproducer_plugin
. When setting the timer again, after pausing, it would restrict the run time to the deadline even if the deadline was the original transaction deadline as opposed to the block deadline. This would restrict the amount of time the transaction was allowed to run on "first call" (when WASM loaded). The transaction could then fail because it ran "too long" even though it had not actually executed formax-transaction-time
.Now:
Node configurable wall clock block deadline (configurable via
cpu-effort-percent
) and node configurablemax-transaction-time
are now provided to the transaction processor (transaction_context
) so when pausing for WASM load times, the wall clock block deadline can be honored separately from themax-transaction-time
.