Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_utils: macro for waiting on condition #12586

Merged
merged 2 commits into from
Aug 4, 2023

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Aug 3, 2023

Currently we have cooperative_spin_wait_with_timeout that throws if a condition
isn't met after a given timeout. The way this typically shows up in tests is
the exception is caught by the test fiber, the test's destructor is called, and
the generic timeout exception is logged after everything is cleaned up. This
makes it difficult to track down issues because it's rarely clear what timed
out.

This patch adds a new boost_require_eventually macro to BOOST_FAIL on timeout,
indicating the log line that failed.

Tests that use this will timeout like:

INFO  2023-08-03 12:53:16,023 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(272): fatal error: in "test_delete_from_stm_consume": Timed out at /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc:272
INFO  2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners
INFO  2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections

Instead of logs like below that don't indicate what timed out:

INFO  2023-08-03 12:56:48,354 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1
INFO  2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners
INFO  2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections
...
INFO  2023-08-03 12:56:49,003 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 0 connections
unknown location(0): fatal error: in "test_delete_from_stm_consume": seastar::timed_out_error: timedout
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(270): last checkpoint
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(256): Leaving test case "test_delete_from_stm_consume"; testing time: 4194410us

I don't love that boost permeates the interface, but until we move to
another test framework, it seems reasonable to tie the implementation
and the macro name to boost.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

andrwng added 2 commits August 3, 2023 13:42
Currently we have cooperative_spin_wait_with_timeout that throws if a condition
isn't met after a given timeout. The way this typically shows up in tests is
the exception is caught by the test fiber, the test's destructor is called, and
the generic timeout exception is logged after everything is cleaned up. This
makes it difficult to track down issues because it's rarely clear what timed
out.

This patch adds a new boost_require_eventually macro to BOOST_FAIL on timeout,
indicating the log line that failed.

Tests that use this will timeout like:
```
INFO  2023-08-03 12:53:16,023 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(272): fatal error: in "test_delete_from_stm_consume": Timed out at /home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc:272
INFO  2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners
INFO  2023-08-03 12:53:16,638 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections
```

Instead of logs like below that don't indicate what timed out:
```
INFO  2023-08-03 12:56:48,354 [shard 0] cluster - members_manager.cc:373 - applying finish_reallocations_cmd, offset: 10, node id: 1
INFO  2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:278 - kafka rpc protocol - Stopping 1 listeners
INFO  2023-08-03 12:56:48,950 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 2 connections
...
INFO  2023-08-03 12:56:49,003 [shard 0] kafka - server.cc:287 - kafka rpc protocol - Shutting down 0 connections
unknown location(0): fatal error: in "test_delete_from_stm_consume": seastar::timed_out_error: timedout
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(270): last checkpoint
/home/awong/Repos/redpanda/src/v/cloud_storage/tests/delete_records_e2e_test.cc(256): Leaving test case "test_delete_from_stm_consume"; testing time: 4194410us
```

I don't love that boost permeates the interface, but until we move to
another test framework, it seems reasonable to tie the implementation
and the macro name to boost.
This will make it easier to figure out what is timing out, when it
happens.
@andrwng andrwng force-pushed the test-eventually-macro branch from 56ee58a to de08acd Compare August 3, 2023 20:43
@dotnwat dotnwat merged commit e99925d into redpanda-data:dev Aug 4, 2023
ztlpn added a commit that referenced this pull request Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants