-
Notifications
You must be signed in to change notification settings - Fork 0
test_cli.test_params_yaml flaky #166
Comments
Looks like the tests pass before eProsima/Fast-DDS#411 was merged |
Causing test failures (ros2/build_farmer#166)
I was debugging the problem. @sloretz, as your said, it is related to the commit where the default values of reliable communications for discovery endpoints have changed. Right now in the simple scenario of one publisher and one subscriber, the discovery take ~100milliseconds more. But the performance improves according to the number of publisher and subscriber increase. Your test creates a client and immediately send the request. Now this request is sent before the discovery of publisher and subscriber. Sniffing the RTPS messages I saw the publisher is VOLATILE. Then after it sends the request to no one, it removes internally it. When the subscriber is discovered, the publisher doesn't have the request. |
@richiware Thanks for looking into it. Sounds like this is a problem with the tests rather than Fast-RTPS. |
I just got a failure in Was this just a case of flakyness or a consistent issue? It looks like it is in the nightly repeated, so I'm guessing flaky: https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1314/testReport/ |
@wjwwood it's been failing since eProsima/Fast-DDS#411. I tried to root cause it, but couldn't figure it out. It appears to be a race condition in the test itself rather than a fastrtps issue. |
The test seems to fail with FastRTPS only. Repeat-until-pass builds: Repeat-until-fail builds: The failing test is waiting for the client to get a response on the sent request for the parameters while the service side never receives the request. The most likely reason I can see is that the client sends the request before the subscription for the request from the server has been matched (basically sending the request to I thought the following snippet would already make sure that the publisher / subscriber on the request topic have actually been matched before proceeding with sending the request: https://github.com/ros2/rmw_fastrtps/blob/ea4fb7cfaa671a2118a7d54203c786d890d8ed38/rmw_fastrtps_shared_cpp/src/rmw_service_server_is_available.cpp#L107-L114 This logic was added in ros2/rmw_fastrtps/pull/238. I Comparing that logic with the code in the publisher/subscriber listeners which was added in ros2/rmw_fastrtps#234 the logic using a set of endpoints seems more robust in case events are happening multiple times. With the patch in ros2/rmw_fastrtps/pull/262 the test seem to be at least less flaky: Repeat-until-pass builds: Repeat-until-fail builds: |
Failures in nightly_linux-aarch64_debug #734, nightly_linux-aarch64_extra_rmw_release #266, nightly_linux-aarch64_release #694, nightly_linux_debug #1109, nightly_linux_extra_rmw_release #262, nightly_linux_release #1091, nightly_osx_debug #1165, nightly_osx_extra_rmw_release #303, nightly_osx_release #1172, nightly_win_deb #1165, nightly_win_extra_rmw_rel #269, nightly_win_rel #1104, nightly_xenial_linux-aarch64_release #264, nightly_xenial_linux_release #255
Failures in nightly_linux-aarch64_extra_rmw_release #266, nightly_linux-aarch64_release #694, nightly_linux_extra_rmw_release #262, nightly_linux_release #1091, nightly_win_extra_rmw_rel #269, nightly_win_rel #1104, nightly_xenial_linux-aarch64_release #264, nightly_xenial_linux_release #255
@dirk-thomas narrowed it down to something in FastRTPS 1.7.1 @richiprosima FYI
Fast-RTPS f2516be21df726baac4da22c39a6395f06164594
Fast-RTPS b48ce9d2fba6fc94e756da01c58b72f2ad848238
Fast-RTPS 846307b9baaaff49ad5d3aee68b9cb509851e4f3
Fast-RTPS d2ec94e7a665d689b24e8ae71c9b83c9efec45db
Fast-RTPS bc61fa3fab63ced48416db86ebb8df4e0f0cebdf
Fast-RTPS 92f2e37ef682988e5cfc4f320de4712630893009
Fast-RTPS 5827bb94408f3d5d2284d2e77b482ff84738ef3f
Fast-RTPS cb0a6e237eee4985ef8b1ffe6afbebd8e4116d04
The text was updated successfully, but these errors were encountered: