-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support service 2/2 --- rosbag2 service play #1481
Support service 2/2 --- rosbag2 service play #1481
Conversation
rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp
Outdated
Show resolved
Hide resolved
rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp
Outdated
Show resolved
Hide resolved
rosbag2_test_common/include/rosbag2_test_common/client_manager.hpp
Outdated
Show resolved
Hide resolved
void PlayerImpl::PlayerClient::async_send_request(const rclcpp::SerializedMessage & message) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments from MichaelOrlov
I have a concern about PlayerClient::async_send_request. We can’t really make it sending request asynchronously at least with the current design. Because we need to wait for future returning from client_->async_send_request(..). Since we can't delete ros_message until the future complete. Otherwise, it will be a dangling pointer to the request_addr and memory could be reused and we have a chance to use corrupted memory during the client_->async_send_request(..). Also according to the notes in the client_→async_send_request(..) API need to call client_→remove_pending_request(future) if we waited for future and timeout happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will consider how to resolve this problem based on your suggestion.
The comments are from MichaelOrlov (#1414 (review)) I have made a first pass of the review and here is my major notes and proposals:
|
Your suggestion is to create a worker thread for management. I'm considering whether it's possible to avoid using a thread and, instead, check if the current request future queue has reached its limit before each async_send_request call. During this check, completed futures and timed-out futures would be removed. If, at that point, the queue hasn't reached its limit, the request can be sent and request future is added to the queue; otherwise, an error is returned, and the request won't be sent. Besides, there are 2 parameters |
cfea3e0
to
1c73a3f
Compare
Handle review comments for this PR is from commit 1c73a3f. |
1c73a3f
to
54f82cb
Compare
Due to my mistake, the commit d6ecb16 includes corrections for review comments. |
54f82cb
to
6ee96bc
Compare
I rebased code. |
@Barry-Xu-2018 can you rebase the code, i will start the review. |
Yeah. I start to do rebase. |
b707ee4
to
8351852
Compare
Now generic client ros2/rclcpp#2358 isn't merged. So build still failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
50% review, I need more time to complete the 1st review.
rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp
Outdated
Show resolved
Hide resolved
rosbag2_transport/include/rosbag2_transport/player_service_client.hpp
Outdated
Show resolved
Hide resolved
std::chrono::seconds request_future_timeout_; | ||
size_t maximum_request_future_queue_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe those are really user dependent configuration, that would be really better to configure when we play the service bag as clients? what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
furthermore, if the rosbag2 plays back with really low rate, or server is working with use_sim_time
enabled, eventually server would take really long time to respond to the rosbag2 service client for this. that means, user cannot receive the service response from the server. i think this needs to be well explained at least when we use ros2 bag play services
. at least, these options should be configurable by user, and they understand that this timeout is based on monotonic clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe those are really user dependent configuration, that would be really better to configure when we play the service bag as clients? what do you think?
Totally agree.
Maybe request_future_timeout should be configured by user. We also don't know regarding the required duration of a service process. Currently, it is set for 5 minutes, which should suffice for some cases. However, for cases with special requirements, we should also provide a configurable method.
At the beginning, Michael suggested this way #1481 (comment). I used another way and wanted to hear opinions. So I neglected to consider allowing users to set the relevant parameters.
Let's hear Michael's opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, having async behavior looks good to me as implemented now with max queue depth and timeout reap the pending request if that takes long time. if we have thread to do sync way, we might have thousands threads running to wait the response from service server, we have no idea how much it takes to complete the service.
i will defer this to @MichaelOrlov , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The maximum queue depth and timeout duration for these two configurations should be customizable by the user. Currently, this interface has not been reflected in this PR (Which way is used to configure, command parameter or environment variable ? This need to further discuss), but it will be added in subsequent updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides all comments in the source code, there is one thing i would like to discuss on user experience.
current implementation does, generic clients are created on service type and send the request based on the service introspection event messages. that is good to send the request to the service server from the bag file.
but it does not show any result from the server, internally it caches the feature and results are ready until timeouts, never printed to the user.
i think current behavior is okay that user can do play the requests from the bag file. and results can be checked on the server side.
what do you think?
rosbag2_transport/src/rosbag2_transport/player_service_client.cpp
Outdated
Show resolved
Hide resolved
rosbag2_transport/src/rosbag2_transport/player_service_client.cpp
Outdated
Show resolved
Hide resolved
rosbag2_transport/src/rosbag2_transport/player_service_client.cpp
Outdated
Show resolved
Hide resolved
rosbag2_transport/src/rosbag2_transport/player_service_client.cpp
Outdated
Show resolved
Hide resolved
} | ||
|
||
if (request_futures_list_.size() == maximum_request_future_queue_) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, print warning to let the user know to tell queue size is not enough, so that user can configure this once this option is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was called at PlayerServiceClientManager::register_request_future(). If the queue is full, it leads to failure to register request future. So I will output a warning at that function.
rosbag2_transport/src/rosbag2_transport/player_service_client.cpp
Outdated
Show resolved
Hide resolved
type == service_msgs::msg::ServiceEventInfo::REQUEST_RECEIVED) && | ||
client_side_type_ == introspection_type::METADATA) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would be better for user to know that the service data that user trying to replay does not have any contents
so that it is unable to replay the data via this rosbag2 database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple clients to the service name. So current logic is incorrect.
I need to reconsider the logic of this part.
if (message.size() <= rosbag2_cpp::get_serialization_size_for_service_metadata_event()) { | ||
client_side_type_ = introspection_type::METADATA; | ||
RCUTILS_LOG_WARN_ONCE_NAMED( | ||
ROSBAG2_TRANSPORT_PACKAGE_NAME, | ||
"The configuration of introspection for '%s' client is metadata !", | ||
service_name_.c_str()); | ||
return false; | ||
} else { | ||
client_side_type_ = introspection_type::CONTENTS; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would move these implementation into rosbag2_cpp
service_utils.hpp/cpp, like get_service_event_type()
which returns service introspection event type based on the content length?
type == service_msgs::msg::ServiceEventInfo::REQUEST_SENT) | ||
{ | ||
if (message.size() <= rosbag2_cpp::get_serialization_size_for_service_metadata_event()) { | ||
client_side_type_ = introspection_type::METADATA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt this bad assumption? this rosbag2 database could have multiple client's data in the same service name. i do not think we can cache the introspection type in generic client here unless we manage that for each client ID.
the assumption we can make here,
- we are not sure if server or clients enabled
contents
ormetadata
. some did, some did not. - there are multiple clients to the service name, this is likely.
- we do not want to duplicated request from the same transaction with
REQUEST_SENT
andREQUEST_RECEIVED
- introspection flag can be changed dynamically, and user application would do so on server and clients.
- service introspection event does not have unique GID during transaction rmw#357 does not allow us to tell the identical transaction.
how about the following for now with some note?
always try to send request based on service server data with CONTENTS
is enabled. expectation is that we can guide the user to enable service introspection on service server side with this feature, and that is gonna be much easier to take advantage of this feature instead of enabling all clients.
when ros2/rmw#357 solves, we can control more flexibly with client ID if the request is played or not as much as possible with server and client data, besides we can also avoid request duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt this bad assumption? this rosbag2 database could have multiple client's data in the same service name.
Yes. I need to reconsider the logic of the code.
always try to send request based on service server data with CONTENTS is enabled. expectation is that we can guide the user to enable service introspection on service server side with this feature, and that is gonna be much easier to take advantage of this feature instead of enabling all clients.
If introspection is set on the service side, send the received request sent from the service side. This can simplify the code logic.
when ros2/rmw#357 solves, we can control more flexibly with client ID if the request is played or not as much as possible with server and client data, besides we can also avoid request duplication.
Yes. I will reconsider this part of the logic based on the client ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After considering, we cannot use "always try to send request based on service server data with CONTENTS is enabled. ".
e.g.
In recorded data
- request_sent (for client 1)
- request_received (for client 1)
- ...
While dealing with 1, code doesn't know whether service enable introspection. But code must decide whether this request is sent this time. At this case, code have to decide to use request data from request_sent to send. Code have to do judgement for each client. So I change the logic of code 05c1e95.
I didn't have time to review this pr (though I had a look at dependency pr ros2/rclcpp#2358), but I have to wonder at the value of playing back services and that being part of rosbag itself. I personally don't see why you'd want this as conceptually it brings up the problem that @fujitatomoya is pointing out that there's no way to "playback" what happens with the responses rosbag will receive after making requests. And I don't see a general way of handling this other than perhaps printing the responses to the console and I think that has limited value. It really only makes sense when the responses are empty or informational only and don't drive other behavior in your system. Instead, if you actually wanted to emulate playback of service requests and do something with the responses then I'd imagine it would be better to write a script that is specific to your system, so it can subscribe to the replayed service introspection topic and makes the appropriate requests and do something meaningful for the responses. I'll acknowledge that just because I can't imagine a good use for this generic (generic in the common sense not specifically talking about type-agnostic in the On the other hand, having the record feature and having the info in the bag file for analysis or custom playback scripts seems really useful, and I have no issues with that one. |
originally we think that this is still useful to debug the service servers based on the recorded data for development. that is the major use case that we have. @Barry-Xu-2018 any thoughts?
👍 |
One case I can think of is a sequence of requests causing issues in the system. Currently, the recorded requests can accurately simulate the scenario in which the problem occurred. (Don't care response from service.) Besides, in the past, there have been community members who have raised the need for recording and playing back services. #773 |
I guess I'm just wondering how often the responses don't matter... Seems like a topic might be better in that case.
Right, but I wonder if they thought through the implications of playing back services. It doesn't really make sense. That's why I was suggesting that rather than have this as a feature of rosbag it should be done using user written scripts that understand the responses. I suppose that having this part built into rosbag that's only useful when the responses don't matter is fine, but it's useful in fewer situations than people realized I think. |
IMO, playback service request still can be useful even without checking the response in
thoughts / opinion? |
I suppose playback of state change requests with printing of the results (so you can at least see when a state change fails) is a good use case for this kind of feature. I certainly don't want to stand in the way of the feature, I just wanted to see what concrete use cases we have for it (the state change playback is a good example I think).
|
For what it's worth, my team and I are limiting ourselves to using topics
exclusively BECAUSE services aren't logged. We have several use cases for
services but instead we are implementing a crude workaround using topics so
we can be sure our system will behave the same in replay as it did during
the test.
…On Wed, Jan 10, 2024, 5:48 PM William Woodall ***@***.***> wrote:
I suppose playback of state change requests with printing of the results
(so you can at least see when a state change fails) is a good use case for
this kind of feature. I certainly don't want to stand in the way of the
feature, I just wanted to see what concrete use cases we have for it (the
state change playback is a good example I think).
ros2 service echo can work, but it would be nice if rosbag could
optionally print them I think.
—
Reply to this email directly, view it on GitHub
<#1481 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNJ7PYM3RGHT24TYCMYAQ3YN4LDJAVCNFSM6AAAAAA56MV2VKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBVHA3DOOJQHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
About test_burst.cpp, I have submitted a commit 87526a7. |
Signed-off-by: Barry Xu <barry.xu@sony.com>
|
@fujitatomoya Could you please try to reproduce the failure with burst on your local setup after the @Barry-Xu-2018 latest fixes? I was not able to reproduce it with CycloneDDS nor with FastRTPS DDS implementations even with the very heavy load on my machine. e.g. |
hmmm still flaky, i can reproduce https://ci.ros2.org/job/ci_linux/20847/testReport/ only with connextdds. (never happened with cyclone nor fast-dds) @Barry-Xu-2018 could you try |
I have a suspicious that it could be |
Okay. I try it. But my local environment doesn't set up connextdds. This will take some time. |
@MichaelOrlov i guess so too. it seems that 2nd service request from rosbag2 player burst, cannot be received on palyer side. looks like always missing 1 response... i guess we do not want to have the flaky test in here, so maybe skip connextdds case in this specific test, and leave |
|
Since RTI has very restrictive and company/developer non-friendly licensing, I would refrain installing it on my machine. I would suggest - to disabale service support for RTI DDS or just this particular test - and create a follow-up issue and let folks from RTI handle it themselves. |
I am not sure why CI is |
agree. this could be really deep inside of dds implementation. at this moment, we can skip this particular test. |
@Barry-Xu-2018 can you add the patch to skip connextdds for note, i can test locally tomorrow and start CI in the morning. |
@fujitatomoya @Barry-Xu-2018 If you don't mind I can push a fix with disabling |
- The test `burst_bursting_only_filtered_services` fails only with rmw_connext_dds for unknown reasons and tends to be flaky. Sometimes we receive only one service request instead of 2. Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
I can reproduce this flaky result with connextdds in my local environment. |
@Barry-Xu-2018 Likely, you will be able to increase the reproduction rate if you will run your machine in the power save mode and run in parallel in another terminal window |
…g requests" This reverts commit aaaac74. Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
- The test `burst_bursting_only_filtered_services` fails only with rmw_connext_dds for unknown reasons and tends to be flaky. Sometimes we receive only one service request instead of 2. Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
48e45d2
to
6c6c00a
Compare
Re-running CI after disabling failing test for |
@MichaelOrlov @Barry-Xu-2018 all green 🚀🚀🚀 really appreciate your effort! |
@MichaelOrlov after merge this PR, can you take care of this #1608 too? i will update the doc to resolve ros2/ros2_documentation#4274 |
Uff finally we merged it. |
honor is ours, lets keep it up 😄 |
Split #1414 to 2 PRs.
This is the second PR. This PR implement service play.
Note that: this PR depend on #1480
Known a bug while enabling introspection on both service side and client side.
rmw_fastrtps and rmw_cyclonedds have this issue. rmw_connextdds has no this issue.