-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
busy loop consumes CPU 100% #666
Comments
stack trace via gdbROS: humble (gdb) n (gdb) n (gdb) s (gdb) thread 4 (gdb) up |
CC: @MiguelCompany |
@iuhilnehc-ynos if you have bandwidth, could you take a look at this? |
Could you check if the internal application can still reproduce this issue with the latest version of humble distro?
|
From the debug information, we can see that the shared_lock for the #6 0x00007f9cac8590ee in eprosima::shared_lockeprosima::shared_mutex::shared_lock (this=0x7f9caa84fd80, m=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastrtps/utils/shared_mutex.hpp:175
175 m->lock_shared();
(gdb) p m
$33 = (eprosima::shared_lockeprosima::shared_mutex::mutex_type &) @0x55828b635580: {mut = {std::__mutex_base = {_M_mutex = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 12 times>, "\002", '\000' <repeats 26 times>, _align = 0}}, },
gate1 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}}, __g1_start = {__value64 = 0,
__value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0,
0}},
__size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
_align = 2}}}, gate2 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}},
__g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0,
__wrefs = 8, __g_signals = {0, 0}},
_size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
align = 2}}}, state = 2147483649, static write_entered = 2147483648, static n_readers = 2147483647}
(gdb) up
#7 0x00007f9cac9747aa in eprosima::fastrtps::rtps::RTPSParticipantImpl::find_local_reader (this=0x55828b635080, reader_guid=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/participant/RTPSParticipantImpl.cpp:1158
1158 shared_lock<shared_mutex> (endpoints_list_mutex); There was a PR(eProsima/Fast-DDS#2976) to deal with this similar issue, it expects that After breaking this waiting lock |
Does this happen to rolling? it is worth to check if the problem still stays in rolling which uses Fast-DDS master branch. |
eProsima/Fast-DDS#2976 fixed for the Actually, I don't know how to reproduce this issue. |
Does this mean we can close this issue, since it was fixed by eProsima/Fast-DDS#2976 and its backport to humble (eProsima/Fast-DDS#3091)? |
Thanks for your reply. |
let us have some time to make sure this has been addressed to humble, we will confirm that the problem cannot be observed with humble source build. |
@MiguelCompany this has been confirmed our internal test, no problem observed with current humble source build. thanks 👍 |
Bug report
Required Info:
Steps to reproduce issue
This problem only can be reproducible in internal application evaluation. So far unable to develop simple self-contained reproducible application.
Expected behavior
Busy loop does not happen, and all application nodes can start up w/o error or problems.
Actual behavior
Busy loop happens during application initialization, that will loop forever that consumes CPU 100%.
Additional information
according to https://github.com/ros2/ros2/blob/e3d99dcd430b46455f7ca2e134f643810ad34e97/ros2.repos#L36-L37, https://github.com/eProsima/Fast-DDS/tree/v2.6.2 is used.
busy loop starts from here,
https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1268
then, process following,
https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1305
change_to_process points always the same address,
and,
https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1321
will always be false, then end up having the next loop from the top. this mutex is locked by
WriterProxy::WriterProxy
constructor.https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/reader/WriterProxy.cpp#L102
I believe this mutex is locked on here,
https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/writer/StatefulWriter.cpp#L1867
The text was updated successfully, but these errors were encountered: