Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

busy loop consumes CPU 100% #666

Closed
fujitatomoya opened this issue Feb 2, 2023 · 11 comments
Closed

busy loop consumes CPU 100% #666

fujitatomoya opened this issue Feb 2, 2023 · 11 comments
Assignees

Comments

@fujitatomoya
Copy link
Collaborator

Bug report

Required Info:

Steps to reproduce issue

This problem only can be reproducible in internal application evaluation. So far unable to develop simple self-contained reproducible application.

Expected behavior

Busy loop does not happen, and all application nodes can start up w/o error or problems.

Actual behavior

Busy loop happens during application initialization, that will loop forever that consumes CPU 100%.

Additional information

according to https://github.com/ros2/ros2/blob/e3d99dcd430b46455f7ca2e134f643810ad34e97/ros2.repos#L36-L37, https://github.com/eProsima/Fast-DDS/tree/v2.6.2 is used.

busy loop starts from here,

https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1268

then, process following,

https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1305

change_to_process points always the same address,

and,

https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1321

will always be false, then end up having the next loop from the top. this mutex is locked by WriterProxy::WriterProxy constructor.

https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/reader/WriterProxy.cpp#L102

(gdb) s
eprosima::fastrtps::rtps::Endpoint::getMutex (this=0x55828b720960) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastdds/rtps/Endpoint.h:92
92	        return mp_mutex;
(gdb) p mp_mutex
$23 = {<std::__recursive_mutex_base> = {_M_mutex = {__data = {__lock = 2, __count = 2, __owner = 634358, __nusers = 1, __kind = 1, 
        __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = "\002\000\000\000\002\000\000\000\366\255\t\000\001\000\000\000\001", '\000' <repeats 22 times>, 
      __align = 8589934594}}, <std::__timed_mutex_impl<std::recursive_timed_mutex>> = {<No data fields>}, <No data fields>}

I believe this mutex is locked on here,

https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/writer/StatefulWriter.cpp#L1867

@fujitatomoya
Copy link
Collaborator Author

stack trace via gdb

ROS: humble
Install type: built from source, and apt installed binary packages.
target node: /opt/ros/humble/lib/topic_tools/relay
process id: 634333
LPW in a busy loop: 634406

(gdb) n
1306 while (nullptr != change_to_process)
(gdb)
1309 if (!async_mode.fast_check_is_there_slot_for_change(change_to_process))
(gdb)
1314 if (nullptr == current_writer || current_writer->getGuid() != change_to_process->writerGUID)
(gdb)
1316 auto writer_it = writers_.find(change_to_process->writerGUID);
(gdb)
1317 assert(writers_.end() != writer_it);
(gdb)
1319 current_writer = writer_it->second;
(gdb)
1322 if (!current_writer->getMutex().try_lock())
(gdb)
1324 break;
(gdb)
1381 async_mode.group.sender(nullptr, nullptr);
(gdb)
1269 while (async_mode.running) // loop starts here
(gdb) n
1272 if (0 != async_mode.writers_interested_in_remove)
(gdb) n
1277 std::unique_lockstd::mutex lock(mutex_);
(gdb) n
1278 fastrtps::rtps::CacheChange_t* change_to_process = nullptr;
(gdb) n
1282 std::unique_lockstd::mutex in_lock(async_mode.changes_interested_mutex);
(gdb) n
1284 sched.add_interested_changes_to_queue_nts();
(gdb) n
1286 while (async_mode.running &&
(gdb) n
1287 (async_mode.force_wait() || nullptr == (change_to_process = sched.get_next_change_nts())))
(gdb) s
eprosima::fastdds::rtps::FlowControllerAsyncPublishMode::force_wait (this=0x55828b641148) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:229
229 return false;
(gdb)
230 }
(gdb)
eprosima::fastdds::rtps::FlowControllerImpl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode, eprosima::fastdds::rtps::FlowControllerFifoSchedule>::run (this=0x55828b640a60) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:1286
1286 while (async_mode.running &&
(gdb)
1287 (async_mode.force_wait() || nullptr == (change_to_process = sched.get_next_change_nts())))
(gdb)
eprosima::fastdds::rtps::FlowControllerFifoSchedule::get_next_change_nts (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:411
411 return queue_.get_next_change();
(gdb)
eprosima::fastdds::rtps::FlowQueue::get_next_change (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:75
75 if (!is_empty())
(gdb)
eprosima::fastdds::rtps::FlowQueue::is_empty (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:58
58 return new_ones_.is_empty() && old_ones_.is_empty();
(gdb)
eprosima::fastdds::rtps::FlowQueue::ListInfo::is_empty (this=0x55828b640e08) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:125
125 assert((&tail == head.writer_info.next && &head == tail.writer_info.previous) ||
(gdb)
127 return &tail == head.writer_info.next;
(gdb) p &tail
$17 = (eprosima::fastrtps::rtps::CacheChange_t *) 0x55828b640ed8
(gdb) p head.writer_info.next
$18 = (eprosima::fastrtps::rtps::CacheChange_t * volatile) 0x55828b640ed8
(gdb) s
128 }
(gdb)
eprosima::fastdds::rtps::FlowQueue::is_empty (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:58
58 return new_ones_.is_empty() && old_ones_.is_empty();
(gdb)
eprosima::fastdds::rtps::FlowQueue::ListInfo::is_empty (this=0x55828b640fa8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:125
125 assert((&tail == head.writer_info.next && &head == tail.writer_info.previous) ||
(gdb)
127 return &tail == head.writer_info.next;
(gdb) p &tail
$19 = (eprosima::fastrtps::rtps::CacheChange_t *) 0x55828b641078
(gdb) p head.writer_info.next
$20 = (eprosima::fastrtps::rtps::CacheChange_t * volatile) 0x55828b6e76c0
(gdb) s
128 }
(gdb)
eprosima::fastdds::rtps::FlowQueue::is_empty (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:59
59 }
(gdb)
eprosima::fastdds::rtps::FlowQueue::get_next_change (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:75
75 if (!is_empty())
(gdb) s
77 return !new_ones_.is_empty() ?
(gdb) finish
Run till exit from #0 eprosima::fastdds::rtps::FlowQueue::get_next_change (this=0x55828b640ac8)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:77
eprosima::fastdds::rtps::FlowControllerFifoSchedule::get_next_change_nts (this=0x55828b640ac8) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:412
412 }
Value returned is $21 = (eprosima::fastrtps::rtps::CacheChange_t *) 0x55828b6e76c0
(gdb) s
eprosima::fastdds::rtps::FlowControllerImpl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode, eprosima::fastdds::rtps::FlowControllerFifoSchedule>::run (this=0x55828b640a60) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:1303
1303 }
(gdb) p change_to_process
$22 = (eprosima::fastrtps::rtps::CacheChange_t *) 0x55828b6e76c0
// The value is always 0x55828b6e76c0. Because "Remove previously from queue" codes in line 1331 does not work.

(gdb) n
1305 fastrtps::rtps::RTPSWriter* current_writer = nullptr;
(gdb)
1306 while (nullptr != change_to_process)
(gdb)
1309 if (!async_mode.fast_check_is_there_slot_for_change(change_to_process))
(gdb)
1314 if (nullptr == current_writer || current_writer->getGuid() != change_to_process->writerGUID)
(gdb)
1316 auto writer_it = writers_.find(change_to_process->writerGUID);
(gdb)
1317 assert(writers_.end() != writer_it);
(gdb)
1319 current_writer = writer_it->second;
(gdb)
1322 if (!current_writer->getMutex().try_lock())
// (!current_writer->getMutex().try_lock()) is always false

(gdb) s
eprosima::fastrtps::rtps::Endpoint::getMutex (this=0x55828b720960) at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastdds/rtps/Endpoint.h:92
92 return mp_mutex;
(gdb) p mp_mutex
$23 = {std::__recursive_mutex_base = {_M_mutex = {__data = {__lock = 2, __count = 2, __owner = 634358, __nusers = 1, __kind = 1,
__spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\002\000\000\000\366\255\t\000\001\000\000\000\001", '\000' <repeats 22 times>,
__align = 8589934594}}, <std::__timed_mutex_implstd::recursive_timed_mutex> = {}, }
// This mutex is owned by 634358

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f9caa851640 (LWP 634358))]
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55828b6355d0)
at ./nptl/futex-internal.c:57
57 in ./nptl/futex-internal.c
(gdb) up
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55828b6355d0)
at ./nptl/futex-internal.c:87
87 in ./nptl/futex-internal.c
(gdb) info locals
err =
clockbit = 256
op = 393
err =
clockbit =
op =
(gdb) up
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55828b6355d0, expected=expected@entry=0,
clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
139 in ./nptl/futex-internal.c
(gdb) info locals
No locals.
(gdb) up
#3 0x00007f9cad91eac1 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55828b635580, cond=0x55828b6355a8)
at ./nptl/pthread_cond_wait.c:503
503 ./nptl/pthread_cond_wait.c: No such file or directory.
(gdb) info locals
spin = 0
buffer = {__routine = 0x7f9cad91e7a0 <__condvar_cleanup_waiting>, __arg = 0x7f9caa84fc70, __canceltype = -1434125168, __prev = 0x0}
cbuffer = {wseq = 0, cond = 0x55828b6355a8, mutex = 0x55828b635580, private = 0}
err =
g = 0
flags =
g1_start =
maxspin = 0
signals =
result = 0
wseq = 0
seq = 0
private = 0
maxspin =
err =
result =
wseq =
g =
seq =
flags =
private =
signals =
done =
g1_start =
--Type for more, q to quit, c to continue without paging--q
Quit
(gdb) up
#4 __pthread_cond_wait (cond=0x55828b6355a8, mutex=0x55828b635580) at ./nptl/pthread_cond_wait.c:627
627 in ./nptl/pthread_cond_wait.c
(gdb) up
#5 0x00007f9cac858b3f in eprosima::shared_mutex::lock_shared (this=0x55828b635580)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastrtps/utils/shared_mutex.hpp:100
100 gate1
.wait(lk);
(gdb) p lk
$30 = {_M_device = 0x55828b635580, M_owns = true}
(gdb) p gate1

$31 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}}, __g1_start = {__value64 = 0,
__value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0,
0}},
__size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
_align = 2}}}
(gdb) p mut

$32 = {std::__mutex_base = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 2, __kind = 0, __spins = 0,
__elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\002", '\000' <repeats 26 times>,
align = 0}}, }
(gdb) up
#6 0x00007f9cac8590ee in eprosima::shared_lockeprosima::shared_mutex::shared_lock (this=0x7f9caa84fd80, m=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastrtps/utils/shared_mutex.hpp:175
175 m
->lock_shared();
(gdb) p m
$33 = (eprosima::shared_lockeprosima::shared_mutex::mutex_type &) @0x55828b635580: {mut
= {std::__mutex_base = {_M_mutex = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 12 times>, "\002", '\000' <repeats 26 times>, _align = 0}}, },
gate1
= {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}}, __g1_start = {__value64 = 0,
__value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0,
0}},
__size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
_align = 2}}}, gate2 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}},
__g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0,
__wrefs = 8, __g_signals = {0, 0}},
_size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
align = 2}}}, state = 2147483649, static write_entered = 2147483648, static n_readers
= 2147483647}
(gdb) up
#7 0x00007f9cac9747aa in eprosima::fastrtps::rtps::RTPSParticipantImpl::find_local_reader (this=0x55828b635080, reader_guid=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/participant/RTPSParticipantImpl.cpp:1158
1158 shared_lock<shared_mutex> (endpoints_list_mutex);
(gdb) p shared_mutex
Attempt to use a type name as an expression
(gdb) p endpoints_list_mutex
$34 = {mut
= {std::__mutex_base = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 2, __kind = 0, __spins = 0,
__elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 12 times>, "\002", '\000' <repeats 26 times>,
_align = 0}}, }, gate1 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2,
__high = 0}}, __g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0},
__g1_orig_size = 0, __wrefs = 8, __g_signals = {0, 0}},
__size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
_align = 2}}}, gate2 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}},
__g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0,
__wrefs = 8, __g_signals = {0, 0}},
_size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
align = 2}}}, state = 2147483649, static write_entered = 2147483648, static n_readers
= 2147483647}
(gdb) up
#8 0x00007f9cac993a3f in eprosima::fastrtps::rtps::RTPSDomainImpl::find_local_reader (reader_guid=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/RTPSDomain.cpp:537
537 return participant.second->find_local_reader(reader_guid);
(gdb) up
#9 0x00007f9cac874711 in eprosima::fastrtps::rtps::StatefulWriter::intraprocess_heartbeat (this=0x55828b720960,
reader_proxy=0x55828b632c50, liveliness=false)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:498
498 RTPSReader* reader = RTPSDomainImpl::find_local_reader(reader_proxy->guid());
(gdb) up
#10 0x00007f9cac87a74f in operator() (__closure=0x7f9ca0000c00, remote_reader=0x55828b632c50)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:1912
1912 intraprocess_heartbeat(remote_reader);
(gdb) up
#11 0x00007f9cac88096e in std::__invoke_impl<bool, eprosima::fastrtps::rtps::StatefulWriter::process_acknack(const eprosima::fastrtps::rtps::GUID_t&, const eprosima::fastrtps::rtps::GUID_t&, uint32_t, const SequenceNumberSet_t&, bool, bool&)::<lambda(eprosima::fastrtps::rtps::ReaderProxy*)>&, eprosima::fastrtps::rtps::ReaderProxy*>(std::__invoke_other, struct {...} &) (__f=...)
at /usr/include/c++/11/bits/invoke.h:61
61 { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
(gdb) up
#12 0x00007f9cac87f7fd in std::__invoke_r<bool, eprosima::fastrtps::rtps::StatefulWriter::process_acknack(const eprosima::fastrtps::rtps::GUID_t&, const eprosima::fastrtps::rtps::GUID_t&, uint32_t, const SequenceNumberSet_t&, bool, bool&)::<lambda(eprosima::fastrtps::rtps::ReaderProxy*)>&, eprosima::fastrtps::rtps::ReaderProxy*>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:142
142 return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
(gdb) up
#13 0x00007f9cac87da79 in std::_Function_handler<bool(eprosima::fastrtps::rtps::ReaderProxy*), eprosima::fastrtps::rtps::StatefulWriter::process_acknack(const eprosima::fastrtps::rtps::GUID_t&, const eprosima::fastrtps::rtps::GUID_t&, uint32_t, const SequenceNumberSet_t&, bool, bool&)::<lambda(eprosima::fastrtps::rtps::ReaderProxy*)> >::_M_invoke(const std::_Any_data &, eprosima::fastrtps::rtps::ReaderProxy &&) (
__functor=..., __args#0=@0x7f9caa850040: 0x55828b632c50) at /usr/include/c++/11/bits/std_function.h:290
290 return std::__invoke_r<_Res>(
_Base::_M_get_pointer(__functor),
(gdb) up
#14 0x00007f9cac88604f in std::function<bool (eprosima::fastrtps::rtps::ReaderProxy*)>::operator()(eprosima::fastrtps::rtps::ReaderProxy*) const (this=0x7f9caa8500e0, __args#0=0x55828b632c50) at /usr/include/c++/11/bits/std_function.h:590
590 return _M_invoker(M_functor, std::forward<ArgTypes>(args)...);
(gdb) up
#15 0x00007f9cac872101 in eprosima::fastrtps::rtps::for_matched_readers(eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, std::function<bool (eprosima::fastrtps::rtps::ReaderProxy*)>) (reader_vector_1=..., fun=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:78
78 if (fun(remote_reader))
(gdb) up
#16 0x00007f9cac872198 in eprosima::fastrtps::rtps::for_matched_readers(eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, std::function<bool (eprosima::fastrtps::rtps::ReaderProxy*)>) (reader_vector_1=...,
reader_vector_2=..., fun=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:92
92 if (for_matched_readers(reader_vector_1, fun))
(gdb) up
#17 0x00007f9cac872296 in eprosima::fastrtps::rtps::for_matched_readers(eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, eprosima::fastrtps::ResourceLimitedVector<eprosima::fastrtps::rtps::ReaderProxy*, std::integral_constant<bool, false>, eprosima::fastrtps::ResourceLimitedContainerConfig, std::allocatoreprosima::fastrtps::rtps::ReaderProxy*, std::vector<eprosima::fastrtps::rtps::ReaderProxy*, std::allocatoreprosima::fastrtps::rtps::ReaderProxy* > >&, std::function<bool (eprosima::fastrtps::rtps::ReaderProxy*)>) (reader_vector_1=..., reader_vector_2=..., reader_vector_3=..., fun=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:105
105 if (for_matched_readers(reader_vector_1, reader_vector_2, fun))
(gdb) up
#18 0x00007f9cac87a972 in eprosima::fastrtps::rtps::StatefulWriter::process_acknack (this=0x55828b720960, writer_guid=...,
reader_guid=..., ack_count=1, sn_set=..., final_flag=false, result=@0x7f9caa8502a8: true)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/writer/StatefulWriter.cpp:1867
1867 for_matched_readers(matched_local_readers
, matched_datasharing_readers
, matched_remote_readers
,
(gdb) up
#19 0x00007f9cac926269 in eprosima::fastrtps::rtps::WriterProxy::perform_initial_ack_nack (this=0x55828b77ef00)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/reader/WriterProxy.cpp:522
522 writer->process_acknack(guid(), reader
->getGuid(), 1, SequenceNumberSet_t(), false, tmp);
(gdb) up
#20 0x00007f9cac92475d in operator() (__closure=0x55828b7805d0)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/reader/WriterProxy.cpp:102
102 return perform_initial_ack_nack();
// This code is in a constructor. This trouble can only occur at launch a node.

(gdb) up
#21 0x00007f9cac926cce in std::__invoke_impl<bool, eprosima::fastrtps::rtps::WriterProxy::WriterProxy(eprosima::fastrtps::rtps::StatefulReader*, const eprosima::fastrtps::rtps::RemoteLocatorsAllocationAttributes&, const eprosima::fastrtps::ResourceLimitedContainerConfig&)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
61 { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
(gdb) up
#22 0x00007f9cac926b2d in std::__invoke_r<bool, eprosima::fastrtps::rtps::WriterProxy::WriterProxy(eprosima::fastrtps::rtps::StatefulReader*, const eprosima::fastrtps::rtps::RemoteLocatorsAllocationAttributes&, const eprosima::fastrtps::ResourceLimitedContainerConfig&)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/11/bits/invoke.h:142
142 return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
(gdb) up
#23 0x00007f9cac9268c7 in std::_Function_handler<bool(), eprosima::fastrtps::rtps::WriterProxy::WriterProxy(eprosima::fastrtps::rtps::StatefulReader*, const eprosima::fastrtps::rtps::RemoteLocatorsAllocationAttributes&, const eprosima::fastrtps::ResourceLimitedContainerConfig&)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290
290 return std::__invoke_r<_Res>(_Base::_M_get_pointer(__functor),
(gdb) up
#24 0x00007f9cac85433a in std::function<bool ()>::operator()() const (this=0x55828b7805d0) at /usr/include/c++/11/bits/std_function.h:590
590 return _M_invoker(_M_functor, std::forward<ArgTypes>(args)...);
(gdb) up
#25 0x00007f9cac853ce5 in eprosima::fastrtps::rtps::TimedEventImpl::trigger (this=0x55828b7805c0, current_time=..., cancel_time=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/resources/TimedEventImpl.cpp:99
99 bool restart = callback
();
(gdb) up
#26 0x00007f9cac84d38e in eprosima::fastrtps::rtps::ResourceEvent::do_timer_actions (this=0x55828b635468)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/resources/ResourceEvent.cpp:255
255 tp->trigger(current_time
, cancel_time);
(gdb) up
#27 0x00007f9cac84cd53 in eprosima::fastrtps::rtps::ResourceEvent::event_service (this=0x55828b635468)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/resources/ResourceEvent.cpp:158
158 do_timer_actions();
(gdb) p mutex

$35 = {std::__mutex_base = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0,
__elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
__align = 0}}, <std::__timed_mutex_implstd::timed_mutex> = {}, }
(gdb) up
#28 0x00007f9cac853021 in std::__invoke_impl<void, void (eprosima::fastrtps::rtps::ResourceEvent::)(), eprosima::fastrtps::rtps::ResourceEvent*> (
__f=@0x55828b635e20: (void (eprosima::fastrtps::rtps::ResourceEvent::)(eprosima::fastrtps::rtps::ResourceEvent * const)) 0x7f9cac84cd16 eprosima::fastrtps::rtps::ResourceEvent::event_service(), __t=@0x55828b635e18: 0x55828b635468) at /usr/include/c++/11/bits/invoke.h:74
74 return ((std::forward<_Tp>(__t)).__f)(std::forward<_Args>(__args)...);
(gdb) up
#29 0x00007f9cac852f73 in std::__invoke<void (eprosima::fastrtps::rtps::ResourceEvent::
)(), eprosima::fastrtps::rtps::ResourceEvent*> (
__fn=@0x55828b635e20: (void (eprosima::fastrtps::rtps::ResourceEvent::)(eprosima::fastrtps::rtps::ResourceEvent * const)) 0x7f9cac84cd16 eprosima::fastrtps::rtps::ResourceEvent::event_service()) at /usr/include/c++/11/bits/invoke.h:96
96 return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
(gdb)
#30 0x00007f9cac852ed3 in std::thread::_Invoker<std::tuple<void (eprosima::fastrtps::rtps::ResourceEvent::
)(), eprosima::fastrtps::rtps::ResourceEvent*> >::_M_invoke<0ul, 1ul> (this=0x55828b635e18) at /usr/include/c++/11/bits/std_thread.h:253
253 { return std::__invoke(std::get<_Ind>(std::move(_M_t))...); }
(gdb)
#31 0x00007f9cac852e88 in std::thread::_Invoker<std::tuple<void (eprosima::fastrtps::rtps::ResourceEvent::)(), eprosima::fastrtps::rtps::ResourceEvent> >::operator() (this=0x55828b635e18) at /usr/include/c++/11/bits/std_thread.h:260
260 return _M_invoke(_Indices());
(gdb)
#32 0x00007f9cac852e68 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (eprosima::fastrtps::rtps::ResourceEvent::)(), eprosima::fastrtps::rtps::ResourceEvent> > >::_M_run (this=0x55828b635e10) at /usr/include/c++/11/bits/std_thread.h:211
211 _M_run() { _M_func(); }
(gdb)
#33 0x00007f9cadbaf2c3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
(gdb)
#34 0x00007f9cad91fb43 in start_thread (arg=) at ./nptl/pthread_create.c:442
442 ./nptl/pthread_create.c: No such file or directory.
(gdb)
#35 0x00007f9cad9b1a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
81 ../sysdeps/unix/sysv/linux/x86_64/clone3.S: No such file or directory.
(gdb)
Initial frame selected; you cannot go up.

@fujitatomoya
Copy link
Collaborator Author

CC: @MiguelCompany

@fujitatomoya
Copy link
Collaborator Author

@iuhilnehc-ynos if you have bandwidth, could you take a look at this?

@iuhilnehc-ynos
Copy link
Contributor

@fujitatomoya

Could you check if the internal application can still reproduce this issue with the latest version of humble distro?
There are a lot of patches upgraded from v2.6.2 to v2.6.4.

$ dpkg -l | grep -i fastrtps
ii  ros-humble-fastrtps                               2.6.4-1jammy.20230117.223829            amd64        *eprosima Fast DDS* (formerly Fast RTPS) is a C++ implementation of the DDS (Data Distribution Service) standard of the OMG (Object Management Group).

@iuhilnehc-ynos
Copy link
Contributor

From the debug information, we can see that the shared_lock for the endpoints_list_mutex is waiting.

#6 0x00007f9cac8590ee in eprosima::shared_lockeprosima::shared_mutex::shared_lock (this=0x7f9caa84fd80, m=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/include/fastrtps/utils/shared_mutex.hpp:175
175 m->lock_shared();
(gdb) p m
$33 = (eprosima::shared_lockeprosima::shared_mutex::mutex_type &) @0x55828b635580: {mut = {std::__mutex_base = {_M_mutex = {__data = {
__lock = 0, __count = 0, __owner = 0, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = '\000' <repeats 12 times>, "\002", '\000' <repeats 26 times>, _align = 0}}, },
gate1 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}}, __g1_start = {__value64 = 0,
__value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 8, __g_signals = {0,
0}},
__size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
_align = 2}}}, gate2 = {_M_cond = {_M_cond = {__data = {__wseq = {__value64 = 2, __value32 = {__low = 2, __high = 0}},
__g1_start = {__value64 = 0, __value32 = {__low = 0, __high = 0}}, __g_refs = {2, 0}, __g_size = {0, 0}, __g1_orig_size = 0,
__wrefs = 8, __g_signals = {0, 0}},
_size = "\002", '\000' <repeats 15 times>, "\002", '\000' <repeats 19 times>, "\b\000\000\000\000\000\000\000\000\000\000",
align = 2}}}, state = 2147483649, static write_entered = 2147483648, static n_readers = 2147483647}
(gdb) up
#7 0x00007f9cac9747aa in eprosima::fastrtps::rtps::RTPSParticipantImpl::find_local_reader (this=0x55828b635080, reader_guid=...)
at /home/ubuntu/hello-ros2/ros2_humble/src/eProsima/Fast-DDS/src/cpp/rtps/participant/RTPSParticipantImpl.cpp:1158
1158 shared_lock<shared_mutex> (endpoints_list_mutex);

There was a PR(eProsima/Fast-DDS#2976) to deal with this similar issue, it expects that
the reader will receive the requested lock, even if a writer is waiting.

After breaking this waiting lock endpoints_list_mutex, the lock in the StatefulWriter::process_acknack called at https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/reader/WriterProxy.cpp#L522 will be unlocked later, and then https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp#L1321 can try_lock sucessfully.

@fujitatomoya
Copy link
Collaborator Author

Does this happen to rolling? it is worth to check if the problem still stays in rolling which uses Fast-DDS master branch.

@iuhilnehc-ynos
Copy link
Contributor

iuhilnehc-ynos commented Mar 15, 2023

eProsima/Fast-DDS#2976 fixed for the master at first and then backported 2.7.x and 2.6.x.
As we know, rolling use the master branch for Fast-DDS.

Actually, I don't know how to reproduce this issue.

@MiguelCompany
Copy link
Collaborator

Actually, I don't know how to reproduce this issue.

Does this mean we can close this issue, since it was fixed by eProsima/Fast-DDS#2976 and its backport to humble (eProsima/Fast-DDS#3091)?

@iuhilnehc-ynos
Copy link
Contributor

@MiguelCompany

Thanks for your reply.
I think it's better to wait for the final confirmation from @fujitatomoya , as it's reported by an internal application(maybe the internal application we all can't get.) that using ros2 tool(/opt/ros/humble/lib/topic_tools/relay).

@fujitatomoya
Copy link
Collaborator Author

@MiguelCompany

let us have some time to make sure this has been addressed to humble, we will confirm that the problem cannot be observed with humble source build.

@fujitatomoya fujitatomoya self-assigned this Mar 15, 2023
@fujitatomoya
Copy link
Collaborator Author

@MiguelCompany this has been confirmed our internal test, no problem observed with current humble source build. thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants