Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not call Domain::removePublisher while rmw_publisher_allocate return failure in rmw_fastrtps_cpp/dynamic_cpp::create_publisher #433

Closed
Barry-Xu-2018 opened this issue Sep 17, 2020 · 5 comments · Fixed by #434

Comments

@Barry-Xu-2018
Copy link
Contributor

Bug report

- Operating System:
Ubuntu 20.04
- Installation type:
source
- Version or commit hash:
4dc7379
- DDS implementation:
fastrtps
- Client library (if applicable):
rcl

Steps to reproduce issue

$ colcon build --cmake-args -DCMAKE_BUILD_TYPE=Debug --packages-up-to rcl
$ . install/setup.bash
$ valgrind --leak-check=full build/rcl/test/test_node__rmw_fastrtps_cpp
$ valgrind --leak-check=full build/rcl/test/test_node__rmw_fastrtps_dynamic_cpp

Expected behavior

No memory leak.

Actual behavior

Memory leak happens
==225597== 344 (240 direct, 104 indirect) bytes in 2 blocks are definitely lost in loss record 16 of 19
==225597== at 0x48400F3: operator new(unsigned long, std::nothrow_t const&) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==225597== by 0x50CD229: rmw_fastrtps_cpp::create_publisher(CustomParticipantInfo const*, rosidl_message_type_support_t const*, char const*, rmw_qos_profile_t const*, rmw_publisher_options_t const*, bool, bool) (publisher.cpp:125)
==225597== by 0x50C63DD: init_context_impl(rmw_context_t*) (init_rmw_context_impl.cpp:107)
==225597== by 0x50C69C7: rmw_fastrtps_cpp::increment_context_impl_ref_count(rmw_context_t*) (init_rmw_context_impl.cpp:195)
==225597== by 0x50E09B1: rmw_create_node (rmw_node.cpp:60)
==225597== by 0x486D4C0: rcl_node_init (node.c:257)
==225597== by 0x176895: TestNodeFixture__rmw_fastrtps_cpp_test_rcl_node_init_with_internal_errors_Test::TestBody() (test_node.cpp:524)
==225597== by 0x1F0275: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) (gtest.cc:2447)
==225597== by 0x1E922C: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) (gtest.cc:2483)
==225597== by 0x1C5925: testing::Test::Run() (gtest.cc:2522)
==225597== by 0x1C631E: testing::TestInfo::Run() (gtest.cc:2703)
==225597== by 0x1C6A0F: testing::TestCase::Run() (gtest.cc:2825)

@Barry-Xu-2018
Copy link
Contributor Author

Barry-Xu-2018 commented Sep 17, 2020

After investigating, find the cause

While rmw_publisher_allocate return failure

rmw_publisher = rmw_publisher_allocate();
if (!rmw_publisher) {
RMW_SET_ERROR_MSG("failed to allocate publisher");
return nullptr;
}

The clear function is called

auto cleanup_info = rcpputils::make_scope_exit(
[info, participant]() {
if (info->type_support_) {
_unregister_type(participant, info->type_support_);
}
delete info->listener_;
delete info;
});

In above codes, it doesn't remove created publisher in here.
This leads call _unregister_type() is unsuccessful since publisher is in use.

info->publisher_ = Domain::createPublisher(
participant,
publisherParam,
info->listener_);
if (!info->publisher_) {
RMW_SET_ERROR_MSG("create_publisher() could not create publisher");
return nullptr;
}
info->publisher_gid = rmw_fastrtps_shared_cpp::create_rmw_gid(
eprosima_fastrtps_identifier, info->publisher_->getGuid());
rmw_publisher = rmw_publisher_allocate();
if (!rmw_publisher) {
RMW_SET_ERROR_MSG("failed to allocate publisher");
return nullptr;
}

Barry-Xu-2018 added a commit to Barry-Xu-2018/rmw_fastrtps that referenced this issue Sep 17, 2020
@Barry-Xu-2018
Copy link
Contributor Author

Besides, find an issue for current implementation of subscription.

In subscription.cpp(), clear action includes Domain::removeSubscriber

auto cleanup_info = rcpputils::make_scope_exit(
[info, participant]() {
if (info->type_support_) {
_unregister_type(participant, info->type_support_);
}
if (info->subscriber_) {
if (!Domain::removeSubscriber(info->subscriber_)) {
RMW_SAFE_FWRITE_TO_STDERR(
"Failed to remove subscriber after '"
RCUTILS_STRINGIFY(__function__) "' failed.\n");
}
}

But above sequence seems to be a problem.
_unregister_type() -> eprosima::fastrtps::Domain::unregisterType() -> ParticipantImpl::unregisterType()

https://github.com/eProsima/Fast-DDS/blob/8093300bebe636ef88e96ed77651253c2943c51d/src/cpp/fastrtps_deprecated/participant/ParticipantImpl.cpp#L420-L458

If subscriber also exists, type will not be unregistered. So Domain::removeSubscriber() should be called before _unregister_type().

@fujitatomoya
Copy link
Collaborator

@Barry-Xu-2018

If subscriber also exists, type will not be unregistered. So Domain::removeSubscriber() should be called before _unregister_type()

i think so too. if any related publication or subscription with type_support, that type_support cannot be unregistered / deleted.

@Barry-Xu-2018
Copy link
Contributor Author

@fujitatomoya

i think so too. if any related publication or subscription with type_support, that type_support cannot be unregistered / deleted.

Thanks for confirmation.
I will submit new patch for fixing this issue.

@Barry-Xu-2018
Copy link
Contributor Author

@fujitatomoya

I have added fixing #437.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants