Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client_config::reliable_topic_config_map_ should not be modified concurrently [API-384] #849

Merged
merged 14 commits into from
Apr 28, 2021

Conversation

ihsandemir
Copy link
Collaborator

@ihsandemir ihsandemir commented Apr 16, 2021

client_config::reliable_topic_config_map_ can be modified during client execution and hence can be modified through different user threads. Hence, we have to use a concurrent map. Normally, we should use the client_config as an immutable config of parameters, hence it should not be modified once the client started. Hence, the solution is to use a different API in the internal implementation config::reliable_topic_config *find_reliable_topic_config(const std::string &name) which does not modify the map but uses the default batch size.

fixes #848

@ihsandemir ihsandemir added this to the 4.1 milestone Apr 16, 2021
@ihsandemir ihsandemir requested review from yuce, sancar and yemreinci April 16, 2021 05:22
@ihsandemir ihsandemir self-assigned this Apr 16, 2021
@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@sancar
Copy link
Contributor

sancar commented Apr 16, 2021

This issue does not seem to be specific to reliable_topic.
For example, flakeIdGenerator also uses the map from config on the proxy and modifies it on some cases.

What about introducing private methods to config which does not modify internal maps and return optional, which will be used by proxies.
Not sure if this causes a cyclic dependency because related proxies should be a friend class of the config.

@ihsandemir
Copy link
Collaborator Author

private

Good catch. Yes, i see that the same problems for flake_id_generator_config and near_cache_config. The private method returning a raw pointer or nullptr may also work, let me try doing that way and see hot it works. I don't think it is cyclic dependency, we just forward declare a class to be a friend and hence the config will be used as the same from the other classes.

… map can be modified during client execution and hence can be modified through different user threads. Hence, the solution is to use a different API in the internal implementation `config::reliable_topic_config *find_reliable_topic_config(const std::string &name)` which does not modify the map but uses the default batch size.

fixes hazelcast#848
@ihsandemir ihsandemir changed the title client_config::reliable_topic_config_map_ should be concurrent map client_config::reliable_topic_config_map_ should not be modified concurrently Apr 16, 2021
@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@devOpsHazelcast
Copy link
Contributor

Windows test FAILed.

@devOpsHazelcast
Copy link
Contributor

Linux test FAILed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

yemreinci
yemreinci previously approved these changes Apr 19, 2021
Copy link
Contributor

@yemreinci yemreinci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, just 2 suggestions.

hazelcast/src/hazelcast/client/config.cpp Outdated Show resolved Hide resolved
hazelcast/src/hazelcast/client/config.cpp Outdated Show resolved Hide resolved
sancar
sancar previously approved these changes Apr 19, 2021
…g to do so causes the error

```
terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided
```
as described at this [SO question](https://stackoverflow.com/questions/56007421/joining-a-thread-resource-deadlock-avoided)

The user provided a [stack trace](https://hazelcastcommunity.slack.com/archives/C01NC9N8CAE/p1618833418016000?thread_ts=1618329061.009600&cid=C01NC9N8CAE) where it happens:
```
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f7fdd67842a in __GI_abort () at abort.c:89
#2  0x00007f7fddf8f0ad in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
hazelcast#3  0x00007f7fddf8d066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
hazelcast#4  0x00007f7fddf8c089 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
hazelcast#5  0x00007f7fddf8c9dd in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
hazelcast#6  0x00007f7fdd9f2f33 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
hazelcast#7  0x00007f7fdd9f3437 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
hazelcast#8  0x00007f7fdf528395 in hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::stop (this=0x7f7fd8012388)
    at /tmp/hazelcast-cpp-client/hazelcast/src/hazelcast/client/proxy.cpp:1726
hazelcast#9  0x00007f7fdf52820c in hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::~ReliableTopicExecutor (this=0x7f7fd8012388, __in_chrg=<optimized out>)
    at /tmp/hazelcast-cpp-client/hazelcast/src/hazelcast/client/proxy.cpp:1715
hazelcast#10 0x00005559cb56768d in hazelcast::client::reliable_topic::MessageRunner<hazelcast::client::topic::reliable_listener>::~MessageRunner (this=0x7f7fd80122c0,
    __in_chrg=<optimized out>) at /usr/local/hazelcast/include/hazelcast/client/reliable_topic.h:134
hazelcast#11 0x00005559cb5676e6 in hazelcast::client::reliable_topic::MessageRunner<hazelcast::client::topic::reliable_listener>::~MessageRunner (this=0x7f7fd80122c0,
    __in_chrg=<optimized out>) at /usr/local/hazelcast/include/hazelcast/client/reliable_topic.h:134
---Type <return> to continue, or q <return> to quit---
hazelcast#12 0x00005559cb58191a in std::_Sp_counted_ptr<hazelcast::client::reliable_topic::MessageRunner<hazelcast::client::topic::reliable_listener>*, (__gnu_cxx::_Lock_policy)2>::_M_dispose
    (this=0x7f7fd8011d80) at /usr/include/c++/6/bits/shared_ptr_base.h:372
hazelcast#13 0x00005559cb54e588 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f7fd8011d80) at /usr/include/c++/6/bits/shared_ptr_base.h:150
hazelcast#14 0x00005559cb548095 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f7fc9fe2d10, __in_chrg=<optimized out>)
    at /usr/include/c++/6/bits/shared_ptr_base.h:662
hazelcast#15 0x00005559cb5569c0 in std::__shared_ptr<hazelcast::client::execution_callback<hazelcast::client::rb::read_result_set>, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
    this=0x7f7fc9fe2d08, __in_chrg=<optimized out>) at /usr/include/c++/6/bits/shared_ptr_base.h:928
hazelcast#16 0x00005559cb5569f8 in std::shared_ptr<hazelcast::client::execution_callback<hazelcast::client::rb::read_result_set> >::~shared_ptr (this=0x7f7fc9fe2d08,
    __in_chrg=<optimized out>) at /usr/include/c++/6/bits/shared_ptr.h:93
hazelcast#17 0x00005559cb556a38 in hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::Message::~Message (this=0x7f7fc9fe2cf0, __in_chrg=<optimized out>)
    at /usr/local/hazelcast/include/hazelcast/client/topic/impl/reliable/ReliableTopicExecutor.h:45
hazelcast#18 0x00007f7fdf528699 in hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::Task::run (this=0x7f7fc9fe2e20)
    at /tmp/hazelcast-cpp-client/hazelcast/src/hazelcast/client/proxy.cpp:1741
hazelcast#19 0x00007f7fdf52808a in hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::<lambda()>::operator()(void) const (__closure=0x7f7fd8012288)
    at /tmp/hazelcast-cpp-client/hazelcast/src/hazelcast/client/proxy.cpp:1711
hazelcast#20 0x00007f7fdf53d11a in std::_Bind_simple<hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::ReliableTopicExecutor(std::shared_ptr<hazelcast::client::ringbuffer>, haze---Type <return> to continue, or q <return> to quit---
lcast::logger&)::<lambda()>()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7f7fd8012288) at /usr/include/c++/6/functional:1391
hazelcast#21 0x00007f7fdf53a55d in std::_Bind_simple<hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::ReliableTopicExecutor(std::shared_ptr<hazelcast::client::ringbuffer>, hazelcast::logger&)::<lambda()>()>::operator()(void) (this=0x7f7fd8012288) at /usr/include/c++/6/functional:1380
hazelcast#22 0x00007f7fdf5394ac in std::thread::_State_impl<std::_Bind_simple<hazelcast::client::topic::impl::reliable::ReliableTopicExecutor::ReliableTopicExecutor(std::shared_ptr<hazelcast::client::ringbuffer>, hazelcast::logger&)::<lambda()>()> >::_M_run(void) (this=0x7f7fd8012280) at /usr/include/c++/6/thread:197
hazelcast#23 0x00007f7fddfb7e6f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
hazelcast#24 0x00007f7fde4904a4 in start_thread (arg=0x7f7fc9fe3700) at pthread_create.c:456
hazelcast#25 0x00007f7fdd72cd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```

From this stack trace we can see that the join is called from within the same thread which caused the problem.
@ihsandemir ihsandemir dismissed stale reviews from sancar and yemreinci via 143b7de April 21, 2021 03:24
@devOpsHazelcast
Copy link
Contributor

Linux test FAILed.

@ihsandemir
Copy link
Collaborator Author

verify

@devOpsHazelcast
Copy link
Contributor

Windows test FAILed.

@devOpsHazelcast
Copy link
Contributor

Linux test FAILed.

@devOpsHazelcast
Copy link
Contributor

Linux test FAILed.

@devOpsHazelcast
Copy link
Contributor

Windows test FAILed.

Co-authored-by: yemreinci <18687880+yemreinci@users.noreply.github.com>
@devOpsHazelcast
Copy link
Contributor

Windows test FAILed.

@devOpsHazelcast
Copy link
Contributor

Linux test FAILed.

@ihsandemir
Copy link
Collaborator Author

Solution is verified to be working by the community user who originally reported the issue #848 .

@ihsandemir ihsandemir requested a review from yemreinci April 27, 2021 04:30
@@ -37,7 +37,7 @@ namespace hazelcast {
BlockingConcurrentQueue(size_t max_queue_capacity) : capacity_(max_queue_capacity), is_interrupted_(false) {
Copy link
Contributor

@sancar sancar Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this class is not used anymore. Only in the tests. Should we remove it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it and related tests.

sancar
sancar previously approved these changes Apr 27, 2021
@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

yemreinci
yemreinci previously approved these changes Apr 27, 2021
@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@ihsandemir ihsandemir dismissed stale reviews from yemreinci and sancar via 70e8c36 April 28, 2021 02:57
@ihsandemir ihsandemir requested review from sancar and yemreinci April 28, 2021 02:58
@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@ihsandemir ihsandemir merged commit 6d4a8ff into hazelcast:master Apr 28, 2021
@ihsandemir ihsandemir deleted the rel_topic_fix branch April 28, 2021 07:37
@degerhz degerhz changed the title client_config::reliable_topic_config_map_ should not be modified concurrently client_config::reliable_topic_config_map_ should not be modified concurrently [API-384] Apr 28, 2021
@devOpsHazelcast
Copy link
Contributor

Linux test PASSed.

@devOpsHazelcast
Copy link
Contributor

Windows test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReliableTopic listener stops with an unexpected exception
4 participants