Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] QLTabletTest.OperationMemTracking is flaky #3254

Closed
bmatican opened this issue Jan 6, 2020 · 1 comment
Closed

[docdb] QLTabletTest.OperationMemTracking is flaky #3254

bmatican opened this issue Jan 6, 2020 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented Jan 6, 2020

Jira Link: DB-1608
Started off trying to fix a TSAN race:

Location is global 'fLI::FLAGS_tablet_inject_latency_on_apply_write_txn_ms' of size 4 at 0x7fed75541e40 (libtablet.so+0x0000002aee40)

After that though, test is still flaky. Down to 13/500 failures with

ybd tsan --cxx-test client_ql-tablet-test --gtest_filter QLTabletTest.OperationMemTracking -n 500 --tp 4

Current set of errors seen:

7 times

../../src/yb/client/ql-tablet-test.cc:1020: Failure
Value of: tracked_by_log_cache
  Actual: false
Expected: true

4 times

../../src/yb/client/ql-tablet-test.cc:1009: Failure
Failed
Bad status: IO error (yb/client/batcher.cc:231): Errors occured while reaching out to the tablet servers

1 time

../../src/yb/client/ql-tablet-test.cc:1003: Failure
Expected: (operation_tracker_consumption + log_cache_consumption) <= (kValueSize * 2), actual: 132508 vs 131072

1 time

WARNING: ThreadSanitizer: data race (pid=16953)
  Atomic read of size 1 at 0x7f6d0e93d8a0 by thread T26 (mutexes: write M334809244937517040):
    #0 pthread_rwlock_wrlock /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/build/common/llvm-7.0.1.src/../../../src/llvm-7.0.1.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1325 (ql-tablet-test+0x4789b3)
    #1 google::LogMessage::Flush() <null> (libglog.so.0+0x10e09)
    #2 google::LogMessage::~LogMessage() <null> (libglog.so.0+0x10c0e)
    #3 yb::KernelStackWatchdog::RunThread() /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/util/kernel_stack_watchdog.cc:139:13 (libyb_util.so+0x2bfad6)
    #4 decltype(*(std::__1::forward<yb::KernelStackWatchdog*&>(fp0)).*fp()) std::__1::__invoke<void (yb::KernelStackWatchdog::*&)(), yb::KernelStackWatchdog*&, void>(void (yb::KernelStackWatchdog::*&)(), yb::KernelStackWatchdog*&) /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/type_traits:4286:1 (libyb_util.so+0x2c0f53)
    #5 std::__1::__bind_return<void (yb::KernelStackWatchdog::*)(), std::__1::tuple<yb::KernelStackWatchdog*>, std::__1::tuple<>, __is_valid_bind_return<void (yb::KernelStackWatchdog::*)(), std::__1::tuple<yb::KernelStackWatchdog*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (yb::KernelStackWatchdog::*)(), std::__1::tuple<yb::KernelStackWatchdog*>, 0ul, std::__1::tuple<> >(void (yb::KernelStackWatchdog::*&)(), std::__1::tuple<yb::KernelStackWatchdog*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/functional:2216 (libyb_util.so+0x2c0f53)
    #6 std::__1::__bind_return<void (yb::KernelStackWatchdog::*)(), std::__1::tuple<yb::KernelStackWatchdog*>, std::__1::tuple<>, __is_valid_bind_return<void (yb::KernelStackWatchdog::*)(), std::__1::tuple<yb::KernelStackWatchdog*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>::operator()<>() /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/functional:2249 (libyb_util.so+0x2c0f53)
    #7 decltype(std::__1::forward<std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>&>(fp)()) std::__1::__invoke<std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>&>(std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>&) /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/type_traits:4345 (libyb_util.so+0x2c0f53)
    #8 void std::__1::__invoke_void_return_wrapper<void>::__call<std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>&>(std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>&) /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/__functional_base:349 (libyb_util.so+0x2c0f53)
    #9 std::__1::__function::__func<std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*>, std::__1::allocator<std::__1::__bind<void (yb::KernelStackWatchdog::*)(), yb::KernelStackWatchdog*> >, void ()>::operator()() /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/functional:1562:12 (libyb_util.so+0x2c0e5d)
    #10 std::__1::function<void ()>::operator()() const /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/installed/tsan/libcxx/include/c++/v1/functional:1913:12 (libyb-redis.so+0xe7cf4)
    #11 yb::Thread::SuperviseThread(void*) /n/users/bogdan/code/yugabyte-db/build/tsan-clang-dynamic-ninja/../../src/yb/util/thread.cc:739:3 (libyb_util.so+0x37a859)

  Previous write of size 1 at 0x7f6d0e93d8a0 by main thread:
    #0 pthread_rwlock_destroy /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/build/common/llvm-7.0.1.src/../../../src/llvm-7.0.1.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1286 (ql-tablet-test+0x478394)
    #1 glog_internal_namespace_::Mutex::~Mutex() <null> (libglog.so.0+0x18b48)
    #2 cxa_at_exit_wrapper(void*) /n/jenkins/thirdparty/yugabyte-thirdparty-2019-12-07T23_26_21/thirdparty/build/common/llvm-7.0.1.src/../../../src/llvm-7.0.1.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:386 (ql-tablet-test+0x472253)

  Location is global 'google::log_mutex' of size 64 at 0x7f6d0e93d8a0 (libglog.so.0+0x00000022e8a0)

  Mutex M334809244937517040 is already destroyed.
@bmatican bmatican added kind/failing-test Tests and testing infra area/docdb YugabyteDB core features labels Jan 6, 2020
@bmatican bmatican self-assigned this Jan 6, 2020
bmatican added a commit that referenced this issue Jan 13, 2020
Summary: Test was changing the value while background threads were potentially accessing it.

Test Plan: `ybd tsan --cxx-test client_ql-tablet-test --gtest_filter QLTabletTest.OperationMemTracking -n 500 --tp 4`

Reviewers: sergei, mikhail

Reviewed By: mikhail

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7760
@bmatican
Copy link
Contributor Author

Found another race due to an unprotected, not atomic write from the test

  Previous write of size 4 at 0x7f23d64e22d0 by main thread:
    #0 yb::client::QLTabletTest_OperationMemTracking_Test::TestBody() /nfusr/centos-gcp-cloud/jenkins-worker-f66/jenkins/jenkins-github-yugabyte-db-centos-master-clang-tsan-1255/build/tsan-clang-dynamic-ninja/../../src/yb/client/ql-tablet-test.cc:968:58 (ql-tablet-test+0x4f6723)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants