Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_roles/role_clients_tests: use SEASTAR_THREAD_TEST_CASE #18052

Merged
merged 1 commit into from
Apr 25, 2024

Conversation

andijcr
Copy link
Contributor

@andijcr andijcr commented Apr 24, 2024

use of setenv under seastar requires to run under a initialized seastar context, since internal memory management functions will be called

https://buildkite.com/redpanda/redpanda/builds/48184#018f0cd6-3087-493c-bed2-d02e69fc3496

TRACE 2024-04-23 23:14:55,759 [shard 0:main] http - / - client.cc:314 - chunk received, chunk length 505
DEBUG 2024-04-23 23:14:55,759 [shard 0:main] http - iobuf_body.cc:82 - reader - finish called
test_cloud_roles_rpunit: /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1155: void seastar::memory::cpu_pages::shrink(void *, size_t): Assertion `object_cpu_id(ptr) == cpu_id' failed.
Aborting.
Backtrace:
  0xac39f3
  0xb1850b

backported here: #18042

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

use of setenv under seastar requires to run under a initialized seastar
context, since internal memory management functions will be called
@dotnwat
Copy link
Member

dotnwat commented Apr 24, 2024

i updated the cover letter

@travisdowns
Copy link
Member

travisdowns commented Apr 24, 2024

Can you give any more context around "since internal memory management functions will be called"?

Seastar does expect that allocations can happen before the seastar heap is initialized, and tries to use the original heap in that case (then there's a second stage where it initializes a mini heap, then finally the full heap). Is it specific to realloc, e.g., calling realloc inside a seastar context fails if the original allocation was outside?

(this comment is non-blocking, I'm mostly just curious)

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 24, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48219#018f10fa-4b32-4bb5-ac61-d78d07c2604e:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

new failures in https://buildkite.com/redpanda/redpanda/builds/48219#018f1100-d77f-415a-925a-7365f495bf16:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

@andijcr
Copy link
Contributor Author

andijcr commented Apr 24, 2024

Can you give any more context around "since internal memory management functions will be called"?

Seastar does expect that allocations can happen before the seastar heap is initialized, and tries to use the original heap in that case (then there's a second stage where it initializes a mini heap, then finally the full heap). Is it specific to realloc, e.g., calling realloc inside a seastar context fails if the original allocation was outside?

(this comment is non-blocking, I'm mostly just curious)

didn't investigate why it's triggering this assertion, but this is the relevant stacktrace when executing as BOOST TEST

[Backtrace #8]
seastar::memory::cpu_pages::shrink(void*, unsigned long) at /home/andrea/redpanda/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1155
 (inlined by) seastar::memory::shrink(void*, unsigned long) at /home/andrea/redpanda/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1732
 (inlined by) realloc at /home/andrea/redpanda/redpanda/vbuild/release/clang/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:2223
[Backtrace #9]
__add_to_environ at ./stdlib/setenv.c:156
[Backtrace #10]
ask_authority_host_read_test::test_method() at /home/andrea/redpanda/redpanda/src/v/cloud_roles/tests/role_client_tests.cc:165
[Backtrace #11]
ask_authority_host_read_test_invoker() at /home/andrea/redpanda/redpanda/src/v/cloud_roles/tests/role_client_tests.cc:154

the surprising thing is that yesterday locally it didn't fail, but it started today once i updated vtools.

@dotnwat
Copy link
Member

dotnwat commented Apr 24, 2024

Seastar does expect that allocations can happen before the seastar heap is initialized

Good point I guess we should make sure we aren't masking some other / new bug. This is the closet I could find in Seastar log, but its old enough that we'd have this patch in our fork. It's interesting because it also mentions setenv/shrink.

commit 2199ade2fa938f9d81f7eb7d52441c7023cabaa3
Author: Kefu Chai <tchaikov@gmail.com>
Date:   Mon Oct 4 23:52:40 2021 +0800

    memory: always allocate buf using "malloc" for non reactor

    before this change, there is chance that Seastar reallocate a memory
    buffer allocated by the c runtime even before "original_malloc_func"
    is initialized. the runtime might want to allocate memory for some TLS
    variables or just for itself, like env variables.

    but we might want to call functions like setenv() from a reactor thread.
    and setenv() could reallocates the memory buffer previously allocated by
    the "main" thread. in this case, cpu_pages::shrink() panics if it is
    asked to resize a memory chunk allocated by another reactor thread.

    in this change,  __libc_malloc and friends are not overridden anymore.
    instead of using dlsym(RTLD_NEXT) for original_malloc_func, use
    __libc_malloc.

    the same applies to other malloc related functions.

    because musl libc does not define __malloc_trim() yet, see
    https://git.musl-libc.org/cgit/musl/tree/src/include/stdlib.h .
    the malloc_trim() implementation calls __malloc_trim() only if __GLIBC__
    is defined.

    the solution was proposed by Avi Kivity <avi@scylladb.com>

@dotnwat
Copy link
Member

dotnwat commented Apr 24, 2024

Could this be a case where realloc has no knowledge about original_malloc_func and is invoking the seastar malloc in some case?

@andijcr
Copy link
Contributor Author

andijcr commented Apr 24, 2024

should we also block the backport? #18042

@andijcr
Copy link
Contributor Author

andijcr commented Apr 24, 2024

ducktape failure is #18033

@travisdowns
Copy link
Member

@dotnwat yeah, that's the one, and this change was reverted right after, so the current state is actually the "before" state that Kefu describes in that change. However, I think the bug (in seastar) here is just that seastar does not support allocating a pointer on one shard than reallocating it smaller (but not zero) on another shard. This should work, just like cross-shard frees, but based on an inspection of the code it doesn't.

@travisdowns
Copy link
Member

Indeed, this test cases crashes if run with 2+ shards:

SEASTAR_TEST_CASE(test_cross_thread_realloc) {
    // needs at least 2 shards to test what we want to test but will
    // still pass fine on 1 shard

    void *p = malloc(100);

    auto other_shard = (this_shard_id() + 1) % smp::count;

    return smp::submit_to(other_shard, [p]{
        BOOST_REQUIRE(realloc(p, 50));
    });
}

@travisdowns
Copy link
Member

scylladb/seastar#2202

@andijcr
Copy link
Contributor Author

andijcr commented Apr 25, 2024

/ci-repeat 1 skip-redpanda-build

@andijcr andijcr merged commit 9c06089 into redpanda-data:dev Apr 25, 2024
20 checks passed
@andijcr andijcr deleted the fix/unit_test/setenv branch April 25, 2024 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants