Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with function wrapping when multiple addresses for the same name exist #7004

Open
ShangzhiXu opened this issue Sep 24, 2024 · 4 comments

Comments

@ShangzhiXu
Copy link

Describe the bug
Hi there! Firstly,thanks for your great work. But recently I tried to use dynamorio to wrap functions, like this:

static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) {
    dr_printf("Module loaded: %s\n", mod->full_path);
    dr_printf("total functions: %d\n", num_functions);
    for (int i = 0; i < num_functions; i++) {
        app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, function_names[i]);
        if (towrap != NULL) {
            dr_printf("Wrapping function: %s at address %p\n", function_names[i], towrap);
            drwrap_wrap_ex(towrap, generic_wrap_pre, generic_wrap_post, (void *)function_names[i], 0);
        } else {
            dr_printf("Failed to locate function: %s in module %s\n", function_names[i], mod->full_path);
        }
  }
}

the output is:

Module loaded: /usr/lib/x86_64-linux-gnu/libc.so.6
total functions: 3
Wrapping function: memcpy at address 0x00007f0dfca0ecb0
Wrapping function: printf at address 0x00007f0dfc9be5b0
Wrapping function: strcpy at address 0x00007f0dfcac1b70

the outputs shows that dynamorio can wrap functions like 'memcpy' 'strcpy' and 'printf' but the problem is that when I do

static void generic_wrap_pre(void *wrapcxt, void **user_data) {
    const char *func_name = (const char *)*user_data;
    dr_printf("Function %s is called\n", func_name);  // Added logging
    call_stack.push(func_name);
}

and when 'memcpy' 'strcpy' and 'printf' are called in target binary, only printf can be traced, other two can't be traced..
To Reproduce
Steps to reproduce the behavior:
Like above

my command is:
../DynamoRIO-Linux-10.0.19811/bin64/drrun -c client_printGraph.so -- test/buffer_overflow

Versions

  • What version of DynamoRIO are you using? master version
  • Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem? no
  • What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.) Linux 6.1.0-23-amd64
  • Is your application 32-bit or 64-bit? 64bit
@derekbruening
Copy link
Contributor

Please clarify "can't be traced". Do you observe the application's code reaching the memcpy libc entry address? Are you sure it's not just that all cases of memcpy in the application's own code aren't inlined and control never reaches libc memcpy? Debug build DR logs can be used to see all addresses encountered: https://dynamorio.org/page_logging.html

@ShangzhiXu
Copy link
Author

ShangzhiXu commented Sep 26, 2024

Thank you so much for your response! I believe I've identified the issue.
so firstly, the "can't be traced" means memcpy was called in my target program but DynamoRIO failed to record the call.

The cause is this:
In my glibc, there are actually two memcpy, with different version

# readelf -s /usr/lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
   497: 00000000000b1270     9 FUNC    WEAK   DEFAULT   16 wmemcpy@@GLIBC_2.2.5
  2724: 00000000000a2cb0    40 FUNC    GLOBAL DEFAULT   16 memcpy@GLIBC_2.2.5
  2726: 000000000009bdb0   265 IFUNC   GLOBAL DEFAULT   16 memcpy@@GLIBC_2.14

And in my target program, memcpy@@GLIBC_2.14 was linked by default, the PLT looks like this:

0000000000001040 <memcpy@plt>:
    1040:       ff 25 c2 2f 00 00       jmp    *0x2fc2(%rip)        # 4008 <memcpy@GLIBC_2.14>
    1046:       68 01 00 00 00          push   $0x1
    104b:       e9 d0 ff ff ff          jmp    1020 <_init+0x20>

But by default, if I use drwarp like this

        app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, function_names[i]);
        if (towrap != NULL) {
            drwrap_wrap_ex(towrap, generic_wrap_pre, generic_wrap_post, (void *)function_names[i], 0);
        }

it will wrap memcpy@GLIBC_2.2.5.

To resolve this, I created a custom shared library (override_memcpy.c) to force memcpy@GLIBC_2.2.5 using LD_PRELOAD. After doing this, DynamoRIO successfully reported the memcpy calls.

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
#include <dlfcn.h>

void *memcpy(void *dest, const void *src, size_t n) {
    // Use `dlvsym` to find `memcpy` with the specific version `GLIBC_2.2.5`
    static void *(*original_memcpy)(void *, const void *, size_t) = NULL;

    if (!original_memcpy) {
        // Look up the `memcpy` symbol with version `GLIBC_2.2.5`
        original_memcpy = dlvsym(RTLD_NEXT, "memcpy", "GLIBC_2.2.5");
        if (!original_memcpy) {
            fprintf(stderr, "Failed to find memcpy@GLIBC_2.2.5\n");
            return NULL;
        }
    }
    return original_memcpy(dest, src, n);
}

and use LD_PRELOAD=./override_memcpy.so to forcely let my target program to load memcpy@GLIBC_2.2.5.
After that, DynamoRIO was able to trace memcpy correctly.

@derekbruening
Copy link
Contributor

It sounds like you want to use drsym_enumerate_symbols_ex() to walk all symbols and find all memcpy copies; or possibly have drsym_lookup_symbol() or dr_get_proc_address() support iteration instead of returning just one.

If you try drsym_enumerate_symbols_ex() and it works, could you submit a PR to improve the drwrap and drsym_lookup_symbol()/dr_get_proc_address() docs so that others will be aware of the possibility of multiple symbols?

@derekbruening derekbruening changed the title Problems with function wrapping Problems with function wrapping when multiple addresses for the same name exist Sep 26, 2024
@ShangzhiXu
Copy link
Author

ShangzhiXu commented Sep 27, 2024

Thanks! I think I made it with drsym_enumerate_symbols_ex()
Now in my target program, the plt is still like

0000000000001060 <memcpy@plt>:
    1060:       ff 25 b2 2f 00 00       jmp    *0x2fb2(%rip)        # 4018 <memcpy@GLIBC_2.14>
    1066:       68 03 00 00 00          push   $0x3
    106b:       e9 b0 ff ff ff          jmp    1020 <_init+0x20>

And I tried to use drsym_enumerate_symbols_ex() like this:

static bool symbol_filter(drsym_info_t *info, drsym_error_t status, void *data) {
    if (strcmp(info->name, "memcpy") == 0) {
        app_pc start = (app_pc)data; // Assuming data is the start address of the module
        app_pc func_pc = start + info->start_offs; // Correct pointer arithmetic
        // Wrap
        drwrap_wrap_ex(func_pc, generic_wrap_pre, generic_wrap_post, (void *)"memcpy", 0);
       dr_printf("Wrapped function: %s at address: %p\n", info->name, func_pc);
    }
    return true;
}

/* Event called when module is loaded */
static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) {
    if (loaded) {
        drsym_error_t sym_result;
        sym_result = drsym_enumerate_symbols_ex(mod->full_path, symbol_filter, sizeof(drsym_info_t), (void*)mod->start, DRSYM_DEMANGLE);
        if (sym_result != DRSYM_SUCCESS) {
            dr_printf("Failed to enumerate symbols for module %s\n", mod->full_path);
        }
    }
}

In the output, I found out that three different memcpy are wrapped

Wrapped function: memcpy at address: 0x00007f5cbc5ec560
Wrapped function: memcpy at address: 0x00007f5cbbe0bdb0

And in Dynamorio debug info, I do found 0x00007f84f2865399 e8 c2 b1 01 00 call $0x00007f84f2880560 %rsp which means the memcpy been called is wrapped successfully.

I'll try my best to submit a PR to enhance the drwrap functionality

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants