Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realloc calling free #447

Closed
rgowrishankar opened this issue Jul 20, 2016 · 14 comments
Closed

realloc calling free #447

rgowrishankar opened this issue Jul 20, 2016 · 14 comments

Comments

@rgowrishankar
Copy link

rgowrishankar commented Jul 20, 2016

In my program, I am seeing an intermittent crash:

#0  0x00007fc34859b5f7 in raise () from /lib64/libc.so.6
#1  0x00007fc34859cce8 in abort () from /lib64/libc.so.6
#2  0x00007fc3485db317 in __libc_message () from /lib64/libc.so.6
#3  0x00007fc3485e3023 in _int_free () from /lib64/libc.so.6
#4  0x00007fc3485e51b0 in _int_realloc () from /lib64/libc.so.6
#5  0x00007fc3485e5f22 in realloc () from /lib64/libc.so.6
#6  0x00007fc34a56d79d in sdsMakeRoomFor () from /lib64/libhiredis.so.0.12
#7  0x00007fc34a56d90a in sdscatlen () from /lib64/libhiredis.so.0.12
#8  0x00007fc34a56d2a0 in __redisAppendCommand () from /lib64/libhiredis.so.0.12
#9  0x00007fc34a56fac2 in __redisAsyncCommand () from /lib64/libhiredis.so.0.12
#10 0x00007fc34a570498 in redisvAsyncCommand () from /lib64/libhiredis.so.0.12
#11 0x00007fc34a57055a in redisAsyncCommand () from /lib64/libhiredis.so.0.12

From the realloc documentation:
if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr)

I am sending in a string like "hmset aa bb cc" (a longer one). I will try to write a smaller program that can reproduce it, but I am not sure if I will be able to consistently reporoduce it. I am running with version 12 on CentOS 7 platform. I could not find a dbg package on CentOS platform. If you can give me instructions on creating debug symbols I can load the coredump into GDB and gather more information.
Thanks,

Ramkumar

@michael-grunder
Copy link
Collaborator

Hey,

I'm not sure where you would get a debug package for hiredis on CentOS but it should be really easy to just build from source.

Just download the source and build it with a make noopt command. Once you've got it built, you should have the .so and .a file. From there you can just link directly from that within your program.

cc [your .c files] /full/path/to/libhiredis.a

Or if you can create a small program where you can consistently reproduce the problem, just post it here and I can run it in a CentOS VM.

Cheers
Mike

@rgowrishankar
Copy link
Author

Is there a way to load the symbols in gdb to look at an existing coredump?

@michael-grunder
Copy link
Collaborator

I've never really tried to do this, but it looks like you can, yes:

First build hiredis (I think you'll need to keep the same optimization setting for the build).

Then use objcopy to get just the symbols:

objcopy --only-keep-symbols libhiredis.so hiredis.symbols

Finally start gdb like so:

gdb --core <your.core.file> --symbols=hiredis.symbols

This worked for me in a CentOS vm, but I just made test program, and didn't try it against libhiredis itself.

@badboy
Copy link
Contributor

badboy commented Jul 21, 2016

I'm not sure what exactly is failing there. realloc will also call free on the given pointer if it has to re-allocate and move data.

I only see see two ways how this can lead to a failure:

  1. A already broken pointer is passed, making it fail in completely unpredictable ways
  2. The pointer is concurrently modified from somewhere else, making it a data race.

@rgowrishankar
Copy link
Author

rgowrishankar commented Jul 22, 2016

I allocate a new char* and set it to the command that needs to be sent to redis and I pass that in to redisAsyncCommand as both the privdata and format: For example:
char *message; asprintf(message, "command"); redisAsyncCommand(ac, process_callback, message, message);
and I free message in process_callback.
What is the preferred way of calling the redisAsyncCommand?
In a side note, if I use char message[100] instead of char* and dynamic allocation to pass the message to redisAsyncCommand it should still work right?

If I look at the code, in __redisAppendCommand(redisContext *c, const char *cmd, size_t len)
redisContext is passed in to redisAsyncCommand by my code, cmd is initialized in redisvAsyncCommand via call to redisvFormatCommand and then len is also returned by redisvFormatCommand. I don't think the message that I pass into redisAsyncCommand is causing any corruption. I will try to write a small program that does redis transactions to try to reproduce the issue.

@michael-grunder
Copy link
Collaborator

Also run the process through valgrind if you haven't yet. That might give you the hints you need.

Also if you post a small program with the problem I'm happy to help you track down the problem.

@rgowrishankar
Copy link
Author

I ran my original program with valgrind and I see the following with version0.12:
==00:00:31:47.008 4575== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==00:00:31:47.008 4575== by 0x4E3A647: sdsnewlen (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.008 4575== by 0x4E3A0C4: redisBufferWrite (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.008 4575== by 0x4E3D3E5: redisAsyncHandleWrite (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.008 4575== by 0x591A48: redis_source_dispatch (glib.h:95)
==00:00:31:47.008 4575== by 0x508B999: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4000.0)

==00:00:31:47.020 4575== 82 bytes in 3 blocks are possibly lost in loss record 1,357 of 2,306
==00:00:31:47.020 4575== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==00:00:31:47.020 4575== by 0x4E3A647: sdsnewlen (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.020 4575== by 0x4E3CBB4: ??? (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.020 4575== by 0x4E3D497: redisvAsyncCommand (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.020 4575== by 0x4E3D559: redisAsyncCommand (in /usr/lib64/libhiredis.so.0.12)

==00:00:31:47.061 4575== 2,390 bytes in 2 blocks are possibly lost in loss record 2,179 of 2,306
==00:00:31:47.061 4575== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==00:00:31:47.061 4575== by 0x4E3A79C: sdsMakeRoomFor (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.061 4575== by 0x4E3A909: sdscatlen (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.061 4575== by 0x4E3DA3B: redisReaderFeed (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.061 4575== by 0x4E39F4C: redisBufferRead (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.061 4575== by 0x4E3D336: redisAsyncHandleRead (in /usr/lib64/libhiredis.so.0.12)
==00:00:31:47.061 4575== by 0x591A60: redis_source_dispatch (glib.h:100)

@rgowrishankar
Copy link
Author

A different crash that I am seeing has the following backtrace
, "stacktrace":
[ { "crash_thread": true
, "frames":
[ { "address": 140439883885772
, "build_id": "e36c6455b21c1cb68020709a8d5466dcfd2d47f2"
, "build_id_offset": 523468
, "function_name": "__libc_free"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 140439916953504
, "build_id": "cdaee6b30ac02a9c7e4538e32d46094836a90e61"
, "build_id_offset": 28576
, "function_name": "_dictClear"
, "file_name": "/lib64/libhiredis.so.0.12"
}
, { "address": 140439916954293
, "build_id": "cdaee6b30ac02a9c7e4538e32d46094836a90e61"
, "build_id_offset": 29365
, "function_name": "__redisAsyncFree"
, "file_name": "/lib64/libhiredis.so.0.12"
}
, { "address": 5840993
, "build_id": "905f073aa6729f53933b1d05d2e6ecc74f934c22"
, "build_id_offset": 1646689
, "function_name": "redis_source_dispatch"
, "file_name": "/usr/sbin/ovs-vswitchd"
}
, { "address": 140439913871770
, "build_id": "659986b68d0e9d920a9e693546262250b895dca8"
, "build_id_offset": 301466
, "function_name": "g_main_context_dispatch"
, "file_name": "/lib64/libglib-2.0.so.0"

@michael-grunder
Copy link
Collaborator

Hey,

The second error looks like maybe you are double freeing the redisAsyncContext. Are you perhaps disconnecting, and then making an explicit call to redisAsyncFree? Hiredis will free your context for you when you call redisAsyncDisconnect.

I'm just guessing at this point because there isn't really enough to go on without an example program to replicate the error.

@rgowrishankar
Copy link
Author

Hi Michael,

thanks for the reply. The new crash is not happening while cleanup, it is during normal operation. I am running a standalone program trying to reproduce the crash.

I do have a question about using hiredis in multi-threaded program. I have multiple threads that are generating the data that needs to go into redis and I am using a global redisAsyncContext and all the threads are doing redisAsyncCommand using the same context. As long as the context is protected using a global lock, do you see any issues? Should I have only one thread using one redisAsyncContext?

Ram

@rgowrishankar
Copy link
Author

memleak_hiredis.txt

I have a sample program that shows memleak. I have attached it.
To run this:
gcc memleak_hiredis.c -I /usr/include/hiredis -I /usr/include/glib-2.0/ -I /usr/lib64/glib-2.0/include/ -l hiredis -l glib-2.0 -l pthread -g -O0
valgrind --leak-check=yes ~/a.out

There are 2 glib contexts, one for doing async writes and other for subscription.
Child thread opens 2 connections to redis and does subscribe on one of them
Main thread uses the other connection to send the data via async send.
==5648==
==5648== HEAP SUMMARY:
==5648== in use at exit: 5,285 bytes in 58 blocks
==5648== total heap usage: 669 allocs, 611 frees, 26,361 bytes allocated
==5648==
==5648== 9 bytes in 1 blocks are possibly lost in loss record 7 of 58
==5648== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5648== by 0x4E3AF2B: sdsnewlen (sds.c:55)
==5648== by 0x4E3AFC6: sdsempty (sds.c:71)
==5648== by 0x4E3A8E0: redisBufferWrite (hiredis.c:808)
==5648== by 0x4E3EC7D: redisAsyncHandleWrite (async.c:548)
==5648== by 0x40131D: redis_source_dispatch (glib.h:95)
==5648== by 0x508D7A9: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DAF7: ??? (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DBAB: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x4014F5: process_pending_events (memleak_hiredis.c:63)
==5648== by 0x401A82: redis_notifications_handler (memleak_hiredis.c:267)
==5648== by 0x5382DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==5648==
==5648== 9 bytes in 1 blocks are possibly lost in loss record 8 of 58
==5648== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5648== by 0x4E3AF2B: sdsnewlen (sds.c:55)
==5648== by 0x4E3AFC6: sdsempty (sds.c:71)
==5648== by 0x4E3A8E0: redisBufferWrite (hiredis.c:808)
==5648== by 0x4E3EC7D: redisAsyncHandleWrite (async.c:548)
==5648== by 0x40131D: redis_source_dispatch (glib.h:95)
==5648== by 0x508D7A9: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DAF7: ??? (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DBAB: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x4014E1: process_pending_events (memleak_hiredis.c:62)
==5648== by 0x401A82: redis_notifications_handler (memleak_hiredis.c:267)
==5648== by 0x5382DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==5648==
==5648== 21 bytes in 1 blocks are possibly lost in loss record 17 of 58
==5648== at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5648== by 0x4E3AF2B: sdsnewlen (sds.c:55)
==5648== by 0x4E3EF2E: __redisAsyncCommand (async.c:611)
==5648== by 0x4E3F13C: redisvAsyncCommand (async.c:658)
==5648== by 0x4E3F204: redisAsyncCommand (async.c:667)
==5648== by 0x4016FD: init_redis_subscribe (memleak_hiredis.c:119)
==5648== by 0x4017FA: init_everything (memleak_hiredis.c:149)
==5648== by 0x401A64: redis_notifications_handler (memleak_hiredis.c:263)
==5648== by 0x5382DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==5648== by 0x568DCEC: clone (in /usr/lib64/libc-2.17.so)
==5648==
==5648== 135 bytes in 1 blocks are possibly lost in loss record 51 of 58
==5648== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5648== by 0x4E3B170: sdsMakeRoomFor (sds.c:142)
==5648== by 0x4E3B3DB: sdscatlen (sds.c:241)
==5648== by 0x4E40545: redisReaderFeed (read.c:462)
==5648== by 0x4E3A794: redisBufferRead (hiredis.c:772)
==5648== by 0x4E3EBC7: redisAsyncHandleRead (async.c:526)
==5648== by 0x401354: redis_source_dispatch (glib.h:100)
==5648== by 0x508D7A9: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DAF7: ??? (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DBAB: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x4014E1: process_pending_events (memleak_hiredis.c:62)
==5648== by 0x401A82: redis_notifications_handler (memleak_hiredis.c:267)
==5648==
==5648== 1,223 bytes in 1 blocks are possibly lost in loss record 58 of 58
==5648== at 0x4C2BB4A: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5648== by 0x4E3B170: sdsMakeRoomFor (sds.c:142)
==5648== by 0x4E3B3DB: sdscatlen (sds.c:241)
==5648== by 0x4E40545: redisReaderFeed (read.c:462)
==5648== by 0x4E3A794: redisBufferRead (hiredis.c:772)
==5648== by 0x4E3EBC7: redisAsyncHandleRead (async.c:526)
==5648== by 0x401354: redis_source_dispatch (glib.h:100)
==5648== by 0x508D7A9: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DAF7: ??? (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x508DBAB: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.4200.2)
==5648== by 0x4014F5: process_pending_events (memleak_hiredis.c:63)
==5648== by 0x401A82: redis_notifications_handler (memleak_hiredis.c:267)
==5648==
==5648== LEAK SUMMARY:
==5648== definitely lost: 0 bytes in 0 blocks
==5648== indirectly lost: 0 bytes in 0 blocks
==5648== possibly lost: 1,397 bytes in 5 blocks
==5648== still reachable: 3,888 bytes in 53 blocks
==5648== suppressed: 0 bytes in 0 blocks
==5648== Reachable blocks (those to which a pointer was found) are not shown.
==5648== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5648==
==5648== For counts of detected and suppressed errors, rerun with: -v
==5648== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 1 from 1)

@badboy
Copy link
Contributor

badboy commented Aug 27, 2016

All those "possibly lost" messages are likely false positives. The used string library, sds, does some fancy pointer shuffling to make it look like normal char pointers by pointing right after the actual header of the data, which means it's pointing somewhere in the middle of an allocated block. valgrind can't detect this.

As the example does not lead to a crash, I still have no idea what's causing the crash in your application.

Regarding a single globally-locked context used from multiple threads: I don't know glib's internals. It might be that it is not all to happy to get the write events from multiple threads, even if those events are mutually exclusive.

@jkarneges
Copy link

jkarneges commented Dec 10, 2016

(EDIT: This may be a false alarm. I see in the code that hiredis grows a read buffer to 1024 bytes, and so in our small tests of sending a few commands it looked like a leak but maybe isn't. Leaving this comment here in case it helps another person.)

We're seeing similar "possibly lost" indications from valgrind, with a process that continues to grow in size (to hundreds of MB, until our process manager restarts it). It is a relatively small program so I think there is possibly a real leak here.

It's only possible to see the "possibly lost" blocks if we abort the process uncleanly. Normally our program cleans up on termination, and if we terminate cleanly then there are no lost blocks reported (not even possibly lost).

Any chance the realloc is repeatedly extending a buffer when it shouldn't need to? This could explain how a buffer could get longer and longer while running, yet still get freed on clean termination since there's only one pointer to manage it, as opposed to losing many pointers that still point to things.

==31462== 4,818 bytes in 16 blocks are possibly lost in loss record 419 of 438
==31462==    at 0x4C2FD5F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31462==    by 0x50A57FD: sdsMakeRoomFor (in /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13)
==31462==    by 0x50A5939: sdscatlen (in /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13)
==31462==    by 0x50A89AB: redisReaderFeed (in /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13)
==31462==    by 0x50A4FAC: redisBufferRead (in /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13)
==31462==    by 0x50A8296: redisAsyncHandleRead (in /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13)

I should note this is with hiredis 0.13.3, from the Ubuntu 16.04 package.

@dagostinelli
Copy link

FYI: This is the same issue as #493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants