-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory corruption while calling SysV msgsnd & msgrcv #551
Conversation
Here is a snippet of man page for msgsnd & msgrcv, ``` int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg); ssize_t msgrcv(int msqid, void *msgp, size_t msgsz, long msgtyp, int msgflg); DESCRIPTION The msgsnd() and msgrcv() system calls are used to send messages to, and receive messages from, a System V message queue. The calling process must have write permission on the message queue in order to send a message, and read permission to receive a message. The msgp argument is a pointer to a caller-defined structure of the following general form: struct msgbuf { long mtype; /* message type, must be > 0 */ char mtext[1]; /* message data */ }; The mtext field is an array (or other structure) whose size is specified by msgsz, a nonnegative integer value. Messages of zero length (i.e., no mtext field) are permitted. The mtype field must have a strictly positive integer value. This value can be used by the receiving process for message selection (see the description of msgrcv() below). ``` According to it, the 3rd arg of both msgsnd and msgrcv must pass the size of mtext i.e. excluding the mtype. If we apply this logic into our context, we must pass the sizeof(QMessage->data), not sizeof(QMessage). We can also find the corruption using valgrind like below, ``` valgrind --trace-children=yes --tool=memcheck --leak-check=no pgcopydb copy table-data ``` It gives the following summary, ``` ==640986== Syscall param msgsnd(msgp->mtext) points to uninitialised byte(s) ==640986== at 0x4AC92AA: msgsnd (msgsnd.c:25) ==640986== by 0x14E178: queue_send (queue_utils.c:100) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:403) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:295) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:232) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:209) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:117) ==640986== by 0x162643: UnknownInlinedFun (table-data.c:89) ==640986== by 0x162643: copydb_copy_all_table_data (table-data.c:57) ==640986== by 0x11406E: cli_copy_table_data (cli_copy.c:395) ==640986== by 0x1118B2: UnknownInlinedFun (commandline.c:71) ==640986== by 0x1118B2: main (main.c:142) ==640986== Address 0x1ffeeec5f0 is on thread 1's stack ==640986== in frame #2, created by copydb_copy_all_table_data (table-data.c:38) ==640986== ==640986== Syscall param msgsnd(msgp->mtext) points to uninitialised byte(s) ==640986== at 0x4AC92AA: msgsnd (msgsnd.c:25) ==640986== by 0x14E178: queue_send (queue_utils.c:100) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:380) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:307) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:232) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:209) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:117) ==640986== by 0x162AD8: UnknownInlinedFun (table-data.c:89) ==640986== by 0x162AD8: copydb_copy_all_table_data (table-data.c:57) ==640986== by 0x11406E: cli_copy_table_data (cli_copy.c:395) ==640986== by 0x1118B2: UnknownInlinedFun (commandline.c:71) ==640986== by 0x1118B2: main (main.c:142) ==640986== Address 0x1ffeeec5f0 is on thread 1's stack ==640986== in frame #2, created by copydb_copy_all_table_data (table-data.c:38) ==640986== ... ``` The above error disappers with the fix. Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
Well I don't know if we should revert previous patches. It seems like we put too much pressure on the stack allocation everywhere in pgcopydb and should use more of the heap instead... |
@dimitri I see lots of noise created by free due to heap allocation for smaller objects. IMHO, stack is always better for smaller short lived objects. |
That's my opinion too. Can you create a PR that gets back to using the stack for these small objects? |
@arajkumar I'm really happy that the root cause for this issue is discovered. And thanks for the detailed description. |
Here is a snippet of man page for msgsnd & msgrcv,
According to the man page, the 3rd argument(i.e.msgsz) of both msgsnd and msgrcv must pass the size of mtext i.e. excluding the size of mtype.
If we apply this logic into our context, we must pass the sizeof(QMessage->data), not sizeof(QMessage).
We can also find the corruption using valgrind like below,
It gives the following summary,
The above error disappers with our fix.