-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible segfault #2
Comments
Thanks, this stack trace was very helpful. I'm wondering if it would be possible for you to determine which input created this? I have some guesses as to what might cause this, but without being able to replicate it will be difficult to fix. |
I'm working on that now. It's a file that's given me trouble before, probably something to do with control characters or something like that. I'll try and narrow it down to a specific line of code. I have a GUI that reads a delimited file into pandas, then runs various calculations on each column like min/max, frequency count, etc. I use natsort after I've determined that a column contains both numbers and characters to sort it naturally. |
I've been working on this for a while now and it's very frustrating. I've gotten the file that causes the crash down to 122Kb, but I can't get it any smaller. Here's a link: I've never used pastebin before, so hopefully that works, I don't see a way to add an attachment here. I also can't reproduce the crash on a smaller program than my full one, which is 500 lines of python, wx, etc. Hopefully looking at the file that causes the crash will help you. Otherwise, I'm stuck. Thanks again. P. S. The problem happens 100% of the time on a huge file (63MB), but happens intermittently on the pastebin file. |
Thanks, I'll take a look at this tonight. For reference, what system are you on? |
Linux lepore-desktop 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:13:00 UTC 2015 i686 athlon i686 GNU/Linux Running KDE. I spun up a Windows 7 virtual machine and did not get the error. |
I realize that this isn't the question that you asked, but I am finding that the sorting is not working properly because there is an NaN in your data. This confuses Python's sort because 5 < NaN is False and 5 > NaN is False. This created a jump discontinuity in your sorted data (see below). I will update
|
A TypeError is now raised if a '\0' appears in the input. This is a possible solution to issue #2.
Can you try testing with the development version that I have just pushed? My suspicion is that there was some problem when converting one of your inputs to a |
Off on vacation for a week, will test next Thursday. Thanks! On 05/22/2015 12:22 AM, Seth Morton wrote:
|
No luck with the development version, here's the error: home/lepore/.local/lib/python2.7/site-packages/pkg_resources/init.py:1250: UserWarning: /home/lepore/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable). Skipping line 21991: expected 26 fields, saw 31 Skipping line 44022: expected 26 fields, saw 31 [New Thread 0xac0ffb40 (LWP 5978)] Program received signal SIGSEGV, Segmentation fault. |
Can you try using the following function as a key to import sys
def printer(x):
print(x)
sys.stdout.flush()
return x
b = natsorted(your_data, key=printer) |
Since you have the source code, you can also add the following before line 24 in fast_atoi.c, preferably in conjuction with the fprintf(stdout, "fast_atoi string: %d\n", p);
while (white_space(*p)) { p += 1; } This should print out the string right before the problem occurs. |
I think you're making progress. The file that crashes fastnumbers that I posted above no longer crashes it. However, the larger file that the excerpt came from still crashes it. Here are the last values before the segfault: O&795577 Thanks for working on this! |
Great, this helps narrow down the possible problem. I wish that I had given you the right code to add, though. In the C function, can you change it to the following? fprintf(stdout, "fast_atoi string: ");
fprintf(stdout, "%s\n", p);
while (white_space(*p)) { p += 1; } I had accidentally had you use the In the python Last, if you do this multiple times, does it always crash on the same input, or does it change from run to run? Sorry to ask you to modify the tests again. I think we are making headway. |
Happy to help! Here's the latest output. It always crashes on this file, but on the smaller version it only crashed most of the time. (u'16062279', "u'16062279'") Do you need the GDB output? |
I imagine the GDB output won't tell anything we haven't seen before. One thing I notice right away from the two runs is that it is not failing on the same input, but they both begin with Could you let me know if you get a crash doing either of the following? First, try modifying the def printer(x):
print(x)
sys.stdout.flush()
return '' if x.startswith('O&') else x This will remove any string beginning with the "bad" characters from the pool. If you don't get any crashes with that, try the following: def printer(x):
print(x)
sys.stdout.flush()
return x.replace('O&') To see if we can stop the problem just by removing the leading bad characters. |
Unfortunately removing the bad characters isn't acceptable for my purposes (ditto for the nans). The data that I'm reading and sorting must remain exactly as it's written in the source file. Otherwise the output will not match the inputs. It's a government thing! Trying either new printer function I get: Traceback (most recent call last): |
If you made I wasn't suggesting removing the bad stuff for real, just in our debugging. |
Ahh! I see. Would that also apply to the replace code? I think so (was getting TypeError: replace() takes at least 2 arguments (1 given)). I added it there as well and got: 12138003 |
Sorry, it should be |
I should have seen that, sorry. I fixed that line and the file processed successfully! So it's something about the O& that's causing the problem? |
That's what it looks like. As a temporary workaround, can you try the following? a = natsorted(your_data, key=lambda x: x.replace("&", "$")) This will replace all ampersands with dollar signs. These are next to each other on the ASCII table, so it shouldn't mess up the sort order, but it might prevent this seg fault. This might get you by while I figure out the seg fault. |
Hmm.... fast_atoi string: O$ |
Ran the code in gdb again and got a different segfault: fast_atoi string: 11082136 Program received signal SIGSEGV, Segmentation fault. |
Ok, so it is related to having to split the string before sending to In the meantime, you can uninstall fastnumbers to avoid the segfault. |
No worries, I still have several weeks before initial deployment. Thanks for working so hard on this. |
I installed Kubuntu in a virtualbox (and wasn't that fun) but was unable to reproduce the problem, using the same code and data file as on my machine. The versions of Kubuntu were both 15.04. |
Huh... that doesn't give me much hope that I will be able to reproduce. It's not clear to me if the problem is originating from my C code, or if it originating from something else. Internally, |
Understood. I'll try to re-install everything and see if I can get my system like the virtualbox I set up. I'll let you know what happens. Thanks for working on this. |
You didn't happen to be using any special arguments to |
Nothing but: result_list = natsorted(result_list) I'll fiddle around some more with this when I get a chance. |
I re-created the crash on a Kubuntu 15.04 virtualbox image. I've saved the box as a .ova file, which you should be able to download and open in virtualbox. Please email me at greg@rhobard.com and I will give you the download address of the .ova file and some brief instructions on reproducing the error. Thanks! |
When dealing with unicode input, the python object needs to be converted to a bytes object before being converted to a character array. Previously, fastnumbers was relying on the python object remaining in memory when dealing with character arrays because a strcpy was not performed. Because extracting the character array from unicode requires a temporary python object which is quickly de-referenced, this is not a safe technique; the segfault is rare because python garbage collects de-references objects only periodically, so the character array typically remains in memory. To solve this issue, all character arrays are now explicitly copied with strcpy. This required modification of the conversion functions to free the character array memory before returning from fastnumbers. This resolves issue #2, and most likely resolves issue #1.
I would like to award @glepore70 the "Best Bug Reporter" imaginary internet award for taking the time to create a virtual machine image of the system on which the segfault occurs and sending it to me to debug. I don't imagine many users would go through the hassle to fix the problem... they would just uninstall and move on. Thanks so much! |
The segfault was related to making a bad assumption when dealing with character arrays. The Python C-API to get a if (PyBytes_Check(input)) {
str = PyBytes_AS_STRING(input);
} Note this is just a straight pointer assignment, no if (PyUnicode_Check(input)) {
temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
if (temp_bytes != NULL) {
str = PyBytes_AS_STRING(temp_bytes);
Py_DECREF(temp_bytes); // <-- Uh-Oh!
}
} To extract the The interesting thing is that Python only periodically performs garbage collection, so most of the time the temporary bytes object remains in memory for the duration of the The solution of this problem is to force if (PyBytes_Check(input)) {
PyBytes_AsStringAndSize(input, &s, &s_len);
str = malloc((size_t)s_len + 1);
strcpy(str, s);
} else if (PyUnicode_Check(input)) {
temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
if (temp_bytes != NULL) {
PyBytes_AsStringAndSize(temp_bytes, &s, &s_len);
str = malloc((size_t)s_len + 1);
strcpy(str, s); // <-- Now I own the contents of str
Py_DECREF(temp_bytes); // <-- Now not a problem
}
} The only caveat now is that I will merge this with *Calling |
I'm new to python and debugging things, but I seem to have come across a segfault in fastnumbers. I'm using natsort in my python code, and natsort recommended that I install fastnumbers, so I did. Now on rare occasions my code crashes and I can't figure out why. Unfortunately my code is very long and the file that crashes it is huge. However, here is the crash and backtrace from gdb:
Program received signal SIGSEGV, Segmentation fault.
fast_atoi (p=0xac980034 <error: Cannot access memory at address 0xac980034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
24 src/fast_atoi.c: No such file or directory.
(gdb) bt
#0 fast_atoi (p=0xac980034 <error: Cannot access memory at address 0xac980034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
#1 0xb01266f8 in fastnumbers_fast_int (self=0x0, args=0xadfc944c, kwargs=0x0) at src/fastnumbers.c:209
#2 0x0810a1bd in PyEval_EvalFrameEx ()
#3 0x08108dbd in PyEval_EvalCodeEx ()
#4 0x0810b975 in PyEval_EvalFrameEx ()
#5 0x0812299d in ?? ()
#6 0x08193e7c in ?? ()
#7 0x08100162 in PyObject_CallFunctionObjArgs ()
#8 0x080ef072 in ?? ()
#9 0x081743d2 in ?? ()
#10 0x0810f286 in PyEval_EvalFrameEx ()
#11 0x08108dbd in PyEval_EvalCodeEx ()
#12 0x0810a86e in PyEval_EvalFrameEx ()
#13 0x0812299d in ?? ()
#14 0x081430ec in ?? ()
#15 0x08111141 in PyEval_CallObjectWithKeywords ()
#16 0xafeff012 in wxPyCallback::EventThunker(wxEvent&) () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#17 0xaf92d033 in wxAppConsole::HandleEvent(wxEvtHandler_, void (wxEvtHandler::_)(wxEvent&), wxEvent&) const () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#18 0xaf9c1028 in wxEvtHandler::ProcessEventIfMatches(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#19 0xaf9c1404 in wxEvtHandler::SearchDynamicEventTable(wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#20 0xaf9c14de in wxEvtHandler::ProcessEvent(wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#21 0xafbe2c96 in ?? () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#22 0xaf2bf557 in g_cclosure_marshal_VOID__VOIDv () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#23 0xaf2bdabf in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#24 0xaf2d77a5 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#25 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#26 0xaf46a261 in gtk_button_clicked () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#27 0xaf46b411 in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#28 0xaf2bf537 in g_cclosure_marshal_VOID__VOIDv () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#29 0xaf2bc332 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#30 0xaf2bdabf in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#31 0xaf2d77a5 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#32 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#33 0xaf46a191 in gtk_button_released () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
---Type to continue, or q to quit---
#34 0xaf46a1d4 in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#35 0xaf51742c in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#36 0xaf2bc3e4 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#37 0xaf2bd89b in g_closure_invoke () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#38 0xaf2cf791 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#39 0xaf2d7a02 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#40 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#41 0xaf637aac in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#42 0xaf5157c9 in gtk_propagate_event () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#43 0xaf515cdd in gtk_main_do_event () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#44 0xaf3891c9 in ?? () from /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#45 0xaf1ced64 in g_main_context_dispatch () from /lib/i386-linux-gnu/libglib-2.0.so.0
#46 0xaf1cf089 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#47 0xaf1cf439 in g_main_loop_run () from /lib/i386-linux-gnu/libglib-2.0.so.0
#48 0xaf5149a5 in gtk_main () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#49 0xafb97fc3 in wxEventLoop::Run() () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#50 0xafc249c9 in wxAppBase::MainLoop() () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#51 0xaff03451 in wxPyApp::MainLoop() () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#52 0xaff2bf61 in ?? () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#53 0x0810ddcd in PyEval_EvalFrameEx ()
#54 0x0812299d in ?? ()
#55 0x081430ec in ?? ()
#56 0x0810ab8f in PyEval_EvalFrameEx ()
#57 0x0810a6e3 in PyEval_EvalFrameEx ()
#58 0x0810a6e3 in PyEval_EvalFrameEx ()
#59 0x08108dbd in PyEval_EvalCodeEx ()
#60 0x0813dacc in ?? ()
#61 0x08135898 in PyRun_FileExFlags ()
#62 0x08134b05 in PyRun_SimpleFileExFlags ()
#63 0x080dd500 in Py_Main ()
#64 0x080dcf5b in main ()
I think it traces back to fastnumbers. Thanks for taking a look at this, and sorry if I haven't provided enough information.
The text was updated successfully, but these errors were encountered: