Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible segfault #2

Closed
glepore70 opened this issue May 21, 2015 · 33 comments
Closed

Possible segfault #2

glepore70 opened this issue May 21, 2015 · 33 comments
Labels

Comments

@glepore70
Copy link

I'm new to python and debugging things, but I seem to have come across a segfault in fastnumbers. I'm using natsort in my python code, and natsort recommended that I install fastnumbers, so I did. Now on rare occasions my code crashes and I can't figure out why. Unfortunately my code is very long and the file that crashes it is huge. However, here is the crash and backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
fast_atoi (p=0xac980034 <error: Cannot access memory at address 0xac980034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
24 src/fast_atoi.c: No such file or directory.
(gdb) bt
#0 fast_atoi (p=0xac980034 <error: Cannot access memory at address 0xac980034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
#1 0xb01266f8 in fastnumbers_fast_int (self=0x0, args=0xadfc944c, kwargs=0x0) at src/fastnumbers.c:209
#2 0x0810a1bd in PyEval_EvalFrameEx ()
#3 0x08108dbd in PyEval_EvalCodeEx ()
#4 0x0810b975 in PyEval_EvalFrameEx ()
#5 0x0812299d in ?? ()
#6 0x08193e7c in ?? ()
#7 0x08100162 in PyObject_CallFunctionObjArgs ()
#8 0x080ef072 in ?? ()
#9 0x081743d2 in ?? ()
#10 0x0810f286 in PyEval_EvalFrameEx ()
#11 0x08108dbd in PyEval_EvalCodeEx ()
#12 0x0810a86e in PyEval_EvalFrameEx ()
#13 0x0812299d in ?? ()
#14 0x081430ec in ?? ()
#15 0x08111141 in PyEval_CallObjectWithKeywords ()
#16 0xafeff012 in wxPyCallback::EventThunker(wxEvent&) () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#17 0xaf92d033 in wxAppConsole::HandleEvent(wxEvtHandler_, void (wxEvtHandler::_)(wxEvent&), wxEvent&) const () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#18 0xaf9c1028 in wxEvtHandler::ProcessEventIfMatches(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#19 0xaf9c1404 in wxEvtHandler::SearchDynamicEventTable(wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#20 0xaf9c14de in wxEvtHandler::ProcessEvent(wxEvent&) () from /usr/lib/i386-linux-gnu/libwx_baseu-2.8.so.0
#21 0xafbe2c96 in ?? () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#22 0xaf2bf557 in g_cclosure_marshal_VOID__VOIDv () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#23 0xaf2bdabf in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#24 0xaf2d77a5 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#25 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#26 0xaf46a261 in gtk_button_clicked () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#27 0xaf46b411 in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#28 0xaf2bf537 in g_cclosure_marshal_VOID__VOIDv () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#29 0xaf2bc332 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#30 0xaf2bdabf in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#31 0xaf2d77a5 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#32 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#33 0xaf46a191 in gtk_button_released () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0

---Type to continue, or q to quit---
#34 0xaf46a1d4 in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#35 0xaf51742c in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#36 0xaf2bc3e4 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#37 0xaf2bd89b in g_closure_invoke () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#38 0xaf2cf791 in ?? () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#39 0xaf2d7a02 in g_signal_emit_valist () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#40 0xaf2d8075 in g_signal_emit () from /usr/lib/i386-linux-gnu/libgobject-2.0.so.0
#41 0xaf637aac in ?? () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#42 0xaf5157c9 in gtk_propagate_event () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#43 0xaf515cdd in gtk_main_do_event () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#44 0xaf3891c9 in ?? () from /usr/lib/i386-linux-gnu/libgdk-x11-2.0.so.0
#45 0xaf1ced64 in g_main_context_dispatch () from /lib/i386-linux-gnu/libglib-2.0.so.0
#46 0xaf1cf089 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0
#47 0xaf1cf439 in g_main_loop_run () from /lib/i386-linux-gnu/libglib-2.0.so.0
#48 0xaf5149a5 in gtk_main () from /usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0
#49 0xafb97fc3 in wxEventLoop::Run() () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#50 0xafc249c9 in wxAppBase::MainLoop() () from /usr/lib/i386-linux-gnu/libwx_gtk2u_core-2.8.so.0
#51 0xaff03451 in wxPyApp::MainLoop() () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#52 0xaff2bf61 in ?? () from /usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode/wx/core.so
#53 0x0810ddcd in PyEval_EvalFrameEx ()
#54 0x0812299d in ?? ()
#55 0x081430ec in ?? ()
#56 0x0810ab8f in PyEval_EvalFrameEx ()
#57 0x0810a6e3 in PyEval_EvalFrameEx ()
#58 0x0810a6e3 in PyEval_EvalFrameEx ()
#59 0x08108dbd in PyEval_EvalCodeEx ()
#60 0x0813dacc in ?? ()
#61 0x08135898 in PyRun_FileExFlags ()
#62 0x08134b05 in PyRun_SimpleFileExFlags ()
#63 0x080dd500 in Py_Main ()
#64 0x080dcf5b in main ()

I think it traces back to fastnumbers. Thanks for taking a look at this, and sorry if I haven't provided enough information.

@SethMMorton
Copy link
Owner

Thanks, this stack trace was very helpful.

I'm wondering if it would be possible for you to determine which input created this? I have some guesses as to what might cause this, but without being able to replicate it will be difficult to fix.

@glepore70
Copy link
Author

I'm working on that now. It's a file that's given me trouble before, probably something to do with control characters or something like that. I'll try and narrow it down to a specific line of code.

I have a GUI that reads a delimited file into pandas, then runs various calculations on each column like min/max, frequency count, etc. I use natsort after I've determined that a column contains both numbers and characters to sort it naturally.

@glepore70
Copy link
Author

I've been working on this for a while now and it's very frustrating. I've gotten the file that causes the crash down to 122Kb, but I can't get it any smaller. Here's a link:

http://pastebin.com/sMMbGCe0

I've never used pastebin before, so hopefully that works, I don't see a way to add an attachment here.

I also can't reproduce the crash on a smaller program than my full one, which is 500 lines of python, wx, etc. Hopefully looking at the file that causes the crash will help you. Otherwise, I'm stuck.

Thanks again.

P. S. The problem happens 100% of the time on a huge file (63MB), but happens intermittently on the pastebin file.

@SethMMorton
Copy link
Owner

Thanks, I'll take a look at this tonight. For reference, what system are you on?

@glepore70
Copy link
Author

Linux lepore-desktop 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:13:00 UTC 2015 i686 athlon i686 GNU/Linux

Running KDE.

I spun up a Windows 7 virtual machine and did not get the error.

@SethMMorton
Copy link
Owner

I realize that this isn't the question that you asked, but I am finding that the sorting is not working properly because there is an NaN in your data. This confuses Python's sort because 5 < NaN is False and 5 > NaN is False. This created a jump discontinuity in your sorted data (see below). I will update natsort to better handle this case after I solve this issue, but I don't think this is related to the seg fault (which I haven't been able to replicate yet, but I'm on a Mac, so it may be machine dependent). I will dig more.

     SERIAL_NUMBER                  NAME
1927             6        APLIN -OR-&  -
3253      33053 06  BALDASANO BENJAMIN M
1412       2919302     ANDERSON ARVINE L
1323       6135134        AMORE ERNEST S
898        6145219          ALLARD LEO L
3873       6149528      BARNEY WILLIAM A
740        6149858       ALDRICH HENRY W
4813       6248805           BECK JOHN C
4889       6865158       BECKLUND EDWARD
4680       6909807    BEARDSLEY HAROLD F
4683       6953423       BEARLEY HARRY L
4686      11110897       BEARSE SELWYN F
4715      13046508     BEATTIE JOHN H JR
4689      15044122     BEASLEY CHARLES P
4708      16006589     BEASTER RICHARD H
4702      17068735      BEASLEY JOSEPH C
4681      20310601        BEARE GEORGE D
4703      20407637      BEASLEY MARVIN J
4682      31309985    BEARISTO WILLIAM E
4711      33393550     BEATTIE CHARLES D
4714      33404711      BEATTIE HERMAN H
4696      33646001       BEASLEY JAMES B
4695      34174220       BEASLEY HENRY L
4698      34426838       BEASLEY JAMES T
4705      34517074       BEASLEY PEARMAN
4699      34538587       BEASLEY JAMES W
4697      34801955       BEASLEY JAMES L
4701      35790825      BEASLEY JOSEPH B
4709      36531700       BEATON ROBBIE R
4693      36737603      BEASLEY FRANK JR
4687      37197229        BEARY MARTIN C
4691      37563286      BEASLEY DONALD L
4700      37611309       BEASLEY JESSE E
4688      37627746     BEASLEY CHARLES A
4690      38107155     BEASLEY CHESTER J
4685      38466544           BEARPAW TOM
4706      38564225  BEASLEY STEWART R SR
4718      39203811      BEATTIE ROBERT J
4717      39342618     BEATTIE KENNETH M
4710      42054165           BEATTIE C W
4712           NaN        BEATTIE EDWARD   # <=== COUNT RESETS STARTING HERE
4757       6262518        BEAULIEU LEO E
2105       6264303        ARMON THEODORE
4492       6269549     BAUMGARTEN OTIS K
674        6271743         ALBIN HENRY D
3766       6277281     BARNES CLARENCE B
4139       6285548      BARTLEY JESSIE B
250        6294035        ADAMS CLAUDE E
3087       6296739      BAKER CLARENCE F
3685       6379336       BARKER ERNEST P

SethMMorton added a commit that referenced this issue May 22, 2015
A TypeError is now raised if a '\0' appears in the input. This is a
possible solution to issue #2.
@SethMMorton
Copy link
Owner

Can you try testing with the development version that I have just pushed? My suspicion is that there was some problem when converting one of your inputs to a char*, and I have switched to the Python C function that does a bit more error checking when doing the string conversion.

@glepore70
Copy link
Author

Off on vacation for a week, will test next Thursday. Thanks!

On 05/22/2015 12:22 AM, Seth Morton wrote:

Can you try testing with the development version that I have just
pushed? My suspicion is that there was some problem when converting
one of your inputs to a |char*|, and I have switched to the Python C
function that does a bit more error checking when doing the string
conversion.


Reply to this email directly or view it on GitHub
#2 (comment).

@glepore70
Copy link
Author

No luck with the development version, here's the error:

home/lepore/.local/lib/python2.7/site-packages/pkg_resources/init.py:1250: UserWarning: /home/lepore/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Skipping line 15027: expected 26 fields, saw 27
Skipping line 18505: expected 26 fields, saw 27

Skipping line 21991: expected 26 fields, saw 31

Skipping line 44022: expected 26 fields, saw 31

[New Thread 0xac0ffb40 (LWP 5978)]
[New Thread 0xb5351b40 (LWP 5958)]
[New Thread 0xb3b50b40 (LWP 5957)]

Program received signal SIGSEGV, Segmentation fault.
fast_atoi (p=0xac940034 <error: Cannot access memory at address 0xac940034>, error=0xbfffcc26, overflow=0xbfffcc27) at src/fast_atoi.c:24
24 while (white_space(*p)) { p += 1; }

@SethMMorton
Copy link
Owner

Can you try using the following function as a key to natsorted? This will print out every input individually to natsorted before fast_int is run on it. The last one printed before the segfault should be the input causing the problem.

import sys
def printer(x):
    print(x)
    sys.stdout.flush()
    return x
b = natsorted(your_data, key=printer)

@SethMMorton
Copy link
Owner

Since you have the source code, you can also add the following before line 24 in fast_atoi.c, preferably in conjuction with the printer function suggested.

    fprintf(stdout, "fast_atoi string: %d\n", p);
    while (white_space(*p)) { p += 1; }

This should print out the string right before the problem occurs.

@glepore70
Copy link
Author

I think you're making progress. The file that crashes fastnumbers that I posted above no longer crashes it. However, the larger file that the excerpt came from still crashes it. Here are the last values before the segfault:

O&795577
fast_atoi string: -1423753252
fast_atoi string: -1423641420
10305793
fast_atoi string: -1423641548
6132688
fast_atoi string: -1423642156
O&401818
fast_atoi string: -1423753228
fast_atoi string: -1423641292
10300351
fast_atoi string: -1423641420
O&366604
fast_atoi string: -1424162764
Segmentation fault (core dumped)

Thanks for working on this!

@SethMMorton
Copy link
Owner

Great, this helps narrow down the possible problem. I wish that I had given you the right code to add, though. In the C function, can you change it to the following?

fprintf(stdout, "fast_atoi string: ");
fprintf(stdout, "%s\n", p);
while (white_space(*p)) { p += 1; }

I had accidentally had you use the %d format, which will print out an integer, but really I need %s which prints the string in the character array. I also think it will be helpful to know if it is the printing that causes the crash now, or if it is still searching for a space, so I separated the first part of the string from the second.

In the python printer function, can you change print(x), to print(x, repr(x))? This should show any control characters in the string that we aren't thinking about.

Last, if you do this multiple times, does it always crash on the same input, or does it change from run to run?

Sorry to ask you to modify the tests again. I think we are making headway.

@glepore70
Copy link
Author

Happy to help! Here's the latest output. It always crashes on this file, but on the smaller version it only crashed most of the time.

(u'16062279', "u'16062279'")
fast_atoi string: 16062279
(u'31129792', "u'31129792'")
fast_atoi string: 31129792
(u'39093001', "u'39093001'")
fast_atoi string: 39093001
(u'37693447', "u'37693447'")
fast_atoi string: 37693447
(u'O&699536', "u'O&699536'")
fast_atoi string: O&
Segmentation fault (core dumped)

Do you need the GDB output?

@SethMMorton
Copy link
Owner

I imagine the GDB output won't tell anything we haven't seen before.

One thing I notice right away from the two runs is that it is not failing on the same input, but they both begin with O&. I wonder what would happen if you didn't let those strings go to fast_int...

Could you let me know if you get a crash doing either of the following?

First, try modifying the printer function to look like this:

def printer(x):
    print(x)
    sys.stdout.flush()
    return '' if x.startswith('O&') else x

This will remove any string beginning with the "bad" characters from the pool. If you don't get any crashes with that, try the following:

def printer(x):
    print(x)
    sys.stdout.flush()
    return x.replace('O&')

To see if we can stop the problem just by removing the leading bad characters.

@glepore70
Copy link
Author

Unfortunately removing the bad characters isn't acceptable for my purposes (ditto for the nans). The data that I'm reading and sorting must remain exactly as it's written in the source file. Otherwise the output will not match the inputs. It's a government thing!

Trying either new printer function I get:

Traceback (most recent call last):
File "daeric2.py", line 375, in readCSV
result_list = natsorted(result_list, key=self.printer)# if the results are mixed text and numbers, use natural sort
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/natsort.py", line 234, in natsorted
return sorted(seq, reverse=reverse, key=natsort_keygen(key, alg=alg))
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/utils.py", line 294, in _natsort_key
val = key(val)
TypeError: printer() takes exactly 1 argument (2 given)

@SethMMorton
Copy link
Owner

If you made printer part of a class, you will need to add self as part of the function definition, as in def printer(self, x):, or you should make it a @staticmethod to not need self. I think this is the origin of the new error you are seeing.

I wasn't suggesting removing the bad stuff for real, just in our debugging.

@glepore70
Copy link
Author

Ahh! I see. Would that also apply to the replace code? I think so (was getting TypeError: replace() takes at least 2 arguments (1 given)). I added it there as well and got:

12138003
Traceback (most recent call last):
File "daeric2.py", line 375, in readCSV
result_list = natsorted(result_list, key=self.printer)# if the results are mixed text and numbers, use natural sort
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/natsort.py", line 234, in natsorted
return sorted(seq, reverse=reverse, key=natsort_keygen(key, alg=alg))
File "/usr/local/lib/python2.7/dist-packages/natsort-4.0.0-py2.7.egg/natsort/utils.py", line 294, in _natsort_key
val = key(val)
File "daeric2.py", line 539, in printer
return x.replace(self, 'O&')
TypeError: coercing to Unicode: need string or buffer, Example found

@SethMMorton
Copy link
Owner

Sorry, it should be x.replace('O&', ''), since we need to replace the string with something.

@glepore70
Copy link
Author

I should have seen that, sorry. I fixed that line and the file processed successfully! So it's something about the O& that's causing the problem?

@SethMMorton
Copy link
Owner

That's what it looks like.

As a temporary workaround, can you try the following?

a = natsorted(your_data, key=lambda x: x.replace("&", "$"))

This will replace all ampersands with dollar signs. These are next to each other on the ASCII table, so it shouldn't mess up the sort order, but it might prevent this seg fault. This might get you by while I figure out the seg fault.

@glepore70
Copy link
Author

Hmm....

fast_atoi string: O$
fast_atoi string: 795367
fast_atoi string: O$
fast_atoi string: 718174
fast_atoi string: 37490261
fast_atoi string: 37529450
fast_atoi string: 35570246
fast_atoi string: O
fast_atoi string: 1062485
fast_atoi string: 35241067
fast_atoi string: O$
Segmentation fault (core dumped)

@glepore70
Copy link
Author

Ran the code in gdb again and got a different segfault:

fast_atoi string: 11082136
fast_atoi string: 37593519
fast_atoi string: 12005032
fast_atoi string: T!
[New Thread 0xa97fab40 (LWP 26397)]
[New Thread 0xb4351b40 (LWP 26386)]
[New Thread 0xb3b50b40 (LWP 26385)]

Program received signal SIGSEGV, Segmentation fault.
0xb7e102f4 in _IO_vfprintf_internal (s=0xb7f85e80 <IO_2_1_stdout>, format=, ap=0xbfffcbbc "4\300۬\360\064") at vfprintf.c:2039
2039 vfprintf.c: No such file or directory.

@SethMMorton
Copy link
Owner

Ok, so it is related to having to split the string before sending to fast_int. I will try to get a VM to replicate this. Thanks for your help.

In the meantime, you can uninstall fastnumbers to avoid the segfault.

@glepore70
Copy link
Author

No worries, I still have several weeks before initial deployment. Thanks for working so hard on this.

@glepore70
Copy link
Author

I installed Kubuntu in a virtualbox (and wasn't that fun) but was unable to reproduce the problem, using the same code and data file as on my machine. The versions of Kubuntu were both 15.04.

@SethMMorton
Copy link
Owner

Huh... that doesn't give me much hope that I will be able to reproduce.

It's not clear to me if the problem is originating from my C code, or if it originating from something else. Internally, natsort is using re.findall to split your input into numbers and non-numbers, and sending this split list to fast_int from fastnumbers to do the conversion. So, it's not clear to me if the reason for the failure is because I am not handling this input correctly, or if re.findall is giving poorly formed strings to parse. It is also entirely possible that there is some third problem causing this. Without being able to reproduce I am not sure how I will solve the problem.

@glepore70
Copy link
Author

Understood. I'll try to re-install everything and see if I can get my system like the virtualbox I set up. I'll let you know what happens. Thanks for working on this.

@SethMMorton
Copy link
Owner

You didn't happen to be using any special arguments to natsort like LOCALE, did you?

@glepore70
Copy link
Author

Nothing but:

result_list = natsorted(result_list)

I'll fiddle around some more with this when I get a chance.

@glepore70
Copy link
Author

I re-created the crash on a Kubuntu 15.04 virtualbox image. I've saved the box as a .ova file, which you should be able to download and open in virtualbox. Please email me at greg@rhobard.com and I will give you the download address of the .ova file and some brief instructions on reproducing the error. Thanks!

SethMMorton added a commit that referenced this issue Jun 4, 2015
When dealing with unicode input, the python object needs to be
converted to a bytes object before being converted to a character
array. Previously, fastnumbers was relying on the python object
remaining in memory when dealing with character arrays because a
strcpy was not performed.  Because extracting the character array from
unicode requires a temporary python object which is quickly
de-referenced, this is not a safe technique; the segfault is rare
because python garbage collects de-references objects only periodically,
so the character array typically remains in memory.

To solve this issue, all character arrays are now explicitly copied with
strcpy. This required modification of the conversion functions to
free the character array memory before returning from fastnumbers.

This resolves issue #2, and most likely resolves issue #1.
@SethMMorton
Copy link
Owner

I would like to award @glepore70 the "Best Bug Reporter" imaginary internet award for taking the time to create a virtual machine image of the system on which the segfault occurs and sending it to me to debug. I don't imagine many users would go through the hassle to fix the problem... they would just uninstall and move on. Thanks so much!

@SethMMorton
Copy link
Owner

The segfault was related to making a bad assumption when dealing with character arrays.

The Python C-API to get a char* from a string/bytes object is varied, but the simplest version looks a bit like the following:

if (PyBytes_Check(input)) {
    str = PyBytes_AS_STRING(input);
}

Note this is just a straight pointer assignment, no strcpy call is done. As long as the input object is not deleted and str is being used as read-only, this is a fairly safe strategy. The problem arises when the input is not string/bytes, but unicode:

if (PyUnicode_Check(input)) {
    temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
    if (temp_bytes != NULL) {
        str = PyBytes_AS_STRING(temp_bytes);
        Py_DECREF(temp_bytes);   // <-- Uh-Oh!
    }
}

To extract the char* the unicode object must be first converted to bytes. This bytes object is only temporary, which means that as soon as the object is garbage collected* (i.e. deallocated) the str pointer will not point to anything meaningful. When one tries to access the dangling str, a sefgault happens.

The interesting thing is that Python only periodically performs garbage collection, so most of the time the temporary bytes object remains in memory for the duration of the fastnumbers function call even though its reference count is zero. In fact, a segfault would only occur if Python initiates garbage collection on a Py_DECREF call inside the fastnumbers code. Apparently, this is a rare event since I was unable to reproduce the segfault on my machine, and none of my Travis-CI runs had a segfault either.

The solution of this problem is to force fastnumbers to take ownership of the contents of str (i.e. make a strcpy call), and not rely on Python keeping it alive for the duration of the function call:

if (PyBytes_Check(input)) {
    PyBytes_AsStringAndSize(input, &s, &s_len);
    str = malloc((size_t)s_len + 1);
    strcpy(str, s);
} else if (PyUnicode_Check(input)) {
    temp_bytes = PyUnicode_AsEncodedString(input, "ascii", "strict");
    if (temp_bytes != NULL) {
        PyBytes_AsStringAndSize(temp_bytes, &s, &s_len);
        str = malloc((size_t)s_len + 1);
        strcpy(str, s);  // <-- Now I own the contents of str
        Py_DECREF(temp_bytes);  // <-- Now not a problem
    }
}

The only caveat now is that str must be freed at some point, so I had to do a bit of rework of my other code to ensure a free(str) call was made before returning to Python.

I will merge this with master tonight and make an official release to PyPI.


*Calling Py_DECREF reduces the reference count of the object, and when the garbage collector detects that an object has a 0 reference count it will be destroyed (i.e. deallocated).

SethMMorton added a commit that referenced this issue Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants