Crash when non-ascii characters are used in symbols or sonames #173

rhelmot · 2018-01-04T02:07:07Z

At the bottom of every read from memory is the conversion from bytes to string with .decode('ascii'). This fails extremely loudly when there's a character >0x7e. Strings like this can occur naturally in e.g. elf files found on android systems. To reproduce just copy libc.so.6 and replace the libc.so.6 soname text with libc\xffso.6 or whatever.

Two possible solutions:

return bytes instead of string
replace s.decode('ascii') with ''.join(chr(c) for c in s)

The text was updated successfully, but these errors were encountered:

eliben · 2018-01-04T12:35:52Z

Using bytes makes sense to me, but there are a couple of gotchas to consider - one is Python 2 vs. 3 compatibility (pyelftools supports both from the same codebase), another is readelf compatibility (how does readelf show these when printed out).

Patches welcome :)

rhelmot · 2018-02-22T03:37:10Z

Here's a better, non-artificial testcase: clang will accept valid utf-8 files as input, and will accept unicode characters as part of symbols, encoding the symbol names in the elf as utf-8. Here's the source file, the compiled file, and a pyelftools script that will crash while trying to read the symbols. utf_elf.zip

readelf itself will not crash but it will be extremely unhappy about the situation. The version of it on one machine printed out <CE> (only half-correct, the full utf-8 is CE 94 irc), another version printed �, and another seems like it printed a line feed but not a carriage return. That might be due to terminal issues, though.

Probably the best thing to do is to just utf-8 decode, since it won't break anything that wasn't already broken and there's no better standard for how to interpret a stream of bytes without an encoding...

rhelmot mentioned this issue Jan 4, 2018

loading android system libs angr/cle#103

Closed

eliben added the patches-welcome label Jan 4, 2018

rhelmot mentioned this issue Feb 22, 2018

Convert all ascii decoding to utf-8 decoding #182

Merged

eliben closed this as completed in #182 Feb 23, 2018

junsooo mentioned this issue Apr 21, 2018

Fixed several bugs in read_settings function in metadata.py gereeter/hsdecomp#2

Merged

rhelmot mentioned this issue Feb 14, 2020

Python 3: Some lables are bytes others are str. #188

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash when non-ascii characters are used in symbols or sonames #173

Crash when non-ascii characters are used in symbols or sonames #173

rhelmot commented Jan 4, 2018

eliben commented Jan 4, 2018

rhelmot commented Feb 22, 2018 •

edited

Loading

Crash when non-ascii characters are used in symbols or sonames #173

Crash when non-ascii characters are used in symbols or sonames #173

Comments

rhelmot commented Jan 4, 2018

eliben commented Jan 4, 2018

rhelmot commented Feb 22, 2018 • edited Loading

rhelmot commented Feb 22, 2018 •

edited

Loading