Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate #43

Closed
MNWPRO opened this issue Oct 23, 2017 · 14 comments
Assignees

Comments

@MNWPRO
Copy link

MNWPRO commented Oct 23, 2017

i don't know why,i need your help

@williballenthin
Copy link
Owner

Hi, @MNWPRO

In order to triage this issue, you'll need to provide more details about the data you are trying to parse, and the method with which you are parsing. Please share with me the script you're using to parse. If possible, please also share the source data, if its not sensitive.

@MNWPRO
Copy link
Author

MNWPRO commented Oct 24, 2017

@williballenthin
oh,i'm sorry,
I used this script:Evtx_dump.py and The following 123.zip is the EVTX file, which is generated by sysmon.exe, sysmon.exe is a logging tool for Microsoft, and the following is a link to the tool:
123.zip

@MNWPRO
Copy link
Author

MNWPRO commented Oct 24, 2017

@williballenthin
I am Chinese, my English is not good, please forgive me. I hope you can understand the above content

@williballenthin
Copy link
Owner

@MNWPRO thanks for the additional details. i've added a regression test so to this project so that its easy to reproduce. next, i'll try to figure out what the source of the bug is.

@williballenthin
Copy link
Owner

@MNWPRO can you use the windows event viewer to display event number 508 from the sysmon log? i can see that there is some encoded data, possibly in chinese, but i'm not sure what its supposed to be. if you can include a screenshot here that would be a big help.

@MNWPRO
Copy link
Author

MNWPRO commented Oct 31, 2017

image
In the picture, the Chinese is translated into English:
this event is incorrect because the format of the base XML is incorrect. The following is the original text of the event.
I'm sorry that I couldn't get back to you in time
@williballenthin

@MNWPRO
Copy link
Author

MNWPRO commented Oct 31, 2017

Will that be Sysmon's problem? If that's the case, it's Microsoft's own fault
@williballenthin

@MNWPRO
Copy link
Author

MNWPRO commented Oct 31, 2017

image
This is a screenshot of the same event in other ID, and it does contain strange characters, which are meaningless, at least in my opinion
@williballenthin

@williballenthin
Copy link
Owner

yes, this looks like its an issue with sysmon or Microsoft. seems like invalid data is provided to the event log, or it has become corrupt in some other way. unfortunately, I'm not sure that this python tool can do anything to fix it. i'd recommend registering an exception handler when processing the logs so that you can continue work even if you encounter corrupt entries.

@williballenthin williballenthin self-assigned this Oct 31, 2017
@williballenthin
Copy link
Owner

please feel free to continue the discussion, but i'll close this issue as there's nothing to be done by this project.

@YetteNiu
Copy link

Hi all, I got a similar error with this. I used Anaconda- spyder to read some excel files with Chinese characters to a dataframe and got the following error:
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 6-7: unexpected end of data
I was wondering did anyone of you two solved this issue and can do me a favor regarding this error? Thanks in advanced.

@john-corcoran
Copy link

Just chiming in that I've encountered the same issue. From checking output from Microsoft Log Parser, it looks like the events that cause the exception are legitimate but contain either corruption or just unexpected special characters.

Not sure if it's possible to show as much of the failing error as possible, and just replace any corrupted / special characters?

Stack traces are as follows:

Python 2.7 on Ubuntu 18.04:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 166, in rec
    sub = render_root_node(sub.root())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Views.py", line 159, in rec
    sub = escape_value(sub.string())
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/Nodes.py", line 1118, in string
    return self._string().rstrip("\x00")
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "/home/user/.local/lib/python2.7/site-packages/Evtx/BinaryParser.py", line 490, in unpack_wstring
    return bytes(self._buf[start:end]).decode("utf16")
  File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate

Python 3.6 on Ubuntu 18.04:

Traceback (most recent call last):
  File "evtx_dump.py", line 42, in <module>
    main()
  File "evtx_dump.py", line 37, in main
    print(record.xml())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Evtx.py", line 481, in xml
    return e_views.evtx_record_xml_view(self)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 204, in evtx_record_xml_view
    return render_root_node(record.root())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 166, in rec
    sub = render_root_node(sub.root())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 191, in render_root_node
    return render_root_node_with_subs(root_node, subs)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 176, in render_root_node_with_subs
    rec(c, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 126, in rec
    rec(child, acc)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Views.py", line 159, in rec
    sub = escape_value(sub.string())
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/Nodes.py", line 1118, in string
    return self._string().rstrip("\x00")
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/BinaryParser.py", line 211, in explicit_length_handler
    return f(offset, length)
  File "/home/user/.local/lib/python3.6/site-packages/Evtx/BinaryParser.py", line 490, in unpack_wstring
    return bytes(self._buf[start:end]).decode("utf16")
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 0-1: illegal UTF-16 surrogate

@RedCode-X
Copy link

大家好,我也遇到了类似的错误。我使用Anacondaspyder将一些带有汉字的excel文件读取到数据帧中,并收到以下错误:
UnicodeDecodeError:'utf-16-le'编解码器无法解码位置6-7的字节:
我想知道的数据意外结束你们两个人中的任何一个都解决了这个问题,可以帮我解决这个错误吗?提前致谢。

I have the same problem with excel .
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 166-167: unexpected end of data

@nannapanenir
Copy link

any solution for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants