v0.52.0
- [TeamMsgExtractor #444] Fix typo in string that prevented HTML body from generating from the plain text body properly.
- Adjusted the behavior of
MSGFile.areStringsUnicode
to prioritize the property specified by the parent MSG files for MSG files that are embedded. Additionally, added a fallback to rely on whether or not there is a stream using the001F
type to determine the property value if it is entirely missing. - Adjusted
OleWriter.fromMsg()
andMSGFile.export()
to add the argumentallowBadEmbed
which helps to correct a few different issues that may appear in embedded MSG files. These corrections allow the embedded file to still be extracted and to open properly in Outlook. - In addition to the above, the errors that some of those corrections will suppress are now significantly more informative about what went wrong.
v0.51.1
- Add class type added in last version to known class types.
v0.51.0
- [TeamMsgExtractor #401] Add basic support for MSG class type
IPM.SkypeTeams.Message
.
v0.50.1
- [TeamMsgExtractor #434] Fix bug introduced in previous version.
v0.50.0
- [TeamMsgExtractor #432] Adjust html header code to replace non-ascii characters with escaped versions. Also adjusted plain text to html conversion to ensure non-ascii character from the body are encoded to escaped values to be safe.
- Made some corrections to
NullDate
.
v0.49.0
- [TeamMsgExtractor #427] Adjusted code for converting time stamps to create null dates for any time stamp beyond a certain point. The point was determined to be close to the existing null dates.
- [TeamMsgExtractor #425] Added basic support for custom attachments that are Windows Metafiles.
- Changed tolerance of bitmap custom attachment handler to allow for attachments with only a CONTENT stream. This change was made after seeing an example of a file that only had a CONTENT stream and no other streams for the custom data. The code now also tries to create default values for things previously determined from those other streams.
- Fixed an issue in
tryGetMimetype
were the code didn't properly check if the data type was bytes (it only checked if it had a type). - Corrected some exports.
- Added new
ErrorBehavior
valueCUSTOM_ATTACH_TOLERANT
to allow skipping checks for unused data that is normally validated.
v0.48.7
- [TeamMsgExtractor #420] Fixed typo introduced in last version.
v0.48.6
- [TeamMsgExtractor #417] Fixed issues with
openMsg
where some corrupted MSG files could end up throwing an uncaught exception and leaving the file handle open.
v0.48.5
- [TeamMsgExtractor #414] Fixed typo in
message_signed_base.py
.
v0.48.4
- [TeamMsgExtractor #411] Fix console script throwing error due to changed console args not defaulting.
v0.48.3
- [TeamMsgExtractor #409] Added missing private method to
SignedAttachment
. - Fixed some missing typing information.
v0.48.2
- Fixed bugs with
MessageBase.asEmailMessage()
. Numerous improvements to how it handles the data.
v0.48.1
- Added an option (
-s
,--stdin
) to the command line to take an MSG file from stdin. This allows the user to pipe the MSG data from another program directly instead of having to write a middleman that uses theextract-msg
library directly or having to write the file to the disk first. - Changed main function to allow for manual argument list to be passed to it.
- Added attributes to
AttachmentBase
for creation and modification time. These can be accessed throughcreatedAt
orcreationTime
andlastModificationTime
ormodifiedAt
. - Changed
OleWriter
tests to output the name of the test file being done if an error occurs. - Added tests for some command line stuff.
v0.48.0
- Adjusted error handling for named properties to handle critical streams being missing and to allow suppression of those errors.
- Adjusted error handling for named properties to allow silencing of errors caused by invalid references to the name stream. If
ErrorBehavior.NAMED_NAME_STREAM
is provided to theMSGFile
instance, a warning will be logged and that entry will simply be dropped. - Adjusted error handling for signed messages to better check for issues with the signed attachment. This should make errors from violating the standard much easier to understand. These errors can be ignored, but the attachment will not be parsed as a signed attachment.
- Minor docstring updates.
- Minor adjustments to
OleWriter
to prepare the code for being able to write version 4 files. Version 3 files are currently the only one's supported, but much of the code had hard-coded values that could be replaced with variables and small conditionals. This will have very little performance impact, and should not be noticeable. - Improved comments on
OleWriter
to make private sections more understandable. - Changed
MessageSignedBase._rawAttachments
toMessageSignedBase.rawAttachments
to provide non-private access in a reliable way.
v0.47.0
- Changed the public API for
PropertiesStore
to improve the quality of its code. The properties type is now mandatory, and the intelligence field (and the related enum) has been removed.- Additionally, the
toBytes
and__bytes__
methods will both generate based on the contents of this class, allowing for new properties to be created and for existing properties to be modified or removed if the class is set to writable on creation.
- Additionally, the
- Added new method
Named.getPropNameByStreamID
. This method takes the ID of a stream (or property stream entry) and returns the property name (as a tuple of the property name/ID and the property set) that is stored there. ReturnsNone
if the stream is not used to store a named property. This name can be directly used (if it is notNone
) to get theNamedPropertyBase
instance associated. This method is most useful for people looking at the raw data of a stream and trying to figure out what named property it refers to. - Fixed mistake in struct definitions that caused a float to require 8 bytes to unpack.
- Added tests for
extract_msg.properties.props
. - Added basic tests for
extract_msg.attachments
. - Added validation tests for the
enums
andconstants
submodule. - Removed unneeded structs.
- Fixed an issue where
PtypGuid
was being parsed by the wrong property type. Despite having a fixed size, it is still a variable length property. - Fixed all of the setters not working. I didn't know they needed to use the same name as the getter, and I swear they were working at some point with the current setup. Some sources online suggested the original form should work, which is even stranger.
- Unified all line endings to LF instead of a mix of CRLF and LF.
- Changed enums
BCTextFormat
andBCLabelFormat
toIntFlag
enums. The values that exist are for the individual flags and not the groups of flags. - Made
FieldInfo
writable, however it can no longer be directly converted to bytes since it requires additional information outside of itself to convert to bytes. It still retains atoBytes
method, however it requires an argument for the additional data. - Fixed
UnsupportedAttachment
inverting theskipNotImplemented
keyword argument. - Fixed
DMPaperSize
not being a subclass ofEnum
. - Extended values for
DMPaperSize
. - Removed unneeded structs.
- Fixed exports for
extract_msg.constants.st
. - Updated various parts of the documentation to improve it and make it more consistent.
- Fixed
Recipient.existsTypedProperty
andAttachmentBase.existsTypedProperty
having the wrong return type. - Removed "TODO" markers on
OleStreamStruct
and finalized it to only handle the OLEStream for embedded objects. - Fixed type annotations for
extract_msg.utils.fromTimeStamp
. - Added new function
extract_msg.properties.prop.createNewProp
. This function allows a new property to be created with default data based on the name. The name MUST be an 8 character hex string, the first 4 characters being the value for the property ID and the second 4 being the value for the type. - Fixed
VariableLengthProp.reservedFlags
returning the wrong type. - Adjusted many of the property structs to make it easier to use them for packing.
- Renamed some of the property structs to be more informative.
- Fixed
ServerID
using the wrong struct (would have thrown an exception for not having enough data). - Unified signing for integer properties. All integer properties are considered unsigned by default when read directly from the MSG file, however specific property accessors may convert to signed if the property is specifically intended to be so. This does not necessarily include properties that are stored as integers but have a value that is not an integer, like
PtypCurrency
. - Changed
extract_msg.constants.NULL_DATE
to be a subclass ofdatetime.datetime
instead of just a fixed value. Functions that return null dates may end up returning distinct versions ofNullDate
(the newly created subclass), however all instances ofNullDate
will register as being equal to each other. Existing equality comparisons to theNULL_DATE
constant will all function as intended, howeveris
checks may fail for some null dates. - Changed currency type to return a
Decimal
instance for higher precision. - Made
FixedLengthProp
andVariableLengthProp
writable. Only the propertyflags
fromPropBase
is writable. This also includes the ability to convert them to bytes based on their value. - Fixed many issues in
VariableLengthProp
regarding it's calculation of sizes. - Removed
hasLen
function. - Changed style to remove space between variable name and colon that separates the type.
- Corrected
InvaildPropertyIdError
toInvalidPropertyIdError
. - Updated to
olefile
version 0.47. - Updated
RTFDE
minimum version to 0.1.1. - Changed dependency named for
compressed-rtf
to remove minor typo (it should forward to the correct place regardless, but just to be safe). - Fixed issues with implementation of
OleWriter
when it comes to large sets of data. Most issues would fail silently and only be noticeable when trying to open the file. If your file is less than 2 GB, you would likely not have notices and issues at all. This includes adding a new exception that is throw from thewrite
method,TooManySectorsError
. - Fixed some issues with
OleWriter
that could cause entries to end up partially added if the data was an issue.
v0.46.2
- Adjusted typing information on regular expressions. They were using a subscript that was added in Python 3.9 (apparently that is something the type checker doesn't check for), which made the module incompatible with Python 3.8. If you are using Python 3.9 or higher a version check will switch to the more specific typing.
v0.46.1
- [TeamMsgExtractor #394] Fix typo in props that caused the wrong number of bytes to be given to a struct.
v0.46.0
- [TeamMsgExtractor #95] Adjusted the
overrideEncoding
property ofMSGFile
to allow automatic encoding detection. Simply set the property to the string"chardet"
and, assuming thechardet
module is installed, it will analyze a number of the strings to try and form a consensus about the encoding. This will ignore the specified encoding only if if successfully detects. Otherwise it will log a warning and fall back to the default behavior. - [TeamMsgExtractor #387] Changed
extract_msg.utils.decodeRfc2047
to not throw decoding errors if the content given is not ASCII. - [TeamMsgExtractor #387] Changed header parsing policy to
email.policy.compat32
to prevent partial parsing of quoted header fields. - [TeamMsgExtractor #388] Updated documentation of
MSGFile.export
to specify that updated fields on anMSGFile
instance (and it's subclasses) will not be reflected in the result of the function. Many of the functions do use the newest version of a cached_property, but this is not one of them. - Removed methods deprecated in
v0.45.0
. - Changed the base class of
EntryID
from no base class toabc.ABC
. - Added
position
property toEntryID
to tell how many bytes were used to create theEntryID
. - Added a number of properties to
MSGFile
from [MS-OXCMSG]. - Moved some properties down to
MessageBase
from it's subclasses. - Added support for Journal objects.
- Changed internal code of
PermanentEntryID
to correctly parse the data. Previously the distinguished name did not actually end at the null character, instead ending at the end of the bytes provided. If there was trailing data, it would be captured inadvertently. - Finished definition for
StoreObjectEntryID
. - Added new keyword arguments for MSG files:
dateFormat
anddatetimeFormat
. These allow the user to easily override the strings being used for format dates and dates that include a time component, respectively.- In unifying all the formats into 2 options, you may notice that some will look a bit different starting from this version, as there was an unfortunately large amount of variation.
- Fixed code for
MessageBase.parsedDate
which could have incorrect values. - Fixed issues with
MessageBase.date
and related things either being incorrectly documented or doing things that are not specified by the documentation. It was supposed to have been changed to usedatetime
objects, but it was still using strings. - Removed unused function
extract_msg.utils.isEmptyString
. - Removed unused function
extract_msg.utils.properHex
. - Added a
getStream
helper function toCustomAttachmentHandler
. - Added a
getStreamAs
helper function toCustomAttachmentHandler
. - Added new custom attachment handler for journal-associated attachments.
- Changed
EntryID.autoCreate
to returnNone
if givenNone
or empty bytes. - Changed
EntryID.autoCreate
to raise aFeatureNotImplemented
exception if no valid entry ID class is found. - Fix typing annotations for
CustomAttachmentHandler
. - Removed unneeded
imapclient
dependency. - Changed
getJson
to have values be null if they aren't found rather than an empty string. - Implemented the
getJson
method correctly for a number of classes. - Changed
Task.percentComplete
to always return a float. - Changed the
NotImplementedError
for custom attachment handler not being found toFeatureNotImplemented
. Additionally, changed the error message to specify the CLSID found on the attachment to better enable people to report issues. - Changed code for
Recipient
andMessageBase
that makes it rely onMessageBase.recipientTypeClass
to determine the class to use for therecipientType
property. Adjusted the typing ofRecipient
to have it reflect the type that will be used. - Correctly changed the returned value for
ResponseStatus.fromIter
to actually return a List instead of a set. - Filled out typing information for a significant portion of the module where variables or functions were missing it. This includes the entirety of the constants submodule.
- Corrected a number of minor issues.
- Extended values for
DVAspect
enum. - Added new enums to go with parsing for
OLEPresentationStream
. - Changed
NNTPNewsgroupFolderEntryID.newsgroupName
to bytes instead of string since it is ANSI. - Fixed an issue that would cause headers to fail to parse properly if the header text starts with "Microsoft Mail Internet Headers Version 2.0" which is common on some MSG files. This is fixed by stripping that from the beginning before actually parsing the text. This is to circumvent CPython issue #85329, confirmed to still exist in at least some of the supported Python versions.
- Added
listDir
andslistDir
as methods toAttachmentBase
,Recipient
, andNamed
. These always exclude the prefix, returning as if their directory is the root of the object. This allows the named to be directly used for accessing those files. - Numerous spelling fixes in docstrings, comments, and exceptions.
- Reduced the amount of initialization performed by
MessageBase
. Much of this initialization was there from before a lot of stuff changed tocached_property
and a number of internal variables were being used. Now all of the relevant variables will be initialized by the way they are accessed. - Added new exception
DependencyError
. - Changed the errors for missing optional dependencies from
ImportError
toDependencyError
. - Removed all instances of the
rawData
property in favor of thetoBytes
method. For now, many of these will simply return the raw data used, specifically those that are still unmodifiable. Any whose properties have the ability to be modified will have properly implemented versions. These classes also allowNone
to be passed as the value for their data, which will be the default if no arguments have been passed to the constructor. If no arguments orNone
is given as the data, it will create a new instance with default values. This is all in an effort to move towards the ability to create new MSG files and theMSGWriter
class. AlltoBytes
methods will either exclusively returnbytes
or will returnNone
to specify that the structure isn't valid to convert to bytes. Structures that may be invalid will be annotated asOptional[bytes]
for the return type.- Additionally, these objects will also support the
__bytes__
method. If the object returned is not bytes, the method will throw aTypeError
.
- Additionally, these objects will also support the
- Removed the individual
PropBase
flag properties and changed the mainflags
property to return an enum containing the flags. - Changed various data structs to allow modification and creation of new instances for writing to an MSG file.
- Changed
TZRule
to use unsigned values where applicable. - Changed
TZRule
to require the 14 null bytes (I commented it out completely on accident instead of swapping it to a plain read). It now logs a warning about the bytes not being null. - Removed unneeded function
windowsUnicode
. - Moved
FixedLengthProperty.parseType
to the private API. This was not intended for external use anyways, so leaving it as public API didn't make sense. - Fixed check for type in
ContactAddressEntryID
being the wrong value. - Modified
inputToBytes
to support objects with the__bytes__
method. If the method exists and works then it will be used as a last resort. - Modified
OleWriter
to accept objects with a__bytes__
method for the data to use for an entry. - Added
__bytes__
method toMSGFile
. This is equivalent to callingMSGFile.exportBytes
.
v0.45.0
- BREAKING: Changed parsing of string multiple properties to remove the trailing null byte. This will cause the output of parsing them to differ.
- Updated typing information for some functions and classes.
- Fixed a bug with
MessageSignedBase.attachments
that would cause it to return None instead of an empty list if the number of normal attachments was 0 was the error behavior was set to ignore violations of the standard. - Updated
MessageSignedBase.attachments
to usefunctools.cached_property
instead ofproperty
. - Fixed spelling errors in some exception strings.
- Made
NamedPropertyBase
a subclass ofabc.ABC
. - Cleaned up some of the code for named properties to remove unused variables and remove inefficient code.
- Changed
PropBase
to be a subclass ofabc.ABC
. - Added detailed versioning info to the README.
- Deprecated many private functions, including methods on many of the classes. Of primary note are
_getStream
and_getStringStream
, which have been moved to the public API asgetStream
andgetStringStream
. Any deprecated functions still exist and will forward to a public API function if they are not being removed. Additionally, all internal usage of them has been removed. This change is one of the big preparations that is needed for the1.0.0
release.- As mentioned, a number of these deprecated functions have been moved to the public API. It is recommended that you run tests with your code after enabling deprecation warnings to see what should be changed.
- Removed items deprecated in or before
0.42.0
. - Changed the API for the private method
_genRecipient
. This is not intended for use outside of the module except for subclasses. The change removed the allowance of ints for the second argument, requiring that it be a valid enum type. - Convert many enum types to
IntEnum
. - Extended functionality of
PropertiesStore
to allow for integer property names and getting a property based on just the ID. You can also get a list of all properties that use a given ID. - Added new function
PropertiesStore.getProperties
which gets a list of all properties matching the property ID. Return type is a list ofPropBase
instances. - Added new function
PropertiesStore.getValue
which looks for the first matchingFixedLengthProp
and returns the value from it. - Improved internal code related to getting a property with a potentially unknown type.
- Added a number of entirely new functions to the public API on
MSGFile
,AttachmentBase
,PropertiesStore
, andRecipient
objects:getMultipleBinary
: Gets a multiple binary property as a list ofbytes
objects.getSingleOrMultipleBinary
: A combination ofgetStream
andgetMultipleBinary
which prefers a single binary stream. Returns a singlebytes
object or a list ofbytes
objects.getMultipleString
: Gets a multiple string property as a list ofstr
objects.getSingleOrMultipleString
: A combination ofgetStringStream
andgetMultipleString
which prefers a single string stream. Returns a single bytes object or a list of bytes objects.getPropertyVal
: Shortcut forinstance.props.getValue
that allows new behavior to be added by overriding it.getNamedProp
: Shortcut forinstance.namedProperties.get((propertyName, guid), default)
that allows new behavior to be added by overriding it.
- Removed
Named._getStringStream
andNamed.sExists
. The named properties storage will always use regular streams and not string streams. - Changed all
Named
methods to no longer have a prefix argument. The prefix should always be false sense the named property mapping will only exist in the top level directory. - Adjusted
tryGetMimeType
to allows any attachments whosedata
property would return abytes
instance. - Changed internal code to use public API functions wherever possible. This includes making many private API functions use calls to the public API for getting bits of data.
- Fixed potential issue with
AttachmentBase.clsid
which had the potential to cause some attachments to fail to generate a CLSID. - Outright removed or changed a significant portion of the private API. I have rarely, if ever, seen references to these parts, so this should cause you no issues. Some of these have also been moved to the public API, either identically or with changes, and the mapping is as such:
_getNamedAs
->getNamedAs
: Changed to always require a conversion argument. If you were previously using it to plainly get a named property or to handle the properly being None or a real value, you should use the return value ofgetNamedProp
instead._getPropertyAs
->getPropertyAs
: Same as above, usegetPropertyVal
instead for None or plain access._getStreamAs
->getStreamAs
,getStringStreamAs
: Once again, see above. UsegetStream
andgetStringStream
, respectively.
v0.44.0
- Fixed a bug that caused
MessageBase.headerInit
to always returnFalse
after the 0.42.0 update. - Changed
MessageBase.headerInit
to a property. - Fixed
extract_msg.utils.__all__
. - Minor reorganization within
extract_msg/utils.py
. - Minor changes to docstrings.
- Minor README updates.
- Fix issue with folded header fields decoding incorrectly when given to
extract_msg.utils.decodeRfc2047
.
v0.43.0
- [TeamMsgExtractor #56] [TeamMsgExtractor #248] Added new function
MessageBase.asEmailMessage
which will convert theMessageBase
instance, if possible, to anemail.message.EmailMessage
object. If an embedded MSG file on aMessageBase
object is of a class that does not have this function, it will simply be attached to the instance as bytes. - Changed imports in
message_base.py
to help with type checkers. - Changed from using
email.parser.EmailParser
toemail.parser.HeaderParser
inMessageBase.header
. - Changed some of the internal code for
MessageBase.header
. This should improve usage of it, and should not have any noticeable negative changes. You man notice some of the values parse slightly differently, but this effect should be mostly suppressed.
v0.42.2
- Fix bug in
AttachmentBase.mimetype
that would cause it to throw an error when accessed. This bug was introduced inv0.42.0
.
v0.42.1
- Fixed some constants being accessed with the wrong name (names were changed in reorganization).
- Removed unused regular expression.
v0.42.0
- [TeamMsgExtractor #372] Changed the way that the save functions return a value. This makes the return value from all save functions much more informative, allowing a user to separate if a file or folder (or if more than one) was saved from the function. It also guarantees that all classes from this module will return the relevant path(s) if data is actually saved.
- [TeamMsgExtractor #288] Added feature to allow attachment save functions to simply overwrite existing files of the same name. This can be done with the
overwriteExisting
keyword argument from code or the--overwrite-existing
option from the command line. - [TeamMsgExtractor #40] Added new submodule
custom_attachments
. This submodule provides an extendable way to handle custom attachment types, attachment types whose structure and formatting are not defined in the Microsoft documentation for MSG files. This includes a handler to at least partially cover support for Outlook images. - [TeamMsgExtractor #373] Added the
encoding
submodule for encoding tasks, including proper support for Microsoft's implementation of CP950. This gets added to the codecs list as "windows-950".- Added infrastructure to make it easy to add variable-byte (up to two bytes) encodings and single-byte encodings.
- Added the following encodings:
- windows-874
- x-mac-ce
- x-mac-cyrillic
- x-mac-greek
- x-mac-icelandic
- x-mac-turkish
- Fixed an issue in the save functions that left the possibility for the zip files to not end up closing if the save function created it and then had an exception.
- Added new property
AttachmentBase.clsid
which returns the listed CLSID value of the data stream/storage of the attachment. - Changed internal behavior of
MSGFile.attachments
. This should not cause any noticeable changes to the output. - Refactored code significantly to make it more organized.
- Changed the exports from the main module to only include an important subset of the module. For other items, you'll have to import the submodule that it falls under to access it. Submodules export all important pieces, so it will be easier to find.
- This includes having many modules be under entirely new paths. Some of these changes have been done with no deprecation, something I generally try to avoid. This is happening at the same time as the public API is significantly changing, which makes it more acceptable.
- Fixed
__main__
using the wrong enum for error behavior. - Fixed
Named.get
being severely out of date (it's not used anywhere by the module which is why it wasn't noticed). - Fixed
Named.__getitem__
being entirely case-sensitive. - Switched much of the internal code (and the
treePath
property of all classes that have it) to usingweakref.ReferenceType
to avoid hard cyclic references. - Fixed
Recipient._getTypedStream
never returning a value. - Added additional type hints in various places.
- Modified tests.py to only run if it is run as a file instead of imported.
- Changed
knownMsgClass
to a private function since it is explicitly not being exported by any part of the module. - Removed unused function
getFullClassName
. - Fixes to the HTML body when saving as HTML will no longer require the
preparedHtml
/--prepared-html
option. - Removed unused exceptions.
- Entirely reorganized the way attachments are initialized, including the class that will be used in various circumstances. Embedded MSG files, custom attachments, and web attachments will all use dedicated classes that are subclasses of
AttachmentBase
.- With this change, the way to specify a new
Attachment
class is to override the function used when creating attachments. This can be done by passingattachmentInit = myFunction
as an option toopenMsg
. This function MUST return an instance ofAttachmentBase
.
- With this change, the way to specify a new
- Added first implementation of web attachments. Saving is not currently possible, but basic relevant property access is now possible. Saving will not be stopped by this attachment if
skipNotImplemented = True
is passed to the save function. - Changed the option to suppress
RTFDE
errors to fall under theErrorBehavior
enum. Usage of the original option will be allowable, but is being marked as deprecated. However, it is still a dedicated option from the command line.- Also fixed the option not properly ignoring some
RTFDE
errors, specifically the ones that it is normal for the module to throw.
- Also fixed the option not properly ignoring some
- Removed some constants that are not used by the module.
- Updated to support
RTFDE
version0.1.0
. Users encountering random errors from that module should find that those errors have disappeared. If you get errors from it still, bring up the issue on their GitHub. - Fixed bug that would cause weird behavior if you gave an empty string as the path for an MSG file.
- Added support for
IPM.StickyNote
. - Fixed an issue that would cause MSG file to never close if an error happened during any of the
__init__
functions for MSG classes. - Removed unneeded
chardet
dependency. - Removed
Contact.__init__
as it didn't provide any unique behavior. - Changed the documentation of
openMsg
to specify that it accepts all options recognized byMSGFile
subclasses, allowing the doc string to not be modified every time one of them is changed.- Changed the documentation of various
__init__
methods to do the same thing.
- Changed the documentation of various
- Added
dataType
property toAttachmentBase
andSignedAttachment
for checking the class that the data will be, if accessible. ReturnsNone
if the data is inaccessible, including because accessing it would throw an exception. - Added new enum
InsecureFeatures
and optioninsecureFeatures
. This option will allow certain features with security implications to be used for files that you trust. Currently the only feature it supports is the usage ofPIL
/Pillow
to open and modify images. All features like this will be opt-in to reduce possible vulnerabilities. - Modified all custom exceptions the module uses to derive from a single base class for better organization.
- Added new exceptions to handle some of the situations previously handled by base Python exceptions.
- Changed internal handling of the
prefix
option forMSGFile.__init__
(and thereforeopenMsg
). If you are not setting this manually, you should notice little difference. - Made enums less strict and converted all using
fromBits
to beIntFlag
enums. - Fixed
CalendarBase.keywords
being blatantly incorrect (it was so bad I don't know how it slipped through). - Fixed
Contact.gender
being blatantly incorrect. - Fixed sender not being properly decoded in some circumstances.
- Changed behavior of
MSGFile
to haveolefile
raise defects of typeDEFECT_INCORRECT
and above instead of justDEFECT_FATAL
. Uncaught issues ofDEFECT_INCORRECT
can often cause the module to have parsing issues that may be misleading, this just ensures the issue is clarified. This behavior can be reverted back to the previous withErrorBehavior.OLE_DEFECT_INCORRECT
. - Fixed potential issues that may have made is possible for certain attachments to ignore filename conflict resolution code.
v0.41.5
- Fixed an issue from version
0.41.3
where the header being present but missing theFrom
field would cause an exception.
v0.41.4
- Fixed an issue in the last version that would break the decoding function if the contents were not encoded.
- Updated
tzlocal
and allow future updates forcompressed_rtf
andebcdic
.
v0.41.3
- [TeamMsgExtractor #365] Fixed an issue that would cause certain values retrieved from the header to not be decoded properly. It does this when retrieving the values, so nothing about the header has been changed.
- Added new property
MessageBase.headerText
which is the text content of the header stream. Adjusted other things to use this instead of trying to retrieve the stream directly in multiple places. - Added typing to
MessageBase.header
.
v0.41.2
- Updated annotations on
MessageBase.save
. - Added new enum
BodyTypes
. - Added property
MessageBase.detectedBodies
for detecting what bodies have been stored (not generated by the module) in the .msg file.
v0.41.1
- [TeamMsgExtractor #362] Fixed an issue with the removal of the
--dev
option missing one of the checks (I swear I actually tested it).
v0.41.0
- [TeamMsgExtractor #357] Fixed an issue where the properties stream being absent would raise an error message that was not clear.
- [TeamMsgExtractor #357] Added a way to suppress
StandardViolationError
for poorly created files. This may cause issues you might not expect since this exception is meant to stop the processing for a reason. - [TeamMsgExtractor #223] Finally got around to dealing with signed attachments that are embedded MSG files.
SignedAttachment.data
now returns eitherbytes
orMSGFile
.SignedAttachment
also now has aasBytes
property that will return the bytes that created the signed attachment, regardless of if it is an MSG or not, making it unnecessary to callMSGFile.exportBytes
to get the bytes of the embedded MSG file, which can add a small delay to your code. AsAttachment
is a much more complex class, it does not (at least yet) have this property. Also unlikeAttachment
,SignedAttachment
will not throw an exception if the data is an MSG file but is not supported. Instead, it will simply be logged as a exception, but the code will continue. If the data is successfully read as an embedded MSG file, theAttachmentType
will beAttachmentType.SIGNED_EMBEDDED
. - Deprecated
AttachErrorBehavior
in favor of the newErrorBehavior
enum which controls the error behavior for the various parts of the MSG file. Uses of the former will work until the next major version. - Deprecated the
attachmentErrorBehavior
parameter ofMSGFile
in favor oferrorBehavior
. Uses of the previous will work until the next major version. - Added
treePath
property toSignedAttachment
to bring it more in line withAttachmentBase
. - Fixed bug in rare logging message caused by incorrect type name.
- Removed the
dev
anddev_classes
submodules, as most of their features are possible using the base code. Additionally, the classes involved because significantly outdated over time. - Removed the
--dev
argument from the command line. - Changed the message for the standards violation error when an attachment has no specified attachment type and it could not be determined automatically. The error now specifies the path of the attachment that had an issue for easier debugging. Additionally, the log message for it simply not being present has also been changed to show this information.
- Added a dev level log that will output the entire properties mapping if the attachment type is not set. Dev level is 5.
- Fixed a critical error in
Properties
that caused the__contains__
method to always beFalse
. This occurred because it was missing areturn
statement. Fortunately, it looks like only one part of the module was affected due to other parts using a properly written function. - Removed some debug prints that slipped through.
- Changed some parts to use
in
for checking that a property exists as opposed tohas_key
. The function was there to act more like Python 2. - Deprecated
Properties.has_key
. - Removed the
validation
submodule and all related references. It was pretty outdated and has minimal usage at this point in time. It may come back at some later point. - Changed behavior of
MessageBase.save
so it doesn't save raw when an exception occurs. This behavior may have ended up creating unexpected output which is why it was removed. It was mainly there for debugging in the first place, but is no longer necessary. - Added
__contains__
function toNamed
class. - Fixed missing import in
message_base.py
that would only cause problems if something was wrong with the HTML. - Fixed a bad error message appearing when trying to add or use an entry in
OleWriter
using an empty path. - Changed the
InvalidFileFormatError
for a missing property stream to aStandardViolationError
. - Changed the exception messages for a few exceptions to fix typos and clarify.
- Fixed issues in
_rtf.tokenize_rtf
which would cause an exception to handle incorrectly and throw an unclear error. - Fixed various small bugs caused by typos.
- Clean up unneeded imports.
- Improved existing
__all__
entries and added some where they should be. - Removed default export of the
UnrecognizedMSGTypeError
exception in favor of exporting the exceptions module. - Removed default export of
properHex
. - Improved type checking in many places.
- Fixed issues in
RecurrencePattern
.
v0.40.0
- [TeamMsgExtractor #338] Added new code to handle injection of text into the RTF body. For many cases, this will be much more effective as it relies on ensuring that it is in the main group and past the header before injection. It is not currently the first choice as it doesn't have proper respect for encapsulated HTML, however it will replace some of the old methods entirely. Solving this issue was done through the use of a few functions and the internal
_rtf
module. This module in it's entirety is considered to be implementation details, and I give no guarantee that it will remain in it's current state even across patch versions. As such, it is not recommended to use it outside of the module. - Changed
MessageBase.rtfEncapInjectableHeader
andMessageBase.rtfPlainInjectableHeader
fromstr
tobytes
. They always get encoded anyways, so I don't know why I had them returning asstr
. - Updated minimum Python version to 3.8 as 3.6 has reached end of support and 3.7 will reach end of support within the year.
- Updated information in
README
.
v0.39.2
- Fixed issues with
AttachmentBase.name
that could cause it to generate wrong. - Added convenience function
MSGFile.exportBytes
which returns the exported version fromMSGFile.export
as bytes instead of writing it to a file or file-like object.
v0.39.1
- [TeamMsgExtractor #333] Fixed typo in a warning.
- [TeamMsgExtractor #334] Removed
__del__
method fromMSGFile
. It was there for cleanup, but wasn't planned well enough to stop it from causing issues. It may be reintroduced in the future if I can manage to remove the issues. - [TeamMsgExtractor #335] Fixed some parts of
extract_msg.utils.getCommandArgs
having invalid logic after a previous (rather old) update that caused exceptions when using certain options. - Added new property
treePath
toAttachmentBase
andMSGFile
(which adds it to nearly every class). This property is the path to the current instance, represented as a tuple of instances that would be used to get to the current instance. - Added sphinx documentation.
- Fixed an issue in
OleWriter
that would produce corrupted OLE files if the DIFAT needed more than the header.
v0.39.0
- [TeamMsgExtractor #318] Added code to handle a standards violation (from what I can tell, anyways) caused by the attachment not having an
AttachMethod
property. The code will log a warning, attempt to detect the method, and throw aStandardViolationError
if it fails. - [TeamMsgExtractor #320] Changed the way string named properties are handled to allow for the string stream to have some errors and still be parsed. Warnings about these errors will be logged.
- [TeamMsgExtractor #324] Fixed an issues with contact saving when a list property returns
None
. - [TeamMsgExtractor #326] Fixed a bug that could cause some files to error when exporting.
- Fixed an issue where creation and modification times were not being copied to the new OLE file created by
OleWriter
. - Fixed up a few docstrings.
- Fixed a few issues in
MSGFile
regarding thefilename
keyword argument. - Added new argument
rootPath
toOleWriter.fromOleFile
for saving a specific directory from an OLE file instead of just copying the entire file. That directory will become the root of the new one. - Adjusted code for
OleWriter
to generate certain values only at save time to make them more dynamic. This allows for existing streams to be properly edited (although has issues with allowing storages to be edited). - Added new function
OleWriter.deleteEntry
to remove an entry that was already added. If the entry is a storage, all children will be removed too. - Added new function
OleWriter.editEntry
to edit an entry that was already added. - Added new function
OleWriter.addEntry
to add a new entry to the writer without anOleFileIO
instance. Properties of the entry are instead set using the same keyword arguments as described inOleWriter.editEntry
. - Changed
_DirectoryEntry
toDirectoryEntry
to make the more finalized version public. Access to the originals that theOleWriter
class creates should never happen, instead copies should be returned to ensure the behavior is as expected. - Added new function
OleWriter.getEntry
which returns a copy of theDirectoryEntry
instance for that stream or storage in the writer. Use this function to see the current internal state of an entry. - Added new function
OleWriter.renameEntry
which allows the user to rename a stream or storage (in place). This only changes it's direct name and not it's location in the new OLE file. - Added new function
OleWriter.walk
which is similar toos.walk
but for walking the structure of the new OLE file. - Added new function
OleWriter.listItems
which is functionally equivalent toolefile.OleFileIO.listdir
which returns a list of paths to every item. Optionally a user can get the paths just for streams, just for storages, or both. Requesting neither will simply return an empty list. Default is to just return streams. - Added a small amount of path validation to
inputToMsgPath
which is used in a lot of places where user input for a path is accepted. It ensures illegal characters don't exist and that the path segments (each name for a storage or stream) are less than 32 characters. This will be most helpful forOleWriter
. - Added many internal helper functions to
OleWriter
to make extensions easier and consolidate common code. Many of these involve direct access to internal data which is why they are private.
v0.38.4
- Fix line in
OleWriter
that was causing exporting to fail. - Fixed some issues with the
README
.
v0.38.3
- Fixed issues in HTML generation that caused line breaks to be omitted from large sections of the text.
- Fixed issues with requirements file.
v0.38.2
- Fixed new
NameError
accidentally introduced in the previous version.
v0.38.1
- Added a
__del__
method toMSGFile
to ensure a bit of proper cleanup should all references to anMSGFile
instance be removed before the file is closed.OleFileIO
doesn't appear to have one, so it's up to us to ensure it is properly closed. Note that thedel
keyword does not guarantee the immediate deletion of the object, and you should take care to close the file yourself. If this is not possible, importing thegc
module and using it'scollect
method will free the files if your code has no references to them. - Fixed an import issue with signed messages.
v0.38.0
- [TeamMsgExtractor #117] Added class
OleWriter
to allow the writing of OLE files, which allows for embedded MSG files to be extracted. - Added function
MSGFile.export
which copies all streams and storages from an MSG file into a new file. This can "clone" an MSG file or be used for extracting an MSG file that is embedded inside of another. - Added hidden function to
MSGFile
for getting theOleDirectoryEntry
for a storage or stream. This is mainly for use by theOleWriter
class. - Added option
extractEmbedded
toAttachment.save
(--extract-embedded
on the command line) which causes embedded MSG files to be extracted instead of running their save methods. - Fixed minor issues with
utils.inputToMsgPath
(renamed fromutils.inputToMsgPath
). - Renamed
utils.msgpathToString
toutils.msgPathToString
. - Made some of the module requirements a little more strict to better version control. I'll be trying to make periodic checks for updates to the dependency packages and make sure that new versions are compatible before changing the allowed versions, while also trying to keep the requirements a bit flexible.
v0.37.1
- Added option to save function (including
MSGFile.saveAttachments
) to skip any attachments marked as hidden. This can also be done from the command line with--skip-hidden
. - Fixed an issue that could cause an
MSGFile
to be included in the attachment names instead of just strings. - Fixed up several comments.
- Fixed bug that would cause spaces at the end of the subject or filename to break the module.
v0.37.0
- [TeamMsgExtractor #303] Renamed internal variable that wasn't changed when the other instances or it were renamed.
- [TeamMsgExtractor #302] Fixed properties missing (a fatal MSG error) raising an unclear exception. It now uses
InvalidFileFormatError
. - Updated
README
to contain documentation on command line option added in previous version. - Removed
MSGFile.mainProperties
after deprecating it in v0.36.0. - Added new
Properties
Intelligence
type:ERROR
. This type is used when a properties instance is created but has something wrong with it that is not necessarily fatal. Currently the only thing that will cause it is the properties stream being 0 bytes.
v0.36.5
- Documented and exposed to the command line the ability to save the headers to it's own file when saving the msg file. Thanks to martin-mueller-cemas on GitHub for fixing this (and telling me I can switch the branch a pull request points to). I don't know when I added the code to allow the user to do it, but somehow it was never documented and never exposed to the command line.
v0.36.4
- [TeamMsgExtractor #291] Fixed typo in
MSGFile.saveRaw
that may have existed for a significant amount of time. It was using the wrong function (same name, but with different capitalization) but was hidden untilMSGFile
stopped being derived fromOleFileIO
. - Added logging code to
MessageBase.getSavePdfBody
to log the list that is going to be used to runwkhtmltopdf
. This is mainly for debugging purposes, to allow users to potentially see why their arguments may be failing. - Updating funding information on GitHub and the
README
with more ways to support the module's development. - Fixed one of the exceptions in
MessageBase.getSavePdfBody
not using an fstring which caused it to omit information. - Changed the way
wkhtmltopdf
is called to patch a possible security vulnerability. This also seems to have fixed [TeamMsgExtractor #291].
v0.36.3
- Added an option to skip the body if it could not be found, rather than throwing an error. This will cause no file to be made for it in the event no valid body exists. For the save functions, this option is
skipBodyNotFound
and from the command line the option is--skip-body-not-found
. - Fixed a bug that caused contacts to save the business phone with two colons instead of 1.
v0.36.2
- [TeamMsgExtractor #286] Fix missing import.
v0.36.1
- [TeamMsgExtractor #283] Added file for typing recognition.
- Added new option to allow messages with
NotImplementedError
attachments to at least save everything not related to them. From the command line this would be--skip-not-implemented
or--skip-ni
. From the save function, this would be theskipNotImplemented
option.
v0.36.0
- Improved type hints to tell when a function may not return anything.
- Added support for the
reportTag
property toMessageBase
. I noticed this was one of the properties forIPM.Outlook.Recall
so I decided to implement it. I'll work on ensuring all of[MS-OXOMSG]
and[MS-OXCMSG]
are implemented at a later date, including splitting offREPORT
into it's own class, because it is it's own class. - Fixed a few minor issues in some properties. Mostly some
bool
returning properties should have been returning False when they were not found instead of None. Ones that may return None are specifically typed as optional in the source code. Unfortunately using thehelp
command doesn't seem to show the return type for properties for me at least on Python 3.9 and below. - Fixed
Attachment.save
returning apathlib.Path
object instead of astr
after the conversion topathlib
. - Added code to allow
pathlib
objects inutils.openMsgBulk
. It usesglob.glob
which cannot take apathlib
object. - Removed
PermanentEntryID
fromEntryID.autoCreate
. It shares a provider ID withAddressBookEntryID
, and as such would never generate from it anyways. If you specifically need aPermanentEntryID
you will have to instantiate it manually. Additionally, the entry has been removed fromenums.EntryIDType
. - Fixed the GUIDs for some of the EntryID structures being returned as bytes instead of a standard GUID string. This is considered a breaking change and is the reason for the full version increase.
- Improved consistency of properties (the Python kind, not the MSG kind) so that properties for the raw data used to create an instance will all use the same name. Not all classes have this, but the ones that do will now all use
rawData
as the property name. - Fixed the properties header being read in a way that actually swapped the values in it.
- Modified
MSGFile
to use the same name for Properties instances as all other classes.MSGFile.mainProperties
will currently raise aDeprecationWarning
instead of outright failing to help ease the transition. UseMSGFile.props
instead.
v0.35.3
- [TeamMsgExtractor #280] Fix typing issue in
message_base.py
.
v0.35.2
- Made a change to the argument handling for
--no-folders
. Since it requires--attachmentsOnly
to work, I simply made it error when it's not given to avoid confusion. - Updated README.
- Added
-v
to be used instead of--verbose
. This allows you to do-vvv
for verbosity level 3.
v0.35.1
- Fixed a few property conflicts that I missed in the last release (forgot to run the helper script before releasing).
v0.35.0
- [TeamMsgExtractor #206] Implemented full support for Post objects, including the ability to save them.
- [TeamMsgExtractor #212] Implemented full support for Task objects, including the ability to save them. This also includes TaskRequest objects.
- [TeamMsgExtractor #110] Implemented full support for Contact objects, including the ability to save them.
- [TeamMsgExtractor #143] Rewrote the system used for
Appointment
objects to include all of the objects specified in [MS-OXOCAL]. Name changed toAppointmentMeeting
. Completed support for Appointment objects, including the ability to save them. - [TeamMsgExtractor #243] Added optional dependency
mimetype-magic
(installable using themime
extra) which helps to identify attachments that do not give a mime-type. - [TeamMsgExtractor #274] Apparently the properties stream can have random garbage at the end of it (outlook generate the file that showed this) so code was added to ensure it wouldn't break everything.
- [TeamMsgExtractor #274] Made sure that the save function would report if it failed to find or generate a valid body. Specifically, if you were just trying to use the plain text body but it didn't exist (the stream didn't exist, not that the stream was empty) it would silently pass, which was bad behavior. Additionally,
allowFallback
will change the message to specify that current options were not usable for getting a valid body. - [TeamMsgExtractor #207] Changed behavior for max date. Apparently it looks like it is supposed to be August 31, 4500 at 11:59 PM. However, in case this needs to change, we have created a constant called
extract_msg.constants.NULL_DATE
to represent this that you can use in your code to not have to worry about changing your code if we check it. - Moved a few more minor constants to
enums
. - Added support for many internal data structures, specifically Entry ID structures.
- Refactored classes from
extract_msg.data
to submoduleextract_msg.structures
. - Added
python_requires
to setup.py as I noticed that it was missing. - Due to new saving requirements, adjusted the way header injection worked all around. Functions are now built-in to
MessageBase
.getSaveXBody
functions have also been moved down to be defined inMessageBase
. If the extension class needs to specify custom behaviors for creating the save bodies, these functions will need to be overridden.- For saving,
MessageBase
(being the lowest one to currently contain bodies) has a few new properties. These properties represent the injection strings that will be injected into the bodies for the header, with an additional property to specify what properties map to what part of the format string. SeeMessageBase.headerFormatProperties
for more information and an example of how to implement this in your own class. - Injection strings in constants have been removed in favor of dynamic generation, which only creates what is needed. No you will no longer see an empty Bcc field in your messages when you save them.
- Plain text bodies now also use this injection, making it easy to change the header in all bodies by overwriting a single property that tells the program what data to put where.
- For saving,
- Fixed issue in encapsulated RTF header that caused the "To" field to not be present. I had to write them by hand, so it was bound to happen. They are now dynamically generated by each instance, so these fields should always appear.
- All save code has been moved down from
Message
intoMessageBase
for convenience.Message
exists now for specific checking and for future specializations. This also means that anything that is aMessageBase
now has the entire framework for saving built-in, with easy way to change details. - Fixed bad property in
Contact
. - Created save function for
Contact
. Saving, though it exists, is rather minimal and is limited to plain text and HTML. - Significantly extended the
Contact
class's properties. - Adjusted the naming of a few
Contact
properties to better match the microsoft names.firstName
->givenName
.lastName
->surname
.businessPhone
->businessTelephoneNumber
- Etc.
- Changed existing fax properties to give a dictionary of the properties they actually contain rather than just the number. This makes them behave like the newly added email properties.
- Fixed issues with
Task
properties being incorrect. - Added implementation for PtypErrorCode.
- Changed behavior of
Properties.date
to only return the submit time. This is to ensure messages that were never sent do not have a sent date. - Changed behavior of
MessageBase.date
to only return a send date if the message has been sent. For messages with no flags, it assumesTrue
. - Generally brought saving behavior closer to the way outlook handles it.
- Made
SignedAttachment
andBaseAttachment
more similar by adding properties to each that are shared.BaseAttachment
now have aname
property andSignedAttachment
now havelongFilename
andshortFilename
. - Fixed issue in HTML saving that would cause some characters to be dropped when rendering them due to how the header injection worked.
- Removed
__init__
methods from MSG classes that don't change it. This ensures notes are easily passed down. - Correction to last comment, one max date was supposed to be at that date, but another max date is at a different date of the same year.
- Changed the way that
PtypTime
is handled, making it a single function inutils
. - Upgraded dependencies to newer versions (some really need to be newer, like
tzlocal
, for best results). Included dependencies arebeautifulsoup4
andtzlocal
. - Fixed an issue where zip file naming conflicts always failed in zip files for attachments. Both the embedded msg and the plain attachments would fail.
- Actually fixed the issue that would break the main loop.
- MSGFile no longer inherits directly from
OleFileIO
. While I would prefer to do that, the__init__
method for it is rather expensive, and allowing embedded msg files to directly share each other's instances ofOleFileIO
would improve speed immensely. - Fixed attachments not being preemptively loaded when
delayAttachments
wasFalse
. utils.openMsg
now delays attachments while loading the file to get the class type. This means all time for attachments is cut in half as they are only ever loaded once. It also means that files that won't open due to attachments will error a little later, but this shouldn't be a problem.- Changed named properties
Guid
back to constants. This has to do with the next entry. - Fixed a major issue in named properties. Apparently the ID is not enough and you must have the GUID as well, as multiple properties can share the exact same ID.
- Added option
--no-folders
to the command line allowing you to save all attachments from a set of MSG files into a single folder. - Added option
--skip-embedded
to the command line to skip saving embedded MSG files. - Added option
skipEmbedded
toAttachment.save
(and all other related save methods that call it) to skip saving an embedded MSG file. - Changed
__main__
so that it opens the zip file there instead of relying on everything it calls to do it again and again. - Changed the behavior of
--verbose
to allow it to be stacked for more verbosity. Specifying it once turns on warnings, twice for info, and three times for debug. Not specifying it only turns on error logging.
v0.34.3
- Fixed issue that may have caused other olefile types to raise the wrong type of error when passed to
openMsg
. - Fixed issues with changelog format.
- Fixed issue that caused progress to sometimes break the main loop when a file had Unicode characters if the console it was writing to didn't support them.
- Added option to
MessageBase
(and subsequentlyopenMsg
) that allows you to override the code being used for deencapsulation. SeeMessageBase.__init__
for details on how to create an override function. - Added additional msg class types that I found to the list of known class types.
v0.34.2
- [TeamMsgExtractor #267] Fixed issue that caused signed messages that were .eml files to have their data field not be a bytes instance. This field will now always be bytes. If a problem making it bytes occurs, an exception will be raised to give you brief details.
- Added function
utils.unwrapMultipart
that takes a multipart message and acquires a plain text body, an HTML body, and a list of attachments from it. These attachments are returned asdict
s that can easily be converted toSignedAttachment
s. It replaces the logicmailbits
was being used for, and as suchmailbits
is no longer required. The module may be reintroduced in the future. - Added new property
emailMessage
toSignedAttachment
, which returns the emailMessage
instance used to get data for the attachment.
v0.34.1
- Added convenience function
utils.openMsgBulk
(imported toextract_msg
namespace) for opening message paths with wildcards. Allows you to open several messages with one function, returning a list of the opened messages. - Added convenience function
utils.unwrapMsg
which recurses through anMSGFile
and it's attachments, creating a series of linear structures stored in a dictionary. Useful for analyzing, saving, etc. all messages and attachments down through the structure without having to recurse yourself. - Fixed an issue that would cause signed attachments to not properly generate.
v0.34.0
- [TeamMsgExtractor #102] Added the option to directly save body to pdf. This requires that you either have
wkhtmltopdf
on your path or that you provide a path directly to it in order to work. Simply passpdf = True
to save to turn it on. More details in the doc for the save function. You can also do this from the command line using the--pdf
option, incompatible with other body types. - Added
--glob
option for allowing you to provide an msg path that will evaluate wildcards. - Removed per-file output names as they weren't actually functional and currently add too much complexity. If anyone knows a way to handle it directly with
argparse
let me know. - Added
chardet
as a requirement to help work around an error inRTFDE
. - Looking into some of the unsupported encodings revealed at least one to be supported, but was missed due to the name not being registered as an alias.
- Fixed a bug that prevented an HTML body from the plain text body when it couldn't be generated any other way. This happened when the RTF body existed and the plain text body existed, but the RTF body was not encapsulated HTML.
- Due to several of the issues with RTFDE preventing saving due to exceptions, the option
ignoreRtfDeErrors
has been added to theMessageBase
class andopenMsg
. It can also be enabled from the command line with--ignore-rtfde
. This option will make it so all exceptions will be silenced from the module.
v0.33.0
- [TeamMsgExtractor #212] Added support for Task objects.
- Added additional parameter
overrideClass
to all_ensureSetX
type functions. This parameter is a class to initialize using the data, if provided. The data will be provided as the first argument to the__init__
function of the class if the data is notNone
. If the data is done, the value set and returned from_ensureSetX
will beNone
. This can be overridden using the second added parameter, which defaults to this behavior. Simply setpreserveNone
toFalse
to force the data to be passed to the class. Keep in mind that this means the class will receiveNone
as it's first parameter. - Updated
Named
class to make it more like the dictionary it is build on top of. Specifically, addedkeys
,values
, anditems
as functions. - Fixed issue in
Named
where old code wasn't deleted, causing some named properties to be completely omitted. - Improved efficiency of
Named
by making it soget
no longer copied the entire dictionary repeatedly while looking for the property. - Fixed typo in
Attachment
that caused embedded msg files to still regenerate theNamed
instance even though they should have been using the parent's. - Updated fields in
MSGFile
to use enums. - Added additional properties to
MSGFile
. - Significant cleanup.
v0.32.0
- [TeamMsgExtractor #217] Redesigned named properties to be much more efficient. All in all, load times for MSG files are significantly improved all around.
- Named properties load dynamically.
- Named properties use a single shared instance for embedded msg files to contain all of the entries, and tiny individual instances for actually accessing based on what is accessing.
Named
has been updated so it's methods are more standardized.Named
is no longer usable to directly access property values. Instead, access to them is done throughNamedProperties
._registerNamedProperty
has been removed due to no longer being part of the named properties framework.
- Actually removed the exception I meant to remove in 0.31.0.
- Changed
Attachment.type
to an enum instead of a string. This makes it easier to see all of the possible values. - Added option to allow attachment saving to be skipped when calling
Message.save
. - Added
type
property to all attachment types.AttachmentBase
usesAttachmentType.UNKNOWN
,BrokenAttachment
usesAttachmentType.BROKEN
, andUnsupportedAttachment
usesAttachmentType.UNSUPPORTED
.
v0.31.1
- Updated signed attachment mimetype property from
mime
tomimetype
to match with the regular attachment property.
v0.31.0
- [TeamMsgExtractor #223] First implementation of support for signed messages.
- [TeamMsgExtractor #256] Extended the properties of the attachment class to allow for access to more data.
- Fixed some issues with the contact class (some of the code was left unfinished).
- Converted many arguments for msg files to keyword arguments (
**kwargs
). This is a breaking change if your code provides anything except the path to the msg file. This will, however, make it easier to add arguments without the function signature being polluted with numerous arguments. - Updated exception blocks to ensure they would only catch exceptions that are instances of Exception or a child of Exception. A few were missing this check and so might have caught unwanted exceptions.
- Shifted some minor parts down towards the
AttachmentBase
class where possible to allow more usability with broken and unsupported attachments. Nothing about these properties required the attachment to be fully processed, which is why they were moved down. From the perspective of using theAttachment
class nothing has changed. The following properties have been moved:attachmentEncoding
,additionalInformation
,cid
/contentId
,longFilename
,renderingPosition
, andshortFilename
. - Added link to readme for supporting development.
- Finally found the behavior to use for missing encoding. As such,
MissingEncodingError
is being removed from the module entirely, as it is no longer a thing. - Moved many constant sets to enums. This makes things a lot more organized.
v0.30.14
- Fixed major bug in
MSGFile.listDir
that would cause it to give the wrong data due to caching. I forgot to have it cache a different result for each set of options, so it would always give the same result regardless of options after the first access. - Minor updates to
setup.py
. - Fixed URLs in
README
.
v0.30.13
- [TeamMsgExtractor #257] Fixed missing documentation for
customPath
keyword argument toAttachment.save
.
v0.30.12
- [TeamMsgExtractor #253] Fixed various docstring issues.
v0.30.11
- [TeamMsgExtractor #249] Fixed an error with opening the body causing the file to throw an uncaught exception in the
__init__
function ofMessageBase
. The error may still come up later, but you will still have general access to the instance. - Fixed an issue with part of the command line documentation.
v0.30.10
- [TeamMsgExtractor #249] Fixed exception catching not properly accessing the exception (forgot to go through one of the submodules to access exception).
- Updated docstring for
MessageBase.deencapsulatedRtf
.
v0.30.9
- Fixed the behavior of
Properties.get
so it actually behaves like adict
(that was the intent of it, but I did it the wrong way for some reason). - Fixed a type that caused an exception when no HTML body could be found nor generated.
v0.30.8
- Update
imapclient
requirement to>=2.1.0
instead of==2.1.0
. Currently there are no changes that would prevent current future versions from working.
v0.30.7
- [TeamMsgExtractor #239] Fixed msg.py not having
import pathlib
. - After going through the details of the example MSG files provided with the module, specifically unicode.msg, I now am glad I decided to put in some fail-safes in the HTML body processing. One of them does not have an
<html>
,<head>
, nor<body>
tag, and so would have had an error. This will actually prevent the header from injecting properly as well, so a bit of validation before was made necessary to ensure the HTML saving would still work. - Added new exception
BadHtmlError
. - Added new function
utils.validateHtml
. - Updated README credits.
- Changed header logic to generate manually if the header data has been stripped (filled with null bytes) and not just if the stream does not exist.
v0.30.6
- Small adjustments to internal code to make it a bit better.
- Added
Message.getSaveBody
,Message.getSaveHtmlBody
, andMessage.getSaveRtfBody
. These three functions generate their respective bodies that will be used when saving the file, allowing you to retrieve the final products without having to write them to the disk first. All arguments that are passed toMessage.save
that would influence the respective bodies are passed to their respective functions. - I thought I added the documentation for
attachmentsOnly
toMessage.save
but apparently it was missing. Not sure what happened there but I made sure it was added this time. - Added new option to
Message.save
calledcharset
. This is used in the preparation of the HTML body when usingpreparedHtml
. This is also usable with--charset CHARSET
from the command line.
v0.30.5
- [TeamMsgExtractor #225] Added the ability to generate the HTML body from the RTF body if it is encapsulated HTML. If there is no RTF body, then it will do a very basic generation from the plain text body. This process is automatically performed if the HTML body is missing.
- Added the ability for the plain text body to sometimes generate from the RTF body if the plain text body does not exist.
- Added documentation to
Message.save
for theattachmentsOnly
option.
v0.30.4
- [TeamMsgExtractor #233] Added option to
Message.save
to only save the attachments (attachmentsOnly
). This can also be used from the command line with the option--attachments-only
. - Corrected error string for incompatible options in
Message.save
. - Fixed if blocks being nested weirdly in
Message.save
instead of being chained withelif
. Probably an artifact left from adding support for HTML and RTF.
v0.30.3
- [TeamMsgExtractor #232] Updated list of known MSG class types so the module would correctly give
UnsupportedMSGTypeError
instead ofUnrecognizedMSGTypeError
.
v0.30.2
- Fixed typo in
utils.knownMsgClass
. - Updated contributing guidelines and pull request template. Pull requests were updated to match the new structure of the module while the guidelines were updated for clarity.
v0.30.1
- [TeamMsgExtractor #102] Added property
MessageBase.htmlBodyPrepared
which provides the HTML body that has been prepared for actual use or conversion to PDF. All attachments that can be injected into the body directly will be injected. - Corrected a mistake in the documentation of
Message.save
. - Fixed issue where the command line parser was not checking for raw in conjunction with other saving options.
- Changed
utils.setupLogging
to usepathlib
. - Improved reliability of the logic in
utils.setupLogging
. - Removed function
utils.getContFileDir
. - Cleaned up plain
Exception
being raised at the end ofutils.injectRtfHeader
if the injection failed. This now raisesRuntimeError
, an error that should be reported so the injection system can be improved. - Added parsing for PtypServerId to
utils.parseType
. - Added
FolderID
,MessageID
, andServerID
todata
. - Added zip output to the command line.
- Added support for PtypMultipleFloatingTime to
utils.parseType
. - Improved documentation of many functions with the exceptions they may raise.
- Changed the way zip files are handled so that files written to them actually have a modification date now.
v0.30.0
- Removed all support for Python 2. This caused a lot of things to be moved around and changed from indirect references to direct references, so it's possible something fell through the cracks. I'm doing my best to test it, but let me know if you have an issue.
- Changed classes to now prefer
super()
over direct superclass initialization. - Removed explicit object subclassing (it's implicit in Python 3 so we don't need it anymore).
- Converted most
.format
s into f strings. - Improved consistency of docstrings. It's not perfect, but it should at least be better.
- Started the addition of type hints to functions and methods.
- Updated
utils.bytesToGuid
to make it faster and more efficient. - Renamed
utils.msgEpoch
toutils.filetimeToUtc
to be more descriptive. - Updated internal variable names to be more consistent.
- Improvements to the way
__main__
works. This does not affect the output it will generate, only the efficiency and readability.
v0.29.3
- [TeamMsgExtractor #226] Fix typo in command parsing that prevented the usage of
allowFallback
. - Fixed main still manually navigating to a new directory with
os.chdir
instead of usingcustomPath
. - Fixed issue in main where the
--html
option was being using for both html and rtf. This meant if you wanted rtf it would not have used it, and if you wanted html it would have thrown an error. - Fixed
--out-name
having no effect. - Fixed
--out
having no effect.
v0.29.2
- Fixed issue where the RTF injection was accidentally doing HTML escapes for non-encapsulated streams and not doing escapes for encapsulated streams.
- Fixed name error in
Message.save
causing bad logic. For context, the internal variablezip
was renamed to_zip
to avoid a name conflict with the built-in function. Some instances of it were missed.
v0.29.1
- [TeamMsgExtractor #198] Added a feature to save the header in it's own file (prefers the full raw header if it can find it, otherwise puts in a generated one) that was actually supposed to be in v0.29.0 but I forgot, lol.
v0.29.0
- [TeamMsgExtractor #207] Made it so that unspecified dates are handled properly. For clarification, an unspecified date is a custom value in MSG files for dates that means that the date is unspecified. It is distinctly different from a property not existing, which will still return None. For unspecified dates,
datetime.datetime.max
is returned. While perhaps not the best solution, it will have to do for now. - Fixed an issue where
utils.parseType
was returning a string for the date when it makes more sense to return an actual datetime instance. - [TeamMsgExtractor #165] [TeamMsgExtractor #191] Completely redesigned all existing save functions. You can now properly save to custom locations under custom file names. This change may break existing code for several reasons. First, all arguments have been changed to keyword arguments. Second, a few keyword arguments have been renamed to better fit the naming conventions.
- [TeamMsgExtractor #200] Changed imports to use relative imports instead of hard imports where applicable.
- Updated the save functions to no longer rely on the current working directory to save things. The module now does what it can to use hard pathing so that if you spontaneously change working directory it will not cause problems. This should also allow for saving to be threaded, if I am correct.
- [TeamMsgExtractor #197] Added new property
Message.defaultFolderName
. This property returns the default name to be used for a Message if none of the options change the name. - [TeamMsgExtractor #201] Fixed an issue where if the class type was all caps it would not be recognized. According to the documentation the comparisons should have been case insensitive, but I must have misread it at some point.
- [TeamMsgExtractor #202] Module will now handle path lengths in a semi-intelligent way to determine how best to save the MSG files. Default path length max is 255.
- [TeamMsgExtractor #203] Fixed an issue where having multiple "." characters in your file name would cause the directories to be incorrectly named when using the
useFileName
(nowuseMsgFilename
) argument in the save function. - [TeamMsgExtractor #204] Fixed an issue where the failsafe name used by attachments wasn't being encoded before hand causing encoding errors.
- MSG files with a type of simply
IPM
will now be returned asMSGFile
byopenMsg
, as this specifies that no format has been specified. - [TeamMsgExtractor #214] Attachments that error because the MSG class type wasn't recognized or isn't supported will now correctly be
UnsupportedAttachment
instead ofBrokenAttachment
. - Improved internal code in many functions to make them faster and more efficient.
openMsg
will now tell you if a class type is simply unsupported rather than unrecognized. If it is found in the list, the function will raiseUnsupportedMSGTypeError
.- Added caching to
MSGFile.listDir
. I found that if you have larger files this single function might be taking up over half of the processing time because of how many times it is used in the module. - Fully implemented raw saving.
- Extended the
Contact
class to have more properties. - Added new function
MSGFile._ensureSetTyped
which acts like the other ensure set functions but doesn't require you to know the type. Prefer to use other ensure set function when you know exactly what type it will be. - Changed
Message.saveRaw
toMSGFile.saveRaw
. - Changed
MSGFile.saveRaw
to take a path and save the contents to a zip file. - Corrected the help doc to reflect the current repository (was still on mattgwwalker).
- Fixed a bug that would cause an exception on trying to access the RTF body on a file that didn't have one. This is now correctly returning
None
. - The
raw
keyword ofMessage.save
now actually works. - Added property
Attachment.randomFilename
which allows you to get the randomly generated name for attachments that don't have a usable one otherwise. - Added function
Attachment.regenerateRandomName
for creating a new random name if necessary. - Added function
Attachment.getFilename
. This function is used to get the name an attachment will be saved with given the specified arguments. Arguments are identical toAttachment.save
. - Changed pull requests to reflect new style.
- Added additional properties for defined MSG file fields.
- Added zip file support for the
Attachment.save
andMessage.save
. Simply pass a path for thezip
keyword argument and it will create a newZipFile
instance and save all of it's data inside there. Alternatively, you can pass an instance of a class that is either aZipFile
orZipFile
-like and it will simply use that. When this argument is defined, thecustomPath
argument refers to the path inside the zip file. - Added the
html
andrtf
keywords toMessage.save
. These will attempt to save the body in the html or rtf format, respectively. If the program cannot save in those formats, it will raise an exception unless theallowFallback
keyword argument isTrue
. - Changed
utils.hasLen
to usehasattr
instead of the try-except method it was using. - Added new option
recipientSeparator
toMessageBase
allowing you to specify a custom recipient separator (default is ";" to match Microsoft Outlook). - Changed the
openMsg
function inAttachment
to not be strict. This allows you to actually open the MSG file even if we don't recognize the type of embedded MSG that is being used. - Attempted to normalize encoding names throughout the module so that a certain encoding will only show up using one name and not multiple.
- Finally figured out what CRC32 algorithm is used in named properties after directly asking in a Microsoft forum (see the thread here). Fortunately the is already defined in the
compressed-rtf
module so we can take advantage of that. - Reworked
MessageBase._genRecipient
to improve it (because what on earth was that code it was using before?). Variables in the function are now more descriptive. Added comments in several places. - Many renames to better fit naming convention:
dev.setup_dev_logger
todev.setupDevLogger
.MSGFile.fix_path
toMSGFile.fixPath
.MessageBase.save_attachments
toMessageBase.saveAttachments
.*.Exists
toexists
.*.ExistsTypedProperty
to*.existsTypedProperty
.prop.create_prop
toprop.createProp
.Properties.attachment_count
toProperties.attachmentCount
.Properties.next_attachment_id
toProperties.nextAttachmentId
.Properties.next_recipient_id
toProperties.nextRecipientId
.Properties.recipient_count
toProperties.recipientCount
.utils.get_command_args
toutils.getCommandArgs
.utils.get_full_class_name
toutils.getFullClassName
.utils.get_input
toutils.getInput
.utils.has_len
toutils.hasLen
.utils.setup_logging
toutils.setupLogging
.constants.int_to_data_type
toconstants.intToDataType
.constants.int_to_intelligence
toconstants.intToIntelligence
.constants.int_to_recipient_type
toconstants.intToRecipientType
.- Misc internal function variables.
v0.28.7
- Added hex versions of the
MULTIPLE_X_BYTES
constants. - Added
1048
toconstants.MULTIPLE_16_BYTES
- [TeamMsgExtractor #173] rewrote the parsing of the length in
VariableLengthProp.__init__
to use constants rather than values coded directly into the function. This should fix this issue.
v0.28.6
- [TeamMsgExtractor #191] This feature was never properly implemented, so it's not officially supported. However, this specific issue should be fixed. This is a temporary patch until I can get around to rewriting the way the module saves files in general.
- Added
venv
to the.gitignore
list. - Added information to the readme.
v0.28.5
- [TeamMsgExtractor #189] Forgot to import
prepareFilename
inattachment.py
. - Fixed bad link in the changelog.
v0.28.4
- [TeamMsgExtractor #184] Added code to
Message
to ensure subjects with null characters get stripped of them. - Moved code for stripping subjects of bad characters to
prepareFilename
inutils
.
v0.28.3
- Fixed minor typo in an exception description.
- Updated the README this time. Forgot to do it for at least 1 update.
v0.28.2
- Started preparing more of the code for when HTML and RTF saving are fully implemented. Please note that they do not work at all right now. Commented out the code for this because it wasn't meant to be uncommented.
- [TeamMsgExtractor #184] Added code to ensure file names don't have null characters when saving an attachment.
- Minor improvement to the section of the save code that checks if you have provided incompatible options.
- [TeamMsgExtractor #185] Added the
IncompatibleOptionsError
. It was supposed to be added a few updates ago, but was accidentally left out. - Modified
Message.save
to return the currentMessage
instance to allow for chained commands. This allows you to do something likeextract_msg.openMsg("path/to/message.msg").save().close()
where you could not before.
v0.28.1
- [TeamMsgExtractor #181] Fixed issue in
Attachment
that arose when moving some of the code to a base class. - Fixed small error in
utils.parse_type
that caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning. - Added the
ebcdic
module to the requirements to add more supported encodings.
v0.28.0
- [TeamMsgExtractor #87] Added a new system to handle
NotImplementedError
and other exceptions. All msg classes now have an option calledattachmentErrorBehavior
that tells the class what to do if it has an error. The value should be one of three constants:ATTACHMENT_ERROR_THROW
,ATTACHMENT_ERROR_NOT_IMPLEMENTED
, orATTACHMENT_ERROR_BROKEN
.ATTACHMENT_ERROR_THROW
tells the class to not catch and exceptions and just let the user handle them.ATTACHMENT_ERROR_NOT_IMPLEMENTED
tells the class to catchNotImplementedError
exceptions and put an instance ofUnsupportedAttachment
in place of a regular attachment.ATTACHMENT_ERROR_BROKEN
tells the class to catch all exceptions and either replace the attachment withUnsupportedAttachment
if it is aNotImplementedError
orBrokenAttachment
for all other exceptions. With both of those options, caught exceptions will be logged. - In making the previous point work, much code from
Attachment
has been moved to a new class calledAttachmentBase
. BothBrokenAttachment
andUnsupportedAttachment
are subclasses ofAttachmentBase
meaning data can be extracted from their streams in the same way as a functioning attachment. - [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
- Fixed typos in some exceptions.
v0.27.16
- [TeamMsgExtractor #177] Fixed incorrect struct being used. It should be the correct one now, but further testing will be required to confirm this.
- Fixed log error message in
extract_msg.prop
to actually format a value into the message.
v0.27.15
- [TeamMsgExtractor #177] Fixed missing import.
v0.27.14
- [TeamMsgExtractor #173] Fixed typo that I made in the last version that broke things. I didn't have the resources to test this one myself, unfortunately.
- Fixed a typo in an exception message.
v0.27.13
- [TeamMsgExtractor #173] Moved some data used in checks into constants so that I can make sure they get changed every where that they are used. Hopefully I can close this issue.
v0.27.12
- [TeamMsgExtractor #173] Made an assumption about where an exception was thrown from and was wrong. While that location would have throw an exception, the function that called that code was the one to actually throw the exception in question. This issue should be fixed...
- [TeamMsgExtractor #162] Made another attempt to fix the execution flag on the wrapper script.
v0.27.11
- [TeamMsgExtractor #173] Tentatively implemented type 0x1014 (PtypMultipleInteger64). Apparently I forgot to do it earlier.
v0.27.10
- [TeamMsgExtractor #162] Fixed line endings in the wrapper script to be UNIX line endings rather than Windows line endings. Attempted to add the execution flag to the runnable script.
v0.27.9
- [TeamMsgExtractor #161] Added commands to the command line that will allow the user to specify that they want the message data to be output to
stdout
rather than to a file. - [TeamMsgExtractor #162] Added a wrapper for extract_msg that will be installed.
- Fixed some of the encoding names to allow them to actually be used in Python. The names they previously held were not aliases that currently exist.
- Added more documentation to
constants.CODE_PAGES
to give more information about what it is. As it is a list of the possible encodings an msg file can use, I also specified which ones were supported by Python 3. - Moved the main code into a function so it is now callable from outside of the file.
v0.27.8
- [TeamMsgExtractor #158] Fixed a spelling error in a function name that was causing it to not be seen. The function was called
ceilDiv
but was accidentally called ascielDiv
.
v0.27.7
- Fixed an issue in the new bitwise adjustment functions. One of the variable names was incorrect.
v0.27.6
- Fixed a few lines in
data.py
.
v0.27.5
- Fixed an error in
utils.divide
that would cause it to drop the extra data if there was not enough to create a full division. For example, if you had a string that was 10 characters, and divided by 3, you would only receive a total of 9 characters back. - Added some useful functions that will be used in the future.
- [TeamMsgExtractor #155] Updated to use new version of
tzlocal
. - Updated changelog to fit new repository.
v0.27.4
- [TeamMsgExtractor #152] Fixed an issue where the name of an exception was put as the wrong thing.
v0.27.3
- [TeamMsgExtractor #105] Added code to fix an internal msg issue that had recipient lists being split up in the header after a certain amount of characters. This was now a bug on our part, but an issue with the generation of the msg file itself.
- Exposed the
MessageBase
class directly fromextract_msg
. I forgot to do this when I created it. - Added
MessageBase.bcc
. I think it used to exist but got erased somehow on accident. Either way, it exists now.
v0.27.2
- After much debate, I have finally decided to allow an option to override the string encoding in message files. This Was something I reserved solely for
dev_classes.Message
because it felt like it didn't fit with how msg files were supposed to work. I also didn't want messages from people about them running into errors after they overrode the encoding. You can now do this by providing theoverrideEncoding
option on anyMSGFile
class as well as theopenMsg
function. - [TeamMsgExtractor #103] Implemented correct detection of encodings. If you have any more issues with "'X' codec can't decode bytes" it is likely because the encoding specified inside the msg file is wrong.
v0.27.1
- [TeamMsgExtractor #147] Fixed an issue in
Message.save
caused by it attempting to directly access a private variable that was moved to the base classMessageBase
.
v0.27.0
- [TeamMsgExtractor #143] Added new class
Appointment
that can handle outlook appointments or meetings. - [TeamMsgExtractor #143] Added new class
MessageBase
for classes that end up being mostly like theMessage
class. Currently subclassed byMessage
andAppointment
. This should not have a direct effect on any code that uses this module. - Added support for
PtypFloatingTime
inutils.parseType
. - Added proper support for
PtypTime
inFixedLengthProp.parseType
- Added pretty print functions to the
Properties
class and theNamed
class. - Added new function
MSGFile._ensureSetProperty
that acts likeMSGFile._ensureSet
except that it works with properties from theProperties
instance. - Added equivalent of the previously mentioned function to the
Attachment
class and theRecipient
class.
v0.26.4
- Added new function
MSGFile._ensureSetNamed
which acts likeMSGFile._ensureSet
except that it works with named properties. - Added a version of the previous function to the
Attachment
andRecipient
classes. - Added new file
data.py
that contains various data structures that are used for specific properties. - Expanded the functionality of the
Recipient
class by adding more properties. - Added new functions
Named.getNamed
andNamed.getNamedValue
which retrieves a named property or the value of a named property, respectively, based on its name.
v0.26.3
- Added new function
MSGFile.save
that causes it and subclasses to raise aNotImplementedError
if they do not override it. - Fixed some issues in the changelog.
- Added some additional constants for future use.
v0.26.2
- Fixed error in
Message._registerNamedProperty
where I put the exceptionKeyError
instead ofAttributeError
.
v0.26.1
- Fixed an issue in
openMsg
that would leave the basicMSGFile
instance open with the function returned, even if the function was returning a specific msg instance.
v0.26.0
- [TheElementalOfDestruction #3] Implementation of Named properties has finally been added. This allows us to access certain data that was not available to us before through regular methods.
- Added new function
MSGFile.slistDir
that acts likeMSGFile.listDir
, except that it returns a list of strings rather than a list of lists. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedStream
which, based on a path formatted in the same way as you would give toMSGFile._getStringStream
, will return the data in the specified stream without the user needing to know the type before hand. However, if you DO know the type before hand, you can provide this function with one of the values inconstants.FIXED_LENGTH_PROPS_STRING
orconstants.VARIABLE_LENGTH_PROPS_STRING
. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedProperty
which, based on a 4 digit hexadecimal string, will return the property in the properties file that matches that string without the type needing to be specified. However, if you DO know the type before hand, you can provide this function with one of the values inconstants.FIXED_LENGTH_PROPS_STRING
orconstants.VARIABLE_LENGTH_PROPS_STRING
. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedData
which is a combination of the two previously stated functions. - Added new function
MSGFile.ExistsTypedProperty
which determines if a property with the specified id exists in the specified location. If you are looking for a property that may be in the properties file of an attachment or a recipient, please use the corresponding function from that class. - Added an equivalent of the previous 4 functions for the
Recipient
andAttachment
classes. - [TheElementalOfDestruction #2] Finished partial implementation of
utils.parseType
which was necessary for the proper implementation of named properties. This function is not fully implemented because there are some types we do not fully understand.
v0.25.3
- [TeamMsgExtractor #138] Fixed missing import in
extract_msg/utils.py
.
v0.25.2
- [TeamMsgExtractor #134] Fixed a typo that caused
Message.headerDict
to raise an exception. - Upgraded code for
Message.headerDict
to avoid accidentally raising a key error if the header is ever missing the "Received" property. - Fixed an error in the changelog that caused some issue links to link to the wrong place.
v0.25.1
- [TeamMsgExtractor #132] Fixed an issue caused by unfinished code being left in the __main__ file.
- Cleaned up the imports to only be what is needed.
v0.25.0
- Added new class
MSGFile
. TheMessage
class now inherits from this. This class is the base for all MSG files, not justMessage
s. It somewhat recently came to our attention that MSG files are used for a variety of things, including the storage of contacts, leading us to the next part of the changelog. - [TeamMsgExtractor #110] Added new class
Contact
for extracting the data from MSG files storing contacts. - Added new function
openMsg
to the module to be used to open MSG files in which it is not certain what type of MSG is being opened. - Modified the
Attachment
class to use theopenMsg
function to open embedded MSG files. - Added option
delayAttachments
to theMessage
class that will stop it from initializing attachments until the user is ready. This allows users to openMessage
s that have unimplemented attachment types without having to worry about the exception stopping them. This is also an option in the newopenMsg
function.
v0.24.4
- Added new property
Message.isRead
to show whether the email has been marked as read. - Renamed
Message.header_dict
toMessage.headerDict
to better match naming conventions. - Renamed
Message.message_id
toMessage.messageId
to better match naming conventions.
v0.24.3
- Added new close function to the
Message
class to ensure that all embeddedMessage
instances get closed as well. Not having this was causing issues with trying to modify the msg file after the user thought that it had been closed.
v0.24.2
- Fixed bug that somehow escaped detection that caused certain properties to not work.
- Fixed bug with embedded msg files introduced in v0.24.0
v0.24.0
- [TeamMsgExtractor #107] Rewrote the
Messsage.save
function to fix many errors arising from it and to extend its functionality. - Added new function
isEmptyString
to check if a string passed to it isNone
or is empty.
v0.23.4
- [TeamMsgExtractor #112] Changed method used to get the message from an exception to make it compatible with Python 2 and 3.
- [TheElementalOfDestruction #23] General cleanup and all around improvements of the code.
v0.23.3
- Fixed issues in readme.
- [TheElementalOfDestruction #22] Updated
dev_classes.Message
to better match the currentMessage
class. - Fixed bad links in changelog.
- [TeamMsgExtractor #95] Added fallback encoding as well as manual encoding change to
dev_classes.Message
.
v0.23.1
- Fixed issue with embedded msg files caused by the changes in v0.23.0.
v0.23.0
- [TeamMsgExtractor #75] & [TheElementalOfDestruction #19] Completely rewrote the function
Message._getStringStream
. This was done for two reasons. The first was to make it actually work with msg files that have their strings encoded in a non-Unicode encoding. The second reason was to make it so that it better reflected msg specification which says that ALL strings in a file will be either Unicode or non-Unicode, but not both. Because of the second part, theprefer
option has been removed. - As part of fixing the two issues in the previous change, we have added two new properties:
- a boolean
Message.areStringsUnicode
which tells if the strings are Unicode encoded. - A string
Message.stringEncoding
which tells what the encoding is. This is used by theMessage._getStringStream
to determine how to decode the data into a string.
- a boolean
v0.22.1
- [TeamMsgExtractor #69] Fixed date format not being up to standard.
- Fixed a minor spelling error in the code.
v0.22.0
- [TheElementalOfDestruction #18] Added
--validate
option. - [TheElementalOfDestruction #16] Moved all dev code into its own scripts. Use
--dev
to use from the command line. - [TeamMsgExtractor #67] Added compatibility module to enforce Unicode
os
functions. - Added new function to
Message
class:Message.sExists
. This function checks if a string stream exists. It's input should be formatted identically to that ofMessage._getStringStream
. - Added new function to
Message
class:Message.fix_path
. This function will add the proper prefix to the path (if theprefix
parameter is true) and adjust the path to be a string rather than a list or tuple. - Added new function to
utils.py
:get_full_class_name
. This function returns a string containing the module name and the class name of any instance of any class. It is returned in the format of{module}.{class}
. - Added a sort of alias of
Message._getStream
,Message._getStringStream
,Message.Exists
, andMessage.sExists
toAttachment
andRecipient
. These functions run inside the associated attachment directory or recipient directory, respectively. - Added a fix to an issue introduced in an earlier version caused by accidentally deleting a letter in the code.
v0.21.0
- [TheElementalOfDestruction #12] Changed debug code to use logging module.
- [TheElementalOfDestruction #17] Fixed Attachment class using wrong properties file location in embedded msg files.
- [TheElementalOfDestruction #11] Improved handling of command line arguments using argparse module.
- [TheElementalOfDestruction #16] Started work on moving developer code into its own script.
- [TeamMsgExtractor #63] Fixed JSON saving not applying to embedded msg files.
- [TeamMsgExtractor #55] Added fix for recipient sometimes missing email address.
- [TeamMsgExtractor #65] Added fix for special characters in recipient names.
- Module now raises a custom exception (instead of just
IOError
) if the input is not a valid OLE file. - Added
header_dict
property to theMessage
class. - General minor bug fixes.
- Fixed a section in the
Recipient
class that I have no idea why I did it that way. If errors start randomly occurring with it, this fix is why.
v0.20.8
- Fixed a tab issue and parameter type in
message.py
.
v0.20.7
- Separated classes into their own files to make things more manageable.
- Placed
__doc__
back inside of__init__.py
. - Rewrote the
Prop
class to be two different classes that extend from a base class. - Made decent progress on completing the
parse_type
function of theFixedLengthProp
class (formerly a function of theProp
class). - Improved exception handling code throughout most of the module.
- Updated the
.gitignore
. - Updated README.
- Added
# DEBUG
comments before debugging lines to make them easier to find in the future. - Added function
create_prop
inprop.py
which should be used for creating what used to be an instance of theProp
class. - Added more constants to reflect some of the changes made.
- Fixed a major bug that was causing the header to generate after things like "to" and "cc" which would force those fields to not use the header.
- Fixed the debug variable.
- Fixed many small bugs in many of the classes.
- [TheElementalOfDestruction #13] Various loose ends to enhance the workflow in the repo.