Releases
0.4.0
0.4.0
Added generic partition
brick that detects the file type and routes a file to the appropriate
partitioning brick.
Added a file type detection module.
Updated partition_html
and partition_eml
to support file-like objects in 'rb' mode.
Cleaning brick for removing ordered bullets clean_ordered_bullets
.
Extract brick method for ordered bullets extract_ordered_bullets
.
Test for clean_ordered_bullets
.
Test for extract_ordered_bullets
.
Added partition_docx
for pre-processing Word Documents.
Added new REGEX patterns to extract email header information
Added new functions to extract header information parse_received_data
and partition_header
Added new function to parse plain text files partition_text
Added new cleaners functions extract_ip_address
, extract_ip_address_name
, extract_mapi_id
, extract_datetimetz
Add new Image
element and function to find embedded images find_embedded_images
Added get_directory_file_info
for summarizing information about source documents
You can’t perform that action at this time.