You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both quick=TRUE and quick=FALSE methods will now handle text processing (not including comment text in cell content, repeated whitespace and multi-line cell content) without significant drains on performance.
Performance is broadly comparable to that provided by {readODS} for the standard example file in regards execution time, but perhaps understandably (due to what it is extracting) uses more memory.
Performance for large files when using quick=TRUE is quicker than {readODS} and only slightly slower when quick=FALSE, interestingly all {tidyods} extraction approaches use notably less memory than {readODS}.
Performance bottlenecks are now largely due to {xml2} (and underlying libxml2) limitations that cannot be overcome without writing independent C/C++ code to handle XML extraction.
A critical limitation of libxml2 is its requirement for available memory 4 times the file size.
In general for a balanced textual document the internal memory requirement is about 4 times the size of the UTF8 serialization of this document (example the XML-1.0 recommendation is a bit more of 150KBytes and takes 650KBytes of main memory when parsed) GNOME libxml 2 documentation
As a precaution {tidyods} checks the size of the content.xml file inside the ODS zip container and compares this to the available memory reported by ps::ps_system_memory() to determine whether the XML can be safely processed.
This check is an internal function that throws an error when the XML is too large and invisibly returns TRUE if the XML is an ok size, the internal function has a verbose argument if you want to get a report on the file size, processing requirement and available memory.
tidyods::check_xml_memory("path/to/small_file.ods")
#> Error in `check_xml_memory()`:#> ! ODS file is too large to process#> ℹ ODS XML is estimated to need 7.74 GB of memory, uncompressed content.xml#> file within path/to/small_file.ods is 1.93 GB in size.#> ✖ Available system memory is estimated at 1.50 GBtidyods:::check_xml_memory("path/to/small_file.ods", verbose=TRUE)
#> ℹ ODS XML is estimated to need 228.76 kB of memory, uncompressed content.xml #> file within path/to/small_file.ods is 57.19 kB in size.#> ✔ Available system memory is estimated at 1.55 GB
The text was updated successfully, but these errors were encountered:
A successor to previous performance issue (#3).
Both
quick=TRUE
andquick=FALSE
methods will now handle text processing (not including comment text in cell content, repeated whitespace and multi-line cell content) without significant drains on performance.Performance is broadly comparable to that provided by
{readODS}
for the standard example file in regards execution time, but perhaps understandably (due to what it is extracting) uses more memory.Performance for large files when using
quick=TRUE
is quicker than{readODS}
and only slightly slower whenquick=FALSE
, interestingly all{tidyods}
extraction approaches use notably less memory than{readODS}
.Performance bottlenecks are now largely due to
{xml2}
(and underlyinglibxml2
) limitations that cannot be overcome without writing independent C/C++ code to handle XML extraction.A critical limitation of libxml2 is its requirement for available memory 4 times the file size.
As a precaution
{tidyods}
checks the size of thecontent.xml
file inside the ODS zip container and compares this to the available memory reported byps::ps_system_memory()
to determine whether the XML can be safely processed.This check is an internal function that throws an error when the XML is too large and invisibly returns TRUE if the XML is an ok size, the internal function has a
verbose
argument if you want to get a report on the file size, processing requirement and available memory.The text was updated successfully, but these errors were encountered: