Skip to content

Releases: ropensci/epubr

epubr 0.6.4 release

23 Sep 15:42
Compare
Choose a tag to compare
  • Added required package alias in documentation.
  • Allow multiple values for a metadata field, pipe-separated.
  • Update documentation and tests.

epubr 0.6.2 release

24 Feb 17:04
Compare
Choose a tag to compare
  • Documentation updates.

epubr 0.6.1 release

08 Dec 17:05
Compare
Choose a tag to compare
  • Fix warnings caused by updates to other packages like tidyr.

epubr 0.6.0 release

11 Jan 18:13
Compare
Choose a tag to compare
  • Added count_words helper function.
  • Improved word count accuracy in epub. Now also splitting words on new line characters rather than only on spaces. Now also ignoring vector elements in the split result that are most likely to not be words, such as stranded pieces of punctuation.
  • Added epub_recombine for breaking apart and recombining text sections into new data frame rows using alternative breaks based on a regular expression pattern.
  • Added epub_sift function for filtering out small text sections based on low word or character count. This function can also be used directly inside calls to epub_recombine through an argument list.
  • Added epub_reorder for reordering a specified (by index) subset of text section data frame rows according to a text parsing function.
  • Refactored code to remove purrr dependency.
  • Added unit tests.
  • Updated function documentation, readme and vignette.

epubr 0.5.0 release

24 Oct 02:43
Compare
Choose a tag to compare
  • Added epub_cat function for pretty printing to console as a helpful way to quickly inspect the parsed text in a more easily readable format than looking at the quoted strings in the table entries. epub_cat can take an EPUB filename string (may be a vector) as its first argument or a data frame already returned by epub.
  • Like epub_cat, epub_head accepts EPUB character filenames or now also a data frame already returned by epub based on those files. Because of this change, the first argument has been renamed from file to x.
  • Added encoding argument to epub function, defaulting to UTF-8.
    • This helps significantly with reading EPUB archive files properly, e.g., providing ability to parse and substitute all the curly single and double quotes, apostrophes, various forms of hyphens and ellipses.
    • Previously, these were not substituted (e.g., replacing curly quotes with straight quotes), but attempting to do so would have failed anyway because they were not initially read correctly due to the lack of encoding specification.
    • Now non-standard characters are more likely to be read correctly, and those mentioned above are substituted with standard versions. If necessary, the encoding can be changed from UTF-8 via the new argument.
    • It appears that the EPUB format requires UTF encoding. Currently the only permissible option other than UTF-8 is UTF-16. This keeps things very simple and straightforward. Users should not encounter EPUB files in other encodings.
  • Added unit tests and updated documentation.

epubr 0.4.1 release

09 Aug 14:09
Compare
Choose a tag to compare
  • Improved handling of errors and better messages.
  • More robust handling of title field when missing, redundant or requiring remapping/renaming. All outputs of epub now include a title as well as data field, even if the e-book does not have a metadata field named title.
  • Minor improvements to e-book section handling.
  • Added epub_head function for previewing the opening text of each e-book section.
  • Removed R version from Depends field of DESCRIPTION. Package Imports that necessitated a higher R version were previously removed.
  • Minor fixes.
  • Updated documentation, vignette, unit tests.

epubr 0.4.0 release

30 May 13:20
Compare
Choose a tag to compare
  • Enhanced function documentation details.
  • Added epub_meta for strictly parsing EPUB metadata without reading the full file contents.
  • When working with a vector of EPUB files, functions now clean up each unzipped archive temp directory with unlink immediately after use, rather than after all files are read into memory or by overwriting files in a single temp directory.
  • Added initial introduction vignette content.
  • Minor function refactors.
  • Minor bug fixes.
  • Added unit tests.