Skip to content

Releases: Living-with-machines/alto2txt

v0.3.4

29 Nov 18:26
3277d44
Compare
Choose a tag to compare

alto2txt: Extract plain text from newspapers

Converts XML (in METS 1.8/ALTO 1.4, METS 1.3/ALTO 1.4, BLN or UKP format) publications to plaintext articles and generates minimal metadata.

Full documentation and demo instructions.

Added

  • Added PyPI version and MIT license badges to README.md
  • Added pytest-cov with default options to assess documentation
  • Added isort to .pre-commit-config.yaml to sort import consistency
  • Added pycln to .pre-commit-config.yaml to check unused imports
  • Added pycln configuration to pyproject.toml
  • Added alto2txt as a command line script in pyproject.toml

Changed

  • Switch from Apache v2.0 license to MIT license, inline with project recommendations.
  • Updated mypy in .pre-commit-config.yaml

Deprecated

  • Replace extract_publications_text.py with the alto2txt command line interface script specified in pyproject.toml

Removed

  • setup.py
  • requirements.txt

Fixed

  • Fixed python = ">3.6.0" in pyproject.toml rather than >3.7 for consistency with documentation
  • Fixed licensing ambiguity (now all should be MIT)
  • Fixed typos in README.md
  • Fixed surperflous imports via pycln in pre-commit