Skip to content

v1.0.0

Latest
Compare
Choose a tag to compare
@vmenger vmenger released this 20 Dec 10:04
f9fb9fb

1.0.0 (2023-12-20)

Added

  • some internal speedups for SingleTokenLooupAnnotator, MultiTokenLookupAnnotator and LookupTrie
  • caching for sorting annotations, which helps with speed
  • the pre_match_words attribute for RegexpAnnotator
  • the option to provide a LookupTrie to a MultiTokenAnnotator directly
  • a method for getting all words or, for looking up tokens with specific text values in a TokenList, with options for matching_pipeline
  • automated build/publish on merge to main

Changed

  • sorting Annotation and AnnotationSet now requires sort key to be provided as a tuple, and callbacks as a frozendict
  • renamed docdeid.tokenize to docdeid.tokenizer
  • renamed docdeid.process.doc to docdeid.process.doc_processor
  • renamed docdeid.process.annotation_set to docdeid.process.annotation_processor
  • Annotation and Token now only include int/str fields when serializing
  • formatting and linting settings
  • moved the logic for linking tokens to TokenList rather than Tokenizer
  • use casefold() instead of lower() for lowercasing

Fixed

  • a bug with overlapping annotations in MultiTokenLookupAnnotator

Removed

  • automated coverage reporting