v1.0.0

Latest

Latest

vmenger released this 20 Dec 10:04

f9fb9fb

1.0.0 (2023-12-20)

Added

some internal speedups for SingleTokenLooupAnnotator, MultiTokenLookupAnnotator and LookupTrie
caching for sorting annotations, which helps with speed
the pre_match_words attribute for RegexpAnnotator
the option to provide a LookupTrie to a MultiTokenAnnotator directly
a method for getting all words or, for looking up tokens with specific text values in a TokenList, with options for matching_pipeline
automated build/publish on merge to main

Changed

sorting Annotation and AnnotationSet now requires sort key to be provided as a tuple, and callbacks as a frozendict
renamed docdeid.tokenize to docdeid.tokenizer
renamed docdeid.process.doc to docdeid.process.doc_processor
renamed docdeid.process.annotation_set to docdeid.process.annotation_processor
Annotation and Token now only include int/str fields when serializing
formatting and linting settings
moved the logic for linking tokens to TokenList rather than Tokenizer
use casefold() instead of lower() for lowercasing

Fixed

a bug with overlapping annotations in MultiTokenLookupAnnotator

Removed

automated coverage reporting

Assets 2