All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- some internal speedups for
SingleTokenLooupAnnotator
,MultiTokenLookupAnnotator
andLookupTrie
- caching for sorting annotations, which helps with speed
- the
pre_match_words
attribute forRegexpAnnotator
- the option to provide a
LookupTrie
to aMultiTokenAnnotator
directly - a method for getting all words or, for looking up tokens with specific text values in a
TokenList
, with options formatching_pipeline
- automated build/publish on merge to main
- sorting
Annotation
andAnnotationSet
now requires sort key to be provided as atuple
, and callbacks as afrozendict
- renamed
docdeid.tokenize
todocdeid.tokenizer
- renamed
docdeid.process.doc
todocdeid.process.doc_processor
- renamed
docdeid.process.annotation_set
todocdeid.process.annotation_processor
Annotation
andToken
now only includeint
/str
fields when serializing- formatting and linting settings
- moved the logic for linking tokens to
TokenList
rather thanTokenizer
- use
casefold()
instead oflower()
for lowercasing
- a bug with overlapping annotations in
MultiTokenLookupAnnotator
- automated coverage reporting
RegexpAannotator
accepts regexp strings in addition to compiled regexp patterns
- consisent use of
args
andkwargs
inAnnotator
class tree RegexpAnnotator
now offers function to validate matches, implementable by subclassing
- made the
priority
attribute of anAnnotation
non-Optional - multi token lookup now sets the
start_token
andend_token
fields of anAnnotation
- a bug with determnistic sort, when
Optional
fields were set
- an additional
priority
attribute forAnnotation
, giving an extra option for sorting
- upgraded dependencies
- upgraded dependencies, including a
markdown-it-py
which had a vulnerability
- upgraded dependencies, including
certifi
which had a vulnerability
- renamed
processors_enabled
andprocessors_disabled
toenabled
anddisabled
, respectively
- Include
py.typed
in packaging
- a
py.typed
file, indicating PEP 561 compliance
- minor type hint updates
- minor doc updates
- Support for disabling specific processors with the
processors_disabled
keyword.
- Initial version