v1.14.0 release PR #506

mart-r · 2024-11-19T14:08:38Z

In preparation for minor release (mainly to disable python 3.8).

* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix

* CU-8695uhe5n: Update docs dependency pins * CU-8695uhe5n: Fix typo in fsspec version pin

* CU-8695pvhfe: Rename a test class * CU-8695pvhfe: Add tests for multiprocessig usage monitoring * CU-8695pvhfe: Fix usage monitor for multiprocessig. When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method. When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output * CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig. Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.

* CU-8695knfbg: Decouple the edit finder methods from the spell checker * CU-8695knfbg: Add methods for random edit picking and variant estimation to utils; Plus a few tests * CU-8695knfbg: Add edit distance option and use to CLI * CU-8695knfbg: Allow retaining order of elements in generator when getting edits for run-to-run consistency * CU-8695knfbg: Add safeguard for name order to be consistent across runs * CU-8695knfbg: Sort names when getting from CDB to avoid run to run variance * CU-8695knfbg: Move edit finding methods back to BasicSpellChecker class, but make the 1-distance method a class method * CU-8695knfbg: Move validation earlier in edit finder * CU-8695knfbg: Simplify edit finder somewhat

* CU-869574kvp: Add pattern based release version identifying for Snomed preprocessing * CU-869574kvp: Add tests for pattern-based snomed release identification * CU-869574kvp: Update Snomed preprocessing: Separate extensions into an Enum. Do the release/paths check at init to allow for early failures in case of issues * CU-869574kvp: Simplify mappings somewhat. Move common avoids to a common location. Fix UK Drug relationship name * CU-869574kvp: Simplify mappings somewhat more. Remove some clutter by separating common prefixes for release types and file names. * CU-869574kvp: Simplify mappings somewhat more, agai. Remove some clutter by separating common suffixes for release types. * CU-869574kvp: Update preprocessing. New abstraction. Use supprted extensions which describe their file formats along with bundles which give some further insight and control. * CU-869574kvp: Fix data class init * CU-869574kvp: Fix issue with file paths * CU-869574kvp: Fix a UK Clinical description file path * CU-869574kvp: Add (optional) 2nd part of folder name to extension. For AU models, the folder name seems to be 'SnomedCT_Release_AU1000036_20240630T120000Z', so the 1st part is just 'Release' and the 2nd part is indicative of AU. Add usage of this where relevant. * CU-869574kvp: Fix preprocessing tests. Add patch for files/folders where applicable. Change the paths of attributes where applicable.

* CU-8695ucw9b: Fix older DeID models due to changes in transformers. Since transformers 4.42.0, the tokenizer is expected to have the 'split_special_tokens' attribute. But the version we've saved does not. So when it's loaded, this causes an exception to be raised (which is currently caught and logged by medcat). * CU-8695ucw9b: Add functionality for transformers NER to spectacularly fail upon consistent consecutive exceptions. The idea is that this way, if something in the underlying models is consistently failing, the exception is raised rather than simply logged * CU-8695ucw9b: Add tests for exception raising after a pre-defined number of failed document processes * CU-8695ucw9b: Change conditions for raising exception on consecutive failure. Now only raise the exception if the consecutive failure is identical (or similar). We determine that from the type and string-representation of the exception being raised. * CU-8695ucw9b: Small additional cleanup on successful TNER processing * CU-8695ucw9b: Use custom exception when failing due to consecutive exceptions * CU-8695ucw9b: Remove try-except when processing transformers NER to force immediate raising of exception

* MetaCAT fixes and upgrades Pushing for 3 updates: 1) Removed the check and update for labels with zero data, as this was causing issues during evaluation 2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3) 3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0 * Pushing type fix * Pushing for type fix * Fixing type issues * Pushing change * Pushing update w/o try except block For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block

…497) * CU-869671bn4: Update requirements (GHA should fail due to mypy) * CU-869671bn4: Update mypy dev requirement to be less than 1.12

* CU-86967nnra: Remove python 3.8 from GHA * CU-86967nnra: Remove python 3.8 from classifiers * CU-86967nnra: Add python version requirements to setup.py (allowing from 3.9 to 3.11) * CU-86967nnra: Remove upper bound from python requirements. Upper bound could be lifted as soon as `spacy` releases a compatible versions. And it _shouldn't_ require any changes from our side. And it isn't possible to install it on higher versions (currently) due to no `spacy` being available for those versions

* CU-86964zm4d: Use ignore tag correctly to ignore certain parts of UK release * CU-86964zm4d: Use OPCS4 later refset ID by default (and switch to older if needed) * CU-86964zm4d: Fix OPCS4 refset ID tests. Fix the default value being tested for (i.e in case of international release that'll be shown). Add a test for old UK extension. * CU-86964zm4d: Add note regarding OPCS refset ID relevance only for UK extensions. * CU-86964zm4d: Fix checking of extension outside loops. I.e determinie if a UK release/bundle is used for OPCS4/ICD10 mappings splitting. Always returning separate refsets for ICD10 and OSC internally, even if the latter is None.

* CU-8695hghww: Add bash script to run backwards compatibility * CU-8695hghww: Rename backwards compatibility running bash script * CU-8695hghww: Add new step to workflow to run model backwards compatibility * CU-8695hghww: Fix model compatibility regression suite path * CU-8695hghww: Simplify creation and removal of fake model folder

…#502)

…ecated (#500) * CU-8696m1mch: Remove versioning utility since all its parts were deprecated * CU-8696m1mch: Remove tests for versioning utility * CU-8696m1mch: Remove unused test-specific binary (CDB)

mart-r and others added 19 commits August 30, 2024 10:07

Production/master sync (#483)

b8bb4e3

CU-8695jwnjk: Fix description of an argument for --help to work in CLI (

b28fa05

#484)

CU-8695q21f6: Replace rosalind links with S3 ones in docs (#489)

2588670

CU-8695uhe5n: Update docs dependency pins (#491)

56a2856

* CU-8695uhe5n: Update docs dependency pins * CU-8695uhe5n: Fix typo in fsspec version pin

CU-8695m5q4x: Fix issues detecting 1-token concepts (#485)

394e17b

CU-8695vu71q: Make report identical run to run in identical cases (#492)

b433195

CU-869637yfx: Pin spacy dependency to lower than 3.8 (#494)

909cfad

CU-869671bn4: Update requirements to fix workflow issue due to mypy (#…

976adc2

…497) * CU-869671bn4: Update requirements (GHA should fail due to mypy) * CU-869671bn4: Update mypy dev requirement to be less than 1.12

CU-8696n7w95: Remove commented code to fix DeID (oversight in PR 490) (…

df3df66

…#502)

CU-8696m1mch: Remove versioning utility since all its parts were depr…

37a8a63

…ecated (#500) * CU-8696m1mch: Remove versioning utility since all its parts were deprecated * CU-8696m1mch: Remove tests for versioning utility * CU-8696m1mch: Remove unused test-specific binary (CDB)

mart-r merged commit ceb74b1 into production Nov 19, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14.0 release PR #506

v1.14.0 release PR #506

mart-r commented Nov 19, 2024

v1.14.0 release PR #506

v1.14.0 release PR #506

Conversation

mart-r commented Nov 19, 2024