-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14.0 release PR #506
Merged
Merged
v1.14.0 release PR #506
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Pushing bug fix for metacat 2-phase learning for MetaCAT utilises data_undersampled. Fixed a bug in the eval function, which was incorrectly using the data_undersampled instead of the full_data * Pushing change for lazy logging * Pushing update for lazy logging * Pushing lint fix
* CU-8695uhe5n: Update docs dependency pins * CU-8695uhe5n: Fix typo in fsspec version pin
* CU-8695pvhfe: Rename a test class * CU-8695pvhfe: Add tests for multiprocessig usage monitoring * CU-8695pvhfe: Fix usage monitor for multiprocessig. When using CAT.multiprocessing_batch_char_size (CAT._multiprocessing_batch and CAT._mp_cons internally), flush the usage monitor at the end of multiprocessing method. When using CAT.get_entities_multi_texts or CAT.multiprocessing_batch_docs_size (uses the former internally), add logging of usage to output * CU-8695pvhfe: Fix remaining issues with usage monitor for multiprocessig. Avoid checking length of (potentially) non-existent strings. Avoid early iteration of generator.
* CU-8695knfbg: Decouple the edit finder methods from the spell checker * CU-8695knfbg: Add methods for random edit picking and variant estimation to utils; Plus a few tests * CU-8695knfbg: Add edit distance option and use to CLI * CU-8695knfbg: Allow retaining order of elements in generator when getting edits for run-to-run consistency * CU-8695knfbg: Add safeguard for name order to be consistent across runs * CU-8695knfbg: Sort names when getting from CDB to avoid run to run variance * CU-8695knfbg: Move edit finding methods back to BasicSpellChecker class, but make the 1-distance method a class method * CU-8695knfbg: Move validation earlier in edit finder * CU-8695knfbg: Simplify edit finder somewhat
* CU-869574kvp: Add pattern based release version identifying for Snomed preprocessing * CU-869574kvp: Add tests for pattern-based snomed release identification * CU-869574kvp: Update Snomed preprocessing: Separate extensions into an Enum. Do the release/paths check at init to allow for early failures in case of issues * CU-869574kvp: Simplify mappings somewhat. Move common avoids to a common location. Fix UK Drug relationship name * CU-869574kvp: Simplify mappings somewhat more. Remove some clutter by separating common prefixes for release types and file names. * CU-869574kvp: Simplify mappings somewhat more, agai. Remove some clutter by separating common suffixes for release types. * CU-869574kvp: Update preprocessing. New abstraction. Use supprted extensions which describe their file formats along with bundles which give some further insight and control. * CU-869574kvp: Fix data class init * CU-869574kvp: Fix issue with file paths * CU-869574kvp: Fix a UK Clinical description file path * CU-869574kvp: Add (optional) 2nd part of folder name to extension. For AU models, the folder name seems to be 'SnomedCT_Release_AU1000036_20240630T120000Z', so the 1st part is just 'Release' and the 2nd part is indicative of AU. Add usage of this where relevant. * CU-869574kvp: Fix preprocessing tests. Add patch for files/folders where applicable. Change the paths of attributes where applicable.
* CU-8695ucw9b: Fix older DeID models due to changes in transformers. Since transformers 4.42.0, the tokenizer is expected to have the 'split_special_tokens' attribute. But the version we've saved does not. So when it's loaded, this causes an exception to be raised (which is currently caught and logged by medcat). * CU-8695ucw9b: Add functionality for transformers NER to spectacularly fail upon consistent consecutive exceptions. The idea is that this way, if something in the underlying models is consistently failing, the exception is raised rather than simply logged * CU-8695ucw9b: Add tests for exception raising after a pre-defined number of failed document processes * CU-8695ucw9b: Change conditions for raising exception on consecutive failure. Now only raise the exception if the consecutive failure is identical (or similar). We determine that from the type and string-representation of the exception being raised. * CU-8695ucw9b: Small additional cleanup on successful TNER processing * CU-8695ucw9b: Use custom exception when failing due to consecutive exceptions * CU-8695ucw9b: Remove try-except when processing transformers NER to force immediate raising of exception
* MetaCAT fixes and upgrades Pushing for 3 updates: 1) Removed the check and update for labels with zero data, as this was causing issues during evaluation 2) Resolved an issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3) 3) Updated the attention mask creation to dynamically use the actual pad_idx value instead of assuming it to be 0 * Pushing type fix * Pushing for type fix * Fixing type issues * Pushing change * Pushing update w/o try except block For the issue where the confusion matrix couldn't be calculated when testing on a single class with an F1 score of 1, as it expected the original number of training classes (3), pushing an optimized version w/o the try except block
…497) * CU-869671bn4: Update requirements (GHA should fail due to mypy) * CU-869671bn4: Update mypy dev requirement to be less than 1.12
* CU-86967nnra: Remove python 3.8 from GHA * CU-86967nnra: Remove python 3.8 from classifiers * CU-86967nnra: Add python version requirements to setup.py (allowing from 3.9 to 3.11) * CU-86967nnra: Remove upper bound from python requirements. Upper bound could be lifted as soon as `spacy` releases a compatible versions. And it _shouldn't_ require any changes from our side. And it isn't possible to install it on higher versions (currently) due to no `spacy` being available for those versions
* CU-86964zm4d: Use ignore tag correctly to ignore certain parts of UK release * CU-86964zm4d: Use OPCS4 later refset ID by default (and switch to older if needed) * CU-86964zm4d: Fix OPCS4 refset ID tests. Fix the default value being tested for (i.e in case of international release that'll be shown). Add a test for old UK extension. * CU-86964zm4d: Add note regarding OPCS refset ID relevance only for UK extensions. * CU-86964zm4d: Fix checking of extension outside loops. I.e determinie if a UK release/bundle is used for OPCS4/ICD10 mappings splitting. Always returning separate refsets for ICD10 and OSC internally, even if the latter is None.
* CU-8695hghww: Add bash script to run backwards compatibility * CU-8695hghww: Rename backwards compatibility running bash script * CU-8695hghww: Add new step to workflow to run model backwards compatibility * CU-8695hghww: Fix model compatibility regression suite path * CU-8695hghww: Simplify creation and removal of fake model folder
…ecated (#500) * CU-8696m1mch: Remove versioning utility since all its parts were deprecated * CU-8696m1mch: Remove tests for versioning utility * CU-8696m1mch: Remove unused test-specific binary (CDB)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In preparation for minor release (mainly to disable python 3.8).