-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conceptualization of datasets and outputs #103
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…vailable for download
…t cyclic dependencies, unify the usage of pathlib
…data-link and outputs-link
…enerations, provide backward compatibility methods
kasnerz
changed the title
WIP: External and local datasets
WIP: Conceptualization of datasets and outputs
Oct 2, 2024
kasnerz
changed the title
WIP: Conceptualization of datasets and outputs
Conceptualization of datasets and outputs
Oct 6, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request brings another package of major changes to factgenie workflows.
👉️ You should be able to safely integrate the changes into your existing installation. Factgenie will detect that it has been updated and will migrate your custom configuration.
❗️ Careful: You should still back up your files before performing this update.
Changes:
factgenie/config
directory. That includes:config.yml
,datasets.yml
,resources.yml
(see below),factgenie/config/resources.yml
.factgenie/loaders
. What is new is that the loaders for the external datasets provide an additionaldownload()
method. This method can download the example dataset along with all the related resources, i.e. model outputs and annotations.download()
method needs not to be implemented.)example_idx
which relates it to the example in the dataset. Therefore, we no longer require to have a full set of model outputs for the particular split.ANNOTATOR_ID
.PROLIFIC_PID
,SESSION_ID
,STUDY_ID
workerId
,assignmentId
,hitId
. (However, this is completely untested.)ast.literal_eval
for parsing model arguments from the YAML file, hopefully covering majority of cases even for arguments we do not know about.We provide an
utils.migrate()
function that tries to convert old files to the new format, move configuration files to the new directories, etc. The method is invoked only if the main configuration file is detected in the old location.On top of these major changes:
@oplatek I guess you don't have time for a proper review, so I am merging this update myself. It would be great if you could test the update on any of your instances that you are still using (but make sure to back up everything first).