-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GENFI] Enable use of tsv to extract more clinical data #1005
Conversation
This reverts commit c79fb80.
Hello @MatthieuJoulot! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2023-10-25 12:12:06 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MatthieuJoulot !
I made a first pass with some minor suggestions along the way.
Are you planning to add some clinical data to the CI dataset as well ?
clinica/iotools/converters/genfi_to_bids/genfi_to_bids_utils.py
Outdated
Show resolved
Hide resolved
clinica/iotools/converters/genfi_to_bids/genfi_to_bids_utils.py
Outdated
Show resolved
Hide resolved
clinica/iotools/converters/genfi_to_bids/genfi_to_bids_utils.py
Outdated
Show resolved
Hide resolved
clinica/iotools/converters/genfi_to_bids/genfi_to_bids_utils.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Gensollen <nicolas.gensollen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MatthieuJoulot !
The PR LGTM. I have a few additional suggestions but should be good to go once implemented.
if path_to_clinical_tsv: | ||
additional_data_df = pd.read_csv(path_to_clinical_tsv, sep="\t") | ||
|
||
path_to_mapping_tsv = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use Path objects consistently across the code base rather than strings. Especially when it makes the code much easier to read. I would modify both path_to_mapping_tsv
and path_to_ref_csv
.
To be verified, but this would become:
from clinica.utils.filemanip import get_parent
genfi_data_folder = get_parent(__file__, 3) / "data"
path_to_ref_csv = genfi_data_folder / "genfi_ref.csv"
...
path_to_mapping_tsv = genfi_data_folder / "genfi_data_mapping.tsv"
@@ -45,9 +50,19 @@ def convert_images( | |||
write_bids, | |||
) | |||
|
|||
# check that if a clinical tsv is given, a path to the clinical data is given as well | |||
if path_to_clinical_tsv and not path_to_clinical: | |||
raise ValueError("Missing a clinical_data_path.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine users getting stuck here when providing only path_to_clinical_tsv
but not path_to_clinical
since these concepts are quite close for the distracted reader of the documentation.
I would be very explicit to ease debugging. Maybe something like:
raise ValueError(
"The Genfi2BIDS converter is unable to convert the clinical data because "
"the path to these data was not provided while a TSV file with additional "
f"data was given ({path_to_clinical_tsv}). You can either use the appropriate "
"option from the clinica command line interface to provide the missing path, "
"or chose to not convert clinical data at all."
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
@@ -9,6 +9,7 @@ def convert_images( | |||
bids_dir: PathLike, | |||
path_to_clinical: Optional[PathLike], | |||
gif: bool, | |||
path_to_clinical_tsv: Optional[PathLike] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add the = None
to the path_to_clinical
argument as well ?
Also, is gif
optional ? If not, it should be before the optional arguments, if yes, it should be declared as optional with a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @MatthieuJoulot !
In its current state, this converter extracts only the most essential data from the spreadsheets.
This PR proposes to use a tsv given by the user specifying the name of the fields he wants, regardless of which spreadsheet the data is in.
The data must be given in the with underscores instead of spaces, and no capital letters
The data will then be written in the right place (participants.tsv/sessions.tsv/scans.tsv) using a mapping of the data.