[ADNI] Handle reading new format of clinical csv #1016

MatthieuJoulot · 2023-11-15T15:02:08Z

ADNI has changed the format of the clinical csv, thus breaking the converter. This PR aims at fixing this problem.

This reverts commit c79fb80.

pep8speaks · 2023-11-15T15:02:22Z

Hello @MatthieuJoulot! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file clinica/iotools/converters/adni_to_bids/adni_utils.py:

Line 1594:29: W605 invalid escape sequence '\d'
Line 1594:47: W605 invalid escape sequence '\d'
Line 1606:5: E722 do not use bare 'except'

In the file test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py:

Line 374:18: W605 invalid escape sequence '\d'
Line 374:36: W605 invalid escape sequence '\d'

Comment last updated at 2023-11-17 13:42:50 UTC

MatthieuJoulot · 2023-11-15T15:04:08Z

clinica/iotools/converters/adni_to_bids/adni_to_bids.py

-    adni_merge_path = path.join(clinical_data_dir, "ADNIMERGE.csv")
-    participants = set(
-        read_csv(adni_merge_path, sep=",", usecols=["PTID"], squeeze=True).unique()
-    )


Not sure about this one

NicolasGensollen

Thanks for working on this @MatthieuJoulot !
I think this is looking good. I made some suggestions to have clearer error messages.
I think it would be great to add one or two subject/session with the new format to the CI data such that we can test that we are able to handle both formats. WDYT ?

NicolasGensollen · 2023-11-16T08:08:09Z

clinica/iotools/converters/adni_to_bids/adni_utils.py

+    import re
+    from pathlib import Path
+
+    pattern = filename + "(_\d{1,2}[A-Za-z]{3}\d{4})?.csv"


Are we expecting to have exactly one file matching this pattern ?

If yes, I think we should verify this and raise if more than one file was found. Currently the function returns the last one found. Maybe something like this:

files_matching_pattern = [ f for f in Path(clinical_dir).rglob("*.csv") if re.search(pattern, (z.name)) ] if len(files_matching_pattern) != 1: raise IOError( f"Expecting to find exactly one file in folder {clinical_dir} " f"matching pattern {pattern}. {len(files_matching_pattern)} " f"files were found instead : {'\n- '.join(files_matching_pattern)}" ) try: return ....

Unless you download the data twice, yes, we expect only one file.

NicolasGensollen · 2023-11-16T08:11:48Z

clinica/iotools/converters/adni_to_bids/adni_utils.py

+    try:
+        return pd.read_csv(adni_merge_path, sep=",", low_memory=False)
+    except:
+        raise ValueError(f"{filename}.csv was not found. Please check your data.")


I would be a bit more specific here. If you get to this line, the file exists but the reading as a CSV file failed:

raise ValueError( f"File {adni_merge_path} was found but could not " "be loaded as a DataFrame. Please check your data." )

That is not what happens. The file is actually not found (because of the name change), and it errors because adni_merge_path is therefore not defined.

I meant, if you implement the suggestion above, the file has to exist otherwise an error would have been raised before. So you would get to this line only if the CSV file wasn't formatted as expected. Right ?

My bad, I have been reading from bottom to top, I'll come back afterwards

Done as well, works better reading from the top

NicolasGensollen · 2023-11-16T08:13:16Z

clinica/iotools/converters/adni_to_bids/adni_utils.py

-
-            file_to_read_path = path.join(clinical_data_dir, location)
-            cprint(f"\tReading clinical data file: {location}")
+        pattern = location.split(".")[0] + "(_\d{1,2}[A-Za-z]{3}\d{4})?.csv"


Any reason not to use your load_clinical_csv() function here ?

I must have missed that one

Ha yes, actually there was. The way this was built before I changed things was different and I did not see how to use load_clinical_csv

I'm probably missing something, but I don't understand why you couldn't do:

for location in files: location = location.split("/")[0] df_file = load_clinical_csv(clinical_data_dir, location.split(".")[0]) df_filtered = filter_subj_bids(df_file, location, bids_ids).copy() ...

You are right, it looks like this should work. I did not use it, because this part of the code needs to be able not to find a file and this suggestion would make the code crash. Maybe we can work something out though

I nested this part in a try. I'm not fond of it, but it works. WDYT

clinica/iotools/converters/adni_to_bids/adni_utils.py

NicolasGensollen · 2023-11-16T12:44:59Z

clinica/iotools/converters/adni_to_bids/adni_utils.py

-
-            file_to_read_path = path.join(clinical_data_dir, location)
-            cprint(f"\tReading clinical data file: {location}")
+        pattern = location.split(".")[0] + "(_\d{1,2}[A-Za-z]{3}\d{4})?.csv"


I'm probably missing something, but I don't understand why you couldn't do:

for location in files: location = location.split("/")[0] df_file = load_clinical_csv(clinical_data_dir, location.split(".")[0]) df_filtered = filter_subj_bids(df_file, location, bids_ids).copy() ...

NicolasGensollen · 2023-11-16T14:21:23Z

clinica/iotools/converters/adni_to_bids/adni_utils.py

-            cprint(f"\tReading clinical data file: {location}")
-
-            df_file = pd.read_csv(file_to_read_path, dtype=str)
+        try:


I would put as few lines as possible in the try-catch and catch explicit errors. In this case, we need to catch the IOError which happens when the number of found files isn't exactly one. It could be a good idea to give a warning and continue the loop. WDYT ?

try: df_file = load_clinical_csv(clinical_data_dir, location.split(".")[0]) except IOError as e: warnings.warn(e) continue df_filtered = ...

clinica/iotools/converters/adni_to_bids/adni_utils.py

NicolasGensollen

LGTM, thanks @MatthieuJoulot !

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py

NicolasGensollen

Thanks @MatthieuJoulot !
Just a small suggestion on removing parametrization for tests using a single value.
LGTM otherwise !

NicolasGensollen · 2023-11-17T11:32:30Z

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py

+    assert_frame_equal(load_clinical_csv(tmp_path, csv_to_look_for), input_df)
+
+
+@pytest.mark.parametrize(


I would remove the parametrization if you have only one value.

remove csv_to_look_for from the arguments of the test function and simply use "adnimerge" in the test function's body.

NicolasGensollen · 2023-11-17T11:33:07Z

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py

+        load_clinical_csv(tmp_path, csv_to_look_for)
+
+
+@pytest.mark.parametrize(


Same comment here

NicolasGensollen · 2023-11-17T11:33:22Z

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py

+    with open(tmp_path / "adnimerge.csv", "w") as fp:
+        fp.write("col1,col2,col3\n1,2,3\n1,2,3,4")
+
+    # input_df.to_csv(tmp_path / csv_name, sep="\t", index=False)


Suggested change

# input_df.to_csv(tmp_path / csv_name, sep="\t", index=False)

NicolasGensollen

Looks like the linter is unhappy... Could you format ?

NicolasGensollen · 2023-11-17T13:37:20Z

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py

+    input_df.to_csv(tmp_path / csv_name, index=False)
+    assert_frame_equal(load_clinical_csv(tmp_path, csv_to_look_for), input_df)
+
+def test_load_clinical_csv_error(tmp_path, ):


Suggested change

def test_load_clinical_csv_error(tmp_path, ):

def test_load_clinical_csv_error(tmp_path):

strange I thought I did. I will again then

JOULOT Matthieu and others added 5 commits October 23, 2023 10:58

improvements of Genfi

c79fb80

Revert "improvements of Genfi"

4972324

This reverts commit c79fb80.

Merge remote-tracking branch 'upstream/dev' into dev

e6bfb2f

Merge remote-tracking branch 'upstream/dev' into dev

f252284

factorize clinical data reading

da7ab47

MatthieuJoulot requested a review from NicolasGensollen November 15, 2023 15:02

MatthieuJoulot commented Nov 15, 2023

View reviewed changes

JOULOT Matthieu added 2 commits November 15, 2023 16:07

format with black and isort

6ab2823

change the making of sessions.tsv

c7f03a8

NicolasGensollen reviewed Nov 16, 2023

View reviewed changes

apply suggestion from review

4d52e49

MatthieuJoulot requested a review from NicolasGensollen November 16, 2023 11:40

NicolasGensollen reviewed Nov 16, 2023

View reviewed changes

MatthieuJoulot added 2 commits November 16, 2023 14:54

nested to try to allow failure for cosntruction of sessions.tsv

63cf1bc

modify unit-test to look for the right file

6b284a5

MatthieuJoulot requested a review from NicolasGensollen November 16, 2023 14:05

NicolasGensollen reviewed Nov 16, 2023

View reviewed changes

MatthieuJoulot added 2 commits November 16, 2023 15:40

make nesting smaller and smarter using continue

fa49acb

remove warning and useless imports

a015052

MatthieuJoulot marked this pull request as ready for review November 16, 2023 14:48

MatthieuJoulot requested a review from NicolasGensollen November 16, 2023 14:48

NicolasGensollen reviewed Nov 16, 2023

View reviewed changes

clinica/iotools/converters/adni_to_bids/adni_utils.py Show resolved Hide resolved

add docstring

2f643ac

NicolasGensollen approved these changes Nov 16, 2023

View reviewed changes

MatthieuJoulot added 2 commits November 16, 2023 16:45

add unit test

85b5282

correct unit test

f88d278

NicolasGensollen approved these changes Nov 17, 2023

View reviewed changes

test/unittests/iotools/converters/adni_to_bids/test_adni_utils.py Show resolved Hide resolved

add errors tests

111033e

MatthieuJoulot requested a review from NicolasGensollen November 17, 2023 11:05

NicolasGensollen reviewed Nov 17, 2023

View reviewed changes

apply spellcheck

3cdc0c5

NicolasGensollen reviewed Nov 17, 2023

View reviewed changes

format

2638a60

NicolasGensollen merged commit 21f179b into aramis-lab:dev Nov 20, 2023
16 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADNI] Handle reading new format of clinical csv #1016

[ADNI] Handle reading new format of clinical csv #1016

MatthieuJoulot commented Nov 15, 2023 •

edited by NicolasGensollen

Loading

pep8speaks commented Nov 15, 2023 •

edited

Loading

MatthieuJoulot Nov 15, 2023

NicolasGensollen left a comment

NicolasGensollen Nov 16, 2023

MatthieuJoulot Nov 16, 2023

MatthieuJoulot Nov 16, 2023

NicolasGensollen Nov 16, 2023

MatthieuJoulot Nov 16, 2023 •

edited

Loading

NicolasGensollen Nov 16, 2023

MatthieuJoulot Nov 16, 2023

MatthieuJoulot Nov 16, 2023

NicolasGensollen Nov 16, 2023

MatthieuJoulot Nov 16, 2023

MatthieuJoulot Nov 16, 2023

NicolasGensollen Nov 16, 2023

MatthieuJoulot Nov 16, 2023 •

edited

Loading

MatthieuJoulot Nov 16, 2023

NicolasGensollen Nov 16, 2023

NicolasGensollen Nov 16, 2023

NicolasGensollen left a comment

NicolasGensollen left a comment

NicolasGensollen Nov 17, 2023

NicolasGensollen Nov 17, 2023

NicolasGensollen Nov 17, 2023

NicolasGensollen left a comment

NicolasGensollen Nov 17, 2023

MatthieuJoulot Nov 17, 2023

		assert_frame_equal(load_clinical_csv(tmp_path, csv_to_look_for), input_df)


		@pytest.mark.parametrize(

		load_clinical_csv(tmp_path, csv_to_look_for)


		@pytest.mark.parametrize(

	def test_load_clinical_csv_error(tmp_path, ):
	def test_load_clinical_csv_error(tmp_path):

[ADNI] Handle reading new format of clinical csv #1016

[ADNI] Handle reading new format of clinical csv #1016

Conversation

MatthieuJoulot commented Nov 15, 2023 • edited by NicolasGensollen Loading

pep8speaks commented Nov 15, 2023 • edited Loading

Comment last updated at 2023-11-17 13:42:50 UTC

Choose a reason for hiding this comment

NicolasGensollen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthieuJoulot Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthieuJoulot Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasGensollen left a comment

Choose a reason for hiding this comment

NicolasGensollen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasGensollen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthieuJoulot commented Nov 15, 2023 •

edited by NicolasGensollen

Loading

pep8speaks commented Nov 15, 2023 •

edited

Loading

MatthieuJoulot Nov 16, 2023 •

edited

Loading

MatthieuJoulot Nov 16, 2023 •

edited

Loading