[ENH] Add function to enable smart path search #1098 #1103

moritz-gerster · 2022-11-21T08:27:26Z

PR Description

Fixes #1098. The added function find_matching_paths returns all found bids_paths similar to BIDSPath().match(). However, in this function, it is possible to filter for several entities which can be passed as lists. For example

bids_paths = find_matching_paths(root, subjects=["01", "02", "03"], sessions="MedOn", tasks=["Rest", "Move"])

Since this function is very similar to BIDSPath().match() I added the private functions _fnames_to_bidspaths and _return_root_paths and refactored the match() method to avoid code repetition. Also note, that the private function _filter_fnames was extended to enable passing lists of strings as input arguments.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

All comments are resolved
This is not your own PR
All CIs are happy
PR title starts with [MRG]
whats_new.rst is updated
New contributors have been added to CITATION.cff
PR description includes phrase "closes <#issue-number>"

mne-tools#1098 (comment)

welcome · 2022-11-21T08:27:29Z

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

moritz-gerster · 2022-11-21T08:38:10Z

mne_bids/path.py


    regexp = (
        leading_path_str +
        sub_str + ses_str + task_str + acq_str + run_str + proc_str +
-        rec_str + space_str + split_str + desc_str + suffix_str + ext_str


Here I fixed a small bug. "space" must be searched before "recording", according to their order in the basename.

hoechenberger · 2022-11-21T08:40:20Z

Thanks for the contribution! I merged main, let's see how CI like this 🤓

codecov · 2022-11-21T08:48:38Z

Codecov Report

Merging #1103 (2e000ba) into main (b97ce0b) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1103      +/-   ##
==========================================
+ Coverage   95.24%   95.28%   +0.03%     
==========================================
  Files          24       24              
  Lines        3873     3900      +27     
==========================================
+ Hits         3689     3716      +27     
  Misses        184      184

Impacted Files	Coverage Δ
mne_bids/path.py	`97.70% <100.00%> (+0.08%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

hoechenberger · 2022-11-21T09:09:06Z

All green! I don't have time to review, but maybe @sappelhoff, @agramfort, or @adam2392 can find a couple minutes :)

hoechenberger · 2022-11-21T09:09:38Z

@moritz-gerster We'll definitely need a changelog entry though :) in whats_new.rst

sappelhoff

We also need to add the new function to the API docs, and I'd be very happy to see it in action in any one of our existing examples.

Thanks for your contribution, Moritz!

…o get_paths

moritz-gerster · 2022-11-21T10:00:48Z

doc/whats_new.rst

@@ -44,6 +44,7 @@ Detailed list of changes
 - Add :meth:`mne_bids.BIDSPath.find_matching_sidecar` to find the sidecar file associated with a given file path by `Eric Larson`_ (:gh:`1093`)
 - When writing data via :func:`~mne_bids.write_raw_bids`, it is now possible to specify a custom mapping of :class:`mne.Annotations` descriptions to event codes via the ``event_id`` parameter. Previously, passing this parameter would always require to also pass ``events``, and using a custom event code mapping for annotations was impossible, by `Richard Höchenberger`_ (:gh:`1084`)
 - Improve error message when :obj:`~mne_bids.BIDSPath.fpath` cannot be uniquely resolved by `Eric Larson`_ (:gh:`1097`)
+- Add :func:`mne_bids.find_matching_paths` to get all bids paths that match the requirements by :newcontrib:`Moritz Gerster` (:gh:`1103`)


Is this correct @hoechenberger? In the mne_bids contribution guide it is not described how to add lines so I followed the mne contributing guide.

Here I didn't add my name because I guess this works automatically?

The following authors contributed for the first time. Thank you so much! 🤩 * ...

I guess this works automatically

unfortunately not :-)

I followed the mne contributing guide.

it's a bit different here. You just add your name to the list you quoted above, and we don't have the sphinx role ":new_contrib:", so you can just follow the style of previous changelog entries.

Nevermind, I checked the old versions of whats_new.rst and understood that you don't have :newcontrib: as in mne.

if you want the function to be available as mne_bids.find_matching_paths, for example, you'd need to make additions here:

mne-bids/mne_bids/__init__.py

Lines 4 to 18 in b97ce0b

from mne_bids import commands

from mne_bids.report import make_report

from mne_bids.path import (BIDSPath, get_datatypes, get_entity_vals,

print_dir_tree, get_entities_from_fname,

search_folder_for_text, get_bids_path_from_fname)

from mne_bids.read import get_head_mri_trans, read_raw_bids

from mne_bids.utils import get_anonymization_daysback

from mne_bids.write import (make_dataset_description, write_anat,

write_raw_bids, mark_channels,

write_meg_calibration, write_meg_crosstalk,

get_anat_landmarks, anonymize_dataset)

from mne_bids.sidecar_updates import update_sidecar_json, update_anat_landmarks

from mne_bids.inspect import inspect_dataset

from mne_bids.dig import (template_to_head, convert_montage_to_ras,

convert_montage_to_mri)

for example look at BIDSPath in that file --> if it weren't where it is, you would always need to do mne_bids.path.BIDSPath.

And for the API docs, you need to add it here (provided you followed the above):

mne-bids/doc/api.rst

Lines 19 to 22 in b97ce0b

.. autosummary::

:toctree: generated/

write_raw_bids

Thanks @sappelhoff! Added with 1bc097e.

moritz-gerster · 2022-11-21T10:09:40Z

We also need to add the new function to the API docs, and I'd be very happy to see it in action in any one of our existing examples.

@sappelhoff, how do I add the function to the API docs? I have never done that before. I thought this happens automatically via the docstring.

sappelhoff · 2022-11-21T10:20:27Z

if you want the function to be available as mne_bids.find_matching_paths, for example, you'd need to make additions here:

mne-bids/mne_bids/__init__.py

Lines 4 to 18 in b97ce0b

    
           from mne_bids import commands 
        
           from mne_bids.report import make_report 
        
           from mne_bids.path import (BIDSPath, get_datatypes, get_entity_vals, 
        
                                      print_dir_tree, get_entities_from_fname, 
        
                                      search_folder_for_text, get_bids_path_from_fname) 
        
           from mne_bids.read import get_head_mri_trans, read_raw_bids 
        
           from mne_bids.utils import get_anonymization_daysback 
        
           from mne_bids.write import (make_dataset_description, write_anat, 
        
                                       write_raw_bids, mark_channels, 
        
                                       write_meg_calibration, write_meg_crosstalk, 
        
                                       get_anat_landmarks, anonymize_dataset) 
        
           from mne_bids.sidecar_updates import update_sidecar_json, update_anat_landmarks 
        
           from mne_bids.inspect import inspect_dataset 
        
           from mne_bids.dig import (template_to_head, convert_montage_to_ras, 
        
                                     convert_montage_to_mri)

for example look at BIDSPath in that file --> if it weren't where it is, you would always need to do mne_bids.path.BIDSPath.

And for the API docs, you need to add it here (provided you followed the above):

mne-bids/doc/api.rst

Lines 19 to 22 in b97ce0b

    
           .. autosummary:: 
        
              :toctree: generated/ 
        
              write_raw_bids

doc/whats_new.rst

moritz-gerster · 2022-11-22T07:33:37Z

I have a question. What do you think about extending this proposed function to enable ignoring certain entity values?

Suggestion:

def find_matching_paths(root, subjects=None, sessions=None, tasks=None,
                        acquisitions=None, runs=None, processings=None,
                        recordings=None, spaces=None, splits=None,
                        descriptions=None, suffixes=None, extensions=None,
                        datatypes=None, ignore_subjects=None,
                        ignore_sessions=None, ignore_tasks=None,
                        ignore_runs=None, ignore_processings=None,
                        ignore_spaces=None, ignore_acquisitions=None,
                        ignore_splits=None, ignore_descriptions=None,  # ignore_modalities=None, # <- what's this?
                        ignore_datatypes=None, check=True):
    """Get list of all matching paths for all matching entity values.

...

    Returns
    -------
    bids_paths : list of mne_bids.BIDSPath
        The matching paths.

    """
    if datatypes is None and ignore_datatypes is not None:
        datatypes = get_entity_vals(root, 'datatype',
                                    ignore_datatypes=ignore_datatypes)
    fpaths = _return_root_paths(root, datatype=datatypes, ignore_json=False)

    if subjects is None and ignore_subjects is not None:
        subjects = get_entity_vals(root, 'subject',
                                   ignore_subjects=ignore_subjects)
    if sessions is None and ignore_sessions is not None:
        sessions = get_entity_vals(root, 'session',
                                   ignore_sessions=ignore_sessions)
    if tasks is None and ignore_tasks is not None:
        tasks = get_entity_vals(root, 'task', ignore_tasks=ignore_tasks)
    if runs is None and ignore_runs is not None:
        runs = get_entity_vals(root, 'run', ignore_runs=ignore_runs)
    if processings is None and ignore_processings is not None:
        processings = get_entity_vals(root, 'processing',
                                      ignore_processings=ignore_processings)
    if spaces is None and ignore_spaces is not None:
        spaces = get_entity_vals(root, 'space', ignore_spaces=ignore_spaces)
    if acquisitions is None and ignore_acquisitions is not None:
        acquisitions = get_entity_vals(root, 'acquisition',
                                       ignore_acquisitions=ignore_acquisitions)
    if splits is None and ignore_splits is not None:
        splits = get_entity_vals(root, 'split', ignore_splits=ignore_splits)
    if descriptions is None and ignore_descriptions is not None:
        descriptions = get_entity_vals(root, 'description',
                                       ignore_descriptions=ignore_descriptions)

    fpaths_filtered = _filter_fnames(fpaths,
                                     subject=subjects,
                                     session=sessions,
                                     task=tasks,
                                     acquisition=acquisitions,
                                     run=runs,
                                     processing=processings,
                                     recording=recordings,
                                     space=spaces,
                                     split=splits,
                                     description=descriptions,
                                     suffix=suffixes,
                                     extension=extensions)

Reason:

In the current implementation it is somewhat tedious to avoid certain entities. For example, if I want to avoid the subjects "emptyroom" and "03" and I want all tasks except for "Rest" and I don't need the ".tsv" files, I have to write

subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

bids_paths = find_matching_paths(root, subjects=["01", "02", "04", "05", "06", "07, "08", "09", "10", "11", "emptyroom"],
                                                      tasks=["Hold", "Move", "Dance", "Read"])

If the study is ongoing and they collect more subjects they would have to manually edit "subjects" many times.

With the new function this would become

bids_paths = find_matching_paths(root, ignore_subjects=['03', 'emptyroom'], ignore_tasks='Rest')

What do you think? I haven't tested the code but I think it would work. If both "subjects" and "ignore_subjects" is provided, "ignore_subjects" would be ignored.

moritz-gerster · 2022-11-22T08:20:07Z

mne_bids/path.py

@@ -1951,8 +1951,8 @@ def _filter_fnames(fnames, *, subject=None, session=None, task=None,
                 if split else r'(|_split-([^_]+))')
    desc_str = (r'_desc-(' + '|'.join(description) + ')'
                if description else r'(|_desc-([^_]+))')
-    suffix_str = (r'_(' + '|'.join(suffixe) + ')' if suffix
-                  else (r'_(' + '|'.join(ALLOWED_FILENAME_SUFFIX) + ')'))


I noticed that the function _filter_fnames() only returns paths with valid BIDS suffixes even if the two mother functions find_matching_paths() and BIDSPath.match() have the keyword argument check=False.

In theory, one could add the argument check=False to _filter_fnames(check=False) and then code

if suffix: suffix_str = r'_(' + '|'.join(suffix) + ')' elif suffix is None and check: # only return files with suffixes that are valid in BIDS suffix_str = (r'_(' + '|'.join(ALLOWED_FILENAME_SUFFIX) + ')')) elif suffix is None and not check: suffix_str = r'_([^_]+)'

However, this seems overkill to me since the BIDS conformity is checked afterwards using _fnames_to_bidspaths(check=check).

…heck=False

sappelhoff · 2022-11-22T10:16:30Z

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

A good reason to advertise that function in an example with the new find_matching_paths 😉 (are you going to expand one of the existing examples to somehow use your new function? --> this would also create a link from the API docs to that example, which is very useful for people to see your feature in action)

Personally, I find your example

subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)

straight forward. On the other hand having "ignore" keywords in the function signature makes it veeeery long and messy 🤔

moritz-gerster · 2022-11-22T10:49:04Z

Personally, I find your example
subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)
straight forward. On the other hand having "ignore" keywords in the function signature makes it veeeery long and messy 🤔

Ok, I agree :) Let's keep it as it is

moritz-gerster · 2022-11-22T10:52:20Z

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

A good reason to advertise that function in an example with the new find_matching_paths 😉 (are you going to expand one of the existing examples to somehow use your new function? --> this would also create a link from the API docs to that example, which is very useful for people to see your feature in action)

I have looked through the examples and I didn't find a good place to insert it. Because if I exchange a BIDSPath().match() example with this function, then .match() is no longer documented. However, if I add the function and show that it yields the same, the doc becomes longer and contains more than just the most important code snippets. But I guess this would be ok?

I'll have a look again. Let me know if you find a good place to use it.

moritz-gerster · 2022-11-22T11:01:18Z

Ok, I think I found a good example. I'll push it soon.

moritz-gerster · 2022-11-22T12:10:19Z

examples/read_bids_datasets.py

+# Note that this is the same as running:
+session = 'off'
+bids_path = BIDSPath(root=bids_root, session=session, datatype=datatype)
+print(bids_path.match(ignore_json=True))


Should we print the same list twice to show that the result is the same or should we comment out

# print(bids_path.match(ignore_json=True))

and just say that this would yield the same?

I went for printing it twice to make it as obvious as possible.

moritz-gerster · 2022-11-22T12:11:45Z

I'm done working on it @sappelhoff :)

I added two examples.

Co-authored-by: Richard Höchenberger <richard.hoechenberger@gmail.com>

sappelhoff

can you add yourself to this file, please? Just below Denis

mne-bids/CITATION.cff

Line 150 in b97ce0b

- given-names: Denis

thanks!

sappelhoff

I am happy with this, great work @moritz-gerster.

+1 to merge once CIs are green

welcome · 2022-11-22T16:40:14Z

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

hoechenberger · 2022-11-22T16:40:26Z

Thanks @moritz-gerster!

moritz-gerster · 2022-11-22T16:55:51Z

Thank you two for all your effort!!

moritz-gerster added 8 commits November 21, 2022 06:47

Enable matching json files

22a13b8

mne-tools#1098 (comment)

correct entity order

16d06a4

Enable list[str] input for _filter_fnames

3eafc58

Add find_matching_paths

14109a4

refactor to make DRY

29ee7df

misc

b31d5d9

add tests

16e2df1

pep8

6af83c5

moritz-gerster added 2 commits November 21, 2022 09:32

docstring flake8

240dba1

delete coverage file

8f4862b

moritz-gerster commented Nov 21, 2022

View reviewed changes

Merge branch 'main' into main

304c313

sappelhoff reviewed Nov 21, 2022

View reviewed changes

moritz-gerster added 2 commits November 21, 2022 10:56

Add name + what's new

6cddad7

Merge branch 'main' of https://github.com/moritz-gerster/mne-bids int…

cf9773a

…o get_paths

moritz-gerster commented Nov 21, 2022

View reviewed changes

fix doc?

64ec45e

add the function to the API docs

1bc097e

hoechenberger reviewed Nov 21, 2022

View reviewed changes

doc/whats_new.rst Outdated Show resolved Hide resolved

Don't check BIDS conformity in _filter_fnames

e9eb927

moritz-gerster commented Nov 22, 2022

View reviewed changes

make sure .match() and find_matching_paths() use same default kwarg c…

5ecf658

…heck=False

moritz-gerster added 3 commits November 22, 2022 12:23

add example in doc

f88128b

pep8

d472d02

add another example applying 'ignore'

7b8064a

moritz-gerster commented Nov 22, 2022

View reviewed changes

Update doc/whats_new.rst

2bbf6a4

Co-authored-by: Richard Höchenberger <richard.hoechenberger@gmail.com>

sappelhoff reviewed Nov 22, 2022

View reviewed changes

add name

2e000ba

sappelhoff approved these changes Nov 22, 2022

View reviewed changes

hoechenberger approved these changes Nov 22, 2022

View reviewed changes

hoechenberger merged commit 5a3e3b1 into mne-tools:main Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add function to enable smart path search #1098 #1103

[ENH] Add function to enable smart path search #1098 #1103

moritz-gerster commented Nov 21, 2022 •

edited by sappelhoff

Loading

welcome bot commented Nov 21, 2022

moritz-gerster Nov 21, 2022 •

edited

Loading

hoechenberger commented Nov 21, 2022

codecov bot commented Nov 21, 2022 •

edited

Loading

hoechenberger commented Nov 21, 2022

hoechenberger commented Nov 21, 2022

sappelhoff left a comment

moritz-gerster Nov 21, 2022 •

edited

Loading

sappelhoff Nov 21, 2022

moritz-gerster Nov 21, 2022

moritz-gerster Nov 21, 2022

moritz-gerster commented Nov 21, 2022

sappelhoff commented Nov 21, 2022

moritz-gerster commented Nov 22, 2022 •

edited

Loading

moritz-gerster Nov 22, 2022

sappelhoff commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster Nov 22, 2022

moritz-gerster commented Nov 22, 2022 •

edited

Loading

sappelhoff left a comment

sappelhoff left a comment

welcome bot commented Nov 22, 2022

hoechenberger commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

	from mne_bids import commands
	from mne_bids.report import make_report
	from mne_bids.path import (BIDSPath, get_datatypes, get_entity_vals,
	print_dir_tree, get_entities_from_fname,
	search_folder_for_text, get_bids_path_from_fname)
	from mne_bids.read import get_head_mri_trans, read_raw_bids
	from mne_bids.utils import get_anonymization_daysback
	from mne_bids.write import (make_dataset_description, write_anat,
	write_raw_bids, mark_channels,
	write_meg_calibration, write_meg_crosstalk,
	get_anat_landmarks, anonymize_dataset)
	from mne_bids.sidecar_updates import update_sidecar_json, update_anat_landmarks
	from mne_bids.inspect import inspect_dataset
	from mne_bids.dig import (template_to_head, convert_montage_to_ras,
	convert_montage_to_mri)

[ENH] Add function to enable smart path search #1098 #1103

[ENH] Add function to enable smart path search #1098 #1103

Conversation

moritz-gerster commented Nov 21, 2022 • edited by sappelhoff Loading

PR Description

Merge checklist

welcome bot commented Nov 21, 2022

moritz-gerster Nov 21, 2022 • edited Loading

Choose a reason for hiding this comment

hoechenberger commented Nov 21, 2022

codecov bot commented Nov 21, 2022 • edited Loading

Codecov Report

hoechenberger commented Nov 21, 2022

hoechenberger commented Nov 21, 2022

sappelhoff left a comment

Choose a reason for hiding this comment

moritz-gerster Nov 21, 2022 • edited Loading

Choose a reason for hiding this comment

sappelhoff Nov 21, 2022

Choose a reason for hiding this comment

moritz-gerster Nov 21, 2022

Choose a reason for hiding this comment

moritz-gerster Nov 21, 2022

Choose a reason for hiding this comment

moritz-gerster commented Nov 21, 2022

sappelhoff commented Nov 21, 2022

moritz-gerster commented Nov 22, 2022 • edited Loading

moritz-gerster Nov 22, 2022

Choose a reason for hiding this comment

sappelhoff commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster Nov 22, 2022

Choose a reason for hiding this comment

moritz-gerster commented Nov 22, 2022 • edited Loading

sappelhoff left a comment

Choose a reason for hiding this comment

sappelhoff left a comment

Choose a reason for hiding this comment

welcome bot commented Nov 22, 2022

hoechenberger commented Nov 22, 2022

moritz-gerster commented Nov 22, 2022

moritz-gerster commented Nov 21, 2022 •

edited by sappelhoff

Loading

moritz-gerster Nov 21, 2022 •

edited

Loading

codecov bot commented Nov 21, 2022 •

edited

Loading

moritz-gerster Nov 21, 2022 •

edited

Loading

moritz-gerster commented Nov 22, 2022 •

edited

Loading

moritz-gerster commented Nov 22, 2022 •

edited

Loading