REF: Refactor directory structure #127

adelavega · 2021-04-16T21:33:50Z

Closes #124

Previously the output directory structure was:

<outdir>/<bundle_id>/fitlins - output files
<install-dir> (optional, but defaults to .) - Specifies where to download datasets
Example contents: ./Budapest/preproc ./Budapest/neuroscout-bundles/<bundle_id>

Two problems:

The two types of data are intermixed and grouped by Dataset name, not bundle id
surprise folder creation at current directory in Singularity

Instead I'm proposing the following:

<outdir>/neuroscout-<bundle_id> (both arguments are mandatory for both install and run commands).

subdirectories:

inputs/bundle -- contents of untarred bundle
inputs/<dataset_name> -- dataset_name is obtained from bundle meta-data. This is a DataLad dataset.
out/fitlins -- fitlins outputs are here

Here is an example:

    /home/user/out/neuroscout-5xH93  
    └───inputs
    │   │
    │   └───Budapest
    │       └───fmriprep
    │   └───bundle
    │       └───events
    │       │   model.json
    │       │   ...
    └───out
    │   └───fitlins
    │       └───sub-01
    │       └───reports
    │       │   task-movie_space-MNI152NLin2009cAsym_contrast-{name}_stat-effect_statmap.nii.gz
    |       |   ...

This guarantees a "YODA" like structure.
However, if install-dir is defined then the DataLad dataset is installed ininstall-dir/<dataset_name>.

Here is an example of that folder, after caching a few datsets:

    /install-dir
    └───Budapest
    │   └───fmriprep
    └───studyforrest
    │   └───fmriprep

Questions:

Should I symlink <cache-dir>/<dataset_name> --><outdir>/neuroscout-<bundle_id>/inputs/<dataset_name>
if cache-dir is defined?
is out/fitlins too nested? Fitlins by default creates a fitlins output directory hence the structure

…ache data dir

jdkent · 2021-04-21T21:28:02Z

Let me try to imagine a datalad workflow:

create a dataset with the bundle-id in the directory name, like neuroscout-<bundle_id>
create a subdataset with the name inputs
install the dataset with neuroscout-cli install into inputs
neuroscout-cli run would generate output in neuroscout-<bundle_id> (with reference to the cached dataset)
thus the output would be in neuroscout-<bundle_id>/neuroscout-<bundle_id>

Let's say someone was going for the least amount setup and just ran neuroscout-cli (most common):

Then if they wanted to add datalad, they would have to (--force) initialize datasets since datalad does initialize dataset with content in them by default.

From these two cases it's a little strange for the inputs to be explicitly linked in the output of neuroscout-cli.
I don't know if it should be the responsibility of the tool to create a yoda like structure, or if the focus should merely be trying
to be as modular as possible to aid incorporation into a yoda-like structure.

adelavega · 2021-04-21T22:40:22Z

I'm not sure we're in the same page. I think you have the order of operations correct, except that neurosout-cli will handle all the datalad stuff for you.

that is, neuroscout-cli install will create neuroscout-<bundle_id> with the subdirectories inputs/ and within bundle/ {dataset_name}

I use the DataLad python API so a proper DataLad dataset is already installed in bundle/ {dataset_name}

neuroscout run will also run neuroscout install if necessary (maybe this should be more explicit)

The output would be in neuroscout-<bundle_id>/fitlins

I agree that YODA like structure is a bit counter-intuitive, hence why I would often run with -i <data_dir> which would keep all the datasets in one place... Maybe that could be the default option?

I do like the idea of keeping the bundle with the output though, because then you can see exactly what the inputs from neuroscout were to produce the fitlins output, without having to go look for them somewhere else. Given that the bundle is lightweight, this doesn't bother me. The dataset is a bit trickier since that is heavy, so I could buy an argument that this should live somewhere else.

adelavega · 2021-04-22T16:29:45Z

Idea: drop input dataset by defualt if not cached.

…cli into ref/download_dirs

adelavega · 2021-04-22T21:05:45Z

@effigies can I request your review on this?

adelavega · 2021-04-22T21:34:25Z

If you could please review this PR which updates the docs and let me know if its intuitive, that would be great:

https://github.com/neuroscout/neuroscout/blob/dfdd25a21f6b16fdc3d4821f653109fdee929aaf/docs/cli/usage.md

effigies · 2021-04-22T22:05:40Z

Not tonight. I'll try to remember, but please re-ping in the morning.

adelavega · 2021-04-22T22:06:20Z

Oh, no rush at all, thanks!

effigies · 2021-04-23T13:35:37Z

Overall this seems sensible.

Questions:

Should I symlink <cache-dir>/<dataset_name> --><outdir>/neuroscout-<bundle_id>/inputs/<dataset_name>
if cache-dir is defined?

Could you do a datalad reckless clone? That would be the cheapest way to pull the inputs from a local cache directory.

is out/fitlins too nested? Fitlins by default creates a fitlins output directory hence the structure

I think we probably want to pull a nipreps/fmriprep#2303 on FitLins as well, as the outputs are at least ostensibly supposed to be a derivative directory. Figure out what you want your outputs to look like and we'll adjust FitLins to permit it.

neuroscout_cli/cli.py

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

adelavega · 2021-04-23T16:29:04Z

Thanks. I think for now I'll merge this as is, and in a separate PR I can make sure I'm fully compliant with YODA/BIDS. And yes perhaps its better if neuroscout-cli doesn't do it, but fitlins does it.

adelavega added 10 commits April 16, 2021 16:19

Refactor download dirs to default to YODA structure, with option to c…

f3a3257

…ache data dir

Nest output dir

a1afe19

Dont Path cache-dir

8a962e5

Path outdir

563efdd

Reverse order

18c1f7a

fix dir

6e5904d

Add target path

b6c0336

Fix syntax

7e968cd

Mkdir not dirs

bb35ff4

fix self.preproc_path

a11c8f6

adelavega requested a review from jdkent April 20, 2021 05:00

adelavega mentioned this pull request Apr 21, 2021

REL: 0.5.1 #129

Merged

adelavega added 3 commits April 22, 2021 15:10

Merge branch 'master' into ref/download_dirs

e7eef7f

Update README

f39fba8

Merge branch 'ref/download_dirs' of github.com:neuroscout/neuroscout-…

083736a

…cli into ref/download_dirs

adelavega added 2 commits April 22, 2021 16:13

Update cache-dir to install-dir

426006f

Update CLI options

ebf89a0

adelavega mentioned this pull request Apr 22, 2021

Update docs to reflect new CLI directory structure neuroscout/neuroscout#912

Merged

adelavega added 2 commits April 22, 2021 16:43

minor comment fix

093684c

Default install dir to None

756ff2e

effigies reviewed Apr 23, 2021

View reviewed changes

neuroscout_cli/cli.py Outdated Show resolved Hide resolved

Update neuroscout_cli/cli.py

506cfdc

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

adelavega added 3 commits April 23, 2021 11:46

Add datalad drop

266fca7

path should be str

a16a708

"sourcedata" not "inputs"

3ef9db8

adelavega merged commit edf20b3 into master Apr 23, 2021

adelavega deleted the ref/download_dirs branch April 23, 2021 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: Refactor directory structure #127

REF: Refactor directory structure #127

adelavega commented Apr 16, 2021 •

edited

Loading

jdkent commented Apr 21, 2021

adelavega commented Apr 21, 2021

adelavega commented Apr 22, 2021

adelavega commented Apr 22, 2021

adelavega commented Apr 22, 2021 •

edited

Loading

effigies commented Apr 22, 2021

adelavega commented Apr 22, 2021

effigies commented Apr 23, 2021

adelavega commented Apr 23, 2021

REF: Refactor directory structure #127

REF: Refactor directory structure #127

Conversation

adelavega commented Apr 16, 2021 • edited Loading

jdkent commented Apr 21, 2021

adelavega commented Apr 21, 2021

adelavega commented Apr 22, 2021

adelavega commented Apr 22, 2021

adelavega commented Apr 22, 2021 • edited Loading

effigies commented Apr 22, 2021

adelavega commented Apr 22, 2021

effigies commented Apr 23, 2021

adelavega commented Apr 23, 2021

adelavega commented Apr 16, 2021 •

edited

Loading

adelavega commented Apr 22, 2021 •

edited

Loading