Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Refactor directory structure #127

Merged
merged 21 commits into from
Apr 23, 2021
Merged

REF: Refactor directory structure #127

merged 21 commits into from
Apr 23, 2021

Conversation

adelavega
Copy link
Contributor

@adelavega adelavega commented Apr 16, 2021

Closes #124

Previously the output directory structure was:

  • <outdir>/<bundle_id>/fitlins - output files
  • <install-dir> (optional, but defaults to .) - Specifies where to download datasets
    Example contents: ./Budapest/preproc ./Budapest/neuroscout-bundles/<bundle_id>

Two problems:

  1. The two types of data are intermixed and grouped by Dataset name, not bundle id
  2. surprise folder creation at current directory in Singularity

Instead I'm proposing the following:

  • <outdir>/neuroscout-<bundle_id> (both arguments are mandatory for both install and run commands).

subdirectories:

  • inputs/bundle -- contents of untarred bundle
  • inputs/<dataset_name> -- dataset_name is obtained from bundle meta-data. This is a DataLad dataset.
  • out/fitlins -- fitlins outputs are here

Here is an example:

    /home/user/out/neuroscout-5xH93  
    └───inputs
    │   │
    │   └───Budapest
    │       └───fmriprep
    │   └───bundle
    │       └───events
    │       │   model.json
    │       │   ...
    └───out
    │   └───fitlins
    │       └───sub-01
    │       └───reports
    │       │   task-movie_space-MNI152NLin2009cAsym_contrast-{name}_stat-effect_statmap.nii.gz
    |       |   ...

This guarantees a "YODA" like structure.
However, if install-dir is defined then the DataLad dataset is installed ininstall-dir/<dataset_name>.

Here is an example of that folder, after caching a few datsets:

    /install-dir
    └───Budapest
    │   └───fmriprep
    └───studyforrest
    │   └───fmriprep

Questions:

  • Should I symlink <cache-dir>/<dataset_name> --><outdir>/neuroscout-<bundle_id>/inputs/<dataset_name>
    if cache-dir is defined?

  • is out/fitlins too nested? Fitlins by default creates a fitlins output directory hence the structure

@adelavega adelavega requested a review from jdkent April 20, 2021 05:00
@jdkent
Copy link

jdkent commented Apr 21, 2021

Let me try to imagine a datalad workflow:

  • create a dataset with the bundle-id in the directory name, like neuroscout-<bundle_id>
  • create a subdataset with the name inputs
  • install the dataset with neuroscout-cli install into inputs
  • neuroscout-cli run would generate output in neuroscout-<bundle_id> (with reference to the cached dataset)
  • thus the output would be in neuroscout-<bundle_id>/neuroscout-<bundle_id>

Let's say someone was going for the least amount setup and just ran neuroscout-cli (most common):

  • Then if they wanted to add datalad, they would have to (--force) initialize datasets since datalad does initialize dataset with content in them by default.

From these two cases it's a little strange for the inputs to be explicitly linked in the output of neuroscout-cli.
I don't know if it should be the responsibility of the tool to create a yoda like structure, or if the focus should merely be trying
to be as modular as possible to aid incorporation into a yoda-like structure.

@adelavega
Copy link
Contributor Author

I'm not sure we're in the same page. I think you have the order of operations correct, except that neurosout-cli will handle all the datalad stuff for you.

that is, neuroscout-cli install will create neuroscout-<bundle_id> with the subdirectories inputs/ and within bundle/ {dataset_name}

I use the DataLad python API so a proper DataLad dataset is already installed in bundle/ {dataset_name}

neuroscout run will also run neuroscout install if necessary (maybe this should be more explicit)

The output would be in neuroscout-<bundle_id>/fitlins

I agree that YODA like structure is a bit counter-intuitive, hence why I would often run with -i <data_dir> which would keep all the datasets in one place... Maybe that could be the default option?

I do like the idea of keeping the bundle with the output though, because then you can see exactly what the inputs from neuroscout were to produce the fitlins output, without having to go look for them somewhere else. Given that the bundle is lightweight, this doesn't bother me. The dataset is a bit trickier since that is heavy, so I could buy an argument that this should live somewhere else.

@adelavega adelavega mentioned this pull request Apr 21, 2021
@adelavega
Copy link
Contributor Author

Idea: drop input dataset by defualt if not cached.

@adelavega
Copy link
Contributor Author

@effigies can I request your review on this?

@adelavega
Copy link
Contributor Author

adelavega commented Apr 22, 2021

If you could please review this PR which updates the docs and let me know if its intuitive, that would be great:

https://github.com/neuroscout/neuroscout/blob/dfdd25a21f6b16fdc3d4821f653109fdee929aaf/docs/cli/usage.md

@effigies
Copy link
Contributor

Not tonight. I'll try to remember, but please re-ping in the morning.

@adelavega
Copy link
Contributor Author

Oh, no rush at all, thanks!

@effigies
Copy link
Contributor

Overall this seems sensible.

Questions:

  • Should I symlink <cache-dir>/<dataset_name> --><outdir>/neuroscout-<bundle_id>/inputs/<dataset_name>
    if cache-dir is defined?

Could you do a datalad reckless clone? That would be the cheapest way to pull the inputs from a local cache directory.

  • is out/fitlins too nested? Fitlins by default creates a fitlins output directory hence the structure

I think we probably want to pull a nipreps/fmriprep#2303 on FitLins as well, as the outputs are at least ostensibly supposed to be a derivative directory. Figure out what you want your outputs to look like and we'll adjust FitLins to permit it.

neuroscout_cli/cli.py Outdated Show resolved Hide resolved
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
@adelavega
Copy link
Contributor Author

Thanks. I think for now I'll merge this as is, and in a separate PR I can make sure I'm fully compliant with YODA/BIDS. And yes perhaps its better if neuroscout-cli doesn't do it, but fitlins does it.

@adelavega adelavega merged commit edf20b3 into master Apr 23, 2021
@adelavega adelavega deleted the ref/download_dirs branch April 23, 2021 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-configure directory defaults
3 participants