Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement BidsPartialComponent #284

Merged
merged 18 commits into from
Jun 22, 2023

Conversation

pvandyken
Copy link
Contributor

@pvandyken pvandyken commented Apr 28, 2023

Resolves #243.

For a play-by-play of what I did, the commits are quite detailed and fairly well organized.

For a high level overview:

The new classes

BidsComponent is conceived of as a multi-selectable dict (sort of... actually right now only __getitem__ is implemented). If a tuple of entities is selected, you get back a BidsPartialComponent which has a ZipList but no path or name. BidsComponent is actually just a subclass of BidsPartialComponent.

If a single entity is selected from BidsComponent (or BidsPartialComponent), you get a BidsComponentRow. This represents a single entity and its values, and it's implemented as a ImmutableList (sort of a UserTuple). It has the wildcards and entities properties, but instead of returning dicts, they directly return the {str} or ['list', 'of', 'values'] associated with the entity. This is to ensure symmetry between the following access modes:

inputs['bold']['subject'].wildcards == inputs['bold'].wildcards['subject']

Note that the preferred style would still be inputs['bold'].wildcards['subject'], so this aspect of the new implementation is largely for consistency and to make everything more fail-proof.

On the other hand, BidsComponentRow.zip_lists does still return a dict mapping the one entity to its ordered entity list. This break in symmetry is motivated by the new Expandable protocol I established:

Expandable classes have an expand method, a filter method, and a zip_lists property or attribute. The signatures of these items must be as found in BidsPartialComponent. Thus, zip_lists in BidsComponentRow needs to return a dict in order to meet the protocol. In return, we're able to cleanly test all of three of the current classes using the exact same test code.

Furthermore, because BidsComponentRow is already implemented as a list with immediate access to the values, any call from the user to a symmetrically defined zip_lists would be completely redundant. It would give only a less powerful version of the list-like object the user already has.

The documentation

I broke off a bit from our completely autogenerated documentation in the api section in order to prioritize the overview of BidsComponent. BidsPartialComponent is really mostly a repeat of BidsComponent, so none of it's members are shown, just an extended docstring overview. BidsComponentRow is rather different, so it's displayed in full.

I also moved the input_* legacy properties into a dropdown box. I suppose we could technically deprecate them, but they don't really do any harm and dropping them would completely prevent older workflows from upgrading (unlike BidsDataset.wildcards etc, which probably never had much buy-in). This way we can document them and keep them, but out of the way.

Deduplication

I decided that zip_list deduplication should not be performed immediately when getting subcomponents and rows. This is both for simplicity, and to ensure that indexing remains consistent and clear as capabilities become more complex. So the only deduplication to be performed will be done in the expand method (since it unambiguously doesn't make sense for expand to return multiple identical paths)

Design Question

  • I stored the name of the entity in BidsComponentRow as .entity. The name is rather close to the .entities property, which returns something completely different (the .entities name doesn't make sense in isolation, but does if you consider the whole picture). So should we store .entity as something else?

TBD

  • Add zip_list deduplication to the expand methods

@pvandyken pvandyken marked this pull request as ready for review May 1, 2023 20:22
@github-actions github-actions bot requested review from akhanf, kaitj and tkkuehn May 3, 2023 14:42
@pvandyken pvandyken force-pushed the feat/bidscomponent/partial branch from 8756d18 to f9f74ad Compare May 4, 2023 17:33
@pvandyken pvandyken added the enhancement New feature or request label May 4, 2023
@pvandyken pvandyken force-pushed the feat/bidscomponent/partial branch from 581f837 to f908a54 Compare May 4, 2023 19:49
@pvandyken
Copy link
Contributor Author

Okay, I think that finally fixes all the test failures.

Just note for posterity, when combining MockerFixture with a hypothesis test, you need to call mocker.stopall() at the very beginning of the test so that all the mocks are removed, otherwise you get very strange errors. This is even if you only run one example, because if the example fails, hypothesis still does multiple runs of the test to try to shrink the example.

Copy link
Contributor

@tkkuehn tkkuehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code basically looks good to me, just a few changes for the docstrings (some of which are a matter of opinion and optional)

snakebids/utils/utils.py Outdated Show resolved Hide resolved
snakebids/utils/utils.py Show resolved Hide resolved
snakebids/utils/utils.py Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
snakebids/core/datasets.py Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
snakebids/core/datasets.py Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
Copy link
Contributor

@kaitj kaitj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of spelling things found, but pending resolution of Tristan's review, think this one may be good to go.

Re: your design question - what would be the alternative?

snakebids/core/datasets.py Outdated Show resolved Hide resolved
snakebids/core/datasets.py Outdated Show resolved Hide resolved
@pvandyken
Copy link
Contributor Author

Re: your design question - what would be the alternative?

Not sure, guess I wanted to see if either of you would object to the status quo

@tkkuehn
Copy link
Contributor

tkkuehn commented May 10, 2023

Should we store .entity as something else?

Sorry, forgot to specifically address this as part of my first review. I'd also be happy with entity_name, which I guess would slightly more precisely describe what part of the entity this attribute represents (i.e. the long-form name as opposed to the entity key or value). I think I'm fine with just calling it entity though.

@pvandyken
Copy link
Contributor Author

Sorry, forgot to specifically address this as part of my first review. I'd also be happy with entity_name, which I guess would slightly more precisely describe what part of the entity this attribute represents (i.e. the long-form name as opposed to the entity key or value). I think I'm fine with just calling it entity though.

Okay, I'm just going to keep it then.

@tkkuehn tkkuehn mentioned this pull request Jun 2, 2023
@pvandyken pvandyken requested review from tkkuehn and kaitj June 7, 2023 02:39
@pvandyken
Copy link
Contributor Author

I implemented the requested changes and the __spec argument on filter. In addition:

  • I changed the filter api to Iterable. This makes it more flexible for use (not a breaking change), and in principle, safer in a snakemake environment where we don't have static type checking (the previous version ate iterables without filtering correctly). All the tests now use consumable iterators instead of lists.
  • I went ahead and made paths in all the expand methods a position-only argument __paths. This is technically a breaking change, but I highly doubt anyone is referring to it via the keyword currently. My rationale is
    1. Accessing it via keyword makes it semantically confusing with **wildcards
    2. paths is technically a valid wildcard name, and we're taking up that namespace for no good reason. Of course, snakemake's expand does the same thing with filepatterns and func, and this is something we can address later, but in the meantime we should keep our own functions clean.

Copy link
Contributor

@kaitj kaitj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending merge conflict fix

Note that we don't use importlib_resources, as it caused problems on
github actions
Copies all the methods from tuple, including hashable (which only works
if all of the contents are hashable). Does not support mutation
Represents the expand(), filter(), and zip_lists attributes
A generic interface supporting a key type, a value type for single keys,
and a seperate value type for tuple keys
The new BidsComponent only contains code relevant to the name and path

Change constructor arguments of both classes into keyword-only, as the
order of the arguments changed anyway, and kw_only increases future
clarity and robustness
Represents a single selected entity from a BidsComponent. Implemented as
an Immutable list, so supports all the methods and abilities of a tuple.

.wildcards and .entities both directly return the `{wildcard}` string or
list of values associated with the entity, but zip_lists still returns a
dict of entity->values to be consistent with the `Expandable` protocol
- BidsPartialComponent
- BidsComponentRow
- Expandable
Allow the setting of uniqueness and the culling behaviour (removing some
of the generated zip_list combinations) via strategy parameters

Continue to Enforce dataset uniqueness in boolean filter tests

Add detailed warning to the
`test_when_all_custom_paths_no_layout_indexed` test, explaining why it
might go beserk on failure because of the MockerFixture being used
Ensures 'datatype', 'suffix', and 'extension' are not used as the entity
in `BidsComponentRow` so that paths can be safely derived
Add sphinx_design as docs dependency to allow the hiding of deprecated
syntaxes

Update docstrings
Updates all the filter tests to exclusively pass consumable iterators as
arguments to filter methods
__spec is an optional, kw-only argument mutually exclusive with
**filters that takes str or iterable of str to apply directly to the
row's entity
Ensure only unique entries used in expand tests
- Now that expand deduplicates paths, we need to test with deduplicated
  zip_lists
This will allow the addition of a second, optional position-only
argument following it: __spec. It also frees the name paths for
potential wilcard names, since paths is a valid wildcard name and right
now there's a conflict
@pvandyken pvandyken merged commit 89c9fd9 into khanlab:main Jun 22, 2023
@pvandyken pvandyken deleted the feat/bidscomponent/partial branch June 22, 2023 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement BidsPartialComponent
3 participants