Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] clarify that filters should be specified as object of objects #348

Merged
merged 3 commits into from
Oct 30, 2019

Conversation

sappelhoff
Copy link
Member

closes #339

In this PR, that for SoftwareFilters and HardwareFilters we expect an object of objects.

Previously, the text said we expect "a list" ... although the text thereafter and the included example showed an object of objects.

The alternative was to allow "lists of objects", which might have been a more elegant solution that would have allowed to naturally specify an order of applied temporal filters. However, as this would break backward compatibility, I vote against that change.

With the proposed object of objects, an order of filters (if required) can be indicated with additional, custom, key value pairs.

@sappelhoff sappelhoff changed the title clarify that filters should be specified as object of objects FIX: clarify that filters should be specified as object of objects Oct 16, 2019
Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. As I've stated elsewhere, the only condition under which I would strenuously object is if order can ever matter, for example, if nonlinear filters can be used. If order can matter, I think the cost in fixing examples and potentially invalidating datasets is worth it.

Also worth considering whether people previously read "list" and interpreted it literally; you may be invalidating datasets that were valid under that interpretation. Checking ///openneuro via datalad (which doesn't include all current datasets...), the only examples I see are "n/a", {"SpatialCompensation":{"GradientOrder":"3rd"}} and {"SpatialCompensation":{"GradientOrder": 3}}. So this isn't a definite concern...

I'll leave approving reviews to ephys people, who know more about these filters than I do.

@jasmainak
Copy link
Collaborator

Thinking about this issue once more, I am not sure if it's really a big difference to have a list or not. It's true that the order could matter but in practice, since this is not a derivatives specification yet, the software filter field is for storing the bare minimum preprocessing that has been done to make the files shareable. This could be, for instance, Maxfilter for Elekta systems and gradient compensation for CTF systems. We don't want to encourage users to hack this to share what could be considered derivative files.

@sappelhoff
Copy link
Member Author

Also worth considering whether people previously read "list" and interpreted it literally; you may be invalidating datasets that were valid under that interpretation.

That is true, but these datasets would not have passed the validator, so these cases are confined to those were people only read the spec, and never used the validator.

I interpret @jasmainak's comment above as another pro for dealing with the issue in the way proposed in this PR. Or did you mean it in another way Mainak?

@jasmainak
Copy link
Collaborator

jasmainak commented Oct 17, 2019

yes it is pro. Do you think though that the current phrasing introduces more flexibility in the specification? I'd prefer something like dictionary as opposed to object

@sappelhoff
Copy link
Member Author

yes it is pro. Do you think though that the current phrasing introduces more flexibility in the specification? I'd prefer something like dictionary as opposed to object

That's why I put a link there to "JSON OBJECT", where it is unambiguously defined what we expect: https://www.w3schools.com/js/js_json_objects.asp

@jasmainak
Copy link
Collaborator

right but a JSON object can contain almost anything unless I'm mistaken?

@jasmainak
Copy link
Collaborator

If you change to something like dictionary of dictionary, where the values of key-value pair are restricted to be strings, then you reduce the risk of breaking backwards compatibility. However, this would need a tweak to the JSON schemas in the validator.

@effigies
Copy link
Collaborator

We seem to use "object" and "dictionary" interchangeably in Common principles.

a simple data dictionary in a JSON format

a description of the corresponding column, using an object containing the following fields:

a dictionary of possible values (keys) and their descriptions (values).

I don't see either used elsewhere, so I don't think there's a strong precedent.

I also don't think it's correct to say that "object" is more general than "dictionary", and it corresponds to a JSON notion, while "dictionary" is a Python name (although the JSON docs associate the two, presumably to help orient Python programmers), so this is a question of style, not of semantics.

If you change to something like dictionary of dictionary, where the values of key-value pair are restricted to be strings, then you reduce the risk of breaking backwards compatibility.

Some values are numeric, so this would definitely break backwards compatibility. This level of specification feels like it's headed in an unnecessarily restrictive direction. "Object of objects"/"dictionary of dictionaries" makes the depth clear; a programmer knows they can make a two-level nested loop over entries, and get some value. It is not our job to say what all those values can or must be.

If we want to be more specific about what's found in the value of certain keys, then we can specify fields using dot notation, as in Derived dataset and pipeline description:

| Field                | Description |
|----------------------|-------------|
| TopField.SecondLevel | ...         |

@jasmainak
Copy link
Collaborator

I just wanted to prevent people from hacking into this and providing the filter parameters through arbitrary nesting. The JSON schema gives straightforward ways to do this without adding any complexity to the validator code. But I'm not sure how this would work with the dot notation.

Anyway, I recognize @sappelhoff just intended this PR as a clarification. So, it's fine by me as it is.

@sappelhoff
Copy link
Member Author

I just wanted to prevent people from hacking into this and providing the filter parameters through arbitrary nesting. The JSON schema gives straightforward ways to do this without adding any complexity to the validator code.

I see your point! Perhaps we can make this as a separate improvement:

  • in the spec, provide some more guidelines (e.g., nested objects are only allowed up to X levels depth)
  • in the validator, use the schema that allows only a max depth
  • provide some RECOMMENDED guidelines for parameter names and their expected value types (e.g. Cutoff, float)

However, as said above - I would see that as separate from this PR.

Can we have some more approving or challenging voices?

@effigies
Copy link
Collaborator

Looks like everybody's happy with this. Merging.

@effigies effigies merged commit b1f05f5 into bids-standard:master Oct 30, 2019
@sappelhoff sappelhoff deleted the filter branch March 31, 2020 18:18
@effigies effigies mentioned this pull request May 26, 2020
5 tasks
@sappelhoff sappelhoff changed the title FIX: clarify that filters should be specified as object of objects [FIX] clarify that filters should be specified as object of objects Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing filter specification in EEG, MEG, iEEG
4 participants