Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requests/filteringTerms.yaml too forgiving for request validation #56

Open
gsfk opened this issue Feb 3, 2023 · 6 comments
Open

requests/filteringTerms.yaml too forgiving for request validation #56

gsfk opened this issue Feb 3, 2023 · 6 comments

Comments

@gsfk
Copy link
Contributor

gsfk commented Feb 3, 2023

I've been using BeaconRequestBody.yaml (and its json cousin) for request validation, but the spec for filtering terms is too forgiving, and lets even the most obviously mangled filters through. The issue is with the use of anyOf, the overlaps between the different filter types, and a forgiving attitude toward additional properties.

For example, this filter passes, even though similarity is required to be a string that's one of "exact", "high", "medium", or "low":

  {
    "id": "some_id",
    "similarity": {"I snuck": "an object in here"}
  }

Some other examples are here: https://www.jsonschemavalidator.net/s/pFQY4Xno

This happens because:

  • FilteringTerm is defined to be any of OntologyFilter, AlphanumericFilter, or CustomFilter,
  • a filter only has to match one of the three types to pass validation
  • malformed AlphanumericFilter can often pass as OntologyFilter
  • almost anything will pass as a CustomFilter, any malformed values other than "id" or "scope" are considered extra properties with no restrictions on them.

The obvious sledgehammer solution is to forbid extra properties in filters, I don't know enough of the history here to know if that's too harsh of a solution. Is CustomFilter meant to be more open?

Here is a solution that forbids all extra properties in filters.

Here, if needed, is a more conservative solution that allows extra properties in CustomFilter but requires them to be objects, which should cut down on the amount of confusion between a CustomFilter extra property and a malformed field in another filter type, all of which are string or boolean.

@mbaudis
Copy link
Member

mbaudis commented Feb 7, 2023

The general problem with filters is that they are over-designed & under-engineered; but I'm not sure if it is necessary to solve by engineering vs. by documentation? IMO we can do some improvements (e.g. format check for ontology id's, your change of the additionalProperties use ...) but with "custom filters" opening the door wide it will always be ... complicated.

One possibility would be to restrict public filter use to some well-defined designs and verify against those... But overall I think there are many cases in Beacon where the solution won't come from "verify against JSON Schema".

@jrambla
Copy link
Contributor

jrambla commented Feb 7, 2023

The approach of adding "additionalProperties: false" to the definitions seems fine to me.
So I'll be in favor of adding that to them.

@jrambla
Copy link
Contributor

jrambla commented Feb 7, 2023

However, I'm in doubt with the CustomFilter stuff.
My guess is that it is there for historical reasons: we started just with ontology terms, nothing else.
It was too restrictive as you are limited to terms defined in ontologies, which would be not applicable to custom lists of items... therefore we add the custom filter.
Alphanumeric filters were added somewhat later... and they make sense, but I also believe that makes CustomFilters no longer necessary as we can express the example:

"demographic.ethnicity:asian"

as

  {
    "id": "demographic.ethnicity",
    "operator": "=",
    "value": "asian"
  }

Are we aligned?

@mbaudis
Copy link
Member

mbaudis commented Feb 7, 2023

I'm for the removal of custom filters from the public API, or just do them as placeholder object (i.e. only definition as object since they should not been though of something anybody needs to understand, and make a note about "internal use".

@mbaudis
Copy link
Member

mbaudis commented Feb 7, 2023

A solution for stricter typing is the use of filterType, e.g. filterType: alphanumeric as a required parameter. VRS is doing something similar then with if conditions (though it gets ... complex). @gsfk you want to look into that (or @tb143)?

@gsfk
Copy link
Contributor Author

gsfk commented Feb 10, 2023

I had a quick peek at the VRS schema and didn't see any conditionals, although perhaps I'm looking in the wrong place.

Eliminating custom queries (or forcing them to be an object) both sound fine, although it only solves part of the problem, malformed OntologyFilters are no longer accepted, but bad AlphanumericFilters still get through.

without significantly complicating the schema, the easiest solutions I see are:

  • adding "additionalProperties": "false" to all definitions (as in the code linked above)
  • addding "additionalProperties": {"type": "object"} to all definitions, so they're easier to distinguish.
  • more strict typing as Michael suggests above (either a required filterType field, or simply vary the name of the id property across definitions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants