Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Consider Support and Documentation for more complex AND/OR queries #506

Open
mrshll1001 opened this issue May 15, 2024 · 0 comments
Open
Assignees

Comments

@mrshll1001
Copy link
Contributor

This stems from @apricot13's post on the OR Forum:

At the moment the API supports filters on several endpoints, for several parameters, which behave in a standardised way for filters i.e. they are cumulative and will only permit results through if they match one of the criteria set out as a filter.

This behaviour has the following effects:

  • Multiple filters on different parameters effectively create a boolean AND query. The query /services?taxonomy_id=XXX&organization_id=YYY will only return results that have a taxonomy_id of XXX and an organization_id of YYY
  • Multiple filters on the same parameter create a boolean OR query on that parameter. The query /services?taxonomy_id=XXX&taxonomy_id=YYY will only return results that have a taxonomy_id of XXX, YYY, or two taxonomy ids.

This is well and dandy, however does not cover (at least) two important use-cases:

  1. what if we only want results that contain both of the taxonomy ids stated, and aren't interested in the others?
  2. What if we want results that match at least one of sets of desirable properties which may be mutually exclusive (i.e. (A AND B AND C) OR (D AND E AND F))

I think that the former is more immediate.

There is also the question of whose problem is it to address? From a HTTP and REST perspective, I think it can be argued that the HSDS API spec is already doing its job by providing the filters. This isn't a great attitude to have wrt supporting people implementing the spec or finding services, though.

The first problem, if we ONLY want results that contain BOTH of the taxonomy_ids stated, very well might be a sorting problem rather than a querying or filter problem. If we've got a set results which we know contains:

  • items with a taxonomy_id of XXX (and perhaps other taxonomy_ids not explicitly filtered for)
  • items with a taxonomy_id of YYY (and perhaps other taxonomy_ids not explicitly filtered for)
  • items with a taxonomy_id of XXX AND YYY (and perhaps other taxonomy_ids not explicitly filtered for)

Then it may be suitable to just sort the results according to how many of the filter parameters each item meets? I think there could be a productive discussion around whether it's the API implementation or the consuming application that does the sorting.

We could address the issue of more complex querying more explicitly by providing a query endpoint — or a query parameter on other endpoints — which takes a query language for the data. For example "Lucene" syntax or ElasticSearch query strings; although that may be difficult for people using other technologies to parse out and translate to a query for their own database systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants