Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle boolean expressions (and, or, not) for filters #33

Closed
maheshrajamani opened this issue Jan 17, 2023 · 9 comments
Closed

Handle boolean expressions (and, or, not) for filters #33

maheshrajamani opened this issue Jan 17, 2023 · 9 comments
Assignees

Comments

@maheshrajamani
Copy link
Contributor

Change the filter clause to handle bool expression

@tatu-at-datastax
Copy link
Contributor

I think this is now completed?

@maheshrajamani
Copy link
Contributor Author

This is for handling and, or and not expression in filter

@tatu-at-datastax
Copy link
Contributor

tatu-at-datastax commented Jan 26, 2023

Ah. Boolean expression, not Boolean value types. Gotcha.
(edited title slightly)

@tatu-at-datastax tatu-at-datastax changed the title Handle bool expression for filters Handle boolean expressions (and, or, not) for filters Jan 26, 2023
@ivansenic
Copy link
Contributor

ivansenic commented Feb 6, 2023

@maheshrajamani @amorton Should I take this to implement the bool logic? Already did this for v2, could adapt to new model of DbFilters?

@maheshrajamani
Copy link
Contributor Author

I have not done anything for bool logic support, you can pick it.

@ivansenic ivansenic self-assigned this Feb 15, 2023
@ivansenic
Copy link
Contributor

ivansenic commented Feb 21, 2023

I wanted to outline quickly a design approach we can have for the boolean logic, and how to apply this to the current code base.

We should continue using the https://github.com/bpodgursky/jbool_expressions library. This library is already used in the docsapi v2, it's also used in the DSE. We already have a pretty advanced experience with the library in the team and should build on top of that.

The implementation should create an new Expression class, where we would keep our own logic, similarly to the https://github.com/stargate/stargate/blob/main/restapi/src/main/java/io/stargate/web/docsapi/service/query/FilterExpression.java class. However, the expression extension can be more simple here.

Reading the JSON filter clause would already create expression tree that holds our data structure with the ComparisonExpression as the only data. This means we would be able to represent the boolean logic at the command level. The FilterableResolver class and the rules that we have defined, would be transitioned to expression rules that serve to enrich the entry model with the database-related filtering. We can have enrichment rules (we have incoming expression and DbFilter next to each other) or transformation rules (transform from ComparisonExpression to DbFilter). My choice for now would be enrichment, purely for possible optimizations later on.

Once the rules are applied, we pass the enriched expression tree to the Read operation where it can be utilized. For now we will support only And and we will keep existing implementations in the operations. And expression simply has a list of children, so we can use this directly.

Right away we can use all the simplification rules that exist in the library OOB, for example AND(A,A) is resolved to A. In the future we can have additional things:

  • rules for dealing with not
  • or support
  • extra rules for some specific situations

I strongly believe that in the future the expression tree can give us complete CQL WHERE clause as string. This would give us many benefits:

  • very easy extension for adding new operands (!=, >, <)
  • very easy support for OR without dealing with the query builder
  • etc

In addition, if the expression tree contains both database and in-memory filters, it can be manipulated in a way that CQL WHERE clause is constructed from database filters, and memory ones are used to check documents in-memory. Some of this we already use in docs v2 and it has prove itself as a very powerful technique.

I will try to provide a small POC tomorrow as a PR, which would depict the first step in the transformation.

@amorton @tatu-at-datastax @maheshrajamani @jeffreyscarpenter fyi

@amorton
Copy link
Contributor

amorton commented Feb 22, 2023

@ivansenic I want to keep support for OR and NOT (and an explicit AND )in the filter syntax out of scope for now. We have have checked with val and do not feel it is needed initially, and once it is needed we will want to understand the use cases.

More generally, if there are improvements in logic we need to understand the use cases well enough to understand what we can push down into Cassandra. We want to push as much logic into cassandra as possible, almost to the point of not implementing filtering if we need to implement the logic in the API.

I am a strong -1 adding the https://github.com/bpodgursky/jbool_expressions library as I do not feel it solves a problem we have, and we are not looking to add OR or NOT or an explicit AND to the filters for now.

I am closing this tick as "Wont Do" it is not in scope.

@amorton amorton closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2023
@ivansenic
Copy link
Contributor

@amorton Fine with me, but just for the record:

  • The work I proposed above would set a basic for all the work needed to be done later on, if we are to handle boolean logic
  • Adding this later, will of course be way more work..
  • The proposed work would also clean up quite a messy implementation of the current filtering rules & matching which is growing and growing.. We also have a ticket for that Revisit FilterMatcher and FilterMatchRules logic #34, not sure what's now with that issue?
  • I am not sure what use-cases are you talking about? This is simply there that you can understand a filter as an expression, nothing more..
  • Not sure why are you -1 for adding the https://github.com/bpodgursky/jbool_expressions library? When ever we come to the issue of the bool logic, we will either have to use this or some other lib.. Hopefully we will not create our own logic here.
  • I understand OR & NOT are out of scope for now (not sure when they would come to the scope?), but again I was proposing a foundation work to be done now..

@amorton
Copy link
Contributor

amorton commented Feb 22, 2023

In general we want to push logic like this down into query handling in C* to make it better, and we want to push back this work until we better understand the problems.

If we do need to handle this in the API tier than we need to have concrete examples of filters the users are likely submit, and then how the library would help and potentially where it may not help. That may involve a lot of detailed experiments, so we would want to make sure we know the problems we are trying to solve. It represents a level of performing tuning that we need more experience to understand, i.e. we may be able to logically simplify a query, but that may not represent a faster way to access the data (e.g. we prefer set lookups to maps). We need to understand these sorts of use cases and how any logic library will handle them before adopting it, so we can wait until we have more of the functionality in place and understand performance.

While the software has been around for a while the author still says it is in development , it is not used much on maven.

There are other logic or expression libraries (here , and there are query planner projects that have wider adoption and include logic components that may be useable. See RexSimplify example from Apache Calcite and substrait. These projects have more usage, and are managed by organisations not individuals which is an important difference.

In summary: we will have to have to support $and, $or etc logic at some point, we need to decide what to support as a product and why, then if the API or C* will do any logical simplification (and with what libraries), and other than expressions that always eval to FALSE or TRUE is the re-formatted query better than the one the user sent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants