Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API changes: extend visitor with return values #901

Closed
wumpz opened this issue Nov 24, 2019 · 25 comments
Closed

API changes: extend visitor with return values #901

wumpz opened this issue Nov 24, 2019 · 25 comments

Comments

@wumpz
Copy link
Member

wumpz commented Nov 24, 2019

If you try to build a CopyVisitor of some kind of object hierarchy, one has to build some kind of Stack structure, to give the actual visitor a hint, where it should write its copied data.

I suggest to extend all visitor methods with a return value:

MyType visitor(MyType param);

The standard implementation should return the actual object inserted.

public interface StatementVisitor {
    Comment visit(Comment comment);
    Commit visit(Commit commit);
    Delete visit(Delete delete);
    Update visit(Update update);
@wumpz
Copy link
Member Author

wumpz commented Dec 2, 2019

Using this we would be able to create some kind of functional visitor implementation.

@wumpz
Copy link
Member Author

wumpz commented Dec 10, 2019

There is a thumbs up above, but I am more interested in textual comments :).

@AnEmortalKid
Copy link
Contributor

I'm having a bit of a hard time sort of visualizing this, so correct me if I wrong.

Let's assume we just had a SelectVisitor to copy the Select.

Old Way:

class OldSelect implements SelectVisitor {

private Select copy = new Select();

// super naive and bad just illustrating examples
Select copy(Select original) {

  visit(original);

return copy;
}

  @Override
    public void visit(Select select) {
      copy.setBody(deepClone(select.getSelectBody());

        // potentially the with items would be
      copy.setWithItems(deepClone(select.getWithItems()));
    }
}

With new pattern:

class NewCopySelect implements SelectVisitor {


// super naive and bad just illustrating examples
Select copy(Select original) {
  return visit(original);
}

  @Override
    public Select visit(Select select) {
   // and other copying stuff
     Select copy = new Select();
      copy.setBody(deepClone(select.getSelectBody());

    // potentially the with items would be
    copy.setWithItems(visit(select.getWithItems());
      return copy;
    }
}

Or am I Missing something.

@wumpz
Copy link
Member Author

wumpz commented Jan 12, 2020

Correct. With this new style the visitors itself return the copy and the caller could immediately put it in the right place. So you don't need to organise your copied items in some sort of stack to find the last generated. Imho a global variable here should be avoided but cannot within the actual implementation.

@AnEmortalKid
Copy link
Contributor

extend all visitor methods with a return value

Would it be valuable to have CopyStatementVisitor and the current StatementVisitor ? If not, then changing StatementVisitor would be fine, probably just a big warning to all consumers that things no longer return void.

@wumpz
Copy link
Member Author

wumpz commented Jan 13, 2020

Indeed this would be a massive api change. That is why we would go for version 4.

If there are two visitor interfaces then we would need to visit methods per class. So that is IMHO a no go.

@AnEmortalKid
Copy link
Contributor

AnEmortalKid commented Jan 13, 2020 via email

@sivaraam
Copy link
Contributor

sivaraam commented Feb 3, 2020

IIUC, the suggested change is that all the visit methods of the visitors would be extended to return a copy of the object they receive as the argument to help with implementing the copy visitor. Is that correct?

If it is then, is my understanding that this is useful only to those who want a deep copy of an object, correct? I'm just wondering whether this huge API change is worth the migration burden on the users who want to move to version 4 of the library? That's because they would have to change every extension of every visitor they would have implemented as a consequence of this change. That sounds like a lot of work and might not be a good thing to bring upon the users. Just my opinion.

@wumpz
Copy link
Member Author

wumpz commented Feb 3, 2020

@sivaraam: that is correct. As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this. The problem here to solve IMHO the backreference to the parent object. How to replace an expression, while there is no reference to the parent? Now you have to adapt every possible parent. To make things worse you not only need the reference to the parent but to the attribute as well.

@sivaraam
Copy link
Contributor

sivaraam commented Feb 3, 2020

As well this change does not support replacement by different types, e.g. Column to literal or CaseExpression or you name it. That is why I am getting away from this.

Just for the sake of discussion, is it not possible to overcome this by making the visit methods return the appropriate super-classes? For example, in the case you mentioned we could just make the visit method corresponding to the Column of the appropriate visitor to return a generic Expression object and the implementation of the visit method could think about what type of Expression should be returned. I think this would work for any given visitor, and the objects it is visiting. Of course, this would possibly make it difficult to comprehend why the visit methods return a different type. But otherwise, does this sound like plausible solution?

That said, I still believe that modifying the existing visitor methods would place a huge migration burden on the users. To overcome this I think we can re-consider the following which was suggested before:

Would it be valuable to have CopyStatementVisitor and the current StatementVisitor ?

That would of course double the amount of visitor classes but at least it would not introduce any migration burden on the existing users. Also, having separate visitor classes would ensure that we don't add any additional (unrelated) responsibility to the visit methods. IOW, it would ensure SRP.

@wumpz
Copy link
Member Author

wumpz commented Feb 6, 2020

So how to overcome this problems of the visitor pattern? Any ideas?

@SerialVelocity
Copy link
Contributor

I agree with @sivaraam but would go one step further and make the extra visitor generic:

public <T> T accept(Generic*Visitor<T> visitor);

This would:

  • Allow CopyVisitor to be written. Each visitor would be allowed to have its own type so you can have implements GenericStatementVisitor<Statement>, GenericExpressionVisitor<Expression>, etc.
  • Allow custom types to be returned too. e.g. the TableVisitor could have T set to List<TableName> (this has the added benefit of easily making the class thread-safe because you no longer have a field lying around).

@AnEmortalKid
Copy link
Contributor

So how to overcome this problems of the visitor pattern? Any ideas?

Would it be possible to invert things for consumption sake. Right now we have

interface SimpleVIsitor {

 void  visitFoo(Foo foo);
}

And the proposed change is:

interface SimpleVisitor {
Foo visitFoo(Foo foo);
}

What if we introduced the CopyVisitor as a lower level stack, so we could have:

interface CopyableVisitor {

 Comment copy(Comment comment);
}

interface SelectVisitor  extends CopyableVisitor {
default void visit(Comment comment) { copy(comment); }
}

Since this is java 8, we could use the default feature to make this at least easy for consumers?

@wumpz
Copy link
Member Author

wumpz commented Mar 5, 2020

@AnEmortalKid That would be a possibility. However, the difficulties are still there. Let us stick with our example of copying the hierarchy, then die proposed changes would do their job via something like

Expression visit(MyExpression expr) {
     MyExpression my = new MyExpression();
     my.setLeftExpression(visit(expr.getLeftExpression()));
     return my;
}

Unfortunately, this misses one level of indirection from the visitor pattern, it should be

    my.setLeftExpression( expr.getLeftExpression().accept(this) );   

Not only the visit methods but the accept methods have to be changed as well.

If we want to create a different type of visitor, like replace all columns with numbers. A really stupid one, but you get the idea of replacing stuff. Then you have to make the return type "nearly everywhere" an Expression.

Additionally, a big problem with this actual visitor stuff is the missing context. If the visit Column method would know, where it came from (BinaryExpression, left) we could take advantage of this as well and so on and so far.

At the moment I think every visitor - API change would go in the wrong direction, because every task needs other specialities. So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.

@AnEmortalKid
Copy link
Contributor

So my thoughts are going in a different direction right now, by building on an actual visitor some kind of context via reflection or so, to overcome most of possible infrastructure problems to solve this issues.

Ah, so something we could push down information on like:

this.setContext(TABLE.X)
column.accept(this);

What sort of things would we need in the context?

@wumpz
Copy link
Member Author

wumpz commented Apr 17, 2020

Right. One could build some kind of proxy for the visitor, that could keep track of the previous or parent object. So our replacement would be simple.

@lalithsuresh
Copy link

lalithsuresh commented Nov 25, 2020

To add to what @SerialVelocity suggested:

I really like the Presto parser API. For example, I have some code where I am converting from SQL to a datalog dialect, and this is one example of how I use the Presto API to do so.


    /*
     * Translates literals into corresponding DDlogRecord instances
     */
    private static class ParseLiterals extends AstVisitor<DDlogRecord, Boolean> {

        @Override
        protected DDlogRecord visitStringLiteral(final StringLiteral node, final Boolean isNullable) {
            try {
                return maybeOption(isNullable, new DDlogRecord(node.getValue()));
            } catch (final DDlogException e) {
                throw new RuntimeException(e);
            }
        }

        @Override
        protected DDlogRecord visitLongLiteral(final LongLiteral node, final Boolean isNullable) {
            return maybeOption(isNullable, new DDlogRecord(node.getValue()));
        }

        @Override
        protected DDlogRecord visitBooleanLiteral(final BooleanLiteral node, final Boolean isNullable) {
            return maybeOption(isNullable, new DDlogRecord(node.getValue()));
        }
    }

And elsewhere, I can do this:

// later, in some method
DDLogRecord record = parseLiterals.process(literal, isNullable);

Note that the visit methods can receive a context and a return value, both of which are generic types parameterized by AstVisitor<X, Y>. And in the above case, the ParseLiterals class is purely stateless (so I can re-use a single instance for the lifetime of my program).

I've been trying out the JSQLParser API simply because Presto is an odd SQL dialect and doesn't support some statements I need. But writing something like the above becomes cumbersome. The return values and the context I pass through the visitors (generic parameters X and Y) now become fields.

Here's the AstVisitor implementation in Presto: https://github.com/prestodb/presto/blob/master/presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java

@wumpz
Copy link
Member Author

wumpz commented Nov 26, 2020

The problem is described in this discussion. What type to return? Or do you want to multiply one visitor by adding different context types?

@lalithsuresh
Copy link

lalithsuresh commented Nov 26, 2020

If I get your question correctly (correct me if I'm wrong please :) ), the signature I'm thinking of is:

public abstract class BaseVisitor<R, C>
{
    public R process(Node node)
    {
        return process(node, null);
    }

    public R process(Node node, @Nullable C context)
    {
        return node.accept(this, context);
    }

   // visit methods for each SQL node...
   public R visit(<SomeSqlType> node, C Context) {
   ....
   }
}

And yes, that will result in each use case specifying return and context types as they need. Visitors that don't need to return anything or need any context can use Void types for R or C.

@manticore-projects
Copy link
Contributor

manticore-projects commented Jun 22, 2024

Greetings All, apologies for showing up late.

In my opinion The current implementation has 3 major shortcomings:

  1. When traversing the AST from the root down to the leaves the Result object (e. g. the StringBuilder) must be global and so we can only traverse serially. No parallelism was possible although it would make sense for large UNION statements or SubSelect columns etc.

  2. When the accept methods are called, there is no way to know what called it and so there is no (elegant) way to get the Child-Parent relationship.

  3. And obviously, both accept and visit methods don't return anything but can only write into global objects.

These were major obstacles when writing JSQLFormatter but became showstopper for my latest project where I want to resolve Columns for AllColumns and AllTablesColumns Expressions (e. g. SELECT * FROM ...).

Long story short, I have refactored all the Visitors making them returning a Generic and also accepting a Generic as additional argument:

public interface SelectItemVisitor<T> {
    <S> T visit(SelectItem<? extends Expression> selectItem, S parameters);
}

public class SelectItemVisitorAdapter<T> implements SelectItemVisitor<T> {
    @Override
    public <S> T visit(SelectItem<? extends Expression> item, S parameters) {
        return null;
    }
}

public class SelectDeParser extends AbstractDeParser<PlainSelect>
        implements SelectVisitor<StringBuilder>,
        SelectItemVisitor<StringBuilder>, FromItemVisitor<StringBuilder>,
        PivotVisitor<StringBuilder> {

    @Override
    public <S> StringBuilder visit(SelectItem<?> selectExpressionItem, S parameters) {
        selectExpressionItem.getExpression().accept(expressionVisitor, parameters);
        if (selectExpressionItem.getAlias() != null) {
            buffer.append(selectExpressionItem.getAlias().toString());
        }
        return buffer;
    }
}

Standard Types would be VOID of course but you can write now your own Visitor returning a Generic (Leaf to Root) and also pushing a Generic down the AST (Root to Leaf) providing it as method parameter.

Please have a look at the branched implementation: https://github.com/JSQLParser/JSqlParser/tree/visitors_return_objects
Do let me know any questions, concerns or suggestions for improvements, thank you.

@manticore-projects
Copy link
Contributor

manticore-projects commented Jun 22, 2024

PS: Although this new implementation gave us the option to improve the DeParsers by handing down the StringBuilder and returning the StringBuilder from each node, I did not touch this.

But in theory we could improve it this way and deparse in parallel (which may not be relevant for plain deparsing, but formatting, query transpilation/rewriting and SQL LSP may be interested in this option).

@manticore-projects
Copy link
Contributor

manticore-projects commented Jun 26, 2024

Any opinion please?

I have successfully used those new Visitors for my Column Resolution and have been effectively able to push scopes down from the root to the leafs, but also to pull the final column information from the leaves up to the root.
Although I had to amend all visit() and accept() methods in all my JSQLParser depending code.

If there were no objections I would merge those changes soon in order to get an "Api breaking" JSQLParser 5 ready?!

@manticore-projects manticore-projects pinned this issue Jun 26, 2024
@manticore-projects
Copy link
Contributor

Final call for opinions please.
I have provided default implementations for mimicking the simplified former calls, e. g.

public interface SelectItemVisitor<T> {
    <S> T visit(SelectItem<? extends Expression> selectItem, S parameters);
   
    default void visit(SelectItem<? extends Expression> selectItem) {
        this.visit(selectItem, null);
    }
}

When there were no objections, I would merge this code on weekend and aim for an "API breaking" release 5.0 next week.

@AnEmortalKid
Copy link
Contributor

AnEmortalKid commented Jun 27, 2024 via email

@manticore-projects
Copy link
Contributor

Solved via 0e1ff56.
Would be released as API Breaking 5.0 with dependency on Java 11.

Documentation with examples is here: https://manticore-projects.com/JSQLParser/migration50.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants