Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes an issue with mixed types in a subsequence #2196

Merged

Conversation

adamretter
Copy link
Member

Closes #2195

@line-o
Copy link
Member

line-o commented Oct 9, 2018

@adamretter Does this also fix #1960 ?

@@ -297,7 +297,7 @@ public Sequence eval(Sequence contextSequence, final Item contextItem) throws XP

final boolean allowMixedNodesInReturn;
if(prev != null) {
allowMixedNodesInReturn = prev.allowMixedNodesInReturn() | expr.allowMixedNodesInReturn();
allowMixedNodesInReturn = prev.allowMixedNodesInReturn() /*| expr.allowMixedNodesInReturn()*/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfgangmm it is unclear why this line needed to be changed for the test to succeed.

Copy link
Member

@wolfgangmm wolfgangmm Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamretter running the test on my install, above comment was not needed. The test passed without it.

I did not contribute to the code calling allowMixedNodesInReturn, but I can see some sense in it. It's not vital though. I would thus say, if your test passes without commenting out the test above, we should leave PathExpr untouched. If it gets in your way though, we need to reconsider.

Copy link
Member Author

@adamretter adamretter Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfgangmm Sure the contributed test (FunSubSequenceTest#persistentSupsequence_toInMemory()) is fine either way.

But... due to my override of Function#allowMixedNodesInReturn(), the existing test XQueryTest#pathOperatorContainingNodesAndNonNodes() fails without the rhs of the above boolean test commented out.

Does that make sense now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the error expected by XQueryTest#pathOperatorContainingNodesAndNonNodes() is in line with what saxon reports. I wonder why you had to overwrite Function#allowMixedNodesInReturn()? It does not seem related to the original issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discard the question, I see it. This is quite a puzzle ...

Copy link
Member

@wolfgangmm wolfgangmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the one line I commented on, the PR looks good to me and I would merge it in.

@@ -297,7 +297,7 @@ public Sequence eval(Sequence contextSequence, final Item contextItem) throws XP

final boolean allowMixedNodesInReturn;
if(prev != null) {
allowMixedNodesInReturn = prev.allowMixedNodesInReturn() | expr.allowMixedNodesInReturn();
allowMixedNodesInReturn = prev.allowMixedNodesInReturn() /*| expr.allowMixedNodesInReturn()*/;
Copy link
Member

@wolfgangmm wolfgangmm Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamretter running the test on my install, above comment was not needed. The test passed without it.

I did not contribute to the code calling allowMixedNodesInReturn, but I can see some sense in it. It's not vital though. I would thus say, if your test passes without commenting out the test above, we should leave PathExpr untouched. If it gets in your way though, we need to reconsider.

@adamretter
Copy link
Member Author

@juri not intentionally!

@juri
Copy link

juri commented Oct 9, 2018

@adamretter wrong person in the mention, I think…

@line-o
Copy link
Member

line-o commented Oct 9, 2018

moi @juri 👋

@juri
Copy link

juri commented Oct 9, 2018

@line-o morjens!

@dizzzz
Copy link
Member

dizzzz commented Oct 10, 2018

Are there doubts on the PR ?

@wolfgangmm
Copy link
Member

@dizzzz see the comments. I'm still banging my head against this.

@wolfgangmm
Copy link
Member

@adamretter ok, after some hours I think I found the correct fix. Please try to change SubSequence.getItemType() to return the correct type of the underlying sequence:

@Override
    public int getItemType() {
        return sequence.getItemType();
    }

With this in place you do not need to overwrite Function.allowMixedNodesInReturn() or PathExpr. Tests should pass. If this fails lazy evaluation, we will need another way to determine the expected type.

@adamretter
Copy link
Member Author

adamretter commented Oct 10, 2018

@wolfgangmm So I don't think that will work for the very common ValueSequence which will still return Type.ITEM.

Is there anything wrong with my proposed changes? No tests fail!
Are there things that you know that this breaks? If so, we should start by adding tests for those things...

@wolfgangmm
Copy link
Member

ValueSequence does compute its type, so should be fine.

You wanted me to look into this because you had to comment out the check in PathExpr. This was originally added by yourself, so I assume you had a reason to do so. I have thus tried to find a solution which does not require the changes to Function and PathExpr.

All in all returning the correct type seems a more correct approach.

@adamretter
Copy link
Member Author

@wolfgangmm Okay, I had no idea I added that check. I guess I did so to fix some broken test.

RegardsValueSequence it uses a common super-type. So if you have a node() and an xs:string in the value sequence, the common super type is item()... no?

@wolfgangmm
Copy link
Member

See #1133, which introduced the changes in question to PathExpr. I'm fine to just drop code nobody can explain, but you asked about that particular line, so I thought it is not ok and we need a solution.

@wolfgangmm
Copy link
Member

If the sequence is indeed mixed, it is correct to return Type.ITEM and in this case the check in PathExpr should fail as it would.

@adamretter
Copy link
Member Author

adamretter commented Oct 12, 2018

@wolfgangmm I just spotted a problem with your suggestion to change SequenceType#getItemType() to return sequence.getItemType();.

When given code like:

let $seq := (<a/>, <b/>, 3, 4)
return
  subsequece($seq, 1, 2)

Your change would mean that we return Type.ITEM from the underlying value sequence, but the subsequence should actually return Type.NODE (or even Type.ELEMENT).

We cannot infer the type from the subequence without actually iterating all of the items in the subsequence and checking their common super type. We really want to avoid that, as a lazy approach is much more performant and memory kind.

Thoughts?

@wolfgangmm
Copy link
Member

@adamretter ok, I assumed that due to the lazy approach, the underlying value sequence would only contain elements one and two, which are actually returned. We're only interested in the type of the result of the subsequence call, which would be Type.ELEMENT in this case.

So even if you have to iterate, you do not need to continue beyond item 2, which means the performance impact should be small?

@adamretter
Copy link
Member Author

adamretter commented Oct 12, 2018

So even if you have to iterate, you do not need to continue beyond item 2, which means the performance impact should be small?

heh. Sure that works for a small subsequence, but what about a sequence of 20,000 items I , where you want the first 999 ?

I guess I am trying to figure out if we have to take determine the type as you are suggesting, or if you think that the fix I proposed previously has a problem?

@wolfgangmm
Copy link
Member

heh. Sure that works for a small subsequence, but what about a sequence of 20,000 items I , where you want the first 999 ?

In this case you only need to determine the type of the first 999, so there's no performance loss, right?

I guess I am trying to figure out if we have to take determine the type as you are suggesting, or if you think that the fix I proposed previously has a problem?

The whole issue was that using subsequence as a path step should result in an error if atomic values are returned. At least this is why PathExpr tries to determine the type. So subsequence((<a>a</a>, "a"), 1, 2)/text() would be an error. We could argue that those checks should actually be done elsewhere and it's not the job of subsequence to determine the type.

However, I think having correct type information returned by expressions where possible is always preferable to not having the information.

@wolfgangmm
Copy link
Member

wolfgangmm commented Oct 12, 2018

I guess what I'm thinking is: if there is an easy and fast way to determine the type of the result of subsequence, we should rather go for it instead of having to modify the superclass Function and PathExpr.

But if it means a performance loss, I'm rather for speed. So if you tell me it's not possible without loss, I will merge the current PR.

adamretter added a commit to adamretter/exist that referenced this pull request Oct 14, 2018
adamretter added a commit to adamretter/exist that referenced this pull request Oct 14, 2018
@adamretter
Copy link
Member Author

adamretter commented Oct 14, 2018

@wolfgangmm I have committed the change you suggested. It looks good. Thank you :-)

@wolfgangmm wolfgangmm merged commit dca4d9a into eXist-db:develop Oct 14, 2018
@adamretter adamretter deleted the hotfix/subsequence-mixed-types branch October 15, 2018 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug issue confirmed as bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants