Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reading from null safe dereferences #21239

Merged
merged 1 commit into from
Nov 9, 2016

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Nov 1, 2016

Null safe dereferences make handling null or missing values shorter.
Compare without:

if (ctx._source.missing != null && ctx._source.missing.foo != null) {
  ctx._source.foo_length = ctx.source.missing.foo.length()
}

To with:

Integer length = ctx._source.missing?.foo?.length();
if (length != null) {
  ctx._source.foo_length = length
}

Combining this with the as of yet unimplemented elvis operator allows
for very concise defaults for nulls:

ctx._source.foo_length = ctx._source.missing?.foo?.length() ?: 0;

Since you have to start somewhere, we started with null safe dereferenes.

Anyway, this is a feature borrowed from groovy. Groovy allows writing to
null values like:

def v = null
v?.field = 'cat'

And the writes are simply ignored. Painless doesn't support this at this
point because it'd be complex to implement and maybe not all that useful.

There is no runtime cost for this feature if it is not used. When it is
used we implement it fairly efficiently, adding a jump rather than a
temporary variable.

This should also work fairly well with doc values.

+ "return a.missing_length", true));

// Writes, all unsupported at this point
// assertEquals(null, exec("org.elasticsearch.painless.FeatureTest a = null; return a?.x")); // Read field
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toyed with this but it is fairly complicated to implement and I don't think it is worth doing in this go. I worked fairly hard on writing all these test cases so I kind of wanted to keep them around for a bit. If we decide later on there is no chance we'll implement writing to null safe dereferences then we can just pitch these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely nice to add this at a later time since even with the docs I bet people will run into this and be confused. But, yes, for another PR :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types make it fairly confusing!

@@ -30,6 +30,9 @@
public final class AnalyzerCaster {

public static Cast getLegalCast(Location location, Type actual, Type expected, boolean explicit, boolean internal) {
if (actual == null || expected == null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because I hit an NPE deep in here and I figured I'd explicitly check it so we could give a nicer warning. Seeing this bug while compiling is always a bug on our part but at least we'll have more information than "NPE".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case that triggers this? because it's definitely worth investigating. Since it's an unexpected case maybe Objects.requireNonNull makes more sense to keep the code a bit cleaner? I could go either way since you're trying to add a better error message, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure if I end up scratching my head for a while on an NPE it is worth being specific about the message.

The tests that I added about writing to a map triggered this. It is why I have a guard on the cast call here. I'm spent a while digging but it looks like it is intentional that expected can be null in some cases and I have to not try and cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense to me.

@@ -156,11 +156,11 @@ postdot
;

callinvoke
: DOT DOTID arguments
: nullSafe=COND? DOT DOTID arguments
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing this in the lexer instead but went this way because I figured they were about equal. If you prefer lexer I'm certainly fine moving it.

@nik9000
Copy link
Member Author

nik9000 commented Nov 1, 2016

@jdconrad I've made a gift for you!

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 Really like this change! Thanks for working on this. Left a few comments, I think all of which are fairly minor.

+ "return a.missing_length", true));

// Writes, all unsupported at this point
// assertEquals(null, exec("org.elasticsearch.painless.FeatureTest a = null; return a?.x")); // Read field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely nice to add this at a later time since even with the docs I bet people will run into this and be confused. But, yes, for another PR :)

@@ -156,11 +156,11 @@ postdot
;

callinvoke
: DOT DOTID arguments
: nullSafe=COND? DOT DOTID arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we've been doing this is simply to check to see if the Token is null, so the nullSafe is necessary. You can just do ctx.COND() != null in the Walker. This keeps the grammar a bit cleaner IMO (unless I missed something?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way to make it clear that it was a thing I was going to check and because it looks like COND() would invoke getToken which looks surprisingly non-trivial. I figured this way antlr saves a copy for me and I can just check it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer a comment over a new variable. This keeps consistency with everything else in the Walker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it to ctx.COND() != null then

;

fieldaccess
: DOT ( DOTID | DOTINTEGER )
: nullSafe=COND? DOT ( DOTID | DOTINTEGER )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

@@ -30,6 +30,9 @@
public final class AnalyzerCaster {

public static Cast getLegalCast(Location location, Type actual, Type expected, boolean explicit, boolean internal) {
if (actual == null || expected == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test case that triggers this? because it's definitely worth investigating. Since it's an unexpected case maybe Objects.requireNonNull makes more sense to keep the code a bit cleaner? I could go either way since you're trying to add a better error message, though.

@@ -56,7 +56,11 @@ void analyze(Locals locals) {
throw createError(new IllegalArgumentException("Cannot write to read-only field [length] for an array."));
}

actual = Definition.INT_TYPE;
if (internal) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... when does this happen? Did I miss a test when I browsed through them quickly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. I see this supports specifically the elvis operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I haven't got elvis yet, I'll work on that after this is in the next time I get a chance. I use this to support array?.length which can be null so the types have to agree.

It does lead me to wonder if, after we have the elvis operator it'd be worth optimizing that "chain of ?. into ?:" case so we don't need the boxing and casting. But I think that is a thing for another time.


@Override
void analyze(Locals locals) {
guarded.expected = expected;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this a bit more in depth, I'm not 100% sure this is going to work as expected. Let me get back to you on casting issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is certainly a bit fidgety and you know it better than I do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to think about this some more, but let me try to explain a bit better the implications of each variable being used here...

expected -- This can definitely be null. I tried to write an explanation in the package-info. The gist of it is that it will be null in cases where promotion is required since we can't determine what expected is until after knowing both the right-hand and left-hand types (I think we want this to always be null here actually, explained later)

explicit -- This is set by an explicit cast and indicates that certain def cases cannot be optimized if it's true. (Since expected will always be null this doesn't need to be set)

internal -- This should not be set here as it's really only used for arguments being passed into a function call where arguments may need to be boxed beyond the user's ability to do so. Take a look at PSubCall or PSubDefCall for an example of this. (Auto boxing will only work if expected and actual are set to the appropriate types which we cannot necessarily know.)

actual -- The type returned from the guarded node. (Set correctly after analysis)

I don't think you ever want to cast here because of the way this node is a wrapper or middle man, so you are sort of double casting since the parent node would do the same cast in theory. I hope that kind of makes sense. For the case of boxing, I think this needs to be specialized for this node, so in analysis if you see the actual type is a primitive you want to make actual the boxed type, but track that you did that with a boolean in the node. Then in writing make sure if you did box, you have to do it there too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I should:

  1. Stop using internal entirely.
  2. Manually check and promote the type during analysis.
  3. Emit the appropriate boxing code manually, probably just a call to writer.box in the right spot.

Is that right?

Copy link
Contributor

@jdconrad jdconrad Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you got it. Happy to zoom if necessary. Sorry for the confusion here. I myself am trying to ensure I have this correct by doing some examples on good old paper.

I will try to beef up the documentation as I see the variable explanations just aren't thorough enough when I'm back working on improvements to Painless again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and put together a fix in a bit and we can go from there. I've got a few things to work through before I get to it so it'll probably be tomorrow before I look at it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good.

assertNull( exec("String a = null; return a?.toString()")); // Call
assertNull( exec("String a = null; return a?.length()")); // Call and box
assertEquals("foo", exec("String a = 'foo'; return a?.toString()")); // Call
assertEquals(Integer.valueOf(3), exec("String a = 'foo'; return a?.length()")); // Call and box
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear I'm missing something. I thought that top-level variables wouldn't be working with this based on the grammar changes I see...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That internal flag?

Copy link
Contributor

@jdconrad jdconrad Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind. I misread the grammar. The elvis operator applies to the call, not the var. Sorry, trying to think about too many casts at once :)

@Override
void setup(MethodWriter writer, Globals globals) {
guarded.setup(writer, globals);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since setup is only used during writing and writing is illegal, this can also be the same create error as store.

}

@Override
void load(MethodWriter writer, Globals globals) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for setup.

}

@Override
void analyze(Locals locals) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot closer to what I would think this node does.

'write' doesn't need to be set here since the default is false and you have the check for it

'read' is correct

'internal' should be false here for the same reason as explained above, but it's false by default so need to set it

'expected' should be null here always as set by default

'explicit' should be false here always as set by default

'actual' is correct again

I think the same strategy as explained in the previous comment is required here. You do the box in write, but analysis won't be correct here because it needs to know that type has been boxed for the rest of the chain to be correct.



@Override
int accessElementCount() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as setup.

}

@Override
boolean isDefOptimized() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as setup.

}

@Override
void updateActual(Type actual) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as setup.

@jdconrad
Copy link
Contributor

jdconrad commented Nov 1, 2016

@nik9000 Sorry for the confusion related to the AExpression variables once again. I know the casting for variable/field/methods is kind of wonky, but it was difficult to come up with a model that was both compatible for natural expressions like binary operators and variables. It's especially difficult to think through all the uses of a wrapper node in a case like this.

@nik9000
Copy link
Member Author

nik9000 commented Nov 8, 2016

@jdconrad I pushed some commits that I think address all of your comments.

@jdconrad
Copy link
Contributor

jdconrad commented Nov 8, 2016

@nik9000 I might be missing something, but it looks like the analysis methods in PSubNullSafeField and PSubNullSafeCallInvoke didn't change?

@nik9000
Copy link
Member Author

nik9000 commented Nov 8, 2016

but it looks like the analysis methods in PSubNullSafeField and PSubNullSafeCallInvoke didn't change?

I reread this comment and made more of the changes you asked for. I'd internalized the "setting internal is wrong" part. but not the actual and explicit part.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 This is good. I'll continue to think about it because I'm not sure I've thought through all the scenarios, but if there're bugs we can fix them as they come up. Thanks!

Null safe dereferences make handling null or missing values shorter.
Compare without:
```
if (ctx._source.missing != null && ctx._source.missing.foo != null) {
  ctx._source.foo_length = ctx.source.missing.foo.length()
}
```

To with:
```
Integer length = ctx._source.missing?.foo?.length();
if (length != null) {
  ctx._source.foo_length = length
}
```

Combining this with the as of yet unimplemented elvis operator allows
for very concise defaults for nulls:
```
ctx._source.foo_length = ctx._source.missing?.foo?.length() ?: 0;
```

Since you have to start somewhere, we started with null safe dereferenes.

Anyway, this is a feature borrowed from groovy. Groovy allows writing to
null values like:
```
def v = null
v?.field = 'cat'
```
And the writes are simply ignored. Painless doesn't support this at this
point because it'd be complex to implement and maybe not all that useful.

There is no runtime cost for this feature if it is not used. When it is
used we implement it fairly efficiently, adding a jump rather than a
temporary variable.

This should also work fairly well with doc values.
@nik9000 nik9000 merged commit d03b8e4 into elastic:master Nov 9, 2016
@nik9000
Copy link
Member Author

nik9000 commented Nov 9, 2016

Thanks for working me through all this @jdconrad! I'll jump on anything that comes up as quickly as I can!

Hashes in case someone wants to track this down:
master: d03b8e4
5.x: e3499c2

@clintongormley clintongormley added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.1.1 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants