Implement reading from null safe dereferences #21239

nik9000 · 2016-11-01T14:02:36Z

Null safe dereferences make handling null or missing values shorter.
Compare without:

if (ctx._source.missing != null && ctx._source.missing.foo != null) {
  ctx._source.foo_length = ctx.source.missing.foo.length()
}

To with:

Integer length = ctx._source.missing?.foo?.length();
if (length != null) {
  ctx._source.foo_length = length
}

Combining this with the as of yet unimplemented elvis operator allows
for very concise defaults for nulls:

ctx._source.foo_length = ctx._source.missing?.foo?.length() ?: 0;

Since you have to start somewhere, we started with null safe dereferenes.

Anyway, this is a feature borrowed from groovy. Groovy allows writing to
null values like:

def v = null
v?.field = 'cat'

And the writes are simply ignored. Painless doesn't support this at this
point because it'd be complex to implement and maybe not all that useful.

There is no runtime cost for this feature if it is not used. When it is
used we implement it fairly efficiently, adding a jump rather than a
temporary variable.

This should also work fairly well with doc values.

nik9000 · 2016-11-01T14:04:02Z

modules/lang-painless/src/test/java/org/elasticsearch/painless/BasicExpressionTests.java

+              + "return a.missing_length", true));
+
+        // Writes, all unsupported at this point
+//        assertEquals(null, exec("org.elasticsearch.painless.FeatureTest a = null; return a?.x"));            // Read field


I toyed with this but it is fairly complicated to implement and I don't think it is worth doing in this go. I worked fairly hard on writing all these test cases so I kind of wanted to keep them around for a bit. If we decide later on there is no chance we'll implement writing to null safe dereferences then we can just pitch these.

Definitely nice to add this at a later time since even with the docs I bet people will run into this and be confused. But, yes, for another PR :)

The types make it fairly confusing!

nik9000 · 2016-11-01T14:05:12Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/AnalyzerCaster.java

@@ -30,6 +30,9 @@
 public final class AnalyzerCaster {

    public static Cast getLegalCast(Location location, Type actual, Type expected, boolean explicit, boolean internal) {
+        if (actual == null || expected == null) {


I added this because I hit an NPE deep in here and I figured I'd explicitly check it so we could give a nicer warning. Seeing this bug while compiling is always a bug on our part but at least we'll have more information than "NPE".

Do you have a test case that triggers this? because it's definitely worth investigating. Since it's an unexpected case maybe Objects.requireNonNull makes more sense to keep the code a bit cleaner? I could go either way since you're trying to add a better error message, though.

I figure if I end up scratching my head for a while on an NPE it is worth being specific about the message.

The tests that I added about writing to a map triggered this. It is why I have a guard on the cast call here. I'm spent a while digging but it looks like it is intentional that expected can be null in some cases and I have to not try and cast.

Yeah, makes sense to me.

nik9000 · 2016-11-01T14:06:04Z

modules/lang-painless/src/main/antlr/PainlessParser.g4

@@ -156,11 +156,11 @@ postdot
    ;

 callinvoke
-    : DOT DOTID arguments
+    : nullSafe=COND? DOT DOTID arguments


I thought about doing this in the lexer instead but went this way because I figured they were about equal. If you prefer lexer I'm certainly fine moving it.

nik9000 · 2016-11-01T14:08:23Z

@jdconrad I've made a gift for you!

jdconrad

@nik9000 Really like this change! Thanks for working on this. Left a few comments, I think all of which are fairly minor.

jdconrad · 2016-11-01T15:36:36Z

modules/lang-painless/src/test/java/org/elasticsearch/painless/BasicExpressionTests.java

+              + "return a.missing_length", true));
+
+        // Writes, all unsupported at this point
+//        assertEquals(null, exec("org.elasticsearch.painless.FeatureTest a = null; return a?.x"));            // Read field


Definitely nice to add this at a later time since even with the docs I bet people will run into this and be confused. But, yes, for another PR :)

jdconrad · 2016-11-01T15:37:56Z

modules/lang-painless/src/main/antlr/PainlessParser.g4

@@ -156,11 +156,11 @@ postdot
    ;

 callinvoke
-    : DOT DOTID arguments
+    : nullSafe=COND? DOT DOTID arguments


The way we've been doing this is simply to check to see if the Token is null, so the nullSafe is necessary. You can just do ctx.COND() != null in the Walker. This keeps the grammar a bit cleaner IMO (unless I missed something?)

I did it this way to make it clear that it was a thing I was going to check and because it looks like COND() would invoke getToken which looks surprisingly non-trivial. I figured this way antlr saves a copy for me and I can just check it.

I strongly prefer a comment over a new variable. This keeps consistency with everything else in the Walker.

I'll switch it to ctx.COND() != null then

jdconrad · 2016-11-01T15:38:21Z

modules/lang-painless/src/main/antlr/PainlessParser.g4

    ;

 fieldaccess
-    : DOT ( DOTID | DOTINTEGER )
+    : nullSafe=COND? DOT ( DOTID | DOTINTEGER )


Same comment as above.

jdconrad · 2016-11-01T15:40:39Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/AnalyzerCaster.java

@@ -30,6 +30,9 @@
 public final class AnalyzerCaster {

    public static Cast getLegalCast(Location location, Type actual, Type expected, boolean explicit, boolean internal) {
+        if (actual == null || expected == null) {


Do you have a test case that triggers this? because it's definitely worth investigating. Since it's an unexpected case maybe Objects.requireNonNull makes more sense to keep the code a bit cleaner? I could go either way since you're trying to add a better error message, though.

jdconrad · 2016-11-01T15:43:16Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/node/PSubArrayLength.java

@@ -56,7 +56,11 @@ void analyze(Locals locals) {
                throw createError(new IllegalArgumentException("Cannot write to read-only field [length] for an array."));
            }

-            actual = Definition.INT_TYPE;
+            if (internal) {


Hmm... when does this happen? Did I miss a test when I browsed through them quickly?

Never mind. I see this supports specifically the elvis operator.

Well I haven't got elvis yet, I'll work on that after this is in the next time I get a chance. I use this to support array?.length which can be null so the types have to agree.

It does lead me to wonder if, after we have the elvis operator it'd be worth optimizing that "chain of ?. into ?:" case so we don't need the boxing and casting. But I think that is a thing for another time.

jdconrad · 2016-11-01T18:08:59Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/node/ESubNullSafeCallInvoke.java

+
+    @Override
+    void analyze(Locals locals) {
+        guarded.expected = expected;


After thinking about this a bit more in depth, I'm not 100% sure this is going to work as expected. Let me get back to you on casting issue.

It is certainly a bit fidgety and you know it better than I do.

I still need to think about this some more, but let me try to explain a bit better the implications of each variable being used here...

expected -- This can definitely be null. I tried to write an explanation in the package-info. The gist of it is that it will be null in cases where promotion is required since we can't determine what expected is until after knowing both the right-hand and left-hand types (I think we want this to always be null here actually, explained later)

explicit -- This is set by an explicit cast and indicates that certain def cases cannot be optimized if it's true. (Since expected will always be null this doesn't need to be set)

internal -- This should not be set here as it's really only used for arguments being passed into a function call where arguments may need to be boxed beyond the user's ability to do so. Take a look at PSubCall or PSubDefCall for an example of this. (Auto boxing will only work if expected and actual are set to the appropriate types which we cannot necessarily know.)

actual -- The type returned from the guarded node. (Set correctly after analysis)

I don't think you ever want to cast here because of the way this node is a wrapper or middle man, so you are sort of double casting since the parent node would do the same cast in theory. I hope that kind of makes sense. For the case of boxing, I think this needs to be specialized for this node, so in analysis if you see the actual type is a primitive you want to make actual the boxed type, but track that you did that with a boolean in the node. Then in writing make sure if you did box, you have to do it there too.

So I should:

Stop using internal entirely.

Manually check and promote the type during analysis.

Emit the appropriate boxing code manually, probably just a call to writer.box in the right spot.

Is that right?

I'm pretty sure you got it. Happy to zoom if necessary. Sorry for the confusion here. I myself am trying to ensure I have this correct by doing some examples on good old paper.

I will try to beef up the documentation as I see the variable explanations just aren't thorough enough when I'm back working on improvements to Painless again.

I'll try and put together a fix in a bit and we can go from there. I've got a few things to work through before I get to it so it'll probably be tomorrow before I look at it again.

Okay, sounds good.

jdconrad · 2016-11-01T18:35:21Z

modules/lang-painless/src/test/java/org/elasticsearch/painless/BasicExpressionTests.java

+        assertNull(                      exec("String a = null; return a?.toString()"));   // Call
+        assertNull(                      exec("String a = null; return a?.length()"));     // Call and box
+        assertEquals("foo",              exec("String a = 'foo'; return a?.toString()"));  // Call
+        assertEquals(Integer.valueOf(3), exec("String a = 'foo'; return a?.length()"));    // Call and box


I swear I'm missing something. I thought that top-level variables wouldn't be working with this based on the grammar changes I see...

That internal flag?

Oh never mind. I misread the grammar. The elvis operator applies to the call, not the var. Sorry, trying to think about too many casts at once :)

jdconrad · 2016-11-01T19:11:37Z