ESQL: Add option to drop null fields #102428

nik9000 · 2023-11-21T18:15:20Z

This adds an option to drop columns that are entirely null from the results. Is this something we want?

nik9000

This option slows things down, which is a bit wild to me. I suppose it's because we have so many pages. I wonder if we can grow the pages....

costin · 2023-11-28T02:28:35Z

I'd rather look into adding a dedicated command for this type of scenario: e.g. drop column_x if null

nik9000 · 2023-11-28T16:12:21Z

Like DROP * IF NULL?

nik9000 · 2023-11-29T16:28:02Z

I ran some performance tests and the service time went up slightly (bad), but request rate went way up (good). I think having some way for us to remove null columns is probably good.

elasticsearchmachine · 2024-01-09T16:06:38Z

Hi @nik9000, I've created a changelog YAML for you.

nik9000 · 2024-01-09T16:08:15Z

@costin and I've talked a bit more about this and it seems like we do indeed just want this as a url parameter.

@stratoula, this is the option I talked to you about earlier. I'm adding a few more tests and will remove draft soon.

…to esql_skip_null

elasticsearchmachine · 2024-01-09T16:17:44Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

nik9000 · 2024-01-09T16:18:00Z

This doesn't work with async. I'll have a look at that too.

nik9000 · 2024-01-09T16:17:07Z

...ugin/esql/compute/src/main/java/org/elasticsearch/compute/data/SingletonOrdinalsBuilder.java

@@ -43,10 +43,6 @@ public SingletonOrdinalsBuilder appendOrd(int value) {
        return this;
    }

-    int[] ords() {
-        return ords;
-    }


This wasn't used and I bumped into it while adding the builder tests.

nik9000 · 2024-01-09T16:29:04Z

I'll have a look at supporting this parameter in the async api after this PR.

nik9000 · 2024-01-10T00:11:42Z

@stratoula, could you give this one a shot on your side and see if it's what you are looking for?

costin · 2024-01-10T00:32:16Z

...ck/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponseTests.java

+                    "null_columns":[{"name":"all_null","type":"integer"}],""" + """
+                    "columns":[{"name":"foo","type":"integer"}],""" + """
+                    "values":[[40],[80]]}""")
+            );


Small nit - we might want to preserve the order of the columns even when removing null columns.
One way to do that is to list the columns as in the regular response but indicate it is null. we can then either skip or add a null in place:

"columns":[{"name":"foo","type":"integer"}, {"name":"all_null", "type":"integer", "null" : true}] "values":[[40],[80]]

or

"values":[[40, null],[80, null]]

I'm not sure that we wanna go there, since the task at hand is clear - "drop null columns", but a more generalise view of this is "constant_columns" - a column with a constant value, which in this case would be null.

I'm not sure that we wanna go there, since the task at hand is clear - "drop null columns", but a more generalise view of this is "constant_columns" - a column with a constant value, which in this case would be null.

Oh that's kind of neat! It's quite easy for us to detect constant columns too.

Small nit - we might want to preserve the order of the columns even when removing null columns.

I like the idea. Then I don't like the idea. Then I do. I dunno. I kind of like that, no matter what, the way to parse the columns is by looking at the columns header. You just one for one them. No need for checking. As it stands now if you send this parameter the null columns just "disappear" from the parsing logic. They are only sitting off to the side as metadata. Like, no need to have different parsing code to handle this case.

OTOH, Chris' quite neat proposal around constant columns would be much more compatible with keeping one list of columns. Just sometimes having a constant. I suppose @stratoula could do either one. I wonder what our friends who work on the clients think. I'll ask!

One interesting thing about extracting constant values is that it's something that would likely also come as part of supporting some different output format like parquet or arrow flight or whatever. I presume those formats have specifications for, well, the entire response shape. And we'd do whatever makes sense for them.

Just to make it clear, we're debating between:

{ "null_columns": [ {"name": "n1", "type": "long"}, {"name": "n2", "type": "keyword"} ], "columns": [ {"name": "r1", "type": "long"}, {"name": "r2", "type": "long"}, }, "values": [ [1, 2], [3, 4] ] }

And

{ "columns": [ {"name": "n1", "type": "long", "constant_value": null}, {"name": "r1", "type": "long"}, {"name": "n2", "type": "keyword", "constant_value": null}, {"name": "r2", "type": "long"}, }, "values": [ [1, 2], [3, 4] ] }

The first one is the "when you enable this option null columns just vanish if you don't update your parser". The second one is "you have to update your parser, but it's nice and compatible with the (later) proposal for constant columns".

FTR, I've discussed this with Nik and come with the following conclusion:
columns and values should be kept in sync to avoid breaking existing parsing for consumers that set the option and don't care about the null values
for those that do, to maintain order, the list of all columns (null and not null) is returned as a separate section.
So something like:

{ "all_columns": [ {"name": "n1", "type": "long"}, {"name": "r1", "type": "long"}, {"name": "n2", "type": "keyword"} {"name": "r2", "type": "long"}, ], "columns": [ {"name": "r1", "type": "long"}, {"name": "r2", "type": "long"} ], "values": [ [1, 2], [3, 4] ] }

This way clients that do care about the null columns or the available info have the data available.

So in order to find the columns with nulls you need to find the difference between all_columns and columns? Do I get this correctly?

Yeah. The advantage of this way is you get everything back in the right order. I figured someone would want that. We talked about putting a always_null on the column in all_columns so you could do it without the set difference operator. I decided to do it if someone asked for it but not at first.

I am fine with that 👍

stratoula · 2024-01-10T06:52:58Z

I will check locally today, I will let you know Nik!

stratoula · 2024-01-10T11:58:43Z

@nik9000 yes this will work just fine! A PoC from me based on your PR elastic/kibana#174585

nik9000 · 2024-01-10T13:48:21Z

Once we've settled on an output format I'll write some docs for this.

nik9000 · 2024-01-12T14:17:59Z

Ok friends! I've pushed code changes to get the new format. No docs yet. I have some test failures to stomp this morning. Then I'll try and get back to the docs.

costin

LGTM

rest-api-spec/src/main/resources/rest-api-spec/api/esql.query.json

stratoula · 2024-01-15T11:06:41Z

I also updated my PoC with the latest format. All good 👍

nik9000 · 2024-01-17T13:46:42Z

run elasticsearch-ci/docs

nik9000 added 2 commits November 21, 2023 10:11

WIP

7cd9a7d

Merge branch 'main' into esql_skip_null

53aab40

nik9000 added the :Analytics/ES|QL AKA ESQL label Nov 21, 2023

elasticsearchmachine added the v8.12.0 label Nov 21, 2023

nik9000 added 2 commits November 21, 2023 13:28

Merge branch 'main' into esql_skip_null

76e9bef

Merge branch 'main' into esql_skip_null

fda7eed

nik9000 commented Nov 21, 2023

View reviewed changes

nik9000 added 2 commits November 28, 2023 13:45

Merge branch 'main' into esql_skip_null

967897c

WIP

6782030

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

wchaparro assigned nik9000 Dec 7, 2023

nik9000 added 3 commits December 11, 2023 15:18

Merge branch 'main' into esql_skip_null

9a21fa8

Merge branch 'main' into esql_skip_null

02eabc8

Tests

e371ce7

nik9000 added >enhancement ES|QL-ui Impacts ES|QL UI labels Jan 9, 2024

Update docs/changelog/102428.yaml

93152d2

nik9000 added 2 commits January 9, 2024 11:15

More testing

7945167

Merge remote-tracking branch 'refs/remotes/nik9000/esql_skip_null' in…

ded99e7

…to esql_skip_null

nik9000 marked this pull request as ready for review January 9, 2024 16:17

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 9, 2024

nik9000 commented Jan 9, 2024

View reviewed changes

spotless

841f80d

nik9000 requested review from ChrisHegarty and stratoula January 10, 2024 00:11

costin reviewed Jan 10, 2024

View reviewed changes

stratoula mentioned this pull request Jan 10, 2024

[ES|QL] [Discover] Distinction betweem Available and empty fields elastic/kibana#174587

Closed

nik9000 added 2 commits January 12, 2024 08:09

Merge branch 'main' into esql_skip_null

df534c5

Rework layout and add async

52c3c69

costin approved these changes Jan 12, 2024

View reviewed changes

rest-api-spec/src/main/resources/rest-api-spec/api/esql.query.json Outdated Show resolved Hide resolved

nik9000 added 2 commits January 12, 2024 16:05

Merge branch 'main' into esql_skip_null

332ab07

Update

f641dcc

nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 16, 2024

elasticsearchmachine merged commit 919d282 into elastic:main Jan 17, 2024
15 checks passed

nik9000 deleted the esql_skip_null branch January 17, 2024 14:06

nik9000 mentioned this pull request Jan 17, 2024

ESQL: data returned by default #102949

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Add option to drop null fields #102428

ESQL: Add option to drop null fields #102428

nik9000 commented Nov 21, 2023

nik9000 left a comment

costin commented Nov 28, 2023

nik9000 commented Nov 28, 2023

nik9000 commented Nov 29, 2023

elasticsearchmachine commented Jan 9, 2024

nik9000 commented Jan 9, 2024

elasticsearchmachine commented Jan 9, 2024

nik9000 commented Jan 9, 2024

nik9000 Jan 9, 2024

nik9000 commented Jan 9, 2024

nik9000 commented Jan 10, 2024

costin Jan 10, 2024

ChrisHegarty Jan 10, 2024

nik9000 Jan 10, 2024

nik9000 Jan 10, 2024

nik9000 Jan 10, 2024

costin Jan 11, 2024

stratoula Jan 12, 2024

nik9000 Jan 12, 2024

stratoula Jan 12, 2024

ChrisHegarty Jan 12, 2024

stratoula commented Jan 10, 2024

stratoula commented Jan 10, 2024

nik9000 commented Jan 10, 2024

nik9000 commented Jan 12, 2024

costin left a comment

stratoula commented Jan 15, 2024

nik9000 commented Jan 17, 2024

ESQL: Add option to drop null fields #102428

ESQL: Add option to drop null fields #102428

Conversation

nik9000 commented Nov 21, 2023

nik9000 left a comment

Choose a reason for hiding this comment

costin commented Nov 28, 2023

nik9000 commented Nov 28, 2023

nik9000 commented Nov 29, 2023

elasticsearchmachine commented Jan 9, 2024

nik9000 commented Jan 9, 2024

elasticsearchmachine commented Jan 9, 2024

nik9000 commented Jan 9, 2024

Choose a reason for hiding this comment

nik9000 commented Jan 9, 2024

nik9000 commented Jan 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stratoula commented Jan 10, 2024

stratoula commented Jan 10, 2024

nik9000 commented Jan 10, 2024

nik9000 commented Jan 12, 2024

costin left a comment

Choose a reason for hiding this comment

stratoula commented Jan 15, 2024

nik9000 commented Jan 17, 2024