-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Add option to drop null fields #102428
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option slows things down, which is a bit wild to me. I suppose it's because we have so many pages. I wonder if we can grow the pages....
I'd rather look into adding a dedicated command for this type of scenario: e.g. |
Like |
I ran some performance tests and the service time went up slightly (bad), but request rate went way up (good). I think having some way for us to remove null columns is probably good. |
Hi @nik9000, I've created a changelog YAML for you. |
@costin and I've talked a bit more about this and it seems like we do indeed just want this as a url parameter. @stratoula, this is the option I talked to you about earlier. I'm adding a few more tests and will remove |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
This doesn't work with async. I'll have a look at that too. |
@@ -43,10 +43,6 @@ public SingletonOrdinalsBuilder appendOrd(int value) { | |||
return this; | |||
} | |||
|
|||
int[] ords() { | |||
return ords; | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't used and I bumped into it while adding the builder tests.
I'll have a look at supporting this parameter in the async api after this PR. |
@stratoula, could you give this one a shot on your side and see if it's what you are looking for? |
"null_columns":[{"name":"all_null","type":"integer"}],""" + """ | ||
"columns":[{"name":"foo","type":"integer"}],""" + """ | ||
"values":[[40],[80]]}""") | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit - we might want to preserve the order of the columns even when removing null columns.
One way to do that is to list the columns as in the regular response but indicate it is null. we can then either skip or add a null in place:
"columns":[{"name":"foo","type":"integer"}, {"name":"all_null", "type":"integer", "null" : true}]
"values":[[40],[80]]
or
"values":[[40, null],[80, null]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that we wanna go there, since the task at hand is clear - "drop null columns", but a more generalise view of this is "constant_columns" - a column with a constant value, which in this case would be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that we wanna go there, since the task at hand is clear - "drop null columns", but a more generalise view of this is "constant_columns" - a column with a constant value, which in this case would be null.
Oh that's kind of neat! It's quite easy for us to detect constant columns too.
Small nit - we might want to preserve the order of the columns even when removing null columns.
I like the idea. Then I don't like the idea. Then I do. I dunno. I kind of like that, no matter what, the way to parse the columns is by looking at the columns
header. You just one for one them. No need for checking. As it stands now if you send this parameter the null columns just "disappear" from the parsing logic. They are only sitting off to the side as metadata. Like, no need to have different parsing code to handle this case.
OTOH, Chris' quite neat proposal around constant columns would be much more compatible with keeping one list of columns. Just sometimes having a constant. I suppose @stratoula could do either one. I wonder what our friends who work on the clients think. I'll ask!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One interesting thing about extracting constant values is that it's something that would likely also come as part of supporting some different output format like parquet or arrow flight or whatever. I presume those formats have specifications for, well, the entire response shape. And we'd do whatever makes sense for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it clear, we're debating between:
{
"null_columns": [
{"name": "n1", "type": "long"},
{"name": "n2", "type": "keyword"}
],
"columns": [
{"name": "r1", "type": "long"},
{"name": "r2", "type": "long"},
},
"values": [
[1, 2],
[3, 4]
]
}
And
{
"columns": [
{"name": "n1", "type": "long", "constant_value": null},
{"name": "r1", "type": "long"},
{"name": "n2", "type": "keyword", "constant_value": null},
{"name": "r2", "type": "long"},
},
"values": [
[1, 2],
[3, 4]
]
}
The first one is the "when you enable this option null columns just vanish if you don't update your parser". The second one is "you have to update your parser, but it's nice and compatible with the (later) proposal for constant columns".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, I've discussed this with Nik and come with the following conclusion:
columns and values should be kept in sync to avoid breaking existing parsing for consumers that set the option and don't care about the null values
for those that do, to maintain order, the list of all columns (null and not null) is returned as a separate section.
So something like:
{
"all_columns": [
{"name": "n1", "type": "long"},
{"name": "r1", "type": "long"},
{"name": "n2", "type": "keyword"}
{"name": "r2", "type": "long"},
],
"columns": [
{"name": "r1", "type": "long"},
{"name": "r2", "type": "long"}
],
"values": [
[1, 2],
[3, 4]
]
}
This way clients that do care about the null columns or the available info have the data available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in order to find the columns with nulls you need to find the difference between all_columns and columns? Do I get this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The advantage of this way is you get everything back in the right order. I figured someone would want that. We talked about putting a always_null
on the column in all_columns
so you could do it without the set difference operator. I decided to do it if someone asked for it but not at first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I will check locally today, I will let you know Nik! |
@nik9000 yes this will work just fine! A PoC from me based on your PR elastic/kibana#174585 |
Once we've settled on an output format I'll write some docs for this. |
Ok friends! I've pushed code changes to get the new format. No docs yet. I have some test failures to stomp this morning. Then I'll try and get back to the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
rest-api-spec/src/main/resources/rest-api-spec/api/esql.query.json
Outdated
Show resolved
Hide resolved
I also updated my PoC with the latest format. All good 👍 |
run elasticsearch-ci/docs |
This adds an option to drop columns that are entirely null from the results. Is this something we want?