[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359

LucaWintergerst · 2018-12-07T11:41:18Z

Describe the feature:
The domainSplit() painless method allows to split domains into their parts (subdomain, tld, ... ). This was first introduced when Machine Learning was integrated into Elasticsearch. It was exposed as part of scripted fields to allow ML jobs to work if they need that information.

However, this functionality is also incredibly useful as part of ingest. No other part of our stack has a substitution for this (apart from packetbeat that does something similar by default).
There's also no good workaround as the public suffix list is required to do "good" domain splitting and scripted fields alone do not allow it being used in many parts of Kibana. Furthermore there's likely also a small performance hit.

@rjernst and @polyfractal discussed this briefly and agreed that it makes sense to have.

One remaining question to work out is if it also makes sense to have this available in scripted aggregations.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-12-07T23:15:04Z

Pinging @elastic/es-core-infra

jakelandis · 2018-12-14T14:50:49Z

We discussed this today in Fixit Friday and agreed that this would be useful in other parts of Elasticsearch, and something that we want to purse.

We still need discuss with @elastic/machine-learning team if they are agree-able to move this code from ml to a more common place in the source tree (and possibly require a re-license). We also need to discuss how to maintain the list of top level domains.

droberts195 · 2018-12-14T15:17:31Z

We also need to discuss how to maintain the list of top level domains.

One option would be to work off the public suffix data file instead of the compressed version embedded in the code. We could ship public_suffix_list.dat as a resource file and parse it at startup. Then updating it would simply become a case of updating that file in the source tree. (Or we could ship it as a config file and parse it from the config directory if we wanted end users to be able to update it independent of a new release.)

We actually had some C++ code to do this in a previous product - I'll dig it out for you.

elasticmachine · 2019-03-13T14:17:11Z

Pinging @elastic/es-core-features

mbudge · 2020-10-13T10:26:08Z

The public suffix file is the best way to get the top level domain, subdomain, registered domain, root domain and last but not least the domain.

rjernst added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Dec 7, 2018

rjernst added the discuss label Dec 7, 2018

jakelandis removed the discuss label Dec 14, 2018

jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Mar 13, 2019

martijnvg mentioned this issue Nov 12, 2019

New processors to expand ingest node's capabilities #48986

Open

9 tasks

rjernst added Team:Data Management Meta label for data/management team Team:Core/Infra Meta label for core/infra team labels May 4, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

jimczi removed the needs:triage Requires assignment of a team area label label Jan 12, 2021

droberts195 mentioned this issue Apr 20, 2021

[ML] domainSplit painless extension not available in runtime fields #71946

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359

[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359

LucaWintergerst commented Dec 7, 2018

elasticmachine commented Dec 7, 2018

jakelandis commented Dec 14, 2018 •

edited

Loading

droberts195 commented Dec 14, 2018 •

edited

Loading

elasticmachine commented Mar 13, 2019

mbudge commented Oct 13, 2020

[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359

[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359

Comments

LucaWintergerst commented Dec 7, 2018

elasticmachine commented Dec 7, 2018

jakelandis commented Dec 14, 2018 • edited Loading

droberts195 commented Dec 14, 2018 • edited Loading

elasticmachine commented Mar 13, 2019

mbudge commented Oct 13, 2020

jakelandis commented Dec 14, 2018 •

edited

Loading

droberts195 commented Dec 14, 2018 •

edited

Loading