-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest] Expose domainSplit() in ingest script processor and possibly aggregations #36359
Comments
Pinging @elastic/es-core-infra |
We discussed this today in Fixit Friday and agreed that this would be useful in other parts of Elasticsearch, and something that we want to purse. We still need discuss with @elastic/machine-learning team if they are agree-able to move this code from ml to a more common place in the source tree (and possibly require a re-license). We also need to discuss how to maintain the list of top level domains. |
One option would be to work off the public suffix data file instead of the compressed version embedded in the code. We could ship We actually had some C++ code to do this in a previous product - I'll dig it out for you. |
Pinging @elastic/es-core-features |
The public suffix file is the best way to get the top level domain, subdomain, registered domain, root domain and last but not least the domain. |
Describe the feature:
The
domainSplit()
painless method allows to split domains into their parts (subdomain, tld, ... ). This was first introduced when Machine Learning was integrated into Elasticsearch. It was exposed as part of scripted fields to allow ML jobs to work if they need that information.However, this functionality is also incredibly useful as part of ingest. No other part of our stack has a substitution for this (apart from packetbeat that does something similar by default).
There's also no good workaround as the public suffix list is required to do "good" domain splitting and scripted fields alone do not allow it being used in many parts of Kibana. Furthermore there's likely also a small performance hit.
@rjernst and @polyfractal discussed this briefly and agreed that it makes sense to have.
One remaining question to work out is if it also makes sense to have this available in scripted aggregations.
The text was updated successfully, but these errors were encountered: