-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to exclude (big) query results from .watcher-history #36719
Comments
@ypid-geberit I think this fits better in the Kibana realm. I am thinking of a way of interpreting the contents of the watch metadata, where you would store the path filtering. This sounds kludgy, but maybe they have a better idea. Can you please open a feature request there https://github.com/elastic/kibana ? I am going to close this. |
Pinging @elastic/es-core-features |
@albertzaharovits I still think this should be implemented in Elasticsearch. I also had this idea of using a metadata field to specify path filtering, example: ---
# yamllint disable rule:line-length rule:comments-indentation
metadata:
comment: 'Test watch'
watcher_history_exclude_filter_path: 'result.input.payload,result.transform.payload'
throttle_period: '0s'
trigger:
schedule:
input:
search:
condition:
transform:
actions: (Yes, YAML is awesome, also for watch definitions. Ref: elastic/examples#239) But this has the issue that metadata is now evaluated as setting so I would instead propose: ---
# yamllint disable rule:line-length rule:comments-indentation
metadata:
comment: 'Test watch'
watcher_history_exclude_filter_path: 'result.input.payload,result.transform.payload'
trigger:
schedule:
input:
search:
actions: What do you have in mind to implement this in Kibana? I only have one idea to midigate this in KIbana which is "Index Patterns" -> "Source Filters" ( The reason I suggested to implement this in the watch definition is that it is specific to the watch in my case. For some watches, I find it useful to have everything in the history (development and staging watches), for other watches (productive) which write their output to other indices in ES anyway I don’t want to have it duplicated in |
I understand now, you mean not recording specific watch history fields for specific watches. This indeed sounds like a watcher feature. I got lead astray when you mentioned as a motivator the visibility problem in kibana. If the problem is the size of the I understand that it's easier to not record specific fields in the first place, rather than having to maintain a "curation" mechanism, but from your experience, do you see the added value of not recording some watch history "in production", given the many ways a watch can fail? |
I tried it with update by query now but for some reason this crashed a Elasticsearch 6.2.4 cluster three times now. The second run was limited to one day worth of watcher history. The cluster is otherwise stable. Script because Curator does not support this yet: #!/bin/bash
PATH="/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib/mit/bin"
(
date --rfc-3339=seconds
for day_offset in $(seq 32); do
date_timestamp="$(date --date "now - ${day_offset} day" '+%Y.%m.%d')"
echo "$(date --rfc-3339=seconds): Running for $date_timestamp"
curl --silent --cacert "$(get_es_cacert)" -u "$(get_es_creds)" "$(get_es_url)/.watcher-history-7-${date_timestamp}/_update_by_query" -H 'Content- Type: application/yaml' --data-binary @/etc/curator/watcher-history_update_by_query.json
done
) >> /var/log/curator/script.log {
"script": {
"source": "ctx._source.result.input.remove('payload'); if (ctx._source.result.containsKey('transform')) { ctx._source.result.transform.remove('payload') } ctx._source.metadata.watch_history_cleaned = true;",
"lang": "painless"
},
"query": {
"query_string": {
"query": "_exists_:(result.input.payload OR result.transform.payload) -_exists_:(metadata.watch_history_cleaned)"
}
}
} If you don’t find an issue with this update_by_query then it is probably because of the old ES version that I tested on and I will retest on a newer one when possible. Looks good to you? A About the crash:
The crash is not 100 % reproducible so a few days of history are fully updated (payload removed) and this fixes the issue with Kibana taking forever to load the watch history. If someone can confirm that this update_by_query is reliable and the issue is only our environment that I guess that is a solution. The nice aspect of this solution is of course not having to extend Elasticsearch and using an API/curation that already exists. |
Describe the feature:
When running my aggregated_issues_in_logs watch over a large number of documents the watch finds and aggregates a large number of documents that are then indexed into a new index in ES. The issues is that the whole watch run with all results is written to
.watcher-history-*
as one document. This results in very big documents (result.input.payload
andresult.transform.payload
). The practical issue with this is that the watch history list in Kibana (Management -> Elasticsearch -> Watcher -> Watches -> log_issues) is unable to show more than one watch execution.It would be helpful if some "filter_path"/"exclude_filter_path" could be specified in the watch definition which should end up in
.watcher-history-*
.The text was updated successfully, but these errors were encountered: