Support to exclude (big) query results from .watcher-history #36719

ypid-geberit · 2018-12-17T15:18:19Z

Describe the feature:

When running my aggregated_issues_in_logs watch over a large number of documents the watch finds and aggregates a large number of documents that are then indexed into a new index in ES. The issues is that the whole watch run with all results is written to .watcher-history-* as one document. This results in very big documents (result.input.payload and result.transform.payload). The practical issue with this is that the watch history list in Kibana (Management -> Elasticsearch -> Watcher -> Watches -> log_issues) is unable to show more than one watch execution.

It would be helpful if some "filter_path"/"exclude_filter_path" could be specified in the watch definition which should end up in .watcher-history-*.

The text was updated successfully, but these errors were encountered:

albertzaharovits · 2018-12-17T17:40:55Z

@ypid-geberit I think this fits better in the Kibana realm. I am thinking of a way of interpreting the contents of the watch metadata, where you would store the path filtering. This sounds kludgy, but maybe they have a better idea. Can you please open a feature request there https://github.com/elastic/kibana ? I am going to close this.

elasticmachine · 2018-12-17T17:41:04Z

Pinging @elastic/es-core-features

ypid-geberit · 2018-12-18T08:04:31Z

@albertzaharovits I still think this should be implemented in Elasticsearch. I also had this idea of using a metadata field to specify path filtering, example:

---

# yamllint disable rule:line-length rule:comments-indentation

metadata:
  comment: 'Test watch'
  watcher_history_exclude_filter_path: 'result.input.payload,result.transform.payload'


throttle_period: '0s'

trigger:
  schedule:

input:
  search:

condition:

transform:

actions:

(Yes, YAML is awesome, also for watch definitions. Ref: elastic/examples#239)

But this has the issue that metadata is now evaluated as setting so I would instead propose:

---

# yamllint disable rule:line-length rule:comments-indentation

metadata:
  comment: 'Test watch'

watcher_history_exclude_filter_path: 'result.input.payload,result.transform.payload'

trigger:
  schedule:

input:
  search:

actions:

What do you have in mind to implement this in Kibana? I only have one idea to midigate this in KIbana which is "Index Patterns" -> "Source Filters" (result.*.payload,result.*.body,result.actions,input.search.request). Is that something you mean?

The reason I suggested to implement this in the watch definition is that it is specific to the watch in my case. For some watches, I find it useful to have everything in the history (development and staging watches), for other watches (productive) which write their output to other indices in ES anyway I don’t want to have it duplicated in .watcher-history-* (we are talking about hundreds of MiB of .watcher-history-* per day in our environment).

albertzaharovits · 2018-12-20T12:09:23Z

Hi @ypid-geberit

I understand now, you mean not recording specific watch history fields for specific watches. This indeed sounds like a watcher feature. I got lead astray when you mentioned as a motivator the visibility problem in kibana.

If the problem is the size of the .watcher-history-* indices, may I recommend using update by query.
This way, the fields can be recorded even in "production" mode and when they're deemed more of a burden than useful they can be removed.

I understand that it's easier to not record specific fields in the first place, rather than having to maintain a "curation" mechanism, but from your experience, do you see the added value of not recording some watch history "in production", given the many ways a watch can fail?

ypid-geberit · 2019-07-25T13:10:40Z

I tried it with update by query now but for some reason this crashed a Elasticsearch 6.2.4 cluster three times now. The second run was limited to one day worth of watcher history. The cluster is otherwise stable.

Script because Curator does not support this yet:

#!/bin/bash

PATH="/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib/mit/bin"

(
    date --rfc-3339=seconds
    for day_offset in $(seq 32); do
        date_timestamp="$(date --date "now - ${day_offset} day" '+%Y.%m.%d')"
        echo "$(date --rfc-3339=seconds): Running for $date_timestamp"
        curl --silent --cacert "$(get_es_cacert)" -u "$(get_es_creds)" "$(get_es_url)/.watcher-history-7-${date_timestamp}/_update_by_query" -H 'Content-    Type: application/yaml' --data-binary @/etc/curator/watcher-history_update_by_query.json
    done
) >> /var/log/curator/script.log

{
  "script": {
    "source": "ctx._source.result.input.remove('payload'); if (ctx._source.result.containsKey('transform')) { ctx._source.result.transform.remove('payload') } ctx._source.metadata.watch_history_cleaned = true;",
    "lang": "painless"
  },
  "query": {
    "query_string": {
      "query": "_exists_:(result.input.payload OR result.transform.payload) -_exists_:(metadata.watch_history_cleaned)"
    }
  }
}

If you don’t find an issue with this update_by_query then it is probably because of the old ES version that I tested on and I will retest on a newer one when possible. Looks good to you?

A _exists_ query for result.input.payload did not work for some reason. I left it in the query without having an effect (boolean OR).

About the crash:

[2019-07-22T15:54:46,922][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [gxmneh61] fatal error in thread [elasticsearch[gxmneh61][search][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:209) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
	at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:590) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
	at org.apache.lucene.index.CodecReader.document(CodecReader.java:83) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
	at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:341) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
	at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:388) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:199) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:499) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.action.search.SearchTransportService$11.messageReceived(SearchTransportService.java:440) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.action.search.SearchTransportService$11.messageReceived(SearchTransportService.java:437) ~[elasticsearch-6.2.4.jar:6.2.4]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:258) ~[?:?]
[..., Java heap dump filled up the /var partition so no further logs could be written]

The crash is not 100 % reproducible so a few days of history are fully updated (payload removed) and this fixes the issue with Kibana taking forever to load the watch history. If someone can confirm that this update_by_query is reliable and the issue is only our environment that I guess that is a solution. The nice aspect of this solution is of course not having to extend Elasticsearch and using an API/curation that already exists.

ypid-geberit changed the title ~~Support to exclude query (big) results from .watcher-history~~ Support to exclude (big) query results from .watcher-history Dec 17, 2018

albertzaharovits closed this as completed Dec 17, 2018

albertzaharovits added the :Data Management/Watcher label Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to exclude (big) query results from .watcher-history #36719

Support to exclude (big) query results from .watcher-history #36719

ypid-geberit commented Dec 17, 2018

albertzaharovits commented Dec 17, 2018

elasticmachine commented Dec 17, 2018

ypid-geberit commented Dec 18, 2018

albertzaharovits commented Dec 20, 2018 •

edited

Loading

ypid-geberit commented Jul 25, 2019 •

edited

Loading

Support to exclude (big) query results from .watcher-history #36719

Support to exclude (big) query results from .watcher-history #36719

Comments

ypid-geberit commented Dec 17, 2018

albertzaharovits commented Dec 17, 2018

elasticmachine commented Dec 17, 2018

ypid-geberit commented Dec 18, 2018

albertzaharovits commented Dec 20, 2018 • edited Loading

ypid-geberit commented Jul 25, 2019 • edited Loading

albertzaharovits commented Dec 20, 2018 •

edited

Loading

ypid-geberit commented Jul 25, 2019 •

edited

Loading