Duplicate @timestamp fields in elasticsearch output #628

smelchior · 2018-06-11T15:43:38Z

I am trying to replace my fluentd installation in kubernetes with fluent-bit 0.13.3 but ran into an issue. We currently have the standard setup:

[INPUT]
        Name             tail
        Path             /var/log/containers/*.log
        Parser           docker
        Tag              kube.*
        Refresh_Interval 5
        Mem_Buf_Limit    5MB
        Skip_Long_Lines  On
[FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL           https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_JSON_Log      On
[OUTPUT]
        Name  es
        Match *
        Host  HOST
        Port  9200
        Logstash_Format On
        Retry_Limit False
       Type flb_type

The problem is that some of the log messages from services are json encoded and also include a @timestamp field. This then causes some errors:

[2018/06/11 15:22:49] [ warn] [out_es] Elasticsearch error
{"took":78,"errors":true,"items":[{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"ZPhx72MBChql05IASc5e","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":142,"_primary_term":1,"status":201}},{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"Zfhx72MBChql05IASc5e","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@6dda49d4; line: 1, column: 509]"}}}},{"index":{"_index":"logstash_test-2018.06.11","_type":"flb_type","_id":"Zvhx72MBChql05IASc5e","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"i_o_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReferenc

I tried to use Merge_JSON_Key to mitigate this, but the option seems to be disabled in the source code (without mentioning it in the docs, took me some time to figure out why it does not work ;-)). In my opinion the Merge_JSON_Log should overwrite existing keys instead of having duplicate keys.

The text was updated successfully, but these errors were encountered:

convoi · 2018-06-14T07:53:22Z

I have the same issue. kibana for example produces logs containing @timestamp fields. my own applications i was able to fix by renaming the timestamp field.

calinah · 2018-06-18T15:00:20Z

I'm also having the same issue. Has anyone found a workaround yet?

edsiper · 2018-07-03T20:36:45Z

to avoid that duplicate field you can set an alternative name for the time field (Time_Key):

https://fluentbit.io/documentation/0.13/output/elasticsearch.html

let me know if that fixes the issue

smelchior · 2018-07-06T19:30:45Z

that should do it, thanks :)

nikolay · 2018-07-31T18:09:36Z

Having to rename it is patchwork, so, think about providing better defaults instead.

jgsqware · 2018-08-28T09:46:36Z

@edsiper I would reopen this,

I have this config:

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Merge_Log           On
        K8S-Logging.Parser  On

    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Logstash_Prefix ${FLUENT_ELASTICSEARCH_PREFIX}
        Time_Key        es_time
        Retry_Limit     False
        Generate_ID     On

and keep getting error like this one:

{"took":0,"errors":true,"items":[{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"0f921680-ccbc-0ba9-f08b-56a14e42386a","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@64099158; line: 1, column: 651]"}}}},{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"a75811af-ebc7-17a3-02a7-cc167a31a90a","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@632e4f1b; line: 1, column: 668]"}}}}]}

donbowman · 2018-09-13T01:48:16Z

I have this issue w/ build from git (latest). I'm not sure what a workaround would be.

I am using via helm chart in Kubernetes, w/ mergeJSONLog: true enabled, and an annotation of apache2 on one of the pods. since its doing a tail -f on the docker logs, they are first parsed as docker, and then parsed as apache2. this causes a duplicate.

[2018/09/13 01:20:30] [ warn] [out_es] Elasticsearch error
{"took":6,"errors":true,"items":[{"index":{"_index":"logstash-2018.09.12","_type":"flb_type","_id":"T7KD0GUB5MRKjjOx59E0","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7aea1496; line: 1, column: 708]"}}}},{"index":{"_index":"logstash-2018.09.12","_type":"flb_type","_id":"ULKD0GUB5MRKjjOx59E0","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field '@timestamp'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@2873f968; line: 1, column: 708]"}}}},{"index":{"_index":"logstash-2018.09.12","_type":"flb_type","_id":"UbKD0GUB5MRKjjOx59E0","status":400,"error":{"type":"mapper_parsing_exception","reason"

goodfoo · 2018-11-16T01:03:23Z

I used

    [FILTER]
        Name record_modifier
        Match *
        Remove_key @timestamp

lxfontes · 2018-12-05T01:12:47Z

👋 late to the party, but here we are 😂

for those using the helm chart, the below works with chart version 1.0.0

  backend:
    type: es
    es:
      time_key: '@ts'

as you can see @timestamp is still present, but different issue.

alwinmarkcf · 2019-01-30T14:53:46Z

@lxfontes this only solves the timestamp problem for me but I still have 'duplicate "time" fields'. Why do we actually need this?

alwinmarkcf · 2019-01-30T15:02:33Z

Ok so for me the fix was for the kubernetes chart setting:

  filter:
    mergeJSONLog: false

As now it will not longer try to merge keys. Guess this is a bug in fluentd as the expected behavior should be that while doing the merge it MUST NOT try to append the field.

raftAtGit · 2019-02-06T18:13:31Z

Isn't it more convenient to change the behavior of Elasticsearch plugin such that, it won't append the @timestamp key (or Time_Key in general) if it already exists?

wirehead · 2019-06-05T02:12:47Z

This needs to be reopened.

Consider that the official Logstash formatter for log4j, https://github.com/logstash/log4j-jsonevent-layout is going to output the @timestamp to standard out. So a developer, encountering this bug, is going to wonder why their Logstash-format JSON is considered invalid by something that describes itself as Logstash compatible.

pankajagrawal16 · 2019-08-05T13:32:09Z

I am facing same issue when generating logs in json format following standard plugin https://github.com/logstash/log4j-jsonevent-layout. I agree with @wirehead comment

edsiper · 2019-08-05T17:03:03Z

JSON with keys the same name is not invalid. But make sense it's a restriction for elasticsearch, your workarounds:

If you are in Kubernetes, use the option Merge_Log_Key in your Kubernetes filter, so your unpacked data will be under that new key, avoiding duplicates if any:

https://docs.fluentbit.io/manual/filter/kubernetes

if you are ingesting straight to Elasticsearch, just change the name of the key that holds the timestamp with the option: Time_Key:

https://docs.fluentbit.io/manual/output/elasticsearch

Now If I implement a kind of "sanitizer" option, take in count it will affect performance. Options above should work, if don't please let me know.

pankajagrawal16 · 2019-08-06T10:18:16Z

2nd option sounded more cleaner to me and that worked well as well

linbingdouzhe · 2019-10-23T14:42:23Z

i think this need to reopened

Vfialkin · 2020-02-28T12:20:05Z

Had to try all options to make it work with Kubana discovery+logs:

changing Time_Key breaks features in Kibana that expect @timekey and doesnt solve duplicates in other fields (like time)
turning off merge with mergeJSONLog: false makes all my application logs(Serilog) unusable jsons

The only thing that worked for me is adding a prefix to merged fields:
mergeLogKey: "appLog"

shree007 · 2020-03-19T15:20:27Z

I fixed with following config

`[PARSER]

Name        docker

Format      json

#Time_Key   time

Time_Format %Y-%m-%dT%H:%M:%S.%L

Time_Keep   off #On 

# Command      |  Decoder | Field | Optional Action

# =============|==================|=================

Decode_Field_As   escaped    log`

minhnnhat · 2020-05-07T10:36:56Z

Had to try all options to make it work with Kubana discovery+logs:

changing Time_Key breaks features in Kibana that expect @timekey and doesnt solve duplicates in other fields (like time)

turning off merge with mergeJSONLog: false makes all my application logs(Serilog) unusable jsons

The only thing that worked for me is adding a prefix to merged fields:
mergeLogKey: "appLog"

Hi @Vfialkin, currently i'm working with serilog too, following your instruction, it's work fine but i still cannot get all log in log field like

Sr for my bad English, but did you know how to solve it ? Thanks

Vfialkin · 2020-05-08T05:53:23Z

Had to try all options to make it work with Kubana discovery+logs:

changing Time_Key breaks features in Kibana that expect @timekey and doesnt solve duplicates in other fields (like time)

turning off merge with mergeJSONLog: false makes all my application logs(Serilog) unusable jsons

The only thing that worked for me is adding a prefix to merged fields:
mergeLogKey: "appLog"

Hi @Vfialkin, currently i'm working with serilog too, following your instruction, it's work fine but i still cannot get all log in log field like

Sr for my bad English, but did you know how to solve it ? Thanks

Hi Minhnhat,
By default, the whole log message will also be added to the index as a 'Log' field so it's not a problem. You can disable it via:

filter: |-
    Keep_Log  off

Take a look at my yml, maybe that will help:

I also tried to describe full setup process in by
blog

minhnnhat · 2020-05-08T14:37:50Z

Had to try all options to make it work with Kubana discovery+logs:

changing Time_Key breaks features in Kibana that expect @timekey and doesnt solve duplicates in other fields (like time)

turning off merge with mergeJSONLog: false makes all my application logs(Serilog) unusable jsons

The only thing that worked for me is adding a prefix to merged fields:
mergeLogKey: "appLog"

Hi @Vfialkin, currently i'm working with serilog too, following your instruction, it's work fine but i still cannot get all log in log field like

Sr for my bad English, but did you know how to solve it ? Thanks

Hi Minhnhat,
By default, the whole log message will also be added to the index as a 'Log' field so it's not a problem. You can disable it via:
filter: |-
    Keep_Log  off
Take a look at my yml, maybe that will help:

I also tried to describe full setup process in by
blog

Hi @Vfialkin, it's really helpful. Thank you !

zerkms · 2020-05-14T09:18:45Z

@Vfialkin thanks for the article, it was super helpful.

One thing though:

extraEntries:
   input: |- 
     Exclude_Path   /var/log/containers/kibana*.log,/var/log/containers/kube*.log,/var/log/containers/etcd-*.log,/var/log/containers/dashboard-metrics*.log

This would not work as expected: at least in the current version of the chart: https://github.com/helm/charts/blob/b71c8c665e7de2ef22e915cd2f173d680cd7636c/stable/fluent-bit/templates/config.yaml

Those extra entries are being appended to the end of the config, and if systemd is enabled, then they are appended to the systemd's section.

I sent a PR to the chart, see below

Vfialkin · 2020-05-14T20:35:18Z

@zerkms thanks for the feedback! Nice catch with systemd section, so it works by coincidence because systemd is defaulted to false 😁 It would definitely be better to have it as a param, hope your PR will get merged soon. 🤞

edsiper self-assigned this Jul 3, 2018

edsiper added question fixed labels Jul 3, 2018

smelchior closed this as completed Jul 6, 2018

dbluxo mentioned this issue Dec 3, 2018

Made time key to use in elastic search queries configurable MaibornWolff/elcep#11

Merged

alwinmarkcf mentioned this issue Jan 30, 2019

[stable/fluent-bit] Duplicate field 'time' helm/charts#11000

Closed

ruzickap added a commit to ruzickap/k8s-istio-demo that referenced this issue Feb 11, 2019

Fixing fluent-bit bug (fluent/fluent-bit#628)

e4306d7

1robroos mentioned this issue May 16, 2019

fluent-bit logs errors komljen/helm-charts#22

Open

toschneck mentioned this issue Dec 7, 2019

fluentbit - fix duplicate timestamp field error kubermatic/kubermatic#4830

Closed

zerkms mentioned this issue May 14, 2020

[stable/fluent-bit] Added input.tail.exclude_path parameter helm/charts#22392

Merged

4 tasks

rajeeshckr mentioned this issue Jun 30, 2020

"type":"json_parse_exception","reason":"Duplicate field 'time' #2219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate @timestamp fields in elasticsearch output #628

Duplicate @timestamp fields in elasticsearch output #628

smelchior commented Jun 11, 2018 •

edited

Loading

convoi commented Jun 14, 2018

calinah commented Jun 18, 2018

edsiper commented Jul 3, 2018

smelchior commented Jul 6, 2018

nikolay commented Jul 31, 2018

jgsqware commented Aug 28, 2018

donbowman commented Sep 13, 2018 •

edited

Loading

goodfoo commented Nov 16, 2018 •

edited

Loading

lxfontes commented Dec 5, 2018

alwinmarkcf commented Jan 30, 2019

alwinmarkcf commented Jan 30, 2019

raftAtGit commented Feb 6, 2019

wirehead commented Jun 5, 2019

pankajagrawal16 commented Aug 5, 2019

edsiper commented Aug 5, 2019

pankajagrawal16 commented Aug 6, 2019

linbingdouzhe commented Oct 23, 2019 •

edited

Loading

Vfialkin commented Feb 28, 2020

shree007 commented Mar 19, 2020

minhnnhat commented May 7, 2020

Vfialkin commented May 8, 2020 •

edited

Loading

minhnnhat commented May 8, 2020

zerkms commented May 14, 2020 •

edited

Loading

Vfialkin commented May 14, 2020

Duplicate @timestamp fields in elasticsearch output #628

Duplicate @timestamp fields in elasticsearch output #628

Comments

smelchior commented Jun 11, 2018 • edited Loading

convoi commented Jun 14, 2018

calinah commented Jun 18, 2018

edsiper commented Jul 3, 2018

smelchior commented Jul 6, 2018

nikolay commented Jul 31, 2018

jgsqware commented Aug 28, 2018

donbowman commented Sep 13, 2018 • edited Loading

goodfoo commented Nov 16, 2018 • edited Loading

lxfontes commented Dec 5, 2018

alwinmarkcf commented Jan 30, 2019

alwinmarkcf commented Jan 30, 2019

raftAtGit commented Feb 6, 2019

wirehead commented Jun 5, 2019

pankajagrawal16 commented Aug 5, 2019

edsiper commented Aug 5, 2019

pankajagrawal16 commented Aug 6, 2019

linbingdouzhe commented Oct 23, 2019 • edited Loading

Vfialkin commented Feb 28, 2020

shree007 commented Mar 19, 2020

minhnnhat commented May 7, 2020

Vfialkin commented May 8, 2020 • edited Loading

minhnnhat commented May 8, 2020

zerkms commented May 14, 2020 • edited Loading

Vfialkin commented May 14, 2020

smelchior commented Jun 11, 2018 •

edited

Loading

donbowman commented Sep 13, 2018 •

edited

Loading

goodfoo commented Nov 16, 2018 •

edited

Loading

linbingdouzhe commented Oct 23, 2019 •

edited

Loading

Vfialkin commented May 8, 2020 •

edited

Loading

zerkms commented May 14, 2020 •

edited

Loading