Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector: Some Spans are not persisted in ElasticSearch #1674

Closed
MahatmaFatalError opened this issue Jul 21, 2019 · 6 comments
Closed

Collector: Some Spans are not persisted in ElasticSearch #1674

MahatmaFatalError opened this issue Jul 21, 2019 · 6 comments

Comments

@MahatmaFatalError
Copy link

I am running Jaeger with ElasticSearch backend with docker-compose:

version: "3"
services:
  els:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.8.1
    restart: always
    hostname: els
    environment:
      - bootstrap.memory_lock=true
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ports:
      - "9200:9200"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
      - eslog:/usr/share/elasticsearch/logs
  
  kibana:
    image: docker.elastic.co/kibana/kibana:6.8.1
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://els:9200
    depends_on:
      - els
  
  jaeger:
    image: "jaegertracing/all-in-one:latest" # must be 1.13 then
    restart: always
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://els:9200
      - LOG_LEVEL=debug   
      - ES_TAGS_AS_FIELDS=true    
    ports:
      - "5775:5775/udp"
      - "6831:6831/udp"
      - "6832:6832/udp"
      - "5778:5778"
      - "16686:16686"
      - "14268:14268"
      - "9411:9411"
    depends_on:
      - els
    links:
      - els
volumes:
  esdata1:
    driver: local
  eslog:
    driver: local

I am sending spans from two different sources (Java Apps with OpenCensus). The trace starts in the first service, and some seconds later the trace gets eventually completed with the spans by the second service.

The symptom is that only spans from the first service gets stored in ElasticSearch and therefore appear in Jaeger-UI. The spans from the second service are missing. Not visible in Jaeger-UI nor Kibana...

Logs for trace d6966bce32be0699e5891fd498be48d6:

jaeger-collector_1   | {"level":"debug","ts":1563710537.0283642,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"6cdaab280365113c"}
jaeger-collector_1   | {"level":"debug","ts":1563710537.0288234,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"bd177501a97cd6d7"}
jaeger-collector_1   | {"level":"debug","ts":1563710542.0346122,"caller":"app/thrift_span_handler.go:82","msg":"Span batch processed by the collector.","ok":true}
jaeger-collector_1   | {"level":"debug","ts":1563710542.035041,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"f8a2c6b248c8f44"}
jaeger-collector_1   | {"level":"debug","ts":1563710542.0353873,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"3517592fc5c0af47"}
jaeger-collector_1   | {"level":"debug","ts":1563710542.036132,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"2854dbfb421329d6"}
jaeger-collector_1   | {"level":"debug","ts":1563710542.0363362,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"26e68cdaaa61e5b6"}
jaeger-collector_1   | {"level":"debug","ts":1563710556.9275262,"caller":"app/thrift_span_handler.go:82","msg":"Span batch processed by the collector.","ok":true}
els                  | [2019-07-21T12:02:36,934][WARN ][o.e.d.r.a.a.i.RestPutMappingAction] [PMb4ZlA] [types removal] The parameter include_type_name should be explicitly specified in create index requests to prepare for 7.0. In 7.0 include_type_name will default to 'false', and requests are expected to omit the type name in mapping definitions.
els                  | [2019-07-21T12:02:36,958][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] Setting index.mapper.dynamic is deprecated since indices may not have more than one type anymore.
els                  | [2019-07-21T12:02:36,960][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:36,961][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:36,963][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] [_default_] mapping is deprecated since it is not useful anymore now that indexes cannot have more than one type
els                  | [2019-07-21T12:02:36,965][INFO ][o.e.c.m.MetaDataCreateIndexService] [PMb4ZlA] [jaeger-service-2069-02-07] creating index, cause [api], templates [], shards [1]/[1], mappings [_default_, service]
els                  | [2019-07-21T12:02:36,975][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] Setting index.mapper.dynamic is deprecated since indices may not have more than one type anymore.
els                  | [2019-07-21T12:02:36,976][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:36,978][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,102][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] Setting index.mapper.dynamic is deprecated since indices may not have more than one type anymore.
els                  | [2019-07-21T12:02:37,105][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,108][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,109][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] [_default_] mapping is deprecated since it is not useful anymore now that indexes cannot have more than one type
els                  | [2019-07-21T12:02:37,111][INFO ][o.e.c.m.MetaDataCreateIndexService] [PMb4ZlA] [jaeger-span-2069-02-07] creating index, cause [api], templates [], shards [1]/[1], mappings [_default_, span]
els                  | [2019-07-21T12:02:37,119][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] Setting index.mapper.dynamic is deprecated since indices may not have more than one type anymore.
els                  | [2019-07-21T12:02:37,122][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,124][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
jaeger-collector_1   | {"level":"debug","ts":1563710557.2356038,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"53b378e69037eee"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.2360857,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"8cf22431614d6a56"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.236264,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"53f9494022e5aec4"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.2363365,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"4c1d04ba62631207"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.236264,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"8298a1a9b49a543b"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.236444,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"da6d54d0c86ce7c3"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.2364695,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"6cef70cd7693266d"}
jaeger-collector_1   | {"level":"debug","ts":1563710557.2365024,"caller":"app/span_processor.go:117","msg":"Span written to the storage by the collector","trace-id":"d6966bce32be0699e5891fd498be48d6","span-id":"ea7822f3c1a9702b"}
els                  | [2019-07-21T12:02:37,433][WARN ][o.e.d.i.m.MapperService  ] [PMb4ZlA] Setting index.mapper.dynamic is deprecated since indices may not have more than one type anymore.
els                  | [2019-07-21T12:02:37,435][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,437][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,439][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,441][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.
els                  | [2019-07-21T12:02:37,444][INFO ][o.e.c.m.MetaDataMappingService] [PMb4ZlA] [jaeger-span-2069-02-07/kZZ2jFjoSz2Id6_Rt30ImA] update_mapping [span]
els                  | [2019-07-21T12:02:37,448][WARN ][o.e.d.i.m.AllFieldMapper ] [PMb4ZlA] [_all] is deprecated in 6.0+ and will be removed in 7.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.

The first batch of spans (6cdaab280365113c to 26e68cdaaa61e5b6) get stored properly in ES, however the second batch (53b378e69037eee to ea7822f3c1a9702b) doesn't get stored.

What's interesting here is the fact that each span has the message span written to the storage by the collector, however, after the first batch of spans there is a message stating
"app/thrift_span_handler.go:82","msg":"Span batch processed by the collector.","ok":true}
which is missing after the second batch.

Interestingly, if I disable ES and use Jaeger solely with the in-memory storage, all spans from both services get stored properly!

@pavolloffay
Copy link
Member

hi @MahatmaFatalError, what version of jaeger are you using?

There should be an error logged if spans are not stored.

Are you generating too much traffic? Maybe the spans are dropped by ES bulk processor you can try to increase these parameters

      --es.bulk.actions int                                       The number of requests that can be enqueued before the bulk processor decides to commit (default 1000)
      --es.bulk.flush-interval duration                           A time.Duration after which bulk requests are committed, regardless of other thresholds. Set to zero to disable. By default, this is disabled. (default 200ms)
      --es.bulk.size int                                          The number of bytes that the bulk requests can take up before the bulk processor decides to commit (default 5000000)

@pavolloffay
Copy link
Member

Metrics might also help you to understand if the spans were dropped https://www.jaegertracing.io/docs/1.13/monitoring/.

e.g:

# HELP jaeger_collector_spans_dropped_total spans.dropped
# TYPE jaeger_collector_spans_dropped_total counter
jaeger_collector_spans_dropped_total{host="localhost"} 0

Or jaeger_bulk* metrics.

@MahatmaFatalError
Copy link
Author

Thanks @pavolloffay for your reply. I am now using all-in-one:1.13.1

There should be an error logged if spans are not stored.

That's exactly the point which makes me crazy ;)

Since this is a local test environment there is not much traffic. We are talking about 5-15 Spans in a Trace while the trace gets triggered by me manually.

Regarding --es.bulk.actions wouldn't it be better to decrease the value to ensure that my 15 spans get flushed?
I tested command: ["--es.bulk.actions=3"] anyway but the issue did not change.

Metrics are looking OK:
jaeger_collector_spans_dropped_total{host="754b7e907dae"} 0
jaeger_bulk_index_attempts_total 5
jaeger_bulk_index_errors_total 0
jaeger_bulk_index_inserts_total 5

@pavolloffay
Copy link
Member

Could you try ES_TAGS_AS_FIELDS=false to see if it has any impact? It would be also good to have a look at index and try to find data manually.

@MahatmaFatalError
Copy link
Author

Could you try ES_TAGS_AS_FIELDS=false to see if it has any impact?

Set it to false. No noticeable impact.

It would be also good to have a look at index and try to find data manually.

I queried the jaeger-span-* Index using Kibana for the specific serviceName: No result. Nevertheless, it is interesting that the specific service name is available in the pull-down-menu.

Then I queried the jaeger-service-* Index using Kibana for the specific serviceName: Results found!

I compared the payload of an entry of the jaeger-service-* index for both services:

Proper service A:

{
  "_index": "jaeger-service-2019-07-19",
  "_type": "service",
  "_id": "12daa6bcccaccac",
  "_version": 1,
  "_score": 1,
  "_source": {
    "serviceName": "hello-service",
    "operationName": "acquiring db connection from pool"
  }
}

Broken service B:

{
  "_index": "jaeger-service-2069-02-06",
  "_type": "service",
  "_id": "580051df358db725",
  "_version": 2,
  "_score": 1,
  "_source": {
    "serviceName": "PostgreSQL-log-converter",
    "operationName": "Seq Scan"
  }
}

Please notice the index pattern of the latter one jaeger-service-2069-02-06.
Seems the timestamp somehow is mixed up.

Then I searched the jaeger-span-* Index for _index = jaeger-span-2069-02-06 which is available in the pull-down-menu for the filter, but the query returns no result.
This might be because Kibana adds a filter criteria < now(). If I set the timerange to the year 2070
all my missing spans are found!

So to sum up my assumptions:

  • Somehow the timestamps of spans from the second service are broken which leads to indexes in the future.
  • Jaeger Query UI only receives Spans from the past, not from the future
  • The root cause might be the instrumentation of the second service, which needs further investigation

@pavolloffay
Copy link
Member

Probably time setting in your second service is wrong. For the indices long in future/past we have this issue #841.

I am closing this as it is not an issue with Jaeger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants