New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add initial support for indexing to data streams #4409

Merged

axw merged 15 commits into elastic:master from axw:apm-indexing-strategy

Nov 19, 2020

Member

axw commented Nov 11, 2020 •

edited

Loading

Motivation/summary

This PR introduces initial support for indexing events into data streams, with the new apm-server.data_streams.enabled config. When this is set, the server will:

add data_stream.{type,dataset,namespace} fields to each published event
use these fields to route the events to the appropriate data stream ($type-$dataset-$namespace)
disable ILM setup by default

Still TODO, in followups:

system tests
set apm-server.data_streams.enabled when in Fleet mode
make data_stream.namespace configurable (using config received from Elastic Agent)
improve config validation, to ensure things like output.elasticsearch.indices cannot be set in conjunction apm-server.data_streams.enabled

Checklist

I have signed the Contributor License Agreement.
I have updated CHANGELOG.asciidoc

I have considered changes for:
~~- [ ] documentation~~ (later maybe; more likely this will be undocumented and set indirectly by using Fleet)
~~- [ ] logging (add log lines, choose appropriate log selector, etc.)~~
~~- [ ] metrics and monitoring (create issue for Kibana team to add metrics to visualizations, e.g. Kibana#44001)~~

automated tests (add tests for the code changes, all unit tests pass locally)
telemetry
~~- [ ] Elasticsearch Service (https://cloud.elastic.co)~~
~~- [ ] Elastic Cloud Enterprise (https://www.elastic.co/products/ece)~~
~~- [ ] Elastic Cloud on Kubernetes (https://www.elastic.co/elastic-cloud-kubernetes)~~

How to test these changes

Run apm-server with -E apm-server.data_streams.enabled=true
Send some transactions, spans, errors, and runtime metrics
Confirm that transactions and spans go into traces-*, errors into logs-*, and runtime metrics into metrics-*.

Related issues

Partially addresses #4378

Contributor

apmmachine commented Nov 11, 2020 •

edited

Loading

💔 Tests Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: [axw commented: jenkins run the tests please]
Start Time: 2020-11-19T02:59:58.618+0000
Duration: 58 min 21 sec

Test stats 🧪

Test	Results
Failed	7
Passed	4747
Skipped	145
Total	4899

Test errors

Expand to view the tests failures

`Build and Test / APM Integration Tests / test_dotnet_error – tests.agent.test_dotnet`

Expand to view the error details

 AssertionError: Expected 1, queried 3555

Expand to view the stacktrace

 dotnet = <tests.fixtures.agents.Agent object at 0x7f2a80eec850>

    @pytest.mark.version
    @pytest.mark.dotnet
    def test_dotnet_error(dotnet):
        utils.check_agent_error(
>           dotnet.oof, dotnet.apm_server.elasticsearch)

tests/agent/test_dotnet.py:18: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3555

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_go_nethttp_error – tests.agent.test_go`

Expand to view the error details

 AssertionError: Expected 1, queried 3549

Expand to view the stacktrace

 go_nethttp = <tests.fixtures.agents.Agent object at 0x7f2a783c7d10>

    @pytest.mark.version
    @pytest.mark.go_nethttp
    def test_go_nethttp_error(go_nethttp):
        utils.check_agent_error(
>           go_nethttp.oof, go_nethttp.apm_server.elasticsearch)

tests/agent/test_go.py:18: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3549

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_java_spring_error – tests.agent.test_java`

Expand to view the error details

 AssertionError: Expected 1, queried 3525

Expand to view the stacktrace

 java_spring = <tests.fixtures.agents.Agent object at 0x7f2a80e69490>

    @pytest.mark.java_spring
    def test_java_spring_error(java_spring):
        utils.check_agent_error(
>           java_spring.oof, java_spring.apm_server.elasticsearch)

tests/agent/test_java.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3525

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_express_error – tests.agent.test_nodejs`

Expand to view the error details

 AssertionError: Expected 1, queried 3593

Expand to view the stacktrace

 express = <tests.fixtures.agents.Agent object at 0x7f2a5828d250>

    @pytest.mark.version
    def test_express_error(express):
        utils.check_agent_error(
>           express.oof, express.apm_server.elasticsearch)

tests/agent/test_nodejs.py:15: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3593

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_flask_error – tests.agent.test_python`

Expand to view the error details

 AssertionError: Expected 2, queried 3630

Expand to view the stacktrace

 flask = <tests.fixtures.agents.Agent object at 0x7f2a58294190>

    @pytest.mark.version
    @pytest.mark.flask
    def test_flask_error(flask):
        # one from exception handler, one from logging handler
>       utils.check_agent_error(flask.oof, flask.apm_server.elasticsearch, ct=2)

tests/agent/test_python.py:17: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 2, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 2, queried 3630

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_django_error – tests.agent.test_python`

Expand to view the error details

 AssertionError: Expected 1, queried 3621

Expand to view the stacktrace

 django = <tests.fixtures.agents.Agent object at 0x7f2a5828d610>

    @pytest.mark.version
    @pytest.mark.django
    def test_django_error(django):
>       utils.check_agent_error(django.oof, django.apm_server.elasticsearch)

tests/agent/test_python.py:66: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3621

tests/utils.py:62: AssertionError

`Build and Test / APM Integration Tests / test_rails_error – tests.agent.test_ruby`

Expand to view the error details

 AssertionError: Expected 1, queried 3655

Expand to view the stacktrace

 rails = <tests.fixtures.agents.Agent object at 0x7f2a783ea110>

    @pytest.mark.version
    @pytest.mark.rails
    def test_rails_error(rails):
        utils.check_agent_error(
>           rails.oof, rails.apm_server.elasticsearch)

tests/agent/test_ruby.py:18: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/utils.py:23: in check_agent_error
    check_elasticsearch_count(elasticsearch, ct, processor='error')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

elasticsearch = <tests.fixtures.es.es.<locals>.Elasticsearch object at 0x7f2a80eec9d0>
expected = 1, processor = 'error'
query = {'query': {'term': {'processor.name': 'error'}}}

    def check_elasticsearch_count(elasticsearch,
                                  expected,
                                  processor='transaction',
                                  query=None):
        if query is None:
            query = {'query': {'term': {'processor.name': processor}}}
    
        actual = 0
        retries = 0
        max_retries = 3
        while actual != expected and retries < max_retries:
            try:
                actual = elasticsearch.count(query)
                retries += 1
                time.sleep(10)
            except TimeoutError:
                retries += 1
                actual = -1
    
        assert actual == expected, "Expected {}, queried {}".format(
>           expected, actual)
E       AssertionError: Expected 1, queried 3655

tests/utils.py:62: AssertionError

Steps errors

Expand to view the steps failures

`Compress`

Took 0 min 0 sec . View more details on here
Description: tar --exclude=coverage-files.tgz -czf coverage-files.tgz coverage

`Compress`

Took 0 min 0 sec . View more details on here
Description: tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests

`Test Sync`

Took 3 min 23 sec . View more details on here
Description: ./script/jenkins/sync.sh

Log output

Expand to view the last 100 lines of log output

[2020-11-19T03:28:30.546Z] --- PASS: TestTransactionAggregationShutdown (3.21s)
[2020-11-19T03:28:30.546Z] === RUN   TestServiceDestinationAggregation
[2020-11-19T03:28:30.546Z] --- PASS: TestServiceDestinationAggregation (2.43s)
[2020-11-19T03:28:30.546Z] === RUN   TestAPIKeyCreate
[2020-11-19T03:28:30.546Z] --- PASS: TestAPIKeyCreate (2.44s)
[2020-11-19T03:28:30.546Z] === RUN   TestAPIKeyCreateExpiration
[2020-11-19T03:28:30.546Z] --- PASS: TestAPIKeyCreateExpiration (2.02s)
[2020-11-19T03:28:30.546Z] === RUN   TestAPIKeyInvalidateName
[2020-11-19T03:28:30.546Z] --- PASS: TestAPIKeyInvalidateName (3.04s)
[2020-11-19T03:28:30.546Z] === RUN   TestAPIKeyInvalidateID
[2020-11-19T03:28:30.546Z] --- PASS: TestAPIKeyInvalidateID (2.01s)
[2020-11-19T03:28:30.546Z] === RUN   TestAPMServerEnvironment
[2020-11-19T03:28:30.546Z] === RUN   TestAPMServerEnvironment/container
[2020-11-19T03:28:30.546Z] === PAUSE TestAPMServerEnvironment/container
[2020-11-19T03:28:30.546Z] === RUN   TestAPMServerEnvironment/systemd
[2020-11-19T03:28:30.546Z] === PAUSE TestAPMServerEnvironment/systemd
[2020-11-19T03:28:30.546Z] === RUN   TestAPMServerEnvironment/macos_service
[2020-11-19T03:28:30.546Z] === PAUSE TestAPMServerEnvironment/macos_service
[2020-11-19T03:28:30.546Z] === RUN   TestAPMServerEnvironment/windows_service
[2020-11-19T03:28:30.546Z] === PAUSE TestAPMServerEnvironment/windows_service
[2020-11-19T03:28:30.546Z] === CONT  TestAPMServerEnvironment/container
[2020-11-19T03:28:30.546Z] === CONT  TestAPMServerEnvironment/macos_service
[2020-11-19T03:28:30.546Z] === CONT  TestAPMServerEnvironment/systemd
[2020-11-19T03:28:30.546Z] === CONT  TestAPMServerEnvironment/windows_service
[2020-11-19T03:28:30.546Z] --- PASS: TestAPMServerEnvironment (0.00s)
[2020-11-19T03:28:30.546Z]     --- PASS: TestAPMServerEnvironment/container (0.40s)
[2020-11-19T03:28:30.546Z]     --- PASS: TestAPMServerEnvironment/macos_service (0.43s)
[2020-11-19T03:28:30.546Z]     --- PASS: TestAPMServerEnvironment/windows_service (0.43s)
[2020-11-19T03:28:30.546Z]     --- PASS: TestAPMServerEnvironment/systemd (0.49s)
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServerInstrumentation
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServerInstrumentation (2.87s)
[2020-11-19T03:28:30.547Z] === RUN   TestJaegerGRPC
[2020-11-19T03:28:30.547Z] --- PASS: TestJaegerGRPC (2.64s)
[2020-11-19T03:28:30.547Z] === RUN   TestJaegerGRPCSampling
[2020-11-19T03:28:30.547Z] --- PASS: TestJaegerGRPCSampling (2.40s)
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServerRequestLoggingValid
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServerRequestLoggingValid (0.23s)
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServerMonitoring
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServerMonitoring (1.28s)
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServerMonitoringBuiltinUser
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServerMonitoringBuiltinUser (1.99s)
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServerOnboarding
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServerOnboarding (2.32s)
[2020-11-19T03:28:30.547Z] === RUN   TestRUMXForwardedFor
[2020-11-19T03:28:30.547Z] --- PASS: TestRUMXForwardedFor (2.43s)
[2020-11-19T03:28:30.547Z] === RUN   TestKeepUnsampled
[2020-11-19T03:28:30.547Z] === RUN   TestKeepUnsampled/false
[2020-11-19T03:28:30.547Z] === RUN   TestKeepUnsampled/true
[2020-11-19T03:28:30.547Z] --- PASS: TestKeepUnsampled (4.86s)
[2020-11-19T03:28:30.547Z]     --- PASS: TestKeepUnsampled/false (2.44s)
[2020-11-19T03:28:30.547Z]     --- PASS: TestKeepUnsampled/true (2.42s)
[2020-11-19T03:28:30.547Z] === RUN   TestTailSampling
[2020-11-19T03:28:30.547Z]     sampling_test.go:136: waiting for 100 "parent" transactions
[2020-11-19T03:28:30.547Z]     sampling_test.go:136: waiting for 100 "child" transactions
[2020-11-19T03:28:30.547Z] --- PASS: TestTailSampling (3.86s)
[2020-11-19T03:28:30.547Z] === RUN   TestTailSamplingUnlicensed
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:54 Starting container id: 993fe2cc0a34 image: quay.io/testcontainers/ryuk:0.2.3
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:54 Waiting for container id 993fe2cc0a34 image: quay.io/testcontainers/ryuk:0.2.3
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:55 Container is ready id: 993fe2cc0a34 image: quay.io/testcontainers/ryuk:0.2.3
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:55 Starting container id: 0b71551a704a image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:55 Waiting for container id 0b71551a704a image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-11-19T03:28:30.547Z] 2020/11/19 03:28:14 Container is ready id: 0b71551a704a image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT
[2020-11-19T03:28:30.547Z] --- PASS: TestTailSamplingUnlicensed (35.43s)
[2020-11-19T03:28:30.547Z] PASS
[2020-11-19T03:28:30.547Z] ok  	github.com/elastic/apm-server/systemtest	81.014s
[2020-11-19T03:28:30.547Z] === RUN   TestAPMServer
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:07 Building apm-server...
[2020-11-19T03:28:30.547Z] 2020/11/19 03:27:09 Built /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4409/src/github.com/elastic/apm-server/apm-server
[2020-11-19T03:28:30.547Z] --- PASS: TestAPMServer (4.80s)
[2020-11-19T03:28:30.547Z] === RUN   TestUnstartedAPMServer
[2020-11-19T03:28:30.547Z] --- PASS: TestUnstartedAPMServer (0.00s)
[2020-11-19T03:28:30.547Z] === RUN   TestExpvar
[2020-11-19T03:28:30.547Z] --- PASS: TestExpvar (0.28s)
[2020-11-19T03:28:30.547Z] PASS
[2020-11-19T03:28:30.547Z] ok  	github.com/elastic/apm-server/systemtest/apmservertest	5.088s
[2020-11-19T03:28:30.547Z] ?   	github.com/elastic/apm-server/systemtest/estest	[no test files]
[2020-11-19T03:28:30.547Z] + cleanup
[2020-11-19T03:28:30.547Z] + rm -rf /tmp/tmp.jHyWJbYSyx
[2020-11-19T03:28:30.547Z] + .ci/scripts/docker-get-logs.sh
[2020-11-19T03:28:31.529Z] Post stage
[2020-11-19T03:28:31.539Z] Running in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4409/src/github.com/elastic/apm-server
[2020-11-19T03:28:31.563Z] Archiving artifacts
[2020-11-19T03:28:31.868Z] Recording test results
[2020-11-19T03:28:32.679Z] [WARN] tar: pathPrefix parameter is deprecated.
[2020-11-19T03:28:33.036Z] + tar --version
[2020-11-19T03:28:33.362Z] + tar --exclude=system-tests-linux-files.tgz -czf system-tests-linux-files.tgz system-tests
[2020-11-19T03:28:33.362Z] tar: system-tests: Cannot stat: No such file or directory
[2020-11-19T03:28:33.362Z] tar: Exiting with failure status due to previous errors
[2020-11-19T03:28:33.374Z] [INFO] system-tests-linux-files.tgz was not compressed or archived : script returned exit code 2
[2020-11-19T03:57:54.941Z] [INFO] For detailed information see: https://apm-ci.elastic.co/job/apm-integration-tests-selector-mbp/job/master/11890/display/redirect
[2020-11-19T03:58:16.786Z] Copied 64 artifacts from "APM Integration Test MBP Selector » master" build number 11890
[2020-11-19T03:58:17.852Z] Post stage
[2020-11-19T03:58:17.861Z] Recording test results
[2020-11-19T03:58:19.054Z] Running on Jenkins in /var/lib/jenkins/workspace/pm-server_apm-server-mbp_PR-4409
[2020-11-19T03:58:19.136Z] [INFO] getVaultSecret: Getting secrets
[2020-11-19T03:58:19.409Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-11-19T03:58:20.148Z] + chmod 755 generate-build-data.sh
[2020-11-19T03:58:20.149Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4409/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4409/runs/18 UNSTABLE 3501269
[2020-11-19T03:58:20.699Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4409/runs/18/steps/?limit=10000 -o steps-info.json
[2020-11-19T03:58:20.950Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-server/apm-server-mbp/PR-4409/runs/18/tests/?status=FAILED -o tests-errors.json

axw added 7 commits

November 11, 2020 11:30


          idxmgmt: add support for data streams

1688df1

Introduce `apm-server.data_streams.enabled` config,
which will be used by idxmgmt to route events to
data streams based on data_stream.* fields that are
expected to be in each published event.


          _meta/fields.common.yml: add data_stream.* fields

3b5b40a


          model: add data_stream.{type,dataset} fields

b728661

When transforming model objects into beat.Events,
set the data_stream.{type,dataset} fields. We add
data_stream.namespace elsewhere, using an event
processor.


          beater: handle apm-server.data_streams.enabled

33200d9

Handle the new apm-server.data_streams.enabled
config:
 - when enabled, we add data_stream.namespace to
   all published events
 - when disabled, we remove data_stream.* fields
   from all published events

For now we just set data_stream.namespace to "default".
Later this will be based on the config received from Fleet.


          processor/stream/package_tests: update tests

64c6bf9

There is a hack in here to inject data_stream.namespace
in all published events, since the tests do not use the
standard libbeat pipeline code.


          Update approvals

dd506f8


          Add changelog entry

6b5c23f

axw force-pushed the apm-indexing-strategy branch from b4135f3 to 6b5c23f Compare

November 11, 2020 03:31

codecov-io commented Nov 11, 2020 •

edited

Loading

Codecov Report

Merging #4409 (1e7379e) into master (f0dc152) will increase coverage by 0.05%.
The diff coverage is 86.02%.

@@            Coverage Diff             @@
##           master    #4409      +/-   ##
==========================================
+ Coverage   76.11%   76.17%   +0.05%     
==========================================
  Files         156      158       +2     
  Lines        9713     9782      +69     
==========================================
+ Hits         7393     7451      +58     
- Misses       2320     2331      +11

Impacted Files	Coverage Δ
tests/fields.go	`35.77% <0.00%> (-0.34%)`	⬇️
idxmgmt/supporter.go	`77.98% <58.33%> (-6.40%)`	⬇️
beater/beater.go	`63.28% <88.23%> (+1.82%)`	⬆️
beater/config/config.go	`65.00% <100.00%> (+0.59%)`	⬆️
beater/config/data_streams.go	`100.00% <100.00%> (ø)`
beater/http.go	`68.33% <100.00%> (+0.53%)`	⬆️
beater/telemetry.go	`84.00% <100.00%> (+0.66%)`	⬆️
datastreams/servicename.go	`100.00% <100.00%> (ø)`
idxmgmt/supporter_factory.go	`69.69% <100.00%> (+1.95%)`	⬆️
model/error.go	`98.59% <100.00%> (+0.03%)`	⬆️
... and 11 more

axw requested a review from a team

November 11, 2020 05:37

axw marked this pull request as ready for review

November 11, 2020 05:37

simitt reviewed

View reviewed changes

Contributor

simitt left a comment

Generally looks good to me, left a couple of comments.

I realized that apm-* templates are still set up when data streams are enabled, while they wouldn't be applied any more. Doesn't have to be changed in this PR but we could disable template setup in this case.

Out of curiosity - Is there a Kibana/APM issue already to adapt the UI to the data stream changes? (right now it is ofc broken when data streams are enabled)

datastreams/constants.go Outdated Show resolved Hide resolved

datastreams/servicename.go Show resolved Hide resolved

idxmgmt/supporter.go Show resolved Hide resolved

idxmgmt/supporter.go Show resolved Hide resolved

changelogs/head.asciidoc Outdated Show resolved Hide resolved

axw added 3 commits

November 16, 2020 10:56


          datastreams: address review comments

c41c11b


          changelogs: clarify feature is experimental

080ba4e


          Merge branch 'master' into apm-indexing-strategy

658e639

Member Author

axw commented Nov 16, 2020

I realized that apm-* templates are still set up when data streams are enabled, while they wouldn't be applied any more. Doesn't have to be changed in this PR but we could disable template setup in this case.

Existing integrations disable this by injecting config, e.g. https://github.com/elastic/beats/blob/b6896ee6892da8ce4567d63ee0fa962512fb8701/x-pack/elastic-agent/spec/filebeat.yml#L3. I don't mind disabling it by default when enabling data streams though. Probably a bit neater.

Out of curiosity - Is there a Kibana/APM issue already to adapt the UI to the data stream changes? (right now it is ofc broken when data streams are enabled)

Not yet AFAIK. It's broken by default, but you can load a template and update the APM app settings in Kibana to fetch transactions and spans from traces-* instead of apm-*, etc.

jalvz reviewed

View reviewed changes

datastreams/servicename.go Show resolved Hide resolved

docs/fields.asciidoc Outdated Show resolved Hide resolved

model/transaction.go Outdated

@@ @@ -111,7 +112,13 @@ func (e *Transaction) fields() common.MapStr { @@
               func (e *Transaction) Transform(_ context.Context, _ *transform.Config) []beat.Event {
               	transactionTransformations.Inc()
+              	// Transactions are stored in a "traces" data stream along with spans.
+              	dataset := datastreams.NormalizeServiceName(e.Metadata.Service.Name)

Contributor

jalvz Nov 16, 2020

The assumption that we don't need to differenciate spans from transactions is strong, but what if we are wrong? Isn't cheap and harmless to add an apm.span or apm.transaction dataset prefix, just in case?

Member Author

axw Nov 17, 2020

Isn't cheap and harmless to add an apm.span or apm.transaction dataset prefix, just in case

It's not necessarily harmless, as more datasets will lead to more shards. We can always revisit this in the future.

Contributor

jalvz Nov 18, 2020

So how much it would hurt?
I guess its fine, but a similar argument can be done the other way around: by separating spans and transactions in different indices, it is likely that some queries in the APM UI can be changed to faster alternatives.
We would not get this benefit if we revisit this decision in the future because the UI would have to query for both index name and index content...
The opposite is not true: if we go with separated spans and transactions to start with, and later we find they don't add value (performance or user-flexibility wise), we would be able to optimize for less shards without any drawbacks.

Member Author

axw Nov 19, 2020

So how much it would hurt?

We won't know for sure until we test.

This was discussed in the proposal, and the discussion concluded with this comment (from me):

"It seems to me that there's no clear answer here. In order to make progress I propose that we start out conservatively w.r.t. sharding, and have a combined data stream as described.
This will go out as experimental to start with, and we can adjust as necessary. We can replay some real data through the options to compare performance."

I guess its fine, but a similar argument can be done the other way around: by separating spans and transactions in different indices, it is likely that some queries in the APM UI can be changed to faster alternatives.

Span docs are currently used in two places in the UI, AFAIK:

displaying an individual trace waterfall
walking traces for the service map

I'm confident (1) is going to be cheap either way. We know (2) is a sub-optimal approach to building the service map, and we're investigation options to move away from it.

We would not get this benefit if we revisit this decision in the future because the UI would have to query for both index name and index content...

Not sure I follow. We don't query spans by themselves; only ever spans and transactions together. If we did want to query only spans in the future, then we could split the data sets and make processor.event a constant_keyword; that would mean only the spans data streams would be considered in the query. This can be done without changing the index pattern, and without breaking backwards compatibility.

Anyway, as mentioned above: let's go with this for now and test it at scale. We can use an ingest node processor to split the transactions and spans into separate streams for testing.

model/profile.go Outdated Show resolved Hide resolved

model/metricset.go Outdated Show resolved Hide resolved

axw added 3 commits

November 17, 2020 09:17


          _meta: specify allowed values for data_stream.type

017289e


          Merge branch 'master' into apm-indexing-strategy

11406d1


          model: update datasets

798cd59

Use a common "apm." prefix, and place the service name last.

Member Author

axw commented Nov 17, 2020

jenkins run the tests please

axw closed this

axw reopened this

Member Author

axw commented Nov 17, 2020

jenkins run the tests please


          Merge branch 'master' into apm-indexing-strategy

e500a2b

simitt approved these changes

View reviewed changes

jalvz approved these changes

View reviewed changes


          Merge branch 'master' into apm-indexing-strategy

1e7379e

Member Author

axw commented Nov 19, 2020

jenkins run the tests please

Member Author

axw commented Nov 19, 2020

Something very wrong is going on in the integration tests...

Member Author

axw commented Nov 19, 2020

(I'm going to try reproducing this locally, so I can debug...)

Member Author

axw commented Nov 19, 2020

Same error is occurring in #4445 ...

Member Author

axw commented Nov 19, 2020

The integration tests have been failing for a while, and are unrelated to the server as far as I can see. I've captured my findings in elastic/apm-integration-testing#987.

axw merged commit 09951ba into elastic:master

axw deleted the apm-indexing-strategy branch

November 19, 2020 09:40

jalvz mentioned this pull request

Integrate with Elastic Agent #4004

Closed

15 tasks

axw mentioned this pull request

Addition of data_stream fields causes allocations when data streams are disabled #4460

Closed

axw added a commit to axw/apm-server that referenced this pull request


          Add initial support for indexing to data streams (elastic#4409)

5b48118

* idxmgmt: add support for data streams

Introduce `apm-server.data_streams.enabled` config,
which will be used by idxmgmt to route events to
data streams based on data_stream.* fields that are
expected to be in each published event.

* _meta/fields.common.yml: add data_stream.* fields

* model: add data_stream.{type,dataset} fields

When transforming model objects into beat.Events,
set the data_stream.{type,dataset} fields. We add
data_stream.namespace elsewhere, using an event
processor.

* beater: handle apm-server.data_streams.enabled

Handle the new apm-server.data_streams.enabled
config:
 - when enabled, we add data_stream.namespace to
   all published events
 - when disabled, we remove data_stream.* fields
   from all published events

For now we just set data_stream.namespace to "default".
Later this will be based on the config received from Fleet.

* processor/stream/package_tests: update tests

There is a hack in here to inject data_stream.namespace
in all published events, since the tests do not use the
standard libbeat pipeline code.

* Update approvals

* Add changelog entry

* datastreams: address review comments

* changelogs: clarify feature is experimental

* _meta: specify allowed values for data_stream.type

* model: update datasets

Use a common "apm." prefix, and place the service name last.

axw mentioned this pull request

[7.x] Add initial support for indexing to data streams (#4409) #4517

Merged

axw added a commit that referenced this pull request


          Add initial support for indexing to data streams (#4409) (#4517)

e4ec4e5

* idxmgmt: add support for data streams

Introduce `apm-server.data_streams.enabled` config,
which will be used by idxmgmt to route events to
data streams based on data_stream.* fields that are
expected to be in each published event.

* _meta/fields.common.yml: add data_stream.* fields

* model: add data_stream.{type,dataset} fields

When transforming model objects into beat.Events,
set the data_stream.{type,dataset} fields. We add
data_stream.namespace elsewhere, using an event
processor.

* beater: handle apm-server.data_streams.enabled

Handle the new apm-server.data_streams.enabled
config:
 - when enabled, we add data_stream.namespace to
   all published events
 - when disabled, we remove data_stream.* fields
   from all published events

For now we just set data_stream.namespace to "default".
Later this will be based on the config received from Fleet.

* processor/stream/package_tests: update tests

There is a hack in here to inject data_stream.namespace
in all published events, since the tests do not use the
standard libbeat pipeline code.

* Update approvals

* Add changelog entry

* datastreams: address review comments

* changelogs: clarify feature is experimental

* _meta: specify allowed values for data_stream.type

* model: update datasets

Use a common "apm." prefix, and place the service name last.

simitt added test-plan v7.11.0 labels

simitt self-assigned this

simitt added the test-plan-ok label

Contributor

simitt commented Dec 23, 2020

Tested with BC1 - sent some event data for errors, spans, transactions and metrics - everything ended up in the expected data streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-plan test-plan-ok v7.11.0