Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Report data shippers #64935

Merged
merged 41 commits into from
Jul 2, 2020

Conversation

afharo
Copy link
Member

@afharo afharo commented Apr 30, 2020

Summary

Closes #64790

Report if well-known data shippers are used to index documents in the cluster.
Tested on OSS, X-Pack and Monitoring collectors.

Manual tests to explain the behaviour

  1. Start a clean cluster:
    The expected payload is an empty array stack_stats.data = []

  2. I installed packetbeat in my machine and started it. stack_stats.data is now...

      "data": [
        {
          "shipper": "packetbeat",
          "index_count": 1,
          "ecs_index_count": 1,
          "doc_count": 56686,
          "size_in_bytes": 29042232
        }
      ],
    
  3. For packetbeat indices pre-7.0, we didn't have the _meta.beat information, so we'll report it under pattern_name instead. But we'll report the shipper property because we know that index pattern is strictly linked to that shipper. The index pattern is defined as { pattern: 'packetbeat-*', patternName: 'packetbeat', shipper: 'packetbeat' }

        {
          "pattern_name": "packetbeat",
          "shipper": "packetbeat",
          "index_count": 1,
          "ecs_index_count": 0,
          "doc_count": 0,
          "size_in_bytes": 208
        }
  4. If I create a new index called citrix-1234, matching the pattern *citrix*, the following is added to the array.

        {
          "pattern_name": "citrix",
          "index_count": 1,
          "ecs_index_count": 0,
          "doc_count": 0,
          "size_in_bytes": 208
        }
  5. For the pattern { pattern: '*logs*', patternName: 'third-party-logs' }, I create some indices containing logs in their name:

    PUT logs-custom-index-1234
    PUT <logs-custom-index-{now%2Fd}-12345>
    PUT <custom-logs-index-{now%2Fd}-12345>
    

    The following payload is added to the data array:

        {
          "pattern_name": "third-party-logs",
          "index_count": 3,
          "ecs_index_count": 0,
          "doc_count": 0,
          "size_in_bytes": 624
        }
  6. If I create an index following the New Indexing Strategy

    PUT events-something-namespace-123124
    {
      "mappings": {
        "_meta": {
          "beat": "my-beat"
        },
        "properties": {
          "ecs": {
            "properties": {
              "version": {
                "type": "keyword"
              }
            }
          },
          "dataset": {
            "properties": {
              "name": {
                "type": "constant_keyword",
                "value": "something"
              },
              "type": {
                "type": "constant_keyword",
                "value": "events"
              }
            }
          }
        }
      }
    }
    

    We read the values from the mappings and include the following object to the data array:

        {
          "dataset": {
            "name": "something",
            "type": "events"
          },
          "shipper": "my-beat",
          "index_count": 1,
          "ecs_index_count": 1,
          "doc_count": 0,
          "size_in_bytes": 208
        }
  7. Final object after all

    {
      "stack_stats": {
        "data": [
          {
            "shipper": "packetbeat",
            "index_count": 1,
            "ecs_index_count": 1,
            "doc_count": 56686,
            "size_in_bytes": 29042232
          },
          {
            "pattern_name": "packetbeat",
            "shipper": "packetbeat",
            "index_count": 1,
            "ecs_index_count": 0,
            "doc_count": 0,
            "size_in_bytes": 208
          },
          {
            "dataset": {
              "name": "something",
              "type": "events"
            },
            "shipper": "my-beat",
            "index_count": 1,
            "ecs_index_count": 1,
            "doc_count": 0,
            "size_in_bytes": 208
          },
          {
            "pattern_name": "citrix",
            "index_count": 1,
            "ecs_index_count": 0,
            "doc_count": 0,
            "size_in_bytes": 208
          },
          {
            "pattern_name": "third-party-logs",
            "index_count": 3,
            "ecs_index_count": 0,
            "doc_count": 0,
            "size_in_bytes": 624
          }
        ],
        "kibana": "..."
      }
    }

When Monitoring is ON

With the currently limited information we can retrieve in Monitoring, the reported payload in the same scenario would be:

      "data": [
        {
          "pattern_name": "citrix",
          "index_count": 1,
          "doc_count": 0,
          "size_in_bytes": 208
        },
        {
          "pattern_name": "third-party-logs",
          "index_count": 2,
          "doc_count": 0,
          "size_in_bytes": 416
        },
        {
          "pattern_name": "packetbeat",
          "index_count": 2,
          "doc_count": 86535,
          "size_in_bytes": 43978411
        }
      ]

So I've removed the collection of this information when using Monitoring until we find a way to accurately retrieve it (#68998).

TODO:

Adding the v7.8.0 label as tentative only.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@afharo afharo force-pushed the telemetry/report-data-providers branch from 436f7a4 to c51aa04 Compare May 4, 2020 09:03
@afharo afharo force-pushed the telemetry/report-data-providers branch from c51aa04 to db8096b Compare May 4, 2020 09:25
Comment on lines 138 to 148
// GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version
callCluster<ClusterState>('cluster.state', {
index,
metric: 'metadata',
filterPath: [
// The payload is huge and we are only after the name (no other useful stuff so far)
'metadata.indices.*.version',
// Does it have `ecs.version` in the mappings?
'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type',
],
}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a concern about using this API because its documentation says:

The response is an internal representation of the cluster state and its format may change from version to version. If possible, you should obtain any information from the cluster state using the other, more stable, cluster APIs.

I've added a functional test in test/api_integration/apis/telemetry/telemetry_local.js to make sure it works as expected. Although it brings up the risk of flaky tests in the future if the API changes the way it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only alternative so far is to only use the GET <index>/_stats/ API. But that requires the kibana user to have a more permissive role.

Alternatively, there are some ongoing talks with the ES team to modify the same Cluster State API to provide the aggregated data all at once (meaning ES shouldn't drop support or make any changes in that API).

Copy link
Member

@Bamieh Bamieh May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we discuss with other teams the possiblity of adding those permissions to the kibana role instead of using this API? im not completely against using this API if we have no other way, but i think it would make more sense to add those extra permissions and use the _stats api.

Also if you check the compatiblity grid: https://github.com/elastic/kibana/#version-compatibility-with-elasticsearch

Are we sure this API does not change behavior across the compatiblity grid? Our tests will test exact ES matching version, but not other compatible versions where ES minor/patch versinos are newer or ES patch version is lower than kibana's.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against updating the roles to be more permissive (that will allow us to consistently be able to retrieve the doc_count and size_in_bytes properties) but it involves some additional security concerns.

Not being able to retrieve the data from the cluster state API because of the compatibility grid will only result in not being able to provide this parameter in the telemetry (it will return it as {}) but it shouldn't break any other logic unless the API request itself throws any errors.
The warning in that API is about the format may change though, not the API behaviour as such.

N.B.: I just pushed a commit to catch the method and safely return {} if any of the API calls fail.

I think this approach is safer than opening the kibana_system role to be able to read from any index. But I'm happy to revisit this implementation if we think that's the way to go or any other approach (like ES providing this kind of info already embedded in the _cluster/stats API if they are willing the do the change and aggregation on their end).

Copy link
Contributor

@TinaHeiligers TinaHeiligers May 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening up the kibana_system role is not ideal. Could we explore creating a telemetry_system role that has all the permissions needed?
cc @kobelb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think creating a telemetry_system roles would necessarily help us here... If we added the telemetry_system role to the kibana_system user we'd have an equivalent "threat profile". If we created a new telemetry_system user which had the telemetry_system role, it would make setup more complicated and have a slightly different "threat profile" as both user's credentials would be stored in the kibana.yml.

Augmenting the _cluster/stats API so we don't have to change the kibana_system privileges at all would be the safest option.

However, if we had to give the kibana_system role the monitor privilege on indices so we can use the index stats API, it's a lot safer than giving it access to read the documents themselves.

@afharo afharo marked this pull request as ready for review May 4, 2020 17:48
@afharo afharo requested a review from a team as a code owner May 4, 2020 17:48
@afharo afharo requested a review from a team May 4, 2020 17:48
@alexfrancoeur
Copy link

alexfrancoeur commented May 4, 2020

@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename ingest_solutions to ingest? At least initially, we aren't mapping the data providers directly to solutions but instead the shippers themselves

@kobelb kobelb self-requested a review May 4, 2020 19:28
@afharo
Copy link
Member Author

afharo commented May 5, 2020

@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename ingest_solutions to ingest? At least initially, we aren't mapping the data providers directly to solutions but instead the shippers themselves

@alexfrancoeur absolutely, it's a minor change. The only reason I decided not to use ingest on its own is that it may be too vague. We already have the concepts ingest pipelines, ingest nodes, ...

But happy to change it you think it would make things easier :)

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from stack monitoring

) {
const responses = await Promise.all(
clusterUuids.map(async clusterUuid => {
// Should we take into consideration CCS? https://github.com/elastic/kibana/blob/3a396027f669803e1a3143237578973fb1ab20d0/x-pack/plugins/monitoring/server/routes/api/v1/elasticsearch/indices.js#L42
Copy link
Contributor

@igoristic igoristic May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Are there any repercussions if we do use CCS by default? (like will it be less efficient, or slower?). If not then my other concern is it would probably need to be conditional based on licensing (and maybe also if there are any config options tied with it). I think the terms are kind of confusing since we actually mean "Multi-stack monitoring" here, right?
Screen Shot 2020-05-05 at 11 36 10 AM

Looks like it's only available for Gold license and above


However, I think CSS is available for all licenses:
Screen Shot 2020-05-05 at 11 36 58 AM

Source: https://www.elastic.co/subscriptions

I think it's fine the way it is right now and can be added later if anything

Copy link
Contributor

@igoristic igoristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from Stack Monitoring pov (code and functionality) ✅

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution looks good, although I did add a couple of questions.
I also pulled the code and ran the changed and added tests, all of which passed.

Comment on lines 138 to 148
// GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version
callCluster<ClusterState>('cluster.state', {
index,
metric: 'metadata',
filterPath: [
// The payload is huge and we are only after the name (no other useful stuff so far)
'metadata.indices.*.version',
// Does it have `ecs.version` in the mappings?
'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type',
],
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.

@afharo
Copy link
Member Author

afharo commented May 6, 2020

@elasticmachine merge upstream

@TinaHeiligers
Copy link
Contributor

@elasticmachine merge upstream

elasticmachine and others added 3 commits June 18, 2020 11:34
…emetry/get_data_telemetry.test.ts

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
…emetry/get_data_telemetry.test.ts

Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
@afharo
Copy link
Member Author

afharo commented Jun 23, 2020

@elasticmachine merge upstream

}

// Otherwise, try with the list of known index patterns
return DATA_DATASETS_INDEX_PATTERNS.find(({ pattern }) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a conversation with @alexfrancoeur and @kobelb, I need to change this to a .filter

@afharo
Copy link
Member Author

afharo commented Jun 25, 2020

@elasticmachine merge upstream

@afharo afharo force-pushed the telemetry/report-data-providers branch from 9395313 to 4803f96 Compare June 30, 2020 15:42
@afharo
Copy link
Member Author

afharo commented Jul 1, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@afharo afharo merged commit 6607bf7 into elastic:master Jul 2, 2020
@afharo afharo deleted the telemetry/report-data-providers branch July 2, 2020 07:08
afharo added a commit to afharo/kibana that referenced this pull request Jul 2, 2020
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
afharo added a commit that referenced this pull request Jul 2, 2020
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 2, 2020
* master: (46 commits)
  [Visualize] Add missing advanced settings and custom label for pipeline aggs (elastic#69688)
  Use dynamic: false for config saved object mappings (elastic#70436)
  [Ingest Pipelines] Error messages (elastic#70167)
  [APM] Show transaction rate per minute on Observability Overview page (elastic#70336)
  Filter out error when calculating a label (elastic#69934)
  [Visualizations] Each visType returns its supported triggers (elastic#70177)
  [Telemetry] Report data shippers (elastic#64935)
  Reduce SavedObjects mappings for Application Usage (elastic#70475)
  [Lens] fix dimension label performance issues (elastic#69978)
  Skip failing endgame tests (elastic#70548)
  [SIEM] Reenabling Cypress tests (elastic#70397)
  [SIEM][Security Solution][Endpoint] Endpoint Artifact Manifest Management + Artifact Download and Distribution (elastic#67707)
  [Security] Adds field mapping support to rule creation (elastic#70288)
  SECURITY-ENDPOINT: add fields for events to metadata document (elastic#70491)
  Fixed assertion in hybrid index pattern test to iterate through indices (elastic#70130)
  [SIEM][Exceptions] - Exception builder component (elastic#67013)
  [Ingest Manager] Rename data sources to package configs (elastic#70259)
  skip suites blocking es snapshot promomotion (elastic#70532)
  [Metrics UI] Fix asynchronicity and error handling in Snapshot API (elastic#70503)
  fix export response (elastic#70473)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Telemetry] Report array of well-known technologies
10 participants