-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] Report data shippers #64935
[Telemetry] Report data shippers #64935
Conversation
436f7a4
to
c51aa04
Compare
c51aa04
to
db8096b
Compare
// GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version | ||
callCluster<ClusterState>('cluster.state', { | ||
index, | ||
metric: 'metadata', | ||
filterPath: [ | ||
// The payload is huge and we are only after the name (no other useful stuff so far) | ||
'metadata.indices.*.version', | ||
// Does it have `ecs.version` in the mappings? | ||
'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type', | ||
], | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a concern about using this API because its documentation says:
The response is an internal representation of the cluster state and its format may change from version to version. If possible, you should obtain any information from the cluster state using the other, more stable, cluster APIs.
I've added a functional test in test/api_integration/apis/telemetry/telemetry_local.js
to make sure it works as expected. Although it brings up the risk of flaky tests in the future if the API changes the way it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only alternative so far is to only use the GET <index>/_stats/
API. But that requires the kibana
user to have a more permissive role.
Alternatively, there are some ongoing talks with the ES team to modify the same Cluster State API to provide the aggregated data all at once (meaning ES shouldn't drop support or make any changes in that API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we discuss with other teams the possiblity of adding those permissions to the kibana role instead of using this API? im not completely against using this API if we have no other way, but i think it would make more sense to add those extra permissions and use the _stats
api.
Also if you check the compatiblity grid: https://github.com/elastic/kibana/#version-compatibility-with-elasticsearch
Are we sure this API does not change behavior across the compatiblity grid? Our tests will test exact ES matching version, but not other compatible versions where ES minor/patch versinos are newer or ES patch version is lower than kibana's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against updating the roles to be more permissive (that will allow us to consistently be able to retrieve the doc_count
and size_in_bytes
properties) but it involves some additional security concerns.
Not being able to retrieve the data from the cluster state API because of the compatibility grid will only result in not being able to provide this parameter in the telemetry (it will return it as {}
) but it shouldn't break any other logic unless the API request itself throws any errors.
The warning in that API is about the format may change though, not the API behaviour as such.
N.B.: I just pushed a commit to catch the method and safely return {}
if any of the API calls fail.
I think this approach is safer than opening the kibana_system
role to be able to read from any index. But I'm happy to revisit this implementation if we think that's the way to go or any other approach (like ES providing this kind of info already embedded in the _cluster/stats
API if they are willing the do the change and aggregation on their end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening up the kibana_system
role is not ideal. Could we explore creating a telemetry_system
role that has all the permissions needed?
cc @kobelb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think creating a telemetry_system
roles would necessarily help us here... If we added the telemetry_system
role to the kibana_system
user we'd have an equivalent "threat profile". If we created a new telemetry_system
user which had the telemetry_system
role, it would make setup more complicated and have a slightly different "threat profile" as both user's credentials would be stored in the kibana.yml
.
Augmenting the _cluster/stats
API so we don't have to change the kibana_system
privileges at all would be the safest option.
However, if we had to give the kibana_system
role the monitor
privilege on indices so we can use the index stats API, it's a lot safer than giving it access to read the documents themselves.
@afharo minor suggestion, and fine to leave as is if this blocks anything, but would it be possible to rename |
@alexfrancoeur absolutely, it's a minor change. The only reason I decided not to use But happy to change it you think it would make things easier :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from stack monitoring
) { | ||
const responses = await Promise.all( | ||
clusterUuids.map(async clusterUuid => { | ||
// Should we take into consideration CCS? https://github.com/elastic/kibana/blob/3a396027f669803e1a3143237578973fb1ab20d0/x-pack/plugins/monitoring/server/routes/api/v1/elasticsearch/indices.js#L42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Are there any repercussions if we do use CCS by default? (like will it be less efficient, or slower?). If not then my other concern is it would probably need to be conditional based on licensing (and maybe also if there are any config options tied with it). I think the terms are kind of confusing since we actually mean "Multi-stack monitoring" here, right?
Looks like it's only available for Gold license and above
However, I think CSS is available for all licenses:
Source: https://www.elastic.co/subscriptions
I think it's fine the way it is right now and can be added later if anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from Stack Monitoring pov (code and functionality) ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution looks good, although I did add a couple of questions.
I also pulled the code and ran the changed and added tests, all of which passed.
src/plugins/telemetry/server/telemetry_collection/ingest_solutions/constants.ts
Outdated
Show resolved
Hide resolved
src/plugins/telemetry/server/telemetry_collection/ingest_solutions/ingest_solutions.ts
Outdated
Show resolved
Hide resolved
// GET _cluster/state/metadata/<index>?filter_path=metadata.indices.*.version | ||
callCluster<ClusterState>('cluster.state', { | ||
index, | ||
metric: 'metadata', | ||
filterPath: [ | ||
// The payload is huge and we are only after the name (no other useful stuff so far) | ||
'metadata.indices.*.version', | ||
// Does it have `ecs.version` in the mappings? | ||
'metadata.indices.*.mappings._doc.properties.ecs.properties.version.type', | ||
], | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much tech debt do we want to get ourselves into? If it is possible to use a more stable API, I think it's worth changing.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
…emetry/get_data_telemetry.test.ts Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
@elasticmachine merge upstream |
} | ||
|
||
// Otherwise, try with the list of known index patterns | ||
return DATA_DATASETS_INDEX_PATTERNS.find(({ pattern }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a conversation with @alexfrancoeur and @kobelb, I need to change this to a .filter
@elasticmachine merge upstream |
…ort-data-providers
…ort-data-providers
9395313
to
4803f96
Compare
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* master: (46 commits) [Visualize] Add missing advanced settings and custom label for pipeline aggs (elastic#69688) Use dynamic: false for config saved object mappings (elastic#70436) [Ingest Pipelines] Error messages (elastic#70167) [APM] Show transaction rate per minute on Observability Overview page (elastic#70336) Filter out error when calculating a label (elastic#69934) [Visualizations] Each visType returns its supported triggers (elastic#70177) [Telemetry] Report data shippers (elastic#64935) Reduce SavedObjects mappings for Application Usage (elastic#70475) [Lens] fix dimension label performance issues (elastic#69978) Skip failing endgame tests (elastic#70548) [SIEM] Reenabling Cypress tests (elastic#70397) [SIEM][Security Solution][Endpoint] Endpoint Artifact Manifest Management + Artifact Download and Distribution (elastic#67707) [Security] Adds field mapping support to rule creation (elastic#70288) SECURITY-ENDPOINT: add fields for events to metadata document (elastic#70491) Fixed assertion in hybrid index pattern test to iterate through indices (elastic#70130) [SIEM][Exceptions] - Exception builder component (elastic#67013) [Ingest Manager] Rename data sources to package configs (elastic#70259) skip suites blocking es snapshot promomotion (elastic#70532) [Metrics UI] Fix asynchronicity and error handling in Snapshot API (elastic#70503) fix export response (elastic#70473) ...
Summary
Closes #64790
Report if well-known data shippers are used to index documents in the cluster.
Tested on OSS, X-Pack and Monitoring collectors.
Manual tests to explain the behaviour
Start a clean cluster:
The expected payload is an empty array
stack_stats.data = []
I installed
packetbeat
in my machine and started it.stack_stats.data
is now...For
packetbeat
indices pre-7.0, we didn't have the_meta.beat
information, so we'll report it underpattern_name
instead. But we'll report theshipper
property because we know that index pattern is strictly linked to that shipper. The index pattern is defined as{ pattern: 'packetbeat-*', patternName: 'packetbeat', shipper: 'packetbeat' }
If I create a new index called
citrix-1234
, matching the pattern*citrix*
, the following is added to the array.For the pattern
{ pattern: '*logs*', patternName: 'third-party-logs' }
, I create some indices containinglogs
in their name:The following payload is added to the
data
array:If I create an index following the New Indexing Strategy
We read the values from the mappings and include the following object to the
data
array:Final object after all
When Monitoring is ON
With the currently limited information we can retrieve in Monitoring, the reported payload in the same scenario would be:
So I've removed the collection of this information when using Monitoring until we find a way to accurately retrieve it (#68998).
TODO:
kibana_system
(Add monitor and view_index_metadata to built-inkibana_system
role elasticsearch#57755)*bro*
).ecs.version
doc_count
nor thesize
. I've got confirmation that that's OK if those 2 fields are optional.Adding thev7.8.0
label as tentative only.Checklist
Delete any items that are not applicable to this PR.
For maintainers