Enterprise Search Stack Monitoring #114303

kovyrin · 2021-10-07T15:16:57Z

Summary

This PR adds support for Enterprise Searc into the Stack Monitoring plugin. It relies on the new metricbeat module we're shipping in 7.16 (already merged + there is a PR to improve the metricsets) and that will be integrated into the Enterprise Search solution by default (running as a sidecar, controlled via the solution config).

The code in this PR is based primarily on the patterns and style of APM, Beats and Logstash modules and we tried to keep the changes extremely contained to avoid conflicts with any of the de-angularization work that is ongoing within the plugin. The overview page has been built with React (hence the react flag being enabled within the PR, we'll remove it before merging) to align with the new direction for the monitoring plugin.

Our team is planning to support and keep developing this code going forward and we're ready to make whatever changes necessary to align it with the status quo followed by other parts of Stack Monitoring. If any help is needed with testing of the changes, please let us know.

Event Structure

One thing of note in this PR is that Enterprise Search monitoring events fo not have a cluster_uuid field at the root level unlike all other events used by Stack Monitoring. Since the events are generated by metricbeat and elasticsearch metricbeat module has already added a cluster_uuid into the global schema as an alias for their field, we cannot use the same approach and we did not want to add the field to the global schema since it is not compatible with ECS. Instead, we had to change Stack monitoring logic for fetching time series to allow us to pass a flag to it to skip the implicit cluster_uuid filter applied to all queries. You can see the changes in get_metrics.ts and get_series.ts.

Feature Progress

Screenshots

Main page

Enterprise Search Overview

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
This was checked for cross-browser compatibility

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk	Probability	Severity	Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space.	Low	High	Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks.	High	Low	Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled.	Medium	High	Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

This was checked for breaking API changes and was labeled appropriately

…lugin patterns (including CCS, etc)

…h only)

…traffic

…terprise search stats in addition to metrics (they are fetched differently and allow us to reuse the stats code we have for the main page panel)

…cs anyways

phillipb

Looks pretty good! Small tweaks.

x-pack/plugins/monitoring/public/application/pages/enterprise_search/overview.tsx

x-pack/plugins/monitoring/public/components/enterprise_search/overview/overview.tsx

matschaffer · 2021-11-29T04:54:58Z

Guessing if we address @phillipb 's concerns here we can merge this.

JasonStoltz · 2021-12-03T18:53:52Z

@elasticmachine merge upstream

phillipb

LGTM!

phillipb · 2021-12-07T02:56:01Z

@elasticmachine merge upstream

kibana-ci · 2021-12-07T04:12:30Z

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`monitoring`	439	445	+6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`monitoring`	436.6KB	445.5KB	+8.9KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`monitoring`	23.6KB	23.6KB	+58.0B

History

💚 Build #11071 succeeded 26e5e8b
💚 Build #10929 succeeded 3dd0643
💔 Build #10795 failed 3d48164
💚 Build #10468 succeeded 5244518
💔 Build #10446 failed 68f3336
💔 Build #10434 failed 3a4951e

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kovyrin

kibanamachine · 2021-12-07T15:12:14Z

The following labels were identified as gaps in your version labels and will be added automatically:

v8.1.0

If any of these should not be on your pull request, please manually remove them.

@timestamp

* Added enterprise search panel, corrected queries * Update the index pattern for Enterprise Search * Typescript error ignore * Our timestamp fields are called @timestamp (per ECS) * Adjust Enterprise Search index patterns with the rest of monitoring plugin patterns (including CCS, etc) * Initial implementation of the Enterprise Search overview panel (health only) * Add a basic stub for enterprise search response fields * Cleanup aggs configs * Bring back a file deleted by mistake * Started working on the overview page * Correctly use heap_max as the total heap * Ent search breadcrumbs * Simple overview * Allow the cluster_uuid filter to be skipped while fetching metrics * Cleanup * Switch to module-level uuid field and use both types of events * Add stats-based product usage metrics + apply filter paths to reduce traffic * Change the name of the ent search overview class * Move the standalone cluster hack in the the internal function * Change the overview page to show product usage metrics + introduce enterprise search stats in addition to metrics (they are fetched differently and allow us to reuse the stats code we have for the main page panel) * Cluster UUID is at the module level now * Simplify ent search pages structure, only have one overview page * Fix ent search icon * Add total instances * Product usage metric graphs * Simplify metrics loading in the overview page since we load all metrics anyways * Add more enterprise search overview metrics * Avoid duplicate labels * linting * Revert "Simplify metrics loading in the overview page since we load all metrics anyways" This reverts commit 4bd67ab. * Switch to multiple timeseries per graph * Reorder graphs and metrics for better experience * Typescript fixes * i18n fixes * Added a couple more JVM metrics * Completely covered JVM metrics * Convert Enterprise Search component to Typescript * Switch config setting back * Remove the nodes link since it raises more questions than it solves * Update jest snapshots with the new metrics * Remove console statement * Properly handle cases when aggregations return no data for Enterprise Search * Add a functional test for the Enterprise search cluster list panel * Add a functional test for Enterprise Search overview page * Update multicluster API response fixture with the new enterprise search response key * Default uptime value is 0 * update overview fixture * More fixture updates * Remove fixmes * Fix imports * Properly export type * Maybe fix the type checking error * PR Feedback * TS fixes Co-authored-by: cdelgado <carlos.delgado@elastic.co> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Jason Stoltzfus <jason.stoltzfus@elastic.co>

kibanamachine · 2021-12-07T15:15:39Z

💚 Backport successful

Status	Branch	Result
✅	8.0

This backport PR will be merged automatically after passing CI.

@timestamp

* Added enterprise search panel, corrected queries * Update the index pattern for Enterprise Search * Typescript error ignore * Our timestamp fields are called @timestamp (per ECS) * Adjust Enterprise Search index patterns with the rest of monitoring plugin patterns (including CCS, etc) * Initial implementation of the Enterprise Search overview panel (health only) * Add a basic stub for enterprise search response fields * Cleanup aggs configs * Bring back a file deleted by mistake * Started working on the overview page * Correctly use heap_max as the total heap * Ent search breadcrumbs * Simple overview * Allow the cluster_uuid filter to be skipped while fetching metrics * Cleanup * Switch to module-level uuid field and use both types of events * Add stats-based product usage metrics + apply filter paths to reduce traffic * Change the name of the ent search overview class * Move the standalone cluster hack in the the internal function * Change the overview page to show product usage metrics + introduce enterprise search stats in addition to metrics (they are fetched differently and allow us to reuse the stats code we have for the main page panel) * Cluster UUID is at the module level now * Simplify ent search pages structure, only have one overview page * Fix ent search icon * Add total instances * Product usage metric graphs * Simplify metrics loading in the overview page since we load all metrics anyways * Add more enterprise search overview metrics * Avoid duplicate labels * linting * Revert "Simplify metrics loading in the overview page since we load all metrics anyways" This reverts commit 4bd67ab. * Switch to multiple timeseries per graph * Reorder graphs and metrics for better experience * Typescript fixes * i18n fixes * Added a couple more JVM metrics * Completely covered JVM metrics * Convert Enterprise Search component to Typescript * Switch config setting back * Remove the nodes link since it raises more questions than it solves * Update jest snapshots with the new metrics * Remove console statement * Properly handle cases when aggregations return no data for Enterprise Search * Add a functional test for the Enterprise search cluster list panel * Add a functional test for Enterprise Search overview page * Update multicluster API response fixture with the new enterprise search response key * Default uptime value is 0 * update overview fixture * More fixture updates * Remove fixmes * Fix imports * Properly export type * Maybe fix the type checking error * PR Feedback * TS fixes Co-authored-by: cdelgado <carlos.delgado@elastic.co> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Jason Stoltzfus <jason.stoltzfus@elastic.co> Co-authored-by: Oleksiy Kovyrin <oleksiy@kovyrin.net> Co-authored-by: cdelgado <carlos.delgado@elastic.co> Co-authored-by: Jason Stoltzfus <jason.stoltzfus@elastic.co>

@timestamp

* Added enterprise search panel, corrected queries * Update the index pattern for Enterprise Search * Typescript error ignore * Our timestamp fields are called @timestamp (per ECS) * Adjust Enterprise Search index patterns with the rest of monitoring plugin patterns (including CCS, etc) * Initial implementation of the Enterprise Search overview panel (health only) * Add a basic stub for enterprise search response fields * Cleanup aggs configs * Bring back a file deleted by mistake * Started working on the overview page * Correctly use heap_max as the total heap * Ent search breadcrumbs * Simple overview * Allow the cluster_uuid filter to be skipped while fetching metrics * Cleanup * Switch to module-level uuid field and use both types of events * Add stats-based product usage metrics + apply filter paths to reduce traffic * Change the name of the ent search overview class * Move the standalone cluster hack in the the internal function * Change the overview page to show product usage metrics + introduce enterprise search stats in addition to metrics (they are fetched differently and allow us to reuse the stats code we have for the main page panel) * Cluster UUID is at the module level now * Simplify ent search pages structure, only have one overview page * Fix ent search icon * Add total instances * Product usage metric graphs * Simplify metrics loading in the overview page since we load all metrics anyways * Add more enterprise search overview metrics * Avoid duplicate labels * linting * Revert "Simplify metrics loading in the overview page since we load all metrics anyways" This reverts commit 4bd67ab. * Switch to multiple timeseries per graph * Reorder graphs and metrics for better experience * Typescript fixes * i18n fixes * Added a couple more JVM metrics * Completely covered JVM metrics * Convert Enterprise Search component to Typescript * Switch config setting back * Remove the nodes link since it raises more questions than it solves * Update jest snapshots with the new metrics * Remove console statement * Properly handle cases when aggregations return no data for Enterprise Search * Add a functional test for the Enterprise search cluster list panel * Add a functional test for Enterprise Search overview page * Update multicluster API response fixture with the new enterprise search response key * Default uptime value is 0 * update overview fixture * More fixture updates * Remove fixmes * Fix imports * Properly export type * Maybe fix the type checking error * PR Feedback * TS fixes Co-authored-by: cdelgado <carlos.delgado@elastic.co> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Jason Stoltzfus <jason.stoltzfus@elastic.co>

carlosdelest and others added 7 commits October 6, 2021 14:38

Added enterprise search panel, corrected queries

79922f5

Update the index pattern for Enterprise Search

4ea8faf

Typescript error ignore

ccc7167

Our timestamp fields are called @timestamp (per ECS)

01f5a20

Adjust Enterprise Search index patterns with the rest of monitoring p…

5993a0b

…lugin patterns (including CCS, etc)

Initial implementation of the Enterprise Search overview panel (healt…

fb64cac

…h only)

Add a basic stub for enterprise search response fields

324c08b

kovyrin added Feature:Stack Monitoring 7.16 candidate labels Oct 7, 2021

kovyrin self-assigned this Oct 7, 2021

kovyrin added 20 commits October 7, 2021 11:21

Cleanup aggs configs

c71d85b

Bring back a file deleted by mistake

d8c4900

Started working on the overview page

4a320f0

Correctly use heap_max as the total heap

e04a604

Ent search breadcrumbs

785b816

Simple overview

8107643

Allow the cluster_uuid filter to be skipped while fetching metrics

a6427bc

Cleanup

fd9b065

Switch to module-level uuid field and use both types of events

174ffc1

Add stats-based product usage metrics + apply filter paths to reduce …

d8fb127

…traffic

Change the name of the ent search overview class

e73ec50

Move the standalone cluster hack in the the internal function

da39948

Change the overview page to show product usage metrics + introduce en…

aa3cc95

…terprise search stats in addition to metrics (they are fetched differently and allow us to reuse the stats code we have for the main page panel)

Cluster UUID is at the module level now

7739cd5

Simplify ent search pages structure, only have one overview page

6d7d95e

Fix ent search icon

5507957

Add total instances

c9b4e24

Product usage metric graphs

548a2a5

Simplify metrics loading in the overview page since we load all metri…

4bd67ab

…cs anyways

Add more enterprise search overview metrics

a090c45

Merge branch 'main' into kovyrin/ent-search-monitoring

62180c6

phillipb suggested changes Nov 11, 2021

View reviewed changes

matschaffer removed their request for review November 29, 2021 04:54

kovyrin added 2 commits December 1, 2021 11:42

Merge branch 'master' into kovyrin/ent-search-monitoring

e1c046a

Remove fixmes

37fa2ff

elastic deleted a comment from kibanamachine Dec 1, 2021

kovyrin and others added 5 commits December 1, 2021 12:04

Fix imports

3a4951e

Properly export type

68f3336

Maybe fix the type checking error

5244518

PR Feedback

3d48164

TS fixes

3dd0643

Merge branch 'main' into kovyrin/ent-search-monitoring

26e5e8b

phillipb approved these changes Dec 7, 2021

View reviewed changes

Merge branch 'main' into kovyrin/ent-search-monitoring

3b156b8

kovyrin merged commit 7929123 into main Dec 7, 2021

kovyrin deleted the kovyrin/ent-search-monitoring branch December 7, 2021 15:11

kibanamachine added the v8.1.0 label Dec 7, 2021

kibanamachine mentioned this pull request Dec 7, 2021

[8.0] Enterprise Search Stack Monitoring (#114303) #120630

Merged

neptunian mentioned this pull request Dec 14, 2021

[Stack Monitoring] Enterprise search module showing up in Cluster overview with no data #121192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enterprise Search Stack Monitoring #114303

Enterprise Search Stack Monitoring #114303

kovyrin commented Oct 7, 2021 •

edited

Loading

phillipb left a comment

matschaffer commented Nov 29, 2021

JasonStoltz commented Dec 3, 2021

phillipb left a comment

phillipb commented Dec 7, 2021

kibana-ci commented Dec 7, 2021

kibanamachine commented Dec 7, 2021

kibanamachine commented Dec 7, 2021

Enterprise Search Stack Monitoring #114303

Enterprise Search Stack Monitoring #114303

Conversation

kovyrin commented Oct 7, 2021 • edited Loading

Summary

Event Structure

Feature Progress

Screenshots

Main page

Enterprise Search Overview

Checklist

Risk Matrix

For maintainers

phillipb left a comment

Choose a reason for hiding this comment

matschaffer commented Nov 29, 2021

JasonStoltz commented Dec 3, 2021

phillipb left a comment

Choose a reason for hiding this comment

phillipb commented Dec 7, 2021

kibana-ci commented Dec 7, 2021

💚 Build Succeeded

Metrics [docs]

Module Count

Async chunks

Page load bundle

History

kibanamachine commented Dec 7, 2021

kibanamachine commented Dec 7, 2021

💚 Backport successful

kovyrin commented Oct 7, 2021 •

edited

Loading