Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch #99031

Merged
merged 9 commits into from
Aug 30, 2021

Conversation

alexwizp
Copy link
Contributor

@alexwizp alexwizp commented May 3, 2021

Part of: #93770

This PR is a part of #93770 for src/plugins/visualizations/server/usage_collector/get_usage_collector.ts
What was done:

  • Size parameter was changed from 10000 to 1000
  • Code was refactored to use soClient instead of esClient
  • The paging is used through the soClient.createPointInTimeFinder method
  • Code was optimized, extra cycles were removed

space,
past_days: getPastDays(lastUpdated),
};
const finder = await soClient.createPointInTimeFinder({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question applies to all refactored telemetries (#98903, #98914, #99023) where we replaced esClient with soClient.
During my testing today, I found that this finder returns only data for the default namespace. If I create visualizations for a non-default namespace, this will not be reflected in the telemetry.

I've tried to fix that by adding namespaces: ['*'], but with that argument pagination doesn't work. see:
image

@lukeelmers @afharo could you please help me a little cause we have a recommendation to use soClient ( #88604 (comment) ) for telemetry with pagination.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah we specifically decided against adding namespace support when using PIT until we had a concrete need for it. I guess we do now 🙂

I've created an issue to track this enhancement: #99044

@afharo WDYT, do we push forward on this work without soClient (or with using traditional pagination which typically works with <10k results), or do we wait until such a time as we have namespace support in #99044?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to wait, because the code with soClient looks easier to me + we don't need to implement paging ourselves. If I'm not mistaken in esClient we don't have a simple wrapper like createPointInTimeFinder . + #88604 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we push forward on this work without soClient

IMO, it feels like we keep coming back and forth with this conversation. And, usually, AFAIK, the general recommendation is to use the SO Client whenever possible because using plain ES client requests would break if/when we change the internal implementations of SOs (underlying indices, ways to store them, ...).

I'm not fully familiar with the limitations of allowing namespaces: ['*']. I need to better understand the reasons exposed in #99044 before recommending migrating to esClient as a workaround :)

So ++ to wait :)

Copy link
Member

@afharo afharo May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been playing around with this PR locally, and I think these changes would solve the issue: alexwizp#6

I tested it locally and it works as expected to me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I don't know how I thought it worked when I tried... It's def not working 🤷

Sorry about that! Then, we need to rely on the outcome of #99044

Copy link
Member

@lukeelmers lukeelmers May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the question remains, do we adjust the scope of this PR as follows?

  • Size parameter was changed from 10000 to 1000
  • Code was refactored to use soClient instead of esClient
  • The paging is used through the soClient.createPointInTimeFinder method
  • Code was optimized, extra cycles were removed

Basically, since @alexwizp has already started the work, is it worth merging with the same 10k size, only switching to SO client instead? Or do we close this PR and consider it blocked by #99044? WDYT @alexwizp @afharo

(Either way it doesn't actually resolve #93770, it would just be a small refactor if we moved ahead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a lot of sense just to switch esClient -> soClient. We started that work to resolve #93770.
But on the other hand I'm ok to use soClient with size === 10000. it adds more contributions to my piggy bank =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to what @alexwizp said. I guess the question is how long would it take us to fix #99044? If it's just a couple of weeks, probably it's worth putting this PR on hold and rebase when ready?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add #99044 to our list of items to discuss for our next sprint. I don't think it would be a particularly large amount of work, it's mostly about deciding on a strategy with the security team & ensuring there's adequate test coverage.

@alexwizp
Copy link
Contributor Author

alexwizp commented May 5, 2021

@elasticmachine merge upstream

@alexwizp
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

merge conflict between base and head

@alexwizp alexwizp added v7.16.0 Team:Visualizations Visualization editors, elastic-charts and infrastructure v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Feature:Visualizations Generic visualization features (in case no more specific feature label is available) labels Aug 26, 2021
@alexwizp alexwizp self-assigned this Aug 26, 2021
@alexwizp alexwizp requested a review from stratoula August 26, 2021 13:37
const visSummaries: VisSummary[] = esResponse.hits.hits.map((hit) => {
const spacePhrases = hit._id.toString().split(':');
const lastUpdated: string = get(hit, '_source.updated_at');
const space = spacePhrases.length === 3 ? spacePhrases[0] : 'default'; // if in a custom space, the format of a saved object ID is space:type:id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was removed because we can get space in more legal way from so.namespaces

@alexwizp alexwizp marked this pull request as ready for review August 27, 2021 13:07
@alexwizp alexwizp requested a review from a team August 27, 2021 13:07
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@alexwizp
Copy link
Contributor Author

@afharo @stratoula, blocker issue was fixed - PR was rebased to latest master, tested locally and ready to your final review

@stratoula
Copy link
Contributor

@elasticmachine merge upstream

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, tested it locally, works fine

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @alexwizp

@alexwizp alexwizp merged commit aa8364a into elastic:master Aug 30, 2021
alexwizp added a commit to alexwizp/kibana that referenced this pull request Aug 30, 2021
…elays on Elasticsearch (elastic#99031)

* [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch

Part of: elastic#93770

* fix CI

* fix typo

* fix namespaces issue

* fix tests

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
alexwizp added a commit that referenced this pull request Aug 30, 2021
…elays on Elasticsearch (#99031) (#110447)

* [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch

Part of: #93770

* fix CI

* fix typo

* fix namespaces issue

* fix tests

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Aug 30, 2021
…eporting-to-v2

* 'master' of github.com:elastic/kibana: (120 commits)
  [Lens] should register "suffix" field formatter in setup lifecycle (elastic#110218)
  skip flaky suite (elastic#98463)
  skip flaky suite (elastic#108633)
  [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#98903)
  fixes failing tests (elastic#110436)
  [TSVB] Remove deprecated `IFieldType` (elastic#110404)
  [Lens] Remove deprecated `IFieldType` (elastic#109825)
  [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#99023)
  [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#99031)
  [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#98914)
  Don't add split part of UI if we have one series (elastic#109483)
  [Discover] Migrate angular routing to react router (elastic#107042)
  [Security Solution][Endpoint][Event Filters] Fixes missing spacers between event filters cards (elastic#110282)
  [ML] Data Grid: Fix alignment of sorting arrow when histogram charts are enabled (elastic#110053)
  [canvas] Fix image argument form issues (elastic#109767)
  Fix asset in Pitch template (elastic#109742)
  chore(NA): moving @kbn/securitysolution-list-api to babel transpiler (elastic#110265)
  chore(NA): moving @kbn/securitysolution-list-constants to babel transpiler (elastic#110269)
  [Fleet] Fix upgrade link in Fleet policy table (elastic#110228)
  [ML] APM Latency Correlations: Fix empty state (elastic#109813)
  ...

# Conflicts:
#	src/plugins/data/common/query/timefilter/types.ts
FrankHassanabad added a commit that referenced this pull request Feb 15, 2022
…int in Time) and restructuring of folders (#124912)

## Summary

Changes the usage collector telemetry within security solutions to use PIT (Point in Time) and a few other bug fixes and restructuring.

* The main goal is to change the full queries for up to 10k items to be instead using 1k batched items at a time and PIT (Point in Time). See [this ticket](#93770) for more information and [here](#99031) for an example where they changed there code to use 1k batched items. I use PIT with SO object API, searches, and then composite aggregations which all support the PIT. The PIT timeouts are all set to 5 minutes and all the maximums of 10k to not increase memory more is still there. However, we should be able to increase the 10k limit at this point if we wanted to for usage collector to count beyond the 10k. The initial 10k was an elastic limitation that PIT now avoids.
* This also fixes a bug where the aggregations were only returning the top 10 items instead of the full 10k. That is changed to use [composite aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html). 
* This restructuring the folder structure to try and do [reductionism](https://en.wikipedia.org/wiki/Reductionism#In_computer_science) best we can. I could not do reductionism with the schema as the tooling does not allow it. But the rest is self-repeating in the way hopefully developers expect it to be. And also make it easier for developers to add new telemetry usage collector counters in the same fashion.
* This exchanges the hand spun TypeScript types in favor of using the `caseComments` and the `Sanitized Alerts` and the `ML job types` using Partial and other TypeScript tricks.
* This removes the [Cyclomatic Complexity](https://en.wikipedia.org/wiki/Cyclomatic_complexity) warnings coming from the linters by breaking down the functions into smaller units.
* This removes the "as casts" in all but 1 area which can lead to subtle TypeScript problems.
* This pushes down the logger and uses the logger to report errors and some debug information

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Visualizations Generic visualization features (in case no more specific feature label is available) release_note:skip Skip the PR/issue when compiling release notes Team:Visualizations Visualization editors, elastic-charts and infrastructure v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants