Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data.search] Use incrementCounter for search telemetry #91230

Merged
merged 10 commits into from
Feb 18, 2021

Conversation

lukasolson
Copy link
Member

@lukasolson lukasolson commented Feb 11, 2021

Summary

Resolves #89744.

When reporting search telemetry, we query a saved object (type search-telemetry), which, prior to this PR, had successCount, errorCount, and averageDuration, and forward these to the telemetry cluster.

When search requests complete, prior to this PR, we would query the saved object, and update each of these properties in the saved object. This often resulted in many 409 conflict errors, because of race conditions in updating the saved object. In addition, there was a bug in how we calculated the averageDuration.

This PR updates the collection to use incrementCounter, which uses a script to update the counts instead of querying/updating the saved object, which should get rid of the race condition. It also replaces averageDuration in the collected saved object with totalDuration, and uses incrementCounter to increment that as well. Then, when reporting to the telemetry cluster, we query the saved object, and calculate the averageDuration, which is then reported to the telemetry cluster.

Open questions

Since averageDuration was being calculated incorrectly, and since the saved object will now have totalDuration instead, we need to "reset" the values for each of the properties stored in the saved object. How do we accomplish this?

@lukasolson lukasolson added review Feature:Search Querying infrastructure in Kibana Feature:Telemetry v8.0.0 Team:AppServices release_note:skip Skip the PR/issue when compiling release notes v7.12.0 labels Feb 11, 2021
@lukasolson lukasolson self-assigned this Feb 11, 2021
@lukasolson lukasolson requested a review from a team as a code owner February 11, 2021 22:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a telemetry POV.
As for

"reset" the values for each of the properties stored in the saved object

You'd need a SO migration for that unless we simply throw the 'old' SO out and recreate the new one after the upgrade. I speak under correction here though, I'm not a SO expert! @Bamieh might be able to give more advice.

trackSuccess: async (duration: number) => {
const repository = await getRepository();
await repository.incrementCounter(SAVED_OBJECT_ID, SAVED_OBJECT_ID, [
{ fieldName: 'successCount' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we only migrate SO between versions id the data from one version to the next needs to be persisted (i.e. end-user facing SO). Since telemetry-related SO is fetched once every 24 hours and indexed and that hasn't changed, it doesn't really matter if we blow the SO object out the water and reinitialize a new one when the cluster upgrades. If you did want to persist the data, then I guess you'd need to add a migration. @Bamieh WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with tina. if a transformation is not possible of the old data simply using a new saved object id and dropping this one is fine. We can drop the existing data in the same savedobject type via migrations.

const migrate713 = (doc) => {
  return {
    ...doc,
    attributes: {},
  };
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR with a migration as suggested, @Bamieh could you take one more look? Thanks!

Copy link
Member

@Bamieh Bamieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Bamieh
Copy link
Member

Bamieh commented Feb 18, 2021

@elasticmachine merge upstream

},
]);
} catch (e) {
if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use SavedObjectsErrorHelpers.isConflictError(err) ?

{ fieldName: 'errorCount' },
]);
} catch (e) {
if (e.statusCode === 409 && retryCount < MAX_RETRY_COUNT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SavedObjectsErrorHelpers.isConflictError(err)

Copy link
Contributor

@lizozom lizozom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use SavedObjectsErrorHelpers.isConflictError(err) to detect conflicts, but otherwise LGTM

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@lukasolson lukasolson merged commit 2408d00 into elastic:master Feb 18, 2021
lukasolson added a commit to lukasolson/kibana that referenced this pull request Feb 18, 2021
* [data.search] Use incrementCounter for search telemetry

* Update reported type

* Retry conflicts

* Fix telemetry check

* Use saved object migration to drop previous document

* Review feedback

* Fix import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
lukasolson added a commit to lukasolson/kibana that referenced this pull request Feb 18, 2021
* [data.search] Use incrementCounter for search telemetry

* Update reported type

* Retry conflicts

* Fix telemetry check

* Use saved object migration to drop previous document

* Review feedback

* Fix import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
lukasolson added a commit that referenced this pull request Feb 19, 2021
)

* [data.search] Use incrementCounter for search telemetry

* Update reported type

* Retry conflicts

* Fix telemetry check

* Use saved object migration to drop previous document

* Review feedback

* Fix import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
lukasolson added a commit that referenced this pull request Feb 19, 2021
)

* [data.search] Use incrementCounter for search telemetry

* Update reported type

* Retry conflicts

* Fix telemetry check

* Use saved object migration to drop previous document

* Review feedback

* Fix import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Feb 19, 2021
* master: (111 commits)
  [Logs UI] Replace dependencies in the infra bundle (elastic#91503)
  [Search Source] Do not request unmapped fields if source filters are provided (elastic#91921)
  [APM] Kql Search Bar suggests values outside the selected time range (elastic#91918)
  Refactored component edit policy tests into separate folders and using client integration testing setup (elastic#91657)
  [Fleet] Don't error on missing package_assets value (elastic#91744)
  [Lens] Pass used histogram interval to chart (elastic#91370)
  [Indexpattern management] Use indexPatterns Service instead of savedObjects client (elastic#91839)
  [Security Solutions] Fixes Cypress tests for indicator match by making the selectors more specific (elastic#91947)
  [CI] backportrc can skip CI (elastic#91886)
  Revert "[SOM] fix flaky suites (elastic#91809)"
  [Fleet] Install Elastic Agent integration by default during setup (elastic#91676)
  [Fleet] Silently swallow 404 errors when deleting ingest pipelines (elastic#91778)
  [data.search] Use incrementCounter for search telemetry (elastic#91230)
  [Fleet] Bootstrap functional test suite (elastic#91898)
  [Alerts][Docs] Added API documentation for alerts plugin (elastic#91067)
  Use correct environment in anomaly detection setup link (elastic#91877)
  [FTSR] Convert to tasks and add jest/api integration suites (elastic#91770)
  [CI] Build and publish storybooks (elastic#87701)
  docs: add PHP agent info to docs (elastic#91773)
  [DOCS] Adds and updates Visualization advanced settings (elastic#91904)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Search Querying infrastructure in Kibana Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes review v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Telemetry] Possible race condition in stack_stats search reporting
6 participants