Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata telemetry not being recorded #2144

Closed
4 tasks
SpicyLemon opened this issue Sep 6, 2024 · 1 comment · Fixed by #2177
Closed
4 tasks

Metadata telemetry not being recorded #2144

SpicyLemon opened this issue Sep 6, 2024 · 1 comment · Fixed by #2177
Assignees
Labels
bug Something isn't working metadata Metadata Module
Milestone

Comments

@SpicyLemon
Copy link
Contributor

SpicyLemon commented Sep 6, 2024

Summary of Bug

None of the metadata telemetry stuff is actually being emitted.

Version

v1.19.1 (all the way back to v1.3.0 when it was "added").

Steps to Reproduce

  1. Connect a node to datadog.
  2. Try to find any telemetry info for the "metadata" key.

Note: I haven't actually tried those steps, but I'm certain telemetry isn't being recorded.

I found this by noticing that we are attempting to record telemetry using code like this:

	defer types.GetIncObjFunc(types.TLType_Scope, action)

That GetIncObjFunc function returns a func() that actually does the telemetry stuff.

When execution reaches that line, it queues up the call to types.GetIncObjFunc(types.TLType_Scope, action) to run at the end of the current function. So at the end, it calls GetIncObjFunc, which just creates a func(). But nothing tells go to actually execute that func, so it's simply discarded (similar to how defer iter.Close() just discards any error returned by iter.Close()).

If that line were this (note the extra () at the end):

	defer types.GetIncObjFunc(types.TLType_Scope, action)()

Then, when execution reaches that line, it would execute types.GetIncObjFunc(types.TLType_Scope, action), then it would queue up the resulting func() to run at the end of the current function. At the end, it'd then run the thing that updates telemetry before finishing it's return.

To verify that nothing is happening:

  1. I created a global var TelemetryCounter int in the events.go file.
  2. I updated GetIncObjFunc to increment TelemetryCounter at the end of the func() it returns (after the call(s) to telemetry.IncrCounterWithLabels).
  3. I updated some unit tests to check the TelemetryCounter value before and after executions that should be doing telemetry stuff.
  4. I wrote a simpler unit test that just did defer types.GetIncObjFunc(types.TLType_Scope, action).
  5. I found that the TelemetryCounter value was never changing.
  6. I then wrote a unit test that just did defer types.GetIncObjFunc(types.TLType_Scope, action)() and found that TelemetryCounter DID increment in that test.

The Problem

The GetIncObjFunc function returns a func() that updates telemetry data. However, the resulting func() is never being executed; it's being created then thrown away.

There's a couple ways to fix this.

  1. Add () to the end of all the invocations, e.g. defer types.GetIncObjFunc(types.TLType_Scope, action)().
  2. Refactor GetIncObjFunc to just do the telemetry updates instead of returning a function that does it; probably want to rename it too since it's now doing something instead of getting a function.

Further, I don't think we get anything by deferring these calls. In almost all cases, it's being done at the very end of the func anyway.

Further Consideration

It's probably worthwhile to have a discussion about whether this telemetry data is still wanted. I.e. should we fix this or delete that stuff?

Here's what it was supposed to track:

  • Changes in the number of metadata objects in state:
    • Type: counter.
    • Keys: "metadata", "stored-object".
    • Value: Either 1 if being created, or -1 if being deleted. If being updated, this telemetry entry isn't recorded; i.e. the value here is never 0.
    • Labels: Category, and Object Type (details below).
  • A count of each action performed for each metadata object type.
    • Type: counter.
    • Keys: "metadata", "object-action".
    • Value: 1 (always).
    • Labels: Category, Object Type, and Action (details below).

Labels:

  • Category:
    • Name: "category"
    • Possible Values: "entry", "specification", or "object-store-locator".
  • Object type:
    • Name: "object-type"
    • Possible Values: "scope", "session", "record", "scope-specification", "contract-specification", "record-specification", or "object-store-locator".
  • Action:
    • Name: "action"
    • Possible Values: "created", "updated", or "deleted".

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@SpicyLemon SpicyLemon added bug Something isn't working metadata Metadata Module labels Sep 6, 2024
@SpicyLemon SpicyLemon moved this from Todo to Backlog in Provenance Core Protocol Team Sep 6, 2024
@SpicyLemon SpicyLemon added this to the v1.20.0 milestone Sep 6, 2024
@SpicyLemon SpicyLemon moved this from Backlog to Todo in Provenance Core Protocol Team Sep 6, 2024
@iramiller
Copy link
Member

After the last discussion it probably doesn't make sense to track object counts in these kinds of metrics. Any data warehouse will already have a complete picture of the type and number of assets already and will not rely on these counters. If there was value here it would likely be in processing time counters that track how long it takes to validate records--but even there we can get this data based on overall request processing time which is handled separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working metadata Metadata Module
Projects
Development

Successfully merging a pull request may close this issue.

2 participants