Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): move otlp server data from runs to its own table #3028

Merged
merged 9 commits into from
Aug 4, 2023

Conversation

schoren
Copy link
Contributor

@schoren schoren commented Aug 3, 2023

This PR makes the OTLP server save data on its own table instead of directly updating the run table with the received spans.

This avoids bugs caused by moving the queues from in memory to external.

There wasn't a good place to put the traces repository needed, so I took the resources approach, moved the Trace out of the model package and into the traces package, and added the repo there. So, there are a lot of files changes that are just change the reference to the new traces.Trace and related

Changes

  • moved model.Trace and related structs into traces.Trace
  • added a traces.TraceRepository used to store and access otlp server db data
  • updated otlp.Ingestor to interface with the traces.TraceRepository

Checklist

  • tested locally
  • added new dependencies
  • updated the docs
  • added a test

if !traceDB.ShouldRetry() {
return true, "TraceDB is not retryable"
}

maxTracePollRetry := job.PollingProfile.Periodic.MaxTracePollRetry()
// we're done if we have the same amount of spans after polling or `maxTracePollRetry` times
log.Printf("[PollerExecutor] Test %s Run %d: Job count %d, max retries: %d", job.Test.ID, job.Run.ID, job.EnqueueCount(), maxTracePollRetry)
if job.EnqueueCount() == maxTracePollRetry {
if job.EnqueueCount() >= maxTracePollRetry {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a weird bug where a job got incorrectly requeued after hitting the max retries limit. Since the new EnqueueConunt was bigger than maxTracePollRetry and the check was for equality, it got forever stuck in a polling loop

"trace_id" varchar not null primary key,
"tenant_id" uuid,
"trace" JSONB,
"created_at" timestamp NOT NULL DEFAULT NOW()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this field gets autofilled on create. the rows are always inserted. for updates we use a delete+insert approach so we'll always have the correct value.

I think this might be useful for deleting old records from time to time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code in this file was not being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file was merged with this server/traces/traces_test.go

}

if run.State != test.RunStateAwaitingTrace {
// test is not waiting for trace, so we can completely ignore those as they might
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test integrity is now safe, since the results are saved to another table. the ingestor has no impact on the runs table

return FromSpanList(flattenSpans)
}

func FromSpanList(input []*v1.Span) Trace {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was extracted as an exported function so it could be reused by the ingestor more easily

)

func TestTraces(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was merged from server/model/traces_test.go

@schoren schoren removed the request for review from mathnogueira August 4, 2023 00:26
Copy link
Contributor

@xoscar xoscar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @schoren I really like that you moved the traces model and logic to its own package!

server/executor/default_poller_executor.go Outdated Show resolved Hide resolved
ctx,
`INSERT INTO otlp_traces (tenant_id, trace_id, trace) VALUES ($1, $2, $3)`,
sqlutil.TenantID(ctx),
trace.ID.String(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Philosophical question: should we rely on each vendor to have a unique TraceID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question. I can't think of a simple way to make this more unique and still be able to accurately query the table. I will make the PK include the tenant_id so it's at least unique among tenants. I guess at some point we'll need to relate to to something else, but maybe we should wait to have this problem and better understand the usecase behind it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the tenant ID to the PK makes it non-nullable, so it's no good. all the queries are using the tenant_id, so while this is not enforced on the DB level, it is at the app level, making it essentially the same

@schoren schoren force-pushed the otlp-server-db-table branch from e3308be to d046ae4 Compare August 4, 2023 15:05
@schoren schoren merged commit 64da9a7 into main Aug 4, 2023
@schoren schoren deleted the otlp-server-db-table branch August 4, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants