Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add rsources stats for dropped events at processor #3852

Merged
merged 3 commits into from
Sep 21, 2023

Conversation

Sidddddarth
Copy link
Member

@Sidddddarth Sidddddarth commented Sep 11, 2023

Description

  • All events failed during the processor's pipeline (failed, filtered) are treated as dropped with respect to rudder-sources stats.
  • For dropped events we are not capturing failed records

Linear Ticket

Ensure drop events are marked as failed in source [job-status endpoint]

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@@ -2216,7 +2216,8 @@ func (proc *Handle) transformSrcDest(
var successCountMetadataMap map[string]MetricMetadata
eventsToTransform, successMetrics, successCountMap, successCountMetadataMap = proc.getDestTransformerEvents(response, commonMetaData, eventsByMessageID, destination, transformer.UserTransformerStage, trackingPlanEnabled, transformationEnabled)
failedJobs, failedMetrics, failedCountMap := proc.getFailedEventJobs(response, commonMetaData, eventsByMessageID, transformer.UserTransformerStage, transformationEnabled, trackingPlanEnabled)
proc.saveFailedJobs(failedJobs)
droppedJobs := proc.getDroppedJobs(response, eventList)
proc.saveFailedJobs(append(failedJobs, droppedJobs...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that for rudder sources' use cases both failed and dropped jobs in processor's pipeline could be considered as dropped (i.e. non-retryable)

  1. We could rename
  • proc.saveFailedJobs to proc.saveDroppedJobs
  • rsources.NewFailedJobsCollector to rsources.NewDroppedJobsCollector
  • rsources.JobsFailed to rsources.JobsDropped
  1. Dropped jobs failed keys shouldn't be captured
  2. It would be better to report rsources dropped jobs during Store, once for all scenarios as the last operation of the method, after storing router, batchrouter & proc_errors

func (proc *Handle) getDroppedJobs(response transformer.Response, eventsToTransform []transformer.TransformerEvent) []*jobsdb.JobT {
// each messageID is one event when sending to the transformer
inputMessageIDs := lo.Map(eventsToTransform, func(e transformer.TransformerEvent, _ int) string {
return e.Metadata.MessageID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why processor's pipeline operates on the assumption that messageIDs are unique. Dedup is a feature that can be turned off, right? Am I missing some other important fact with respect to message ids and the processor's pipeline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is correct but dedup ideally should never be turned off

Also for many of the downstream destinations like Warehouse they do operate under the assumption that the primary key is the messageID which is unique

@Sidddddarth Sidddddarth force-pushed the chore.rsourcesStatsForDroppedEvents branch 2 times, most recently from 20511d7 to 6c16717 Compare September 15, 2023 07:40
@Sidddddarth Sidddddarth requested a review from a team September 15, 2023 09:30
@Sidddddarth Sidddddarth force-pushed the chore.rsourcesStatsForDroppedEvents branch from 9eadca3 to 0dbe30f Compare September 15, 2023 12:17
@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Patch coverage: 88.76% and project coverage change: -0.08% ⚠️

Comparison is base (1fa2f45) 69.07% compared to head (a544978) 69.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3852      +/-   ##
==========================================
- Coverage   69.07%   69.00%   -0.08%     
==========================================
  Files         351      352       +1     
  Lines       52814    52858      +44     
==========================================
- Hits        36480    36473       -7     
- Misses      14043    14087      +44     
- Partials     2291     2298       +7     
Files Changed Coverage Δ
...ger/transformation/transformationStatusUploader.go 45.97% <0.00%> (+2.96%) ⬆️
processor/processor.go 87.50% <89.28%> (-0.01%) ⬇️
processor/events_response.go 91.48% <91.48%> (ø)
processor/transformer/transformer.go 90.39% <100.00%> (+0.17%) ⬆️
services/rsources/stats_collector.go 90.90% <100.00%> (+0.18%) ⬆️

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sidddddarth Sidddddarth force-pushed the chore.rsourcesStatsForDroppedEvents branch 2 times, most recently from 6514b03 to dbbd0f7 Compare September 15, 2023 14:40
@Sidddddarth Sidddddarth merged commit f5b8e7b into master Sep 21, 2023
@Sidddddarth Sidddddarth deleted the chore.rsourcesStatsForDroppedEvents branch September 21, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants