Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(forwarder): improve CSV handling #324

Merged
merged 1 commit into from
Jul 29, 2024
Merged

fix(forwarder): improve CSV handling #324

merged 1 commit into from
Jul 29, 2024

Conversation

jta
Copy link
Contributor

@jta jta commented Jul 29, 2024

A few simple performance fixes for converting CSV to JSON:

  • avoid unmarshalling if target is json.RawMessage, since that calls json.Valid.
  • avoid calling strconv.Quote where possible. Most of the inputs we handle in practice do not require special handling.
  • keep track of max record length and use it to initialize our buffer, which avoids resizing.

In profiling, these changes reduced processing time for a VPC flow logs file by ~33%.

@jta jta force-pushed the joao/csv branch 2 times, most recently from 238cca6 to 993878b Compare July 29, 2024 17:22
@jta jta marked this pull request as ready for review July 29, 2024 18:29
}

// avoid unmarshalling if possible
if r, ok := v.(*json.RawMessage); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no additional tests for this new branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the only case we test today, since both the tests and the runner use json.RawMessage. Arguably we could add a testcase for the unused path, but that was tested prior so we do know it works. We would be ensuring that it continues to work going forward. I've update the test to verify this slow path. The test results differ due to key reordering (when we go through json.Unmarshal, keys are ordered alphabetically)

A few simple performance fixes for converting CSV to JSON:
- avoid unmarshalling if target is json.RawMessage, since that calls
  `json.Valid`.
- avoid calling `strconv.Quote` where possible. Most of the inputs we
  handle in practice do not require special handling.
- keep track of max record length and use it to initialize our buffer,
  which avoids resizing.

In profiling, these changes reduced processing time for a VPC flow logs
file by ~33%.
@jta jta merged commit 36b455a into main Jul 29, 2024
15 checks passed
@jta jta deleted the joao/csv branch July 29, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants