Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV writer improvements #5604

Merged
merged 5 commits into from
Jun 12, 2024
Merged

CSV writer improvements #5604

merged 5 commits into from
Jun 12, 2024

Conversation

rcaudy
Copy link
Member

@rcaudy rcaudy commented Jun 11, 2024

  1. Change CSV writing code to use chunk-based reading code and stop allocating boxed primitives.
  2. Correct some trivial warnings.
  3. Fix column header separator-escaping bug.

…cating boxed primitives. Correct some trivial warnings. Fix column header separator-escaping bug.
@rcaudy rcaudy added core Core development tasks NoDocumentationNeeded csv ReleaseNotesNeeded Release notes are needed labels Jun 11, 2024
@rcaudy rcaudy added this to the June 2024 milestone Jun 11, 2024
@rcaudy rcaudy requested a review from lbooker42 June 11, 2024 20:49
@rcaudy rcaudy self-assigned this Jun 11, 2024
@rcaudy
Copy link
Member Author

rcaudy commented Jun 12, 2024

Unit tests have been updated to provide full coverage for the new/changed code.
Some basic testing suggests significant speedup in cases where chunk filling is more performant than serial get.

@rcaudy rcaudy requested review from devinrsmith and removed request for lbooker42 June 12, 2024 16:20
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that using Table.columnIterators would be more expensive because they don't use a shared context and have individual keys; but that makes me wish we had a iterator-like construction that could be as efficient as this.

ColumnsIterator it = table.columnsIterator();
OfByte bytes = it.byteColumn("Foo");
OfDouble doubles = it.doubleColumn("Bar");
while (it.hasNext()) {
  byte myByte = bytes.getByte();
  double myDouble = doubles.getDouble();
  it.advance();
}
it.close()

Having something like this would make it easier IMO to splay out to these sorts of row-oriented formats without having to worry as much about low-level chunking details.

Happy to approve PR otherwise, but had a few Qs.

@rcaudy
Copy link
Member Author

rcaudy commented Jun 12, 2024

I see that using Table.columnIterators would be more expensive because they don't use a shared context and have individual keys; but that makes me wish we had a iterator-like construction that could be as efficient as this.

ColumnsIterator it = table.columnsIterator();
OfByte bytes = it.byteColumn("Foo");
OfDouble doubles = it.doubleColumn("Bar");
while (it.hasNext()) {
  byte myByte = bytes.getByte();
  double myDouble = doubles.getDouble();
  it.advance();
}
it.close()

Having something like this would make it easier IMO to splay out to these sorts of row-oriented formats without having to worry as much about low-level chunking details.

Happy to approve PR otherwise, but had a few Qs.

That's a clever idea. It might be a lot easier to use for many use cases. It obviously wouldn't be a Java Iterator, though, which makes it less suitable for other use cases.

devinrsmith
devinrsmith previously approved these changes Jun 12, 2024
devinrsmith
devinrsmith previously approved these changes Jun 12, 2024
@rcaudy rcaudy merged commit bf2fdec into deephaven:main Jun 12, 2024
15 checks passed
@rcaudy rcaudy deleted the rwc-csvwriter branch June 12, 2024 20:44
@github-actions github-actions bot locked and limited conversation to collaborators Jun 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
core Core development tasks csv NoDocumentationNeeded ReleaseNotesNeeded Release notes are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants