Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query): Add NestedCheckpointReader for input format parser #6385

Merged
merged 8 commits into from
Jul 1, 2022

Conversation

youngsofun
Copy link
Member

@youngsofun youngsofun commented Jul 1, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

methods in TypeDeserializer may be called recursively, and in each recursive call, it may need to checkpoint.

add a new NestedCheckpointReader, and use it throughout the parse progress.

Changelog

  • Bug Fix

Related Issues

Fixes #6353

@vercel
Copy link

vercel bot commented Jul 1, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jul 1, 2022 at 0:34AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Jul 1, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-bugfix this PR patches a bug in codebase label Jul 1, 2022
@mergify

This comment was marked as outdated.

@youngsofun youngsofun changed the title fix(query): Add NestedCheckpointReader fix(query): Add NestedCheckpointReader for input format parser Jul 1, 2022
use super::BufferRead;

// this struct can avoid NestedCheckpointReader<NestedCheckpointReader<R>> in recursive functions
pub struct NestedCheckpointReader<R: BufferRead> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is thatCheckpointReader renaming to NestedCheckpointReader ?

Copy link
Member Author

@youngsofun youngsofun Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it can maintain multi checkpoints.
a big diff implementation:
keep using the buffer instead of preadd to the buffer of the inner reader when rollback

} else {
reader.rollback_to_checkpoint()?;
reader.pop_checkpoint();
self.inner.de_text_quoted(reader, format)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can not construct a tmp CheckpointReader only when needed because the recursive call to de_text_quoted() , compiler complain about potential type CheckpointReader<CheckpointReader<CheckpointReader<...>>>

self.checkpoints.push(self.pos)
}

// todo(youngsofun): add CheckPointGuard to make it safer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be great to have a Guard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do it in the next pr, with some other minor refactors. let`s fix the bug first.

@databend-bot
Copy link
Member

Wait for another reviewer approval

@databend-bot
Copy link
Member

CI Passed
Reviewers Approved
Let's Merge
Thank you for the PR @youngsofun

@databend-bot databend-bot merged commit bff565e into databendlabs:main Jul 1, 2022
format: &FormatSettings,
) -> Result<()> {
reader.push_checkpoint();
Copy link
Member Author

@youngsofun youngsofun Jul 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not find a good way to impl a guard for checkpoint

@sundy-li

  1. most of the time, need to pop_checkpoint early like here, guard does not help.
  2. guard need to hold a mut ref to impl Drop, so the flowing code need to use guard.reader.read(), it is not convenient. and can not pass it to inner consumer (like de_text_json here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

streaming_load just lost first 'N'
4 participants