-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(csv codec): added additional csv encoding options #18149
Conversation
✅ Deploy Preview for vector-project canceled.
|
✅ Deploy Preview for vrl-playground ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @scMarkus for submitting this PR!
/// In some variants of CSV, quotes are escaped using a special escape character | ||
/// like \ (instead of escaping quotes by doubling them). | ||
/// | ||
/// To use this `double_uotes` needs to be disabled as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// To use this `double_uotes` needs to be disabled as well | |
/// To use this `double_quote` needs to be disabled as well |
/// like \ (instead of escaping quotes by doubling them). | ||
/// | ||
/// To use this `double_uotes` needs to be disabled as well | ||
pub escape: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, worth documenting what happens when double_quote
is enabled.
#[derive(Debug, Clone)] | ||
pub struct CsvSerializer { | ||
delimiter: u8, | ||
double_quote: bool, | ||
escape: u8, | ||
fields: Vec<ConfigTargetPath>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified:
#[derive(Debug, Clone)] | |
pub struct CsvSerializer { | |
delimiter: u8, | |
double_quote: bool, | |
escape: u8, | |
fields: Vec<ConfigTargetPath>, | |
#[derive(Debug, Clone, Default)] | |
pub struct CsvSerializer { | |
config: CsvSerializerConfig, |
pub fn new(conf: CsvSerializerConfig) -> Self { | ||
Self { | ||
delimiter: conf.csv.delimiter, | ||
double_quote: conf.csv.double_quote, | ||
escape: conf.csv.escape, | ||
fields: conf.csv.fields, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you accept the suggestion above, then this can be removed.
pub fn new(conf: CsvSerializerConfig) -> Self { | |
Self { | |
delimiter: conf.csv.delimiter, | |
double_quote: conf.csv.double_quote, | |
escape: conf.csv.escape, | |
fields: conf.csv.fields, | |
} |
} | ||
|
||
#[test] | ||
fn custom_delimiter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for providing these tests!
If custom_delimiter
fails due to #17261 we can (1) make sure this test passes and (2) make a note on current behavior vs desired behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view it is the other way around. Writing custom_escape_char
I could not get it to work. Digging into that I wrote custom_delimiter
as well as found out about this issue which I think is a bug in the current version of vector?
EDIT: I miss read the comment. In fact I messed up in the initial description already. The delimiter
test in is not failing but correct_quoting
is (I will edit the description). custom_delimiter
runs successfully.
opts.fields = fields; | ||
opts.delimiter = b' '; | ||
opts.double_quote = true; | ||
//opts.escape = b'\''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete if not used.
.delimiter(self.delimiter) | ||
.double_quote(self.double_quote) | ||
.escape(self.escape) | ||
.terminator(csv::Terminator::Any(b'\0')) // TODO: this needs proper 'nothig' value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to address TODO
de307c8
to
109f1c1
Compare
Hi @scMarkus, whenever you want this reviewed again, please mark it as "ready for review". |
Thanks for the offer @pront. As thinks stand at the moment I would like to request your guidance in regards to how to properly proceed with the bug at hand (which I really want to be fixed since I intend to utilize csv quoting myself). To proof that omitting the line terminator in Would it be reasonable to maintain this patch in the vector repository for the time being? Or ignore the bug at the moment and simply implement the new configuration feature only? Or any better strategy you may come up with? |
Let's focus on completing this config feature. Also, document expected caveats. As for the |
5da7233
to
d2d4bea
Compare
@pront I tried to document the situation as much as possible in the code. If there is any special syntax for referencing related issues or pull request please let me know. Additional I would like to ask you to have another detailed look at the implementation. I can see some more test failing but I am not quite sure what those are related to. |
d2d4bea
to
eb45586
Compare
|
Hi @scMarkus, thank you for efforts on this PR. I think this looking pretty good, there are some details left to address specifically about the Also, I pinged the maintainer of |
* Initial Signed-off-by: ktf <krunotf@gmail.com> * Fixes Signed-off-by: ktf <krunotf@gmail.com> * Fixes Signed-off-by: ktf <krunotf@gmail.com> * Tests Signed-off-by: ktf <krunotf@gmail.com> * Add docs Signed-off-by: ktf <krunotf@gmail.com> * Add semantic Signed-off-by: ktf <krunotf@gmail.com> * Move url Signed-off-by: ktf <krunotf@gmail.com> * Fix url Signed-off-by: ktf <krunotf@gmail.com> * Add request docs Signed-off-by: ktf <krunotf@gmail.com> * Add batch docs Signed-off-by: ktf <krunotf@gmail.com> * Bump Signed-off-by: ktf <krunotf@gmail.com> * Clippy Signed-off-by: ktf <krunotf@gmail.com> * Apply feedback Signed-off-by: ktf <krunotf@gmail.com> * Apply feedback Signed-off-by: ktf <krunotf@gmail.com> * Add use Signed-off-by: ktf <krunotf@gmail.com> * Bump Signed-off-by: ktf <krunotf@gmail.com>
found potential bug on writing lines with quoted fields
697e558
to
71dcd47
Compare
Closes #17261
This will be a Draft for now since I potentially found a bug in the existing implementation. This commit so far includes the discussed config changes which I happily accept critique for since I am quite new to rust. Furthermore it contains additional tests like
correct_quoting
which might be surface a bug in the current implementation?To fix this behavior I asked here for en enhancement in the respective csv lib. Opinions on that are appreciated as well