Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(csv codec): added additional csv encoding options #18149

Closed
wants to merge 8 commits into from

Conversation

scMarkus
Copy link
Contributor

@scMarkus scMarkus commented Aug 3, 2023

Closes #17261

This will be a Draft for now since I potentially found a bug in the existing implementation. This commit so far includes the discussed config changes which I happily accept critique for since I am quite new to rust. Furthermore it contains additional tests like correct_quoting which might be surface a bug in the current implementation?

To fix this behavior I asked here for en enhancement in the respective csv lib. Opinions on that are appreciated as well

@netlify
Copy link

netlify bot commented Aug 3, 2023

Deploy Preview for vector-project canceled.

Name Link
🔨 Latest commit 71dcd47
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/64d777416adfd70007af9d67

@netlify
Copy link

netlify bot commented Aug 3, 2023

Deploy Preview for vrl-playground ready!

Name Link
🔨 Latest commit 71dcd47
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/64d77741d38efb00086ce56c
😎 Deploy Preview https://deploy-preview-18149--vrl-playground.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jszwedko jszwedko requested a review from pront August 3, 2023 16:08
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @scMarkus for submitting this PR!

/// In some variants of CSV, quotes are escaped using a special escape character
/// like \ (instead of escaping quotes by doubling them).
///
/// To use this `double_uotes` needs to be disabled as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// To use this `double_uotes` needs to be disabled as well
/// To use this `double_quote` needs to be disabled as well

/// like \ (instead of escaping quotes by doubling them).
///
/// To use this `double_uotes` needs to be disabled as well
pub escape: u8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, worth documenting what happens when double_quote is enabled.

Comment on lines 99 to 104
#[derive(Debug, Clone)]
pub struct CsvSerializer {
delimiter: u8,
double_quote: bool,
escape: u8,
fields: Vec<ConfigTargetPath>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified:

Suggested change
#[derive(Debug, Clone)]
pub struct CsvSerializer {
delimiter: u8,
double_quote: bool,
escape: u8,
fields: Vec<ConfigTargetPath>,
#[derive(Debug, Clone, Default)]
pub struct CsvSerializer {
config: CsvSerializerConfig,

Comment on lines 109 to 115
pub fn new(conf: CsvSerializerConfig) -> Self {
Self {
delimiter: conf.csv.delimiter,
double_quote: conf.csv.double_quote,
escape: conf.csv.escape,
fields: conf.csv.fields,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you accept the suggestion above, then this can be removed.

Suggested change
pub fn new(conf: CsvSerializerConfig) -> Self {
Self {
delimiter: conf.csv.delimiter,
double_quote: conf.csv.double_quote,
escape: conf.csv.escape,
fields: conf.csv.fields,
}

}

#[test]
fn custom_delimiter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing these tests!

If custom_delimiter fails due to #17261 we can (1) make sure this test passes and (2) make a note on current behavior vs desired behavior.

Copy link
Contributor Author

@scMarkus scMarkus Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view it is the other way around. Writing custom_escape_char I could not get it to work. Digging into that I wrote custom_delimiter as well as found out about this issue which I think is a bug in the current version of vector?
EDIT: I miss read the comment. In fact I messed up in the initial description already. The delimiter test in is not failing but correct_quoting is (I will edit the description). custom_delimiter runs successfully.

opts.fields = fields;
opts.delimiter = b' ';
opts.double_quote = true;
//opts.escape = b'\'';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete if not used.

.delimiter(self.delimiter)
.double_quote(self.double_quote)
.escape(self.escape)
.terminator(csv::Terminator::Any(b'\0')) // TODO: this needs proper 'nothig' value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to address TODO

@pront
Copy link
Member

pront commented Aug 4, 2023

Hi @scMarkus, whenever you want this reviewed again, please mark it as "ready for review".

@neuronull neuronull added the meta: awaiting author Pull requests that are awaiting their author. label Aug 4, 2023
@scMarkus
Copy link
Contributor Author

scMarkus commented Aug 5, 2023

Thanks for the offer @pront.

As thinks stand at the moment I would like to request your guidance in regards to how to properly proceed with the bug at hand (which I really want to be fixed since I intend to utilize csv quoting myself). To proof that omitting the line terminator in rust-csv works I created a patch myself here which works as intended for me locally. Still I am unsure on this being the correct way of doing things as well as on the acceptance criteria on said pull request.

Would it be reasonable to maintain this patch in the vector repository for the time being? Or ignore the bug at the moment and simply implement the new configuration feature only? Or any better strategy you may come up with?

@pront
Copy link
Member

pront commented Aug 7, 2023

Would it be reasonable to maintain this patch in the vector repository for the time being? Or ignore the bug at the moment and simply implement the new configuration feature only? Or any better strategy you may come up with?

Let's focus on completing this config feature. Also, document expected caveats.

As for the current_quoting test, make sure it passes and add a link to the rust-csv issue. When we update to a fixed version it will prompt us to update the test and the docs. Feel free to ping me if you hear back from rust-csv maintainers!

@scMarkus scMarkus force-pushed the csv_configurable branch 4 times, most recently from 5da7233 to d2d4bea Compare August 8, 2023 18:52
@scMarkus
Copy link
Contributor Author

scMarkus commented Aug 8, 2023

@pront I tried to document the situation as much as possible in the code. If there is any special syntax for referencing related issues or pull request please let me know. Additional I would like to ask you to have another detailed look at the implementation. I can see some more test failing but I am not quite sure what those are related to.

@scMarkus scMarkus marked this pull request as ready for review August 10, 2023 13:01
@bits-bot
Copy link

bits-bot commented Aug 11, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ scMarkus
✅ jszwedko
❌ robot-vector-vic
You have signed the CLA already but the status is still pending? Let us recheck it.

@pront
Copy link
Member

pront commented Aug 11, 2023

Hi @scMarkus, thank you for efforts on this PR. I think this looking pretty good, there are some details left to address specifically about the CsvSerializerOptions comments.

Also, I pinged the maintainer of rust-csv to see if we can get an update on your suggested fix 🤞

jszwedko and others added 8 commits August 12, 2023 10:37
* Initial

Signed-off-by: ktf <krunotf@gmail.com>

* Fixes

Signed-off-by: ktf <krunotf@gmail.com>

* Fixes

Signed-off-by: ktf <krunotf@gmail.com>

* Tests

Signed-off-by: ktf <krunotf@gmail.com>

* Add docs

Signed-off-by: ktf <krunotf@gmail.com>

* Add semantic

Signed-off-by: ktf <krunotf@gmail.com>

* Move url

Signed-off-by: ktf <krunotf@gmail.com>

* Fix url

Signed-off-by: ktf <krunotf@gmail.com>

* Add request docs

Signed-off-by: ktf <krunotf@gmail.com>

* Add batch docs

Signed-off-by: ktf <krunotf@gmail.com>

* Bump

Signed-off-by: ktf <krunotf@gmail.com>

* Clippy

Signed-off-by: ktf <krunotf@gmail.com>

* Apply feedback

Signed-off-by: ktf <krunotf@gmail.com>

* Apply feedback

Signed-off-by: ktf <krunotf@gmail.com>

* Add use

Signed-off-by: ktf <krunotf@gmail.com>

* Bump

Signed-off-by: ktf <krunotf@gmail.com>
found potential bug on writing lines with quoted fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: awaiting author Pull requests that are awaiting their author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CSV encoder configurable
6 participants