Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify quotes in changes.csv #768

Open
tschmidtb51 opened this issue Aug 12, 2024 · 2 comments · May be fixed by #839
Open

Clarify quotes in changes.csv #768

tschmidtb51 opened this issue Aug 12, 2024 · 2 comments · May be fixed by #839
Assignees
Labels
csaf 2.1 csaf 2.1 work editor-revision already worked on in the editor revision

Comments

@tschmidtb51
Copy link
Contributor

Currently, it is not clear whether quotes are mandatory in the changes.csv or not. We should have a clear statement about that.

Currently, I tend to say: "All fields SHOULD be quoted." However, we need to check whether that is allowed by the csv-RFC.

@tschmidtb51 tschmidtb51 added tc-discussion-needed csaf 2.1 csaf 2.1 work investigation needed anything that needs some investigation labels Aug 12, 2024
@tschmidtb51 tschmidtb51 self-assigned this Aug 12, 2024
@tschmidtb51
Copy link
Contributor Author

Here is the investigation result from @oxisto:

CSV Quoting

The relevant section for quoting is RFC 4180, Section 2.

Support of Quoted Fields

Section 2 Paragraph 5 states "Each field may or may not be enclosed in double quotes"
This means that anyone is free to quote fields and a valid CSV parser must be able to understand it.

Mandatory Quotes

Furthermore, it states "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.".
This means that if any part of the value itself has double quotes, the field MUST be quoted. As far as I see there is no scenario where a value of the changes.csv actually contains a double quote, since the values it includes are filenames and timestamps, both cannot contain double quotes.

Similarly, Section 2 Paragraph 6 states "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.". This means that in addition to above, also values containing line breaks and commas must be quoted.

Section 2 Paragraph 7 states "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.". Example: "aaa","b""bb","ccc"

State of Implementation

Go

Go has the package encoding/csv. It supports reading quoted and unquoted fields. It even supports reading quotes in unquoted columns (using the LazyQuotes), although that is not really recommended by the standard. It does however only write quoted fields if it necessary (see https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/encoding/csv/writer.go;l=62-64 and https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/encoding/csv/writer.go;l=148-184). A force quote option was denied (golang/go#12755).

Java/Kotlin

Kotlin has a third-party library (https://github.com/jsoizo/kotlin-csv), it supports a "quote.mode" of ALL or CANONICAL, the latter being the default.

Python

The csv module has quite a lot of different options for reading and writing quotes. When setting the option, csv.QUOTE_ALL, all fields are quoted. The default option is csv.QUOTE_MINIMAL, which behaves like specified in the RFC Section 2, only quoting if necessary.

Rust

Rust has a community crate csv for CSV reading and writing. It also has a QuoteStyle. It can be set to Always, which always quotes, the default is Necessary, which is also similar to the RFC in Section 2.

Bash

Bash has no out-of-the box support for reading CSV, but it can be achieved with a multitude of different tools, such as read or awk or cut. Reading unquoted CSV files however is quote easy since for example read -r or cut -d , can easily split lines just seperated by comma or semicolon.

Note: In general, direct usage of bash reading input from an external source should always be discouraged because of the possibility of CSV injection attacks! Instead, a programming language should be used in order to avoid direct injection of shell commands.

Pro vs. Con

Con Always Quoted

  • Not necessary because of the type of the fields used in changes.csv
  • Not all (albeit the most) programming languages support writing "all" quotes
  • It deviates from the standard convention

Pro Always Quoted

  • Might be considered "cleaner" for a human to read
  • It is debatable, whether quoting helps with CSV Injections, because it is not trivially done with just double quoting but needs more quoting of special characters

@tschmidtb51 tschmidtb51 removed tc-discussion-needed investigation needed anything that needs some investigation labels Nov 20, 2024
@tschmidtb51
Copy link
Contributor Author

tschmidtb51 commented Nov 20, 2024

Suggested phrasing

for https://docs.oasis-open.org/csaf/csaf/v2.0/os/csaf-v2.0-os.html#7113-requirement-13-changescsv
(Thanks to @bernhardreiter for suggesting the phrasing below)

The changes.csv is a valid comma separated values format as defined by RFC 4180 without double quotes.

potentially for more verbosity or a rational:

(As consequence of requirements 2 and 11, there MUST NOT be characters within the file that would require quoting.)

alternatively to the previous sentence:

Note: As a consequence of 7.1.2 Requirement 2 for filenames and 7.1.11 Requirement 11 for directory names, there are no double-quote (") characters, because there is no necessity for quoting.

Note for implementors:
As CSAF 2.0 requires quotes, a RFC 4180 parser can read both format revisions.

tschmidtb51 added a commit to tschmidtb51/csaf that referenced this issue Nov 20, 2024
- addresses parts of oasis-tcs#768
- change changes.csv to non-quoted
- clarify wording
- add normative reference to RFC 4180
@tschmidtb51 tschmidtb51 added the editor-revision already worked on in the editor revision label Nov 20, 2024
tschmidtb51 added a commit to tschmidtb51/csaf that referenced this issue Dec 2, 2024
- addresses parts of oasis-tcs#768
- change changes.csv to non-quoted
- clarify wording
- add normative reference to RFC 4180
@tschmidtb51 tschmidtb51 linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csaf 2.1 csaf 2.1 work editor-revision already worked on in the editor revision
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant