-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: NUL-delimited output #1271
Comments
@charles-dyfis-net is it not simpler in this case to keep newline escaping, instead of using raw output? This allows to keep a single item per line, which is easier to loop over in a shell script: input.json
Filter
Command Line$ jq '.[]' input.json Output
Otherwise, you can actually add a character of your choice at the end of each line, directly from your Filter + NUL
Command Line + NUL
Output + NUL
Command Line + NUL as Raw (View as Hex)
Output + NUL as Raw (Viewed as Hex)
|
Thank you -- I actually have a few StackOverflow answers I'm going to want to amend in light of the patterns suggested in this ticket. That said, this still would be a desirable feature to have. Newline escaping requires the consumer's code to perform unescaping -- while The patterns given here are helpful: though |
If you use |
JSON (at least RFC 7159 JSON) does not permit unescaped ASCII control I'm not sure how you've come across this as an issue. Can you show me a On Fri, Nov 11, 2016 at 10:17 AM Thedward Blevins notifications@github.com @charles-dyfis-net https://github.com/charles-dyfis-net If you use -j instead of -r then it won't output the newline (\u00a0) — |
@wtlangford, gladly. Consider the following contrived example:
...where the intended output is (something equivalent to -- not all ksh-derivative shells implement
Instead, as given above, the actual output is:
Now, to fix this, we can use NUL delimiters. That would modify our expression to be something like the following:
...and it does in fact work exactly as desired. The only problem is that it requires the user to use some idioms that aren't completely obvious unless they read this ticket. :) |
Ah. I see, you're using the raw output mode. It does, as you've found, I see your use case now. I'm not strictly averse to adding a new flag, but On Fri, Nov 11, 2016, 22:58 Charles Duffy notifications@github.com wrote:
|
@charles-dyfis-net you could also keep the list of values encoded as JSON, then use #!/bin/sh
{
jq '.[] | .value' << INPUT_JSON
[
{"value": "I am\na multiline\nvalue\twith a tab"},
{"value": "I am a second value"}
]
INPUT_JSON
} | {
while read -r jsonString
do
printf 'JSON Value: <<<%s>>>\n' "$jsonString"
printf 'Text Value: <<<%s>>>\n' "$( jq -r -n "$jsonString")"
done
}
The conversion from JSON to text is done in |
@eric-brechemier, noted, though that's considerably less efficient than a single jq run. I think I'm entirely happy with @wtlangford's suggestion of treating this as a doc enhancement rather than a software enhancement -- now it's just a question of whether and when I have the time to assign this to myself and generate a wiki edit incorporating the many suggestions given here. :) |
@wtlangford without adding a new flag, you could repurpose the
|
It seems to me that the matter of enhancing jq to support "joining with NUL" is of rather low priority, and certainly much lower than several other issues (notably the release of jq 1.6). In any case, I suspect that most users who actually have the need to join with NUL can simply use the idiom:
That is, I suspect that most such users are working in an environment that has If using tr is not an option, then chances are that using the -c option in some other way, perhaps in conjunction with jq's support for @TSV and/or "\u0000", will suffice to solve the problem at hand. Rather than expending the very limited resources available on supporting NUL-as-delimiter, I believe it would be far better to enhance support for the application/json-seq MIME type. Specifically, it should be easy to use jq to accept a JSON stream as input but produce json-seq as output (and vice versa), but currently the (Note: To convert a stream of JSON texts to json-seq, one could use the form: jq -n --seq --slurpfile in <(STREAM) '$in[]' ) |
@pkoppstein, |
@charles-dyfis-net - My point is that one can use jq -c (without the -r option) to insert the NULs, and then later on in the processing convert to "raw output" if that is really needed. |
@pkoppstein, ...so what you have then is essentially the same proposal offered by @eric-brechemier of using multiple passes, with the same performance overhead -- which is to say, the need to invoke a separate instance of |
@charles-dyfis-net - My comments were mainly directed to the question of whether joining with NUL is really needed, not to the example which you yourself described as contrived. For non-contrived problems, I suspect your concerns about efficiency are probably misplaced. Consider, for example, pipelines of the form: while read -r line ; do MUNGE << "$line" | jq WHATEVER ; done < <(jq -c HEAVYLIFTING) In realistic scenarios, the additional cost associated with the inner invocations of jq will almost certainly be relatively small, perhaps even to the point of insignificance if reasonable care is taken with the details. The real issue here is probably #147 |
@pkoppstein are you referring to this? |
@eric-brechemier - That does seem to be related. |
So, yeah, a -0 would actually be nice. |
Yes please... pretty, pretty please! |
I was thinking of working on this (it looks pretty simple), which option do people want?
Personally I think I would prefer the first one. |
I ended up implementing the first option, but I'll be happy to change the PR to the other option if people prefer that. |
Thanks! I suggested the second option to address the reluctance to introduce a new flag. |
When we can expect release with this --nul-output support? |
Note that JSON strings can also contain null bytes ( |
The same security issue applies to the raw-output option, which could
contain LF `\n` characters.
There are several options for dealing with the security issue:
Document the issue in the manual page for the `-r` and `-0` options.
Error out when one of the output data items contains NUL or LF.
Add the length of the string as a prefix. There are language features
like `read -Nr` in bash that enable reading fixed length strings. That
isn't really useful though as shells use C strings which use NUL as a
terminator so they cannot represent data containing NUL as a variable.
It could be possible to load that into programs in other languages tho.
Add the individual output values as individual files in a directory.
Add options for other output formats that can escape NUL or LF chars.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
To start with I've filed a pull request for documenting the issue:
#2350
If anyone wants to work on the other options that would be great.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Reported-by: @pcworld Reported-in: jqlang#1271 (comment)
Reported-by: @pcworld Reported-in: jqlang#1271 (comment)
There is no, and cannot be a solution or work-around to failing to properly encode data to the syntax of the consuming application. The only reason that The As to documentation, the news won't reach the audience that most needs it, they'll just cargo-cult some naïve code and suffer the consequences. If some guidance to the perplexed is to be delivered, it should be quite clear, that this is fundamentally a correctness issue that is germane to all programming languages and essentially all data formats. Yes, there can be security consequences to getting this wrong, but even absent a security issue, the result is liable to be wrong in various corner or even common cases. In terms of working with shell commands, the
and thus assuming the arguments are validated as part of building the shell command, one can be sure that the command is executed as intended, without deserialisation errors:
If the output is an SQL query, then the serialisation needs to be escaped correctly for the intended SQL dialect (perhaps not a job for JQ, and so one might pass JSON into some other tool that has an SQL API and can quote SQL data). So while I am not ultimately opposed to some mention of the issues in the docs, I don't think the currently pending PR is the right way to handle this. |
I wonder if the -0 option should just get removed. It is just as unsafe
as the -r option and people looking for a safe alternative to the -r
option are going to switch to -0 without knowing about NUL being
allowable in JSON strings. Probably also the -r option should be
deprecated or removed too, in favour of external programs checking and
transforming the JSON output of jq into the needed formats.
Do you want to keep -0 and -r? What changes to the docs do you suggest?
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
That would be my recommendation. IIRC it has not been released yet, and if so, it should not be released.
No, sorry, that would be completely unacceptable. It makes Just because some users are sloppy CANNOT mean that jq is then made unusable for everyone else. The cargo cultists can shoot themselves in the foot in any language, and They can also print raw strings in Python, Perl, ... and I don't see any warnings in those languages about the dangers of text output. |
OK, please revert the commit adding the -0 option and
document in the FAQ that -0 will not be added.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
`--nul-output` / `-0` was removed RE: jqlang#1271 and jqlang#2350
@vdukhovni The You are suggesting that any code needs some formatting where actually here it is not the case: shell variables can hold any binary data that is not containing NUL char, and pipes handle any binary data. As long as a program can ensure that it's properly using NUL and ensuring what it separates doesn't have any NUL char inside, you'll be magnitude faster and safer than going through converters and formatters and re-interpretation of data. As I see it, For these reason, and for what it is worth, I'm not in favor or removing |
What is the compelling use-case for extracting a stream of multi-line strings from a JSON document to feed into a program that supports NUL-separated inputs? For I don't want to give users a false sense of security. Any "raw" output form (be it The suggestion to fail if an item for raw output already contains a NUL does provide some safety, at the cost of throwing errors that should have been handled in some manner before attempting to serialise the data in question as a NUL-separated (terminated) stream. If that's to be done, then one might argue that the same should be available (another option?) with newline-separated output, but even protecting against separator injection is not generally sufficient, sometimes injection of unexpected spaces or unexpected So if such a feature is to be provided, it should be more general:
Would support All that said, I am not convinced there are compelling practical and then sufficiently safe use-cases for this sort of feature. |
The idea is that you have some JSON data and want to safely pass parts of it to other programs via either stdin or command-line arguments. So you process the data with jq, output the data with a safe separator (usually NUL) and use xargs to convert stdin to command-line arguments. For extra safety you pass an option processing terminator before the arguments.
Agreed that injection attacks are always possible. The existing documentation for The handling of the failure when encountering output separators in the output data could be done by jq withholding all output until all of the input is processed. Or you could leave it to subsequent commands to handle the error exit code (likely via shell Without having the You can see here the original context where I personally wanted to use |
Ah, but if you just use |
Thanks for the examples. FWIW, instead of attempting to carefully serialise whatever happened to come in, I'd have restricted the values to a known safe subset
This is then safe to newline separate, and easier to work with. And I'd probably also take care with positional arguments that might look like short or long options, thus make sure to include a
Finally, it is still not clear to me whether the correct thing to do with unexpected values is to abort, or to just skip that value. |
The jq I've updated the wiki page to include your dpkg There could be an option for choosing to skip or abort on separator bytes in the output items, that could be made mandatory for The manual page is reasonably long as-is, so it feels OK to add a new section about safety in general, then the |
Ah, then do this: a) use This is much more general than |
That is a lot more complicated for folks who know shell much better than jq. |
I'm thinking we'll make |
Hmm, I thought jq always required JSON input, not randomly formatted input. Changing the meaning of an option is a major backwards compatibility issue too, so please don't do that. PS: my request to update the documentation to mention injection issues was already rejected in #2350, I can resubmit that if it is wanted. |
There's
There's no need to re-submit it. I'll review #2350. |
See also #2659. |
Repeating what I said in #2659, if
There is another option though: you could have a format filter I also like the idea suggested here, of having jq raise an error when the value to be output contains the terminator character. I agree with @vdukhovni's comments in #1271 that if you're about to output a value containing your seperator, something else is probably wrong (e.g. proper input validation). |
I have no objections to FWIW, I use NUL for the ASCII code point and NULL for the pointer, but if that's considered obscure/esoteric by others, I can live with Finally, I am not sure whether this should throw an error, or just drop non-conforming inputs. I'd be inclined to silently drop them, and if someone wants errors, they can arrange for that with explicit checks, or we could have two versions:
With |
In #2660 we were discussing methods for handling errors — in that case in the input. I would personally prefer a more general way to specify what to do with encoding errors, rather than having multiple versions of each relevant output format. I'm also not in favour of silently dropping non-conforming characters as a default, as I'm against possibly surprising behaviour. Perhaps this could be the way to override the default error handling behaviour, inspired by #2660 (comment) :
Regarding 'NUL'/'null': The ASCII standard (also RFC 20) uses 'NUL' as an acronym for what it calls the 'null character'. ISO/IEC 6429:1992 does the same thing, and so do UNICODE, and POSIX, referring to the ISO standard. The C standard does not mention 'NUL' at all, and only talks about the 'null character'. ECMAScript mentions And for what it's worth, Wikipedia currently calls it the 'null character', 'often abbreviated as NUL (or NULL, though in some contexts that term is used for the null pointer)'. So I'd say 'null' is the name of the character/byte/code point, but 'NUL' is a common abbreviation, which has the advantage of being unambiguous. |
Right now, the standard-practice way to read an array from
jq
into a shell-script is to use raw output and parse on newlines.However, JSON strings can contain literal newlines; this makes such parsing error-prone.
NUL-delimited output, allowing
IFS= read -r -d '' string
to read exactly one C string unambiguously, would resolve this.The text was updated successfully, but these errors were encountered: