Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce over raw files - EOF doesn't terminate inputs #2374

Open
eddyashton opened this issue Nov 23, 2021 · 7 comments
Open

Reduce over raw files - EOF doesn't terminate inputs #2374

eddyashton opened this issue Nov 23, 2021 · 7 comments
Labels

Comments

@eddyashton
Copy link

eddyashton commented Nov 23, 2021

This looks like a bug, but perhaps I'm just missing something obvious. I'm trying to combine multiple raw (non-JSON) files and embed them in a JSON object. These files are ultimately found by a glob, so I don't think I can use --rawfile without extensive bash machinery on top; instead I need to reduce them with -R to read each line, and recombine them based on input_filename.

$ jq -Rn 'reduce inputs as $line ({}; .[input_filename] += [$line])' foo.txt bar.txt baz.txt

This almost works, but the last line of each file is inserted into the wrong list:

{
  "foo.txt": [
    "First line of foo.",
    "I'm a text file with multiple lines!"
  ],
  "bar.txt": [
    "Last line of foo.First line of bar.",
    "I'm also a text file!"
  ],
  "baz.txt": [
    "Last line of bar.First line of baz.",
    "I'm a third text file.",
    "Last line of baz."
  ]
}

It took me a while to work out, but this is because there's no bare newline at the end of my files. If I add that to the test files, it works correctly. ie, if I rewrite foo.txt from:

First line of foo.\nI'm a text file with multiple lines!\nLast line of foo.

to

First line of foo.\nI'm a text file with multiple lines!\nLast line of foo.\n

But I can't guarantee that the real files will be terminated by an empty line. Surely inputs should 'split' at EOF, as well as each newline? It clearly does for the final file, since we get a final entry for the last line there, so why does it combine the entries from earlier files, across EOF marks?

Just to confirm, this also happens with direct invocations of input:

$ jq -Rn '[input] + [input] + [input] + [input] + [input] + [input] + [input]' foo.txt bar.txt baz.txt
[
  "Last line of baz.",
  "I'm a third text file.",
  "Last line of bar.First line of baz.",
  "I'm also a text file!",
  "Last line of foo.First line of bar.",
  "I'm a text file with multiple lines!",
  "First line of foo."
]
@eddyashton
Copy link
Author

This is the behaviour on both 1.5 and 1.6, afaict.

@eddyashton
Copy link
Author

Another data point - this weirdness on the last line of a file also affects input_line_number, which is off-by-one on the final line of the final file:

$ jq -Rn 'reduce inputs as $line ({}; .[input_filename] += [input_line_number, $line])' foo.txt bar.txt baz.txt
{
  "foo.txt": [
    1,
    "First line of foo.",
    2,
    "I'm a text file with multiple lines!"
  ],
  "bar.txt": [
    1,
    "Last line of foo.First line of bar.",
    2,
    "I'm also a text file!"
  ],
  "baz.txt": [
    1,
    "Last line of bar.First line of baz.",
    2,
    "I'm a third text file.",
    2,
    "Last line of baz."
  ]
}

@xguo-prestolabs
Copy link

echo -n first line > a.txt
echo second line > b.txt
jq -R '.' a.txt b.txt   # gives "first linesecond line"
gojq -R '.' a.txt b.txt  # print 2 lines as expected.

@pkoppstein
Copy link
Contributor

pkoppstein commented Nov 25, 2021

Please note that, for better or worse, jq behaves just like cat:

$ cat foo.txt bar.txt
foo1
foo2bar1
bar2

In other words, stringing together file names on the jq command line is more akin to running cat than grep. Call that a bug if you wish, but jq input functions all generally ignore EOF, e.g. with 1 in a one-byte file named one.txt, and '2' in two.txt:

$ jq . one.txt two.txt

yields the single number: 12

So for jq 1.3 through jq 1.6, this perhaps undesirable behavior is, for most intents and purposes, a "feature" given the "backwards compatibility" constraint for X.Y versions.

@eddyashton
Copy link
Author

@pkoppstein Thanks for the clarification. So my proposed change in #2375 does actually break backwards compatibility here, as it now breaks at EOF when processing raw input:

$ jq . one.txt two.txt
12
$ jq -R . one.txt two.txt
"12"
$ ./jq . one.txt two.txt
12
$ ./jq -R . one.txt two.txt
"1"
"2"

@emanuele6
Copy link
Member

emanuele6 commented Nov 1, 2023

Another related problem is that normally jq ignores U+FEFF at the start of the file, but, if you pass multiple files, it only ignores it at the start of the first file.

$ jq . /dev/stdin /dev/fd/3 <<<$'\ufeff'1 3<<<$'\ufeff'2
1
jq: parse error: Invalid numeric literal at line 3, column 0

@nicowilliams
Copy link
Contributor

GitHub needs more emoticons, so we can express horror, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants