Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flag to Avoid Treating NUL Separated Input as Binary #2974

Closed
LangLangBart opened this issue May 29, 2024 · 11 comments · Fixed by #2976
Closed

Add Flag to Avoid Treating NUL Separated Input as Binary #2974

LangLangBart opened this issue May 29, 2024 · 11 comments · Fixed by #2976
Labels
feature-request New feature or request

Comments

@LangLangBart
Copy link

LangLangBart commented May 29, 2024

Discussed in #2971


Issue

Currently, running a command like the following will print a warning:

printf "First\0" | bat -p
SCR-20240529-trdo

The warning is defined in src/printer.rs:

bat/src/printer.rs

Lines 435 to 444 in 8f8c953

if !self.config.style_components.header() {
if Some(ContentType::BINARY) == self.content_type && !self.config.show_nonprintable {
writeln!(
handle,
"{}: Binary content from {} will not be printed to the terminal \
(but will be present if the output of 'bat' is piped). You can use 'bat -A' \
to show the binary file contents.",
Yellow.paint("[bat warning]"),
input.description.summary(),
)?;

The decision to label the input as BINARY seems to be made in src/input.rs:

bat/src/input.rs

Lines 260 to 271 in 8f8c953

let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
let content_type = if first_line.is_empty() {
None
} else {
Some(content_inspector::inspect(&first_line[..]))
};
if content_type == Some(ContentType::UTF_16LE) {
reader.read_until(0x00, &mut first_line).ok();
}

A hacky workaround is to make the first line empty, use bat, and then remove the first line:

printf "\nFirst\0" | bat -p | sed '1d'

Proposed solution

A new flag that doesn't label content_type as BINARY when the first line ends with a NUL byte:

# naming the flag '--text' to align with 'grep/git diff'
printf "First\0" | bat -p --text

The crate 1 used to determine if content is binary states:

//! encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded
//! text can legally contain NULL bytes. Conversely, some particular binary formats (like binary

Based on this, a --text flag would be very appropriate, similar to how grep and git diff have one as well.

printf "First\0" | grep 'First'
# grep: (standard input): binary file matches

printf "First\0" | grep --text 'First'
# First

Footnotes

  1. sharkdp/content_inspector: Fast inspection of binary buffers to guess/determine the type of content

@LangLangBart LangLangBart added the feature-request New feature or request label May 29, 2024
@domenicomastrangelo
Copy link

Hi @LangLangBart, would your issue be fixed by adding the flag -A to show non printable characters?

This would result in the following output:

image

@LangLangBart
Copy link
Author

LangLangBart commented May 30, 2024

by adding the flag -A

Thanks for the suggestion. I failed to mention this in the issue report here and only described it in the linked discussion. For my use case, the -A/--show-all flag would not be adequate.

I try to colorize my zsh history and pipe it into fzf.

# zsh only, the '-N'  flag separates the array elements by `NUL`
print -rNC1 -- "${(@uv)history}" | bat -pl zsh | fzf --read0

@domenicomastrangelo
Copy link

My bad, I missed the discussion link.

So the issue you're having, is that when printing something (in this case a line from the history file), in case it has a null char in it, it will give an error.

It feels like a very nieche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I could work on this as I'm looking for my first contribution to the project, but it would be good to have an opinion from a more senior contributor too :)

@keith-hall
Copy link
Collaborator

I'm personally in favor of the idea, but it would be great to wait for input from some of the other maintainers before spending time on it, in case we don't all agree 😉

@LangLangBart
Copy link
Author

It feels like a very niche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I have updated the description, and I would propose a --text flag to align with grep and git diff.

wait for input from some of the other maintainers

Agreed, we should wait for input from some of the maintainers.

@sharkdp
Copy link
Owner

sharkdp commented May 30, 2024

Sounds good to me. Let's think about making this an option, not a flag. Maybe there are other reasonable options that we want to add later (apart from a yes or no decision). Like whether or not we print that warning.

@LangLangBart
Copy link
Author

LangLangBart commented May 31, 2024

Project goals and alternatives
...
Be a drop-in replacement for (POSIX) cat

Question: Why was the binary message added at all ?

EDIT1: I found the reason in #248, and #336


Let's think about making this an option, not a flag.

How about this?
If the input was labeled binary, check if the first_line would also be labeled binary if the last char is not there, don't label the input as binary?

How about --input={text,auto,…} ?

@domenicomastrangelo
Copy link

great, so @LangLangBart you propose --input to specify which language to use for printing or did I get it wrong?

so basically (more or less)

  if input not set
     let mut first_line = vec![]; 
     reader.read_until(b'\n', &mut first_line).ok(); 
      
     let content_type = if first_line.is_empty() { 
         None 
     } else { 
         Some(content_inspector::inspect(&first_line[..])) 
     }; 
      
     if content_type == Some(ContentType::UTF_16LE) { 
         reader.read_until(0x00, &mut first_line).ok(); 
     }
  else
    content_type = get_content_type_from_input(input)
  endif

@LangLangBart
Copy link
Author

@domenicomastrangelo

@einfachIrgendwer0815 started already a PR.

Besides the color, it works well. Image below comparing 0.7.1 vs their PR.

@einfachIrgendwer0815
Copy link
Contributor

@LangLangBart the color issue should be fixed now

@domenicomastrangelo
Copy link

nice :)

tmeijn pushed a commit to tmeijn/dotfiles that referenced this issue Jan 8, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [sharkdp/bat](https://github.com/sharkdp/bat) | minor | `v0.24.0` -> `v0.25.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>sharkdp/bat (sharkdp/bat)</summary>

### [`v0.25.0`](https://github.com/sharkdp/bat/blob/HEAD/CHANGELOG.md#v0250)

[Compare Source](sharkdp/bat@v0.24.0...v0.25.0)

#### Features

-   Set terminal title to file names when Paging is not Paging::Never [#&#8203;2807](sharkdp/bat#2807) ([@&#8203;Oliver-Looney](https://github.com/Oliver-Looney))
-   `bat --squeeze-blank`/`bat -s` will now squeeze consecutive empty lines, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `bat --squeeze-limit` to set the maximum number of empty consecutive when using `--squeeze-blank`, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   `PrettyPrinter::squeeze_empty_lines` to support line squeezing for bat as a library, see [#&#8203;1441](sharkdp/bat#1441) ([@&#8203;eth-p](https://github.com/eth-p)) and [#&#8203;2665](sharkdp/bat#2665) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for JavaScript files that start with `#!/usr/bin/env bun` [#&#8203;2913](sharkdp/bat#2913) ([@&#8203;sharunkumar](https://github.com/sharunkumar))
-   `bat --strip-ansi={never,always,auto}` to remove ANSI escape sequences from bat's input, see [#&#8203;2999](sharkdp/bat#2999) ([@&#8203;eth-p](https://github.com/eth-p))
-   Add or remove individual style components without replacing all styles [#&#8203;2929](sharkdp/bat#2929) ([@&#8203;eth-p](https://github.com/eth-p))
-   Automatically choose theme based on the terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
-   Add option `--binary=as-text` for printing binary content, see issue [#&#8203;2974](sharkdp/bat#2974) and MR [#&#8203;2976](sharkdp/bat#2976) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Make shell completions available via `--completion <shell>`, see issue [#&#8203;2057](sharkdp/bat#2057) and MR [#&#8203;3126](sharkdp/bat#3126) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Syntax highlighting for puppet code blocks within Markdown files, see [#&#8203;3152](sharkdp/bat#3152) ([@&#8203;liliwilson](https://github.com/liliwilson))

#### Bugfixes

-   Fix long file name wrapping in header, see [#&#8203;2835](sharkdp/bat#2835) ([@&#8203;FilipRazek](https://github.com/FilipRazek))
-   Fix `NO_COLOR` support, see [#&#8203;2767](sharkdp/bat#2767) ([@&#8203;acuteenvy](https://github.com/acuteenvy))
-   Fix handling of inputs with OSC ANSI escape sequences, see [#&#8203;2541](sharkdp/bat#2541) and [#&#8203;2544](sharkdp/bat#2544) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix handling of inputs with combined ANSI color and attribute sequences, see [#&#8203;2185](sharkdp/bat#2185) and [#&#8203;2856](sharkdp/bat#2856) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix panel width when line 10000 wraps, see [#&#8203;2854](sharkdp/bat#2854) ([@&#8203;eth-p](https://github.com/eth-p))
-   Fix compile issue of `time` dependency caused by standard library regression [#&#8203;3045](sharkdp/bat#3045) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Fix override behavior of --plain and --paging, see issue [#&#8203;2731](sharkdp/bat#2731) and MR [#&#8203;3108](sharkdp/bat#3108) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Fix bugs in `$LESSOPEN` support, see [#&#8203;2805](sharkdp/bat#2805) ([@&#8203;Anomalocaridid](https://github.com/Anomalocaridid))

#### Other

-   Upgrade to Rust 2021 edition [#&#8203;2748](sharkdp/bat#2748) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Refactor and cleanup build script [#&#8203;2756](sharkdp/bat#2756) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Checks changelog has been written to for MRs in CI [#&#8203;2766](sharkdp/bat#2766) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   Use GitHub API to get correct MR submitter [#&#8203;2791](sharkdp/bat#2791) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Minor benchmark script improvements [#&#8203;2768](sharkdp/bat#2768) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Update Arch Linux package URL in README files [#&#8203;2779](sharkdp/bat#2779) ([@&#8203;brunobell](https://github.com/brunobell))
-   Update and improve `zsh` completion, see [#&#8203;2772](sharkdp/bat#2772) ([@&#8203;okapia](https://github.com/okapia))
-   More extensible syntax mapping mechanism [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Use proper Architecture for Debian packages built for musl, see [#&#8203;2811](sharkdp/bat#2811) ([@&#8203;Enselic](https://github.com/Enselic))
-   Pull in fix for unsafe-libyaml security advisory, see [#&#8203;2812](sharkdp/bat#2812) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git-version dependency to use Syn v2, see [#&#8203;2816](sharkdp/bat#2816) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Update git2 dependency to v0.18.2, see [#&#8203;2852](sharkdp/bat#2852) ([@&#8203;eth-p](https://github.com/eth-p))
-   Improve performance when color output disabled, see [#&#8203;2397](sharkdp/bat#2397) and [#&#8203;2857](sharkdp/bat#2857) ([@&#8203;eth-p](https://github.com/eth-p))
-   Relax syntax mapping rule restrictions to allow brace expansion [#&#8203;2865](sharkdp/bat#2865) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Apply clippy fixes [#&#8203;2864](sharkdp/bat#2864) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Faster startup by offloading glob matcher building to a worker thread [#&#8203;2868](sharkdp/bat#2868) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Display which theme is the default one in basic output (no colors), see [#&#8203;2937](sharkdp/bat#2937) ([@&#8203;sblondon](https://github.com/sblondon))
-   Display which theme is the default one in colored output, see [#&#8203;2838](sharkdp/bat#2838) ([@&#8203;sblondon](https://github.com/sblondon))
-   Add aarch64-apple-darwin ("Apple Silicon") binary tarballs to releases, see [#&#8203;2967](sharkdp/bat#2967) ([@&#8203;someposer](https://github.com/someposer))
-   Update the Lisp syntax, see [#&#8203;2970](sharkdp/bat#2970) ([@&#8203;ccqpein](https://github.com/ccqpein))
-   Use bat's ANSI iterator during tab expansion, see [#&#8203;2998](sharkdp/bat#2998) ([@&#8203;eth-p](https://github.com/eth-p))
-   Support 'statically linked binary' for aarch64 in 'Release' page, see [#&#8203;2992](sharkdp/bat#2992) ([@&#8203;tzq0301](https://github.com/tzq0301))
-   Update options in shell completions and the man page of `bat`, see [#&#8203;2995](sharkdp/bat#2995) ([@&#8203;akinomyoga](https://github.com/akinomyoga))
-   Update nix dev-dependency to v0.29.0, see [#&#8203;3112](sharkdp/bat#3112) ([@&#8203;decathorpe](https://github.com/decathorpe))
-   Bump MSRV to [1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html), see [#&#8203;3154](sharkdp/bat#3154) ([@&#8203;keith-hall](https://github.com/keith-hall))
-   Update clircle dependency to remove winapi transitive dependency, see [#&#8203;3113](sharkdp/bat#3113) ([@&#8203;niklasmohrin](https://github.com/niklasmohrin))

#### Syntaxes

-   `cmd-help`: scope subcommands followed by other terms, and other misc improvements, see [#&#8203;2819](sharkdp/bat#2819) ([@&#8203;victor-gp](https://github.com/victor-gp))
-   Upgrade JQ syntax, see [#&#8203;2820](sharkdp/bat#2820) ([@&#8203;dependabot](https://github.com/dependabot)\[bot])
-   Add syntax mapping for quadman quadlets [#&#8203;2866](sharkdp/bat#2866) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Map containers .conf files to TOML syntax [#&#8203;2867](sharkdp/bat#2867) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `.xsh` files with `xonsh` syntax that is Python, see [#&#8203;2840](sharkdp/bat#2840) ([@&#8203;anki-code](https://github.com/anki-code))
-   Associate JSON with Comments `.jsonc` with `json` syntax, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate JSON-LD `.jsonld` files with `json` syntax, see [#&#8203;3037](sharkdp/bat#3037) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate `.textproto` files with `ProtoBuf` syntax, see [#&#8203;3038](sharkdp/bat#3038) ([@&#8203;vorburger](https://github.com/vorburger))
-   Associate GeoJSON `.geojson` files with `json` syntax, see [#&#8203;3084](sharkdp/bat#3084) ([@&#8203;mvaaltola](https://github.com/mvaaltola))
-   Associate `.aws/{config,credentials}`, see [#&#8203;2795](sharkdp/bat#2795) ([@&#8203;mxaddict](https://github.com/mxaddict))
-   Associate Wireguard config `/etc/wireguard/*.conf`, see [#&#8203;2874](sharkdp/bat#2874) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Add support for [CFML](https://www.adobe.com/products/coldfusion-family.html), see [#&#8203;3031](sharkdp/bat#3031) ([@&#8203;brenton-at-pieces](https://github.com/brenton-at-pieces))
-   Map `*.mkd` files to `Markdown` syntax, see issue [#&#8203;3060](sharkdp/bat#3060) and MR [#&#8203;3061](sharkdp/bat#3061) ([@&#8203;einfachIrgendwer0815](https://github.com/einfachIrgendwer0815))
-   Add syntax mapping for CITATION.cff, see [#&#8203;3103](sharkdp/bat#3103) ([@&#8203;Ugzuzg](https://github.com/Ugzuzg))
-   Add syntax mapping for kubernetes config files [#&#8203;3049](sharkdp/bat#3049) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Adds support for pipe delimiter for CSV [#&#8203;3115](sharkdp/bat#3115) ([@&#8203;pratik-m](https://github.com/pratik-m))
-   Add syntax mapping for `/etc/pacman.conf` [#&#8203;2961](sharkdp/bat#2961) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
-   Associate `uv.lock` with `TOML` syntax, see [#&#8203;3132](sharkdp/bat#3132) ([@&#8203;fepegar](https://github.com/fepegar))

#### Themes

-   Patched/improved themes for better Manpage syntax highlighting support, see [#&#8203;2994](sharkdp/bat#2994) ([@&#8203;keith-hall](https://github.com/keith-hall)).

#### `bat` as a library

-   Changes to `syntax_mapping::SyntaxMapping` [#&#8203;2755](sharkdp/bat#2755) ([@&#8203;cyqsimon](https://github.com/cyqsimon))
    -   `SyntaxMapping::get_syntax_for` is now correctly public
    -   \[BREAKING] `SyntaxMapping::{empty,builtin}` are removed; use `SyntaxMapping::new` instead
    -   \[BREAKING] `SyntaxMapping::mappings` is replaced by `SyntaxMapping::{builtin,custom,all}_mappings`
-   Make `Controller::run_with_error_handler`'s error handler `FnMut`, see [#&#8203;2831](sharkdp/bat#2831) ([@&#8203;rhysd](https://github.com/rhysd))
-   Improve compile time by 20%, see [#&#8203;2815](sharkdp/bat#2815) ([@&#8203;dtolnay](https://github.com/dtolnay))
-   Add `theme::theme` for choosing an appropriate theme based on the
    terminal's color scheme, see [#&#8203;2896](sharkdp/bat#2896) ([@&#8203;bash](https://github.com/bash))
    -   \[BREAKING] Remove `HighlightingAssets::default_theme`. Use `theme::default_theme` instead.
-   Add `PrettyPrinter::print_with_writer` for custom output destinations, see [#&#8203;3070](sharkdp/bat#3070) ([@&#8203;kojix2](https://github.com/kojix2))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS45MS40IiwidXBkYXRlZEluVmVyIjoiMzkuOTEuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
This was referenced Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants