Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display binary file content for bat -A #640

Merged
merged 4 commits into from
Aug 31, 2019

Conversation

pjsier
Copy link
Contributor

@pjsier pjsier commented Aug 27, 2019

Closes #623. When the -A flag is supplied, attempts to print hex-escaped lines for binary files. The one caveat is it escapes non-ASCII characters, not just non-UTF-8.

From what I could tell printing the hex-escaped version of UTF-8 text would be a lot more involved, but if there's an option I missed here I'm happy to make changes

@sharkdp
Copy link
Owner

sharkdp commented Aug 31, 2019

Thank you very much for your contribution!

There are a few things that could be improved.

For binary files, the grid styling is not quite right:

───────┬─────────────────────────────────────────────────
       │ File: test.png   <BINARY>
───────┴─────────────────────────────────────────────────
   1   │ \x89PNG\r\n
   2   │ \x1a\n
   3   │ \x00\x00\x00\rIHDR\x00\x00\x00\x80\x00\x00\x00D\
       │ x08\x02\x00\x00\x00\xc6%\xaa>\x00\x00\x00\xc2IDA
       │ Tx^\xed\xd4\x81\x06\xc30\x14@\xd1\xb74\xdd\xff\x
       │ ffo\xb3tV\xea\x89\x12l(s\xe2\xaa4I\x03\x87\xd6\x
       │ fe\xd8{\x89\xbbR\x8d;\x87\xfe\x01\x00\x80\x00\x0
       │ 0\x10\x00\x00\x02\x00@\x00\x00\x08\x00\x00\x01\x
       │ 00•\x00\x00\x04\x00\x80\x00\x00\x10\x00\x00\x02\
       │ x00@\x00\x00\x08\x00\x00\x01\x00•\x00\x00\x00\xd
       │ 4^jdK\x94\xf5\x98|\xd1\xf4\x92\\\\>\xcf\x9c?sqX_
       │ \xaf\x8by[\xee\x96\xb6G\xeb\xf1\xea\xd1\xce\xb6\
       │ xe3u;\xe6\xb9\x95\x8d\xc7\xce\x039\xc9\xaf\xc63\
       │ x93{f7\xcf\xab\xbf\xf9\xc9/\x08\x80\x00\x00\x10\
       │ x00\x00\x02\x00@\x00\x00\x08\x00\x00\x01\x00•\x0
       │ 0\x00\x04\x00\x80\x00\x00\x10\x00\x00\x02\x00@\x
       │ 00\x00\x08\x00\x00\x01\x00•\x00\x00\x8c7\xdbh\x0
       │ 3•\xfb\xed\x96e\x00\x00\x00\x00IEND\xaeB`\x82

Notice that the grid in the header-section is not joined with the lower part ( instead of ). Also, there is no final separator line at the end. For comparison, here is a text file:

───────┬─────────────────────────────────────────────────
       │ File: example
───────┼─────────────────────────────────────────────────
   1   │ hello␊
───────┴─────────────────────────────────────────────────

Implementation-wise, I think that the preprocessor module would be a much better place to make the non-ascii => hex conversion:

bat/src/preprocessor.rs

Lines 36 to 73 in c64ab29

pub fn replace_nonprintable(input: &str, tab_width: usize) -> String {
let mut output = String::new();
let tab_width = if tab_width == 0 { 4 } else { tab_width };
for chr in input.chars() {
match chr {
// space
' ' => output.push('•'),
// tab
'\t' => {
if tab_width == 1 {
output.push('↹');
} else {
output.push('├');
output.push_str(&"─".repeat(tab_width - 2));
output.push('┤');
}
}
// line feed
'\x0A' => output.push('␊'),
// carriage return
'\x0D' => output.push('␍'),
// null
'\x00' => output.push('␀'),
// bell
'\x07' => output.push('␇'),
// backspace
'\x08' => output.push('␈'),
// escape
'\x1B' => output.push('␛'),
// anything else
_ => output.push(chr),
}
}
output
}

This would have the added advantage that we could highlight the \x9a tokens in a nice way.

@sharkdp
Copy link
Owner

sharkdp commented Aug 31, 2019

A more serious issue:
This destroys some of the preprocessing that we did before where we replaced some characters like \n with Unicode characters. So this should definitely be done in the preprocessing module.

I'm working on something...

@sharkdp
Copy link
Owner

sharkdp commented Aug 31, 2019

Some impressions:

image

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to display file even if bat thinks it is binary
2 participants