-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A space appears at the beginning of the file (Byte order mark) #1922
Comments
Thank you very much for the detailed bug report. There a few things to consider here.
So I guess for now this is a question of whether or not a particular terminal emulator prints the To be honest, I have never seen a UTF-8 BOM "in the wild". At least on Linux, every program seems to use the BOM-less version when writing UTF-8. That doesn't make this bug less relevant though, because UTF-16 files should suffer from the same problem. @Enselic: IMO, this is not a Windows-specific bug. Files with UTF-8 BOMs might appear on non-Windows systems as well. Further reading: Unicode standard, https://www.unicode.org/versions/Unicode6.1.0/ch16.pdf page 562 |
Great analysis! I confess to not having done that deep of an analysis before putting the windows label on 😊 . Turns out it was overhasty, because there is a similar problem on macOS 11.6 with Terminal.app Version 2.11 (440): The problem persist even in Interestingly, if we bypass the pager, the output is correct: Turns out the output is correct even if the BOM is not first in the output, as long as the pager is bypassed: This is with the current vanilla
So what if we try a later version of less? That seems to solve the problem on macOS:
@v-timofeev What pager and version are you using? |
@Enselic Sorry for not answering for a long time!
I noticed that the behavior depends on the terminal emulator: For VNC (Centos 8) |
If no pager is specified,
|
Does it work if the BOM are the first bytes of the output? Try both |
I suspect the highlighting error is because most syntax regex patterns do not work with a BOM. So even if the terminal displays it properly (i.e. not at all), we still need to strip it whenever we want syntax highlighting to work. Even with A nice side effect of that is that it we also "fix" when the pager and/or terminal in question do not display the BOM properly. I'm setting good-first-issue on this because it shouldn't be very hard to do. |
Describe the bug you encountered:
If you use bat on C# source files (.cs, .xaml and others), a space appears in the first line. This is due to byte order mark (BOM)
Maybe reproduced for others files on Windows systems
https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding
Sample file with BOM:
Program.cs.txt
What did you expect to happen instead?
If I delete these bytes:
bat works correctly :
How did you install
bat
?GitHub release:
bat-v0.18.3-x86_64-pc-windows-gnu.zip
bat-v0.18.3-x86_64-pc-windows-msvc.zip
bat version and environment
Software version
bat 0.18.3 (b146958)
Operating system
Windows 6.2.9200
Command-line
Environment variables
Config file
Could not read contents of 'C:\Users\timoxa\AppData\Roaming\bat\config': Системе не удается найти указанный путь. (os error 3).
Compile time information
The text was updated successfully, but these errors were encountered: