Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 [syntax highlighting] improved language mappings via shebang #485

Open
Kr1ss-XD opened this issue Jan 1, 2021 · 7 comments
Open

🚀 [syntax highlighting] improved language mappings via shebang #485

Kr1ss-XD opened this issue Jan 1, 2021 · 7 comments

Comments

@Kr1ss-XD
Copy link
Contributor

Kr1ss-XD commented Jan 1, 2021

Some tools (especially bat) check source files for a shebang line and if present use this to assign the according syntax rules. I'm wondering if this would be possible for delta, too.

Currently, a source file seems to be considered a specific language only if its name/extension can be mapped to a known language. Therefore, generically named files (e.g. executable shell scripts without *.sh filename extension) are not being syntax highlighted by delta. Considering a shebang could be an option in addition to filenames or --map-syntax.

Maybe it's even possible to use the algorithm/regexes which bat has already implemented ?

@dandavison
Copy link
Owner

Hi @Kr1ss-XD, the core issue here is that bat has access to the entire file, whereas delta (in its current form) only has access to the section of the file that happens to be in the diff hunk. It would be possible to change delta so that it (optionally) tries to find the file on disk (or from the git repo via libgit2). I have wondered from the beginning whether we would want to do that. Of course, the file might not even exist, since delta simply accepts whatever diff is given to it on stdin, which could be entirely fictional.

@Kr1ss-XD
Copy link
Contributor Author

Kr1ss-XD commented Jan 1, 2021

Right, I'm aware that it's not as simple for delta as for bat which is given a filename as argument most of the time.

Since delta can recognize filenames in some cases, I wondered if it could utilize these to open the file and check its contents.

Of course, the file might not even exist, since delta simply accepts whatever diff is given to it on stdin, which could be entirely fictional.

I haven't considered this though.

@dandavison
Copy link
Owner

Since delta can recognize filenames in some cases, I wondered if it could utilize these to open the file and check its contents.

Yes, I agree, this would be possible. And as you say, for things like executable shell scripts, I think it's the only way forwards.

@zachriggle
Copy link

zachriggle commented Jul 13, 2021

This would be really nice, @dandavison.

Problem

I have a lot of e.g. Python scripts without the .py extension, and having a colorized Python-syntax-highlighted diff for these and this would be a game-changer.

delta should be able to auto-detect the language of files in the diff by parsing the hunk headers and running e.g. file on them.

This could be a step that is only run when the filename has no extension at all, so it shouldn't be computationally expensive. You can rely on file so there's no need to even parse the shebang line (which can get complicated).

$ file bin/my-issues
bin/my-issues: Python script text executable, Unicode text, UTF-8 text

$ git log -p bin/my-issues | delta -n
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
commit cde440bb4ea6ae2b957c6ba9fa59640c596af120 ┃
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━
Author: Zach Riggle <REDACTED>
Date:   Fri Jul 2 16:36:59 2021 -0500

    Add --entire-problem to my-issues


bin/my-issues
────────────────────────────────────────────────────────────────────────
<non syntax-highlighted diff>

Solution

I've created a ZSH script which automatically finds all of the interpreters known to file. You can find a copy of it here:
https://gist.github.com/zachriggle/e82ba2b7f6ea55df853fab03c243876d

To save you the time of running it, here's the output on my system.

ash: Neil Brown's ash script text executable, ASCII text
awk: awk script text executable, ASCII text
bash: Bourne-Again shell script text executable, ASCII text
csh: C shell script text executable, ASCII text
ksh: Korn shell script text executable, ASCII text
luacore: Lua script text executable, ASCII text
node: Node.js script text executable, ASCII text
perl: Perl script text executable
python: Python script text executable, ASCII text
ruby: Ruby script text executable, ASCII text
sh: POSIX shell script text executable, ASCII text
stapler: Systemtap script text executable, ASCII text
tclassutil: Tcl script text executable, ASCII text
tclsh: Tcl/Tk script text executable, ASCII text
tcsh: Tenex C shell script text executable, ASCII text
zsh: Paul Falstad's zsh script text executable, ASCII text

You should be able to add these few bits to git-delta and use file to autodetect syntax, using the above as a mapping.

@zachriggle
Copy link

zachriggle commented Jul 13, 2021

@dandavison I created a simple solution to this issue, you should be able to use file on files without extension, and the above mappings, to automatically determine the syntax of a given file.

You may want to strip everything after the first comma (e.g. ASCII text or Unicode text).

@dandavison
Copy link
Owner

Thanks @zachriggle. One thing we should check before proceeding is whether there is a rust crate that already does this and looks to be reliable. Let me know if you're aware of anything.

@flxai
Copy link

flxai commented Feb 7, 2024

@dandavison There is syntect with a relevant function find_syntax_by_first_line. Thanks to @jplatte for pointing this out in a private conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants