Provide a better long-term solution for detection of text files. #4

pyjarrett · 2021-05-25T03:23:49Z

Septum currently checks a very limited selection of extensions to determine if a file is text or not, in order to speed up loading of large source trees and minimize junk files loaded into memory to minimize its memory footprint.

pyjarrett · 2021-05-27T02:35:02Z

Now that septum supports configuration files, SP.Cache.Is_Text could accept a list of extensions from the Search, allowing this to be configurable on a per-user or a per-project basis.

kalkin · 2021-10-05T06:44:51Z

Have a look at this kalkin/file-expert . I wrote a programm for detecting the language type based on the data gathered by github/linguist. At some point in time I will refactor the code to provide C bindings for non Rust library users, if some one is interested.
The other way is to reuse the data to rewrite file-expert as an Ada library. It should be pretty easy, a weekend or two project.

pyjarrett · 2021-10-07T02:25:09Z

@kalkin , you project looks exciting! I'm not sure if it helps solve this issue currently, since Septum's search is language agnostic and the goal is just to determine if a file is readable text, or binary data. At some future point, Septum might gain this need and then I'd reconsider.

kalkin · 2021-10-07T14:03:42Z

@pyjarrett Thanks! Seems like I misunderstood the workings of septum. I thought it does some basic language specific parsing.

If you ever want parse different languages I strongly suggest looking at tree-sitter if you do not know it yet it's a way to specify how to parse your library (in JS :() and then it generates you a library, which returns a universal ast, which contains all the line/character-range coordinates. Quiete a few popular programming languages have tree-sitter support already. https://github.com/tree-sitter/tree-sitter

pyjarrett added the enhancement New feature or request label May 25, 2021

pyjarrett added this to the Beta milestone Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a better long-term solution for detection of text files. #4

Provide a better long-term solution for detection of text files. #4

pyjarrett commented May 25, 2021

pyjarrett commented May 27, 2021 •

edited

Loading

kalkin commented Oct 5, 2021

pyjarrett commented Oct 7, 2021

kalkin commented Oct 7, 2021

Provide a better long-term solution for detection of text files. #4

Provide a better long-term solution for detection of text files. #4

Comments

pyjarrett commented May 25, 2021

pyjarrett commented May 27, 2021 • edited Loading

kalkin commented Oct 5, 2021

pyjarrett commented Oct 7, 2021

kalkin commented Oct 7, 2021

pyjarrett commented May 27, 2021 •

edited

Loading