-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify disambiguation rules in YAML #4087
Conversation
956f735
to
53d3134
Compare
This is now ready for review, although it could use some more testing and cleanup of |
@smola: Looks like the heuristics got a bit messed up during the translation, as two language identification tests are failing. Can you look at them? |
53d3134
to
6cdad03
Compare
6cdad03
to
42c69e3
Compare
Waiting on #4189 to land. 👌 |
lib/linguist/heuristics.rb
Outdated
end | ||
|
||
# Internal: Array of defined heuristics | ||
@heuristics = [] | ||
|
||
# Internal | ||
def initialize(exts_and_langs, &heuristic) | ||
def initialize(exts_and_langs, rules) | ||
@exts_and_langs, @candidates = exts_and_langs.partition {|e| e =~ /\A\./} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the YAML file doesn't support candidates (which is fine since we don't use that anymore). Thus, the above line can be simplified, as well as the additional logic in matches?
to handle candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll perform this change next time I get to work on this.
# key. | ||
# named_patterns - Key-value map of reusable named patterns. | ||
# | ||
# Please keep this list alphabetized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's a test for that in test_pedantic.rb
, test_heuristics_are_sorted
; could you update it to check the YAML file instead? We poor humans have a hard time keeping things alphabetized (speaking from experience :p).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test.
lib/linguist/heuristics.yml
Outdated
perl5: '\buse\s+(?:strict\b|v?5\.)' | ||
perl6: '^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)' | ||
#TODO: add tests for missing heuristics | ||
#TODO: add support to return multiple languages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove these? I'll open an issue for the first one so we don't forget and I don't think we supported the second one before anyway. We're very likely to forget TODOs at the end of the file, even if we later fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot about these.
Actually, there is currently a disambiguation rule that returns two languages: [Language["Linux Kernel Module"], Language["AMPL"]]
for .mod
.
7efff9d
to
2171f29
Compare
There is still one case I couldn't convert to a plain regular expression: However, I found no file that is misclassified with something simpler such as:
According to @Alhadis here 06c049b this is the rationale:
It seems usage of multiline comments with If we assumed this kind of comments are not there, we can just simplify the expression to look for a label without any preceding |
2171f29
to
26526e7
Compare
That would work. I was probably being overly cautious with accuracy considering the PR was my first real contribution to Linguist (and the open-ended fallback to MAXScript convinced me that keeping matches airtight was a wise thing to do). I don't remember seeing any GAS files with multi-line |
@Alhadis 👍 Created a PR simplifying it: #4224 I'll change this PR to support combining negative and positive patterns at the same time. And also supporting multiple patterns with |
26526e7
to
71ab440
Compare
Now with the support of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @smola!
This looks great to me. I'm not even sure who has the final say on these PRs these days, but if nobody is opposed we can go ahead and merge. :) |
|
Lovely. Now what am I gonna do? 😕 Change the YAML parsing logic to accept a filename property? |
@Alhadis I was thinking we could add something along the lines of |
Alright, I'll add it. But this really needs to be more object-oriented, IMHO. For example, rather than specifying pattern:
- source: '/^foo/'
target: data
- source: '/^bar'
target: name
- source: '^qux'
negated: true Strings could still be used; simply treated as shorthand for an object with only the pattern: '/^foo/'
# Which is shorthand for:
pattern:
- source: '/^foo/'
negated: false
target: data That way, we could adopt a cleaner implementation in This would also solve the problem of not being able to specify modifier flags for the regular expression, such as |
Specify disambiguation rules in a YAML file. See rationale in #3746.
Description
All disambiguation rules are moved from
heuristics.rb
intoheuristics.yml
.Checklist: