Skip to content
This repository has been archived by the owner on Mar 23, 2024. It is now read-only.

feat: PRCE-compliant RegEx updates for GitHub Linguist #57

Merged
merged 11 commits into from
Mar 15, 2023

Conversation

randi274
Copy link
Contributor

@randi274 randi274 commented Dec 12, 2022

What does this PR do?

This PR updates certain RegEx elements to be PCRE RegEx compliant (specifically: changing - to _ in named capture group identifiers).

The change is in this commit.

This is needed so that we can get the tmLanguage definition into GitHub's Linguist for language detection and syntax highlighting.

The other changes outside of the above linked commit are either dependency updates or reformatting of files (b/c of changed Prettier behavior).

What issues does this PR fix or reference?

#54

@muenzpraeger
Copy link
Contributor

@randi274 Any updates on this? TY

@@ -1914,15 +1914,15 @@
<dict>
<key>begin</key>
<string>(?x)
(?&lt;type-name&gt;
(?&lt;type_name&gt;
Copy link

@jeffb-sfdc jeffb-sfdc Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: should everything between a <string></string> be changed from foo-bar to foo_bar? (or is it only that which starts with (?x)?)

If yes, what about line 1909, line 1899, etc...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muenzpraeger can you help answer this, given the update needs for Github Linguist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no dumb questions. ;-) Especially not with this change. It took me a while to find this small statement in the PCRE docs for named capture groups.

When PCRE2_UTF is not set, they may contain only ASCII alphanumeric characters and underscores, but must start with a non-digit.

So yes, the change only applies to (?<the_name>).

Also the source of the changes is src/apex.tmLanguage.yml, which is the source definition for this TextMate grammar. Looking at that will make it clearer.

The grammar outputs in the grammars folder are auto-generated via gulp.

(?:
(?:
(?:(?&lt;identifier&gt;@?[_[:alpha:]][_[:alnum:]]*)\s*\:\:\s*)? # alias-qualification
(?&lt;name-and-type-args&gt; # identifier + type arguments (if any)
(?&lt;name_and_type_args&gt; # identifier + type arguments (if any)
Copy link

@jeffb-sfdc jeffb-sfdc Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another dumb questions: Should alias-qualification on line 1920 be alias_qualification?
(also on line 1993, and others)

Or is that a comment, and it doesn't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muenzpraeger same question here

@randi274 randi274 closed this Mar 15, 2023
@randi274 randi274 reopened this Mar 15, 2023
@randi274 randi274 merged commit 1d3c709 into main Mar 15, 2023
@randi274 randi274 deleted the rw/regex-updates branch March 15, 2023 18:31
@dschach
Copy link

dschach commented Mar 21, 2023

@randi274 Thank you for doing this. It will make my port to a new version of highlightjs-apex much easier.

There are some extraneous bits in the regex (or there were - I haven't checked lately) that we can't do in Apex; as I get reacquainted with this codebase I'll raise a PR or an issue to remove those... which may be a bad idea, but I figure why not see what you and the hive mind think?

Thanks again. It's great to see how Apex is defined as a syntax.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants