Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add B4X #6965

Merged
merged 14 commits into from
Aug 29, 2024
Merged

Add B4X #6965

merged 14 commits into from
Aug 29, 2024

Conversation

DecimalTurn
Copy link
Contributor

@DecimalTurn DecimalTurn commented Jul 24, 2024

Description

Closes #6944

Notes on the regex used:
(?:.*(?:\r?\n|\r)){0,9} is used to limit our search to the first 10 lines.
\A\W{0,3} is there in case the file has the UTF-8 BOM which is represented by 3 non-alphanumeric characters. (More than 70% of B4X files have the BOM)

Checklist:

@DecimalTurn DecimalTurn requested a review from a team as a code owner July 24, 2024 00:44
lildude
lildude previously approved these changes Jul 24, 2024
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Note: this PR will not be merged until close to when the next release is made. See here for more details.

This commit moves the check for BOM at the start of the file and fixes a potential problem of compatibility with re2.
Note that `{3}?` in re2 is interpreted as matching the previous token exactly 3 times exactly while the Oniguruma engine interprets this as matching 3 or 0 times.
@DecimalTurn
Copy link
Contributor Author

Sorry for the many edits post-review, I'm done now. I just wanted to make the regex simpler, but then realized that there might be a problem with the fact that Linguist is matching with a ASCII_8BIT encoded string, but some other ports of Linguist might be matching in UTF-8 mode. Hence, it's safer to express the BOM as \W{0,3} since it needs to be able to match 3 characters in ASCII_8BIT for Ruby, but also a single character in UTF-8 mode for various ports.

Related:

@lildude lildude added this pull request to the merge queue Aug 29, 2024
Merged via the queue into github-linguist:master with commit 198bd6b Aug 29, 2024
5 checks passed
@DecimalTurn DecimalTurn deleted the b4x branch August 29, 2024 15:18
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add B4X
2 participants