Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Erlang application resource file extension #6297

Merged
merged 2 commits into from
Mar 15, 2023

Conversation

gionea
Copy link
Contributor

@gionea gionea commented Feb 23, 2023

Description

An application resource file (*.app) contains an application specification which defines an Erlang application. It is officially documented in the app Erlang manual page. The *.app file is similar to the *.app.src file (which is correctly categorized by the Linguist library; see #2964) in that an *.app.src file is used as input by various build systems to generate an *.app file. It is, however, also common to manually write an *.app file in which case no *.app.src file is present in the source tree.

The syntax of an *.app file (as that of an *.app.src file) is standard Erlang syntax.

Due to the generic nature of its name we may need some heuristics to distinguish this particular usage from others.

Checklist:

@gionea gionea requested a review from a team as a code owner February 23, 2023 22:40
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the generic nature of its name we may need some heuristics to distinguish this particular usage from others.

Most definitely and it will need to be 100% accurate. There is no room for misclassification of generic extensions. There are over 27k files with this extension so there is massive scope for misclassification.

If you can come up with a heuristic that precisely matches these Erlang files and only these files, you'll also need to add the extension to the generic.yml file and add tests like for other heuristics and generic extensions.

If you can't come up with a 100% precise heuristic, it's best not to pursue this PR and let users use an override if they really want these files to be classified as Erlang.

I have included a real-world usage sample for all extensions added in this PR:

* Sample source(s):
  
  * [`otp/bootstrap/lib/kernel/ebin/kernel.app`](https://github.com/erlang/otp/blob/master/bootstrap/lib/kernel/ebin/kernel.app)
  * [`cowboy/ebin/cowboy.app`](https://github.com/ninenines/cowboy/blob/master/ebin/cowboy.app)

No you haven't. 😁 You've included a very "hello world" looking sample. If you're going to go ahead with this PR, please remove that sample and add the two you've referenced.

@gionea gionea marked this pull request as draft February 24, 2023 14:56
@gionea gionea force-pushed the erlang branch 2 times, most recently from fdda8e6 to ea05918 Compare February 27, 2023 21:35
@gionea gionea marked this pull request as ready for review February 27, 2023 21:36
lib/linguist/languages.yml Outdated Show resolved Hide resolved
- extensions: ['.app']
rules:
- language: Erlang
pattern: '^{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[a-z\d_@]*|''[\s\S]+'')\s*,\s*\[[\s\S]*\]\s*}\.$'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The sequence '[\s\S]+' will match stuff like 'Foo's Bar', which I'm assuming isn't valid Erlang (I know nothing of the language). I'm assuming you meant '.+' where . includes newlines as well (i.e., the "dotall" modifier typical of most regex flavours). If so, I recommend using '[^']+' instead (which doesn't account for escape sequences, if such a thing is supported).

  2. Did you intend for [a-z] to match only lowercase characters? If not, [a-z\d_@] can be simplified into [\w@], as (IIRC) GitHub's Linguist process runs without full CLDR support, meaning \w will match its traditional POSIX definition ([[:alnum:]], or [A-Za-z0-9_]). @lildude will correct me if I'm wrong, I'm sure. 😉

  3. \.$ doesn't account for trailing whitespace. Unless that's a deliberate omission, I recommend using \.[ \t]*$ to match EOL.

Putting the changes proposed by all three points together, we get:

Suggested change
pattern: '^{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[a-z\d_@]*|''[\s\S]+'')\s*,\s*\[[\s\S]*\]\s*}\.$'
- pattern: '^{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[a-z\d_@]*|''[\s\S]+'')\s*,\s*\[[\s\S]*\]\s*}\.$'
+ pattern: '(?m)^{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[\w@]*|''[^'']+'')\s*,\s*\[.*?\]\s*}\.[\t]*$'
Source (readable)
(?xm) ^
{ \s*
(?:
	application
	|
	'application'
)
\s* , \s*
(?:
	[a-z]+[\w@]*
	|
	'[^']+'
)
\s* , \s*
\[ .*? \]
\s* }
\. [ \t]* $
Source (YAML string)
'(?m)^{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[\w@]*|''[^'']+'')\s*,\s*\[.*?\]\s*}\.[\t]*$'

Copy link
Contributor Author

@gionea gionea Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Your assumption is correct.
  2. The first [a-z] occurrence should only match lowercase letters. However, [a-z\d_@] can be reduced to [\w@] since, although uncommon, that particular segment of an atom can include uppercase letters.
  3. Not deliberate.

Thank you for your time. I have updated the code to reflect your suggestions.

@gionea gionea force-pushed the erlang branch 2 times, most recently from e107049 to a63f467 Compare March 13, 2023 12:55
@lildude lildude merged commit 9358f09 into github-linguist:master Mar 15, 2023
@gionea gionea deleted the erlang branch March 15, 2023 12:51
@github-linguist github-linguist deleted a comment from devilshezi0 Apr 2, 2023
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants