Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple languages for an extension #1

Merged
merged 2 commits into from
Jun 23, 2021

Conversation

edoardopirovano
Copy link

Addresses the underlying cause of github/codeql-action#584, though will not close it yet as we still have to bump the version of this that the CodeQL Action depends on and add a test there to avoid a regression.

The language map that we import unfortunately has some ambiguity in what language an extension maps to since some programming languages where extensions. Currently, we just use whatever appears later in the alphabet, which results in (for example), Smalltalk overriding C# which is clearly not desirable. This PR adds a simple heuristic of languages that are important enough to take priority over other ones. The list is roughly based on a list of most popular programming languages in order of popularity, with some manual adjustments to resolve clashes that in my opinion ended up being resolved the wrong way by doing this. For example, I moved Ruby quite high up because .spec appears a lot in Ruby tests but is also an extension in Python where it used much less.

@edoardopirovano
Copy link
Author

Just in case you don't get tagged for PRs in this repository: @aeisenberg

Copy link
Owner

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR! Just a handful of small nits. And a question.

src/languages.ts Outdated
* where the extension is ambiguous. The ordering of the list matters and
* languages earlier on will get a higher priority when resolving clashes.
*/
const importantLanguages = ["javascript", "typescript", "ruby", "python", "java", "c", "c++", "c#", "rust", "scala", "perl", "go"];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this out to a top-level constant?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just curious, can you give an example of where the ordering here is important?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only example I found where the ordering actually mattered was the case of .spec that I described in the PR description as this extension appears in a few languages but (to the best of my understanding) is mostly important for Ruby where it is common in unit tests. I still ordered the other languages roughly by popularity in case any ambiguities arise in their extensions in future, but at the moment there are no other ambiguities among these languages (except .h being shared between c and c++ but in CodeQL we don't care about that because they both map to the same thing).

src/languages.ts Outdated Show resolved Hide resolved
src/languages.ts Outdated Show resolved Hide resolved
src/languages.ts Outdated Show resolved Hide resolved
@edoardopirovano
Copy link
Author

PR comments above addressed, thanks 🙂

Copy link
Owner

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I'll do a release shortly.

@aeisenberg aeisenberg merged commit a49cee0 into aeisenberg:master Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants