From ee23976e820960dd14f11c5033dec3ae41681b30 Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Tue, 28 Feb 2023 10:48:53 +0000 Subject: [PATCH 1/6] Add color to PR template --- .github/PULL_REQUEST_TEMPLATE.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 0c30227b1a..d767f65c4b 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -5,7 +5,7 @@ ## Checklist: - + - [ ] **I am associating a language with a new file extension.** - [ ] The new extension is used in hundreds of repositories on GitHub.com @@ -28,6 +28,9 @@ - [URL to each sample source, if applicable] - Sample license(s): - [ ] I have included a syntax highlighting grammar: [URL to grammar repo] + + - [ ] I have added a color: [Enter color in hex as `#123456` so the color can be seen in the PR body] + - I chose this color because: [Please specify why you chose this color. It helps in future if there is a request to change the color.] - [ ] I have updated the heuristics to distinguish my language from others using the same extension. - [ ] **I am fixing a misclassified language** From f7b25a35c08781e6ed97be40cdcd625fb1b26293 Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Wed, 8 Mar 2023 10:01:19 +0000 Subject: [PATCH 2/6] Tweak wording --- .github/PULL_REQUEST_TEMPLATE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index d767f65c4b..9805c02b51 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -7,7 +7,7 @@ -- [ ] **I am associating a language with a new file extension.** +- [ ] **I am adding a new extension to a language.** - [ ] The new extension is used in hundreds of repositories on GitHub.com - Search results for each extension: From 16d2db54f532a5977291cf6fc5bf236f0671e389 Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Wed, 8 Mar 2023 11:10:42 +0000 Subject: [PATCH 3/6] Update the contributing guidelines --- CONTRIBUTING.md | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 75ce50727e..c822b1d231 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -25,6 +25,7 @@ These components have their own dependencies - `icu4c`, and `cmake` and `pkg-con On macOS with [Homebrew](http://brew.sh/) the instructions below under Getting started will install these dependencies for you. On Ubuntu: + ```bash apt-get install cmake pkg-config libicu-dev docker.io ruby ruby-dev zlib1g-dev build-essential libssl-dev ``` @@ -62,17 +63,19 @@ To add support for a new extension: 1. Add at least one sample for your extension to the [samples directory][samples] in the correct subdirectory. We prefer examples of real-world code showing common usage. The more representative of the structure of the language, the better. + + **"Hello world" examples will not be accepted.** 1. Open a pull request, linking to a [GitHub search result][search-example] showing in-the-wild usage. If you are adding a sample, please state clearly the license covering the code. If possible, link to the original source of the sample. If you wrote the sample specifically for the PR and are happy for it to be included under the MIT license that covers Linguist, you can state this instead. -Additionally, if this extension is already listed in [`languages.yml`][languages] and associated with another language, then sometimes a few more steps will need to be taken: +Additionally, if this extension is already listed in [`languages.yml`][languages] and associated with another language, then a few more steps will need to be taken: + +1. Make sure that at least two example `.yourextension` files are present in the [samples directory][samples] for each language that uses `.yourextension`. +1. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. -1. Make sure that example `.yourextension` files are present in the [samples directory][samples] for each language that uses `.yourextension`. -1. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.yourextension` files (ping **@lildude** to help with this). - This ensures we're not misclassifying files. -1. If the Bayesian classifier does a bad job with the sample `.yourextension` files then a [heuristic][] may need to be written to help. +Remember, the goal here is to try and avoid false positives! See [My Linguist PR has been merged but GitHub doesn't reflect my changes][merged-pr] for details on when your changes will appear on GitHub after your PR has been merged. @@ -87,25 +90,30 @@ To add support for a new language: 1. Add an entry for your language to [`languages.yml`][languages]. Omit the `language_id` field for now. 1. Add a syntax-highlighting grammar for your language using: + ```bash script/add-grammar https://github.com/JaneSmith/MyGrammar ``` + This command will analyze the grammar and, if no problems are found, add it to the repository. If problems are found, please report them to the grammar maintainer as you will otherwise be unable to add it. + **Please only add grammars that have [one of these licenses][licenses].** 1. Add samples for your language to the [samples directory][samples] in the correct subdirectory. + We prefer examples of real-world code showing common usage. + The more representative of the structure of the language, the better. + + **"Hello world" examples will not be accepted.** 1. Generate a unique ID for your language by running `script/update-ids`. 1. Open a pull request, linking to [GitHub search results][search-example] showing in-the-wild usage. Please state clearly the license covering the code in the samples. Link directly to the original source if possible. If you wrote the sample specifically for the PR and are happy for it to be included under the MIT license that covers Linguist, you can state this instead. -In addition, if your new language defines an extension that's already listed in [`languages.yml`][languages] (such as `.foo`) then sometimes a few more steps will need to be taken: +In addition, if your new language defines an extension that is already listed in [`languages.yml`][languages] and associated with another language, then a few more steps will need to be taken: -1. Make sure that example `.foo` files are present in the [samples directory][samples] for each language that uses `.foo`. -1. Test the performance of the Bayesian classifier with a relatively large number (1000s) of sample `.foo` files (ping **@lildude** to help with this). - This ensures we're not misclassifying files. -1. If the Bayesian classifier does a bad job with the sample `.foo` files, then a [heuristic][] may need to be written to help. +1. Make sure that at least two example `.foo` files are present in the [samples directory][samples] for each language that uses `.foo`. +1. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. Remember, the goal here is to try and avoid false positives! @@ -119,7 +127,6 @@ This process can help differentiate between, for example, `.h` files which could Misclassifications can often be solved by either adding a new filename or extension for the language or adding more [samples][] to make the classifier smarter. - ## Fixing syntax highlighting Syntax highlighting in GitHub is performed using TextMate-compatible grammars. @@ -198,23 +205,18 @@ Here's our current build status: [![Actions Status](https://github.com/github/li Linguist is maintained with :heart: by: - **@Alhadis** -- **@larsbrinkhoff** - **@lildude** (GitHub staff) -- **@pchaigno** As Linguist is a production dependency for GitHub we have a couple of workflow restrictions: - Anyone with commit rights can merge Pull Requests provided that there is a :+1: from a GitHub staff member. - Releases are performed by GitHub staff so we can ensure GitHub.com always stays up to date with the latest release of Linguist and there are no regressions in production. - [grammars]: /vendor/README.md [heuristic]: https://github.com/github/linguist/blob/master/lib/linguist/heuristics.yml [languages]: /lib/linguist/languages.yml [licenses]: https://github.com/github/linguist/blob/9b1023ed5d308cb3363a882531dea1e272b59977/vendor/licenses/config.yml#L4-L15 -[new-issue]: https://github.com/github/linguist/issues/new [samples]: /samples [search-example]: https://github.com/search?utf8=%E2%9C%93&q=extension%3Aboot+NOT+nothack&type=Code&ref=searchresults -[gpr]: https://docs.github.com/packages/using-github-packages-with-your-projects-ecosystem/configuring-rubygems-for-use-with-github-packages [#5756]: https://github.com/github/linguist/issues/5756 [merged-pr]: /docs/troubleshooting.md#my-linguist-pr-has-been-merged-but-gitHub-doesnt-reflect-my-changes From 7d849b0b53e6baa14a5f017b7809ae079b9fe6dd Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Wed, 8 Mar 2023 11:13:34 +0000 Subject: [PATCH 4/6] Tweak readme formatting --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ed68ac3bb2..b142abf0c5 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,6 @@ [![Actions Status](https://github.com/github/linguist/workflows/Run%20Tests/badge.svg)](https://github.com/github/linguist/actions) -[issues]: https://github.com/github/linguist/issues -[new-issue]: https://github.com/github/linguist/issues/new - This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs. ## Documentation @@ -30,6 +27,7 @@ Accordingly, we highly recommend you install a version of Ruby using Homebrew, ` Linguist uses [`charlock_holmes`](https://github.com/brianmario/charlock_holmes) for character encoding and [`rugged`](https://github.com/libgit2/rugged) for libgit2 bindings for Ruby. These components have their own dependencies. + 1. charlock_holmes * cmake * pkg-config @@ -95,6 +93,7 @@ $ github-linguist #### Additional options ##### `--rev REV` + The `--rev REV` flag will change the git revision being analyzed to any [gitrevisions(1)](https://git-scm.com/docs/gitrevisions#_specifying_revisions) compatible revision you specify. This is useful to analyze the makeup of a repo as of a certain tag, or in a certain branch. @@ -118,12 +117,14 @@ $ github-linguist jekyll ``` And here is Jekyll's published website, from the gh-pages branch inside their repository. + ```console $ github-linguist jekyll --rev origin/gh-pages 100.00% 2568354 HTML ``` ##### `--breakdown` + The `--breakdown` or `-b` flag will additionally show the breakdown of files by language. You can try running `github-linguist` on the root directory in this repository itself: @@ -149,6 +150,7 @@ lib/linguist.rb ``` ##### `--json` + The `--json` or `-j` flag output the data into JSON format. ```console @@ -157,6 +159,7 @@ $ github-linguist --json ``` This option can be used in conjunction with `--breakdown` to get a full list of files along with the size and percentage data. + ```console $ github-linguist --breakdown --json {"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}} @@ -213,7 +216,6 @@ lib/linguist.rb Please check out our [contributing guidelines](CONTRIBUTING.md). - ## License The language grammars included in this gem are covered by their repositories' respective licenses. From d360e9fdeaa2ed0a599d8f3719a383049ece0aa4 Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Mon, 13 Mar 2023 09:53:25 +0000 Subject: [PATCH 5/6] Reword as per review suggestions --- .github/PULL_REQUEST_TEMPLATE.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 9805c02b51..003abc71cb 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -28,9 +28,10 @@ - [URL to each sample source, if applicable] - Sample license(s): - [ ] I have included a syntax highlighting grammar: [URL to grammar repo] - - - [ ] I have added a color: [Enter color in hex as `#123456` so the color can be seen in the PR body] - - I chose this color because: [Please specify why you chose this color. It helps in future if there is a request to change the color.] + + - [ ] I have added a color + - Hex value: `#RRGGBB` + - Rationale: - [ ] I have updated the heuristics to distinguish my language from others using the same extension. - [ ] **I am fixing a misclassified language** From a81216b8f2f7ef07b2e3ffebb7020b42c5dac06d Mon Sep 17 00:00:00 2001 From: Colin Seymour Date: Mon, 13 Mar 2023 09:55:21 +0000 Subject: [PATCH 6/6] Use incremental list items --- CONTRIBUTING.md | 16 ++++++++-------- docs/troubleshooting.md | 10 +++++----- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c822b1d231..a545770f50 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -60,12 +60,12 @@ To add support for a new extension: 1. Add your extension to the language entry in [`languages.yml`][languages]. Keep the extensions in alphabetical order, sorted case-sensitively (uppercase before lowercase). The exception is the primary extension: it should always be first. -1. Add at least one sample for your extension to the [samples directory][samples] in the correct subdirectory. +2. Add at least one sample for your extension to the [samples directory][samples] in the correct subdirectory. We prefer examples of real-world code showing common usage. The more representative of the structure of the language, the better. **"Hello world" examples will not be accepted.** -1. Open a pull request, linking to a [GitHub search result][search-example] showing in-the-wild usage. +3. Open a pull request, linking to a [GitHub search result][search-example] showing in-the-wild usage. If you are adding a sample, please state clearly the license covering the code. If possible, link to the original source of the sample. If you wrote the sample specifically for the PR and are happy for it to be included under the MIT license that covers Linguist, you can state this instead. @@ -73,7 +73,7 @@ To add support for a new extension: Additionally, if this extension is already listed in [`languages.yml`][languages] and associated with another language, then a few more steps will need to be taken: 1. Make sure that at least two example `.yourextension` files are present in the [samples directory][samples] for each language that uses `.yourextension`. -1. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. +2. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. Remember, the goal here is to try and avoid false positives! @@ -89,7 +89,7 @@ To add support for a new language: 1. Add an entry for your language to [`languages.yml`][languages]. Omit the `language_id` field for now. -1. Add a syntax-highlighting grammar for your language using: +2. Add a syntax-highlighting grammar for your language using: ```bash script/add-grammar https://github.com/JaneSmith/MyGrammar @@ -99,13 +99,13 @@ To add support for a new language: If problems are found, please report them to the grammar maintainer as you will otherwise be unable to add it. **Please only add grammars that have [one of these licenses][licenses].** -1. Add samples for your language to the [samples directory][samples] in the correct subdirectory. +3. Add samples for your language to the [samples directory][samples] in the correct subdirectory. We prefer examples of real-world code showing common usage. The more representative of the structure of the language, the better. **"Hello world" examples will not be accepted.** -1. Generate a unique ID for your language by running `script/update-ids`. -1. Open a pull request, linking to [GitHub search results][search-example] showing in-the-wild usage. +4. Generate a unique ID for your language by running `script/update-ids`. +5. Open a pull request, linking to [GitHub search results][search-example] showing in-the-wild usage. Please state clearly the license covering the code in the samples. Link directly to the original source if possible. If you wrote the sample specifically for the PR and are happy for it to be included under the MIT license that covers Linguist, you can state this instead. @@ -113,7 +113,7 @@ To add support for a new language: In addition, if your new language defines an extension that is already listed in [`languages.yml`][languages] and associated with another language, then a few more steps will need to be taken: 1. Make sure that at least two example `.foo` files are present in the [samples directory][samples] for each language that uses `.foo`. -1. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. +2. If the two languages look vaguely similar, or one of the languages has uniquely identifiable characteristics, consider writing a [heuristic][] to help with the classification. Remember, the goal here is to try and avoid false positives! diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index c6344f8448..5428ad92bf 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -7,11 +7,11 @@ If the language stats bar is reporting a language that you don't expect: 1. Click on the name of the language in the stats bar to see a list of the files that are identified as that language. Keep in mind this performs a search so the [code search restrictions][search-limits] may result in files identified in the language statistics not appearing in the search results. [Installing Linguist locally](/README.md/#installation) and running it from the [command line](/README.md#command-line-usage) will give you accurate results. -1. If you see files that you didn't write in the search results, consider moving the files into one of the [paths for vendored code](/lib/linguist/vendor.yml), or use the [manual overrides](/docs/overrides.md) feature to ignore them. -1. If the files are misclassified, search for [open issues](https://github.com/github/linguist/issues) to see if anyone else has already reported the issue. +2. If you see files that you didn't write in the search results, consider moving the files into one of the [paths for vendored code](/lib/linguist/vendor.yml), or use the [manual overrides](/docs/overrides.md) feature to ignore them. +3. If the files are misclassified, search for [open issues](https://github.com/github/linguist/issues) to see if anyone else has already reported the issue. Any information you can add, especially links to public repositories, is helpful. You can also use the [manual overrides](/docs/overrides.md) feature to correctly classify them in your repository. -1. If there are no reported issues of this misclassification, [open an issue](https://github.com/github/linguist/issues/new) and include a link to the repository or a sample of the code that is being misclassified. +4. If there are no reported issues of this misclassification, [open an issue](https://github.com/github/linguist/issues/new) and include a link to the repository or a sample of the code that is being misclassified. [search-limits]: https://docs.github.com/github/searching-for-information-on-github/searching-code#considerations-for-code-search @@ -31,8 +31,8 @@ Linguist does not consider [vendored code](/docs/overrides.md#vendored-code), [g If the language statistics bar is not showing your language at all, it could be for a few reasons: 1. Linguist doesn't know about your language. -1. The extension you have chosen is not associated with your language in [`languages.yml`](/lib/linguist/languages.yml). -1. All the files in your repository fall into one of the categories listed above that Linguist excludes by default. +2. The extension you have chosen is not associated with your language in [`languages.yml`](/lib/linguist/languages.yml). +3. All the files in your repository fall into one of the categories listed above that Linguist excludes by default. If Linguist doesn't know about the language or the extension you're using, consider [contributing](/CONTRIBUTING.md) to Linguist by opening a pull request to add support for your language or extension. For everything else, you can use the [manual overrides](/docs/overrides.md) feature to tell Linguist to include your files in the language statistics.