Optimize spell-checking dictionary files across projects (OSOE-523) #346

BenedekFarkas · 2022-12-29T17:53:31Z

Currently https://github.com/Lombiq/GitHub-Actions is the owner spell-checking dictionary files, even for entries specific to OSOCE.

Here are a couple of ideas for improving how OSOCE and other repositories consume spell-checking configuration from LGHA, so there are no hidden dependencies. For example, Utility Scripts and Infrastructure Scripts relies completely on LGHA for its spell-checking dictionaries (most importantly allow.txt and expect.txt), but those are referenced by other repositories too. As a result, spell-checking reports in different repositories have a varying number of irrelevant entries, which makes it harder to maintain them.

With this, OSOCE will be a template for consuming LGHA as they were completely independent.

LGHA should continue to do its own spell-checking, but the allow.txt and expect.txt dictionary files should be specific to LGHA only.
Generic entries should be added to scope-specific files, such as lombiq.txt or orchard.txt and such. Entries that are specific to a submodule of OSOCE can also be considered generic if we can expect that they will be used as a submodule in other repositories too (let's just put them all in osoce.txt?).
OSOCE-specific entries should be added to allow.txt and expect.txt files in its root .github/actions/spelling folder. Unrecognized entries should be removed.
As a result of the previous points, OSOCE should skip LGHA from its own spell-checking.
Other consumer repositories (e.g., Utility Scripts and Infrastructure Scripts) need to be re-checked and some further entries added, so it will be a relatively complex endeavor.
Does LGHA actually need to be a submodule of OSOCE? If it's only used through actions/workflows, then decoupling would make LGHA contributions easier by not having to maintain PRs there too (and breaking changes can occur outside OSOCE, so testing with OSOCE should still happen, but it isn't always enough).

Jira issue

The text was updated successfully, but these errors were encountered:

Piedone · 2022-12-29T18:20:20Z

Can you show an example of "As a result, spell-checking reports in different repositories have a varying number of irrelevant entries"? I don't know what that means.

Re 1. and 3. I don't actually know why we have spell-checking in LGHA separate from OSOCE, but it's not something that we necessarily need to keep.

Technically it doesn't, but it's better this way, because a) most of the changes in LGHA needs to be tested with a project like OSOCE (while that may not necessarily be enough in itself, it mostly is and can almost never be left out) and b) it gets PS static code analysis this way with the same config as OSOCE. Depending on how we can implement Lint YAML files (OSOE-188) NodeJs-Extensions#11, it might necessitate YML files to be under a .NET project too.

BenedekFarkas · 2022-12-30T11:45:38Z

Replying out of order, because of dependencies, but formatting gets really weird...

(4.) The spell-checking report also lists allowed words that aren't actually used: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/3806450757/jobs/6475261798#step:4:139. But this is even more apparent in LGHA, because a lot of entries are added just for the sake of OSOCE: https://github.com/Lombiq/Open-Source-Orchard-Core-Extensions/actions/runs/3806450757/jobs/6475261798#step:4:115.

(6.) If we can enforce opening a PR in OSOCE (as in validate, like now) without it having to be a submodule, then managing the issues/PRs would be easier.

(1.) But even if you open a PR in OSOCE, you still need to make changes to the workflows to reference the issue branch, which is hard to enforce / validate automatically, and having LGHA as a submodule doesn't fix that.
(2.) Adding instructions about it to LGHA's issue template on GH would give clear instructions at least. I'm also thinking about external contributors here, but it would be helpful for our team as well.
(3.) Add GHA action/workflow to run PS analysis (OSOE-504) PowerShell-Analyzers#17 would remove another hurdle too. PS analyzers doesn't need to be a submodule of OSOCE either with that (but at least, it shouldn't be a submodule of both OSOCE and Utility Scripts, because that also makes managing PRs so much more complicated).

(1. and 3.): With the changes in 6., LGHA will need to have its own spell-checking, but that's fine. What I don't like about the current setup (and why more separation of concerns would be beneficial) is that OSOCE's spell-checking configuration affects the base LGHA configuration (and LGHA's own spell-checking), but also a bunch of completely independent projects through that as well. This makes it a lot harder to test changes (which I experienced when working on multiple complex spell-checking issues), because it's easy to break unrelated projects, like a closed client-project or Infrastructure Scripts.

Piedone · 2023-01-01T22:20:31Z

I'm not sure what you mean this as an argument for, but note that while OSOCE is a project (solution) in itself, we use the same extensions in a number of other repos/solutions. So, if you want spell-checking for a given repo, where we utilize OSOCE projects as submodules, then we'll also need to include the same Spelling configuration (barring if after Use per-line spell-checking ignore pattern and remove one-off ignores (OSOE-491) GitHub-Actions#134 it turns out that not much of OSOCE config should be in LGHA because it's not actually general enough). So, LGHA and OSOCE Spelling configuration overlapping is not an issue in itself, given that the majority of projects where we use its Spelling workflow are projects that use projects from OSOCE. If we can make this configuration shared (or cascading) better, then let's do that. I.e., still have a way that I can add the Spelling workflow to my OC project that utilizes OSOCE submodules, with no configuration additional configuration.
Let's see what we need for Lint YAML files (OSOE-188) NodeJs-Extensions#11. BTW @DAud-IcI the submodule was your idea, please chime in if you have something to add.

BenedekFarkas · 2023-01-10T11:55:43Z

(4.) Using a submodule (that is also a submodule of OSOCE) in a consumer project is a case we need to cover, sure.
But what I'm saying is that the current setup is not just an overlap of configuration, LGHA is completely in control of OSOCE's spell checking, even though LGHA is a generic "component". So even if you want to change something OSOCE-specific, you need to go through LGHA with a PR, which is extra work we can trim down.
OSOCE having its own dictionary/dictionaries that are specific to the submodules it hosts would simplify this process, because you wouldn't need to change branch references in LGHA either (since there would be no need for a new branch/PR in LGHA). If you have a submodule in another project, then:

You can ignore them from spell checking, because changing anything would require opening an OSOCE PR too anyway. OrchardCore.Commerce is a similar situation (but not the same, because it runs spell checking in itself, but the point is that spell checking is taken care of), that's why it's ignored from spell checking in DotNest Tenants Core.
You can reference OSOCE's own dictionary file in the consumer project.

(5.) Linting would be a great addition, but I wouldn't keep LGHA in OSOCE just to be able to use NodeJS Extensions from there. NodeJS Extensions providing an action/workflow for YAML linting (and might also be nice for other types of code analysis to make it more accessible and compelling for 3rd parties to use) would be better to avoid such technical dependencies between these projects that are otherwise independent, similarly to Lombiq/PowerShell-Analyzers#17.

Piedone · 2023-01-10T15:29:12Z

If you add an OSOCE submodule to your solution, and use the Spelling workflow, then without using the same config that OSOCE does, you may get false spelling errors. What I'm saying here is, that whatever we do, using OSOCE submodules and Spelling in a project shouldn't necessitate any more configuration than something like:

  spelling:
    name: Spelling
    uses: Lombiq/GitHub-Actions/.github/workflows/spelling.yml@dev

Or at most something like:

  spelling:
    name: Spelling
    uses: Lombiq/GitHub-Actions/.github/workflows/spelling.yml@dev
    with:
      spell-check-this: Lombiq/Open-Source-Orchard-Core-Extensions@dev

And if you need cascading overrides like we have with DotNest Tenants Core, then that should be possible too.

So, in the end, you get this:

If you need generic spell-checking in any project, you use the default config from LGHA.
If you need generic spell-checking for an OC solution using OSOCE submodules, you use the default config from somewhere (Also LGHA, another config? Or OSOCE?).
In any case, if you need customization, you can add that on top of the defaults (vs having to copy configuration from the defaults).

If we have this, then I'm all for it.

Ideally yes, but as NE stands today, it needs a .NET solution and .NET projects (due to MSBuild running the whole thing in the end).

BenedekFarkas · 2023-01-11T11:01:56Z

We are in agreement, also about making the necessary configuration as lean possible, but note that a very minimalistic configuration (for consumer projects) means that you have to generalize a lot (in LGHA). That's not why I opened this issue (but they are connected), more like the fact that every project that uses spell-checking by default inherits the config that is OSOCE-specific, which is not necessary (in currently half of the cases) and makes working on spell-checking in LGHA a lot harder by having to re-check every consumer project to avoid breaking changes. There are 8 of those currently, including OrchardCore.Commerce, which is not affected by recent breaking changes, because it fully copied dictionary files (which is also not desirable, because maintaining them becomes more difficult) with 200+ entries it doesn't actually need.

My original point is that development of the spelling component is very tedious due to too much shared configuration and reliance on OSOCE-specific configuration inherited from LGHA.

Re: 4.: Something similar should be possible and what I'm proposing doesn't prevent this. BUT: If the project requires any other additional dictionary file from an external source, then it's not possible like this, see 6.3 below.
The bottom line (and the explanation for my original point) is that OSOCE should have its own dictionary specific to its submodules and projects that you can reference in another project that shares a submodule with OSOCE. And we're still on the same page that it should be possible with the least configuration necessary, but 2 out of the 3 (AFAIK) projects that are affect already override the list of prefixes and dictionary files, so this would mean just adding one more line to both.

6.1. If you need generic spell-checking in any project, you use the default config from LGHA.
Depends on what you call generic. I'd say that it's the common denominator, i.e., the English dictionary, filetypes, software-terms and aws (for generic cloud terminology). We have css, npm, rust, scala in the default config even for projects that don't use any or all of those technologies, e.g., Utility Scripts and Infrastructure Scripts. Note that it's still possible that some words are used from those dictionaries, but I'd consider those cases a coincidence that need to be investigated not to produce false-negatives due to using those dictionaries.
We can also say that OC/web development should be considered generic, and Utility Scripts and Infrastructure Scripts can/should override the default list of dictionaries to remove what it doesn't need.

6.2. If you need generic spell-checking for an OC solution using OSOCE submodules, you use the default config from somewhere (Also LGHA, another config? Or OSOCE?).
That is part of my original vision of these changes, but from a different angle: another project that has nothing to do with OSOCE submodules shouldn't inherit OSOCE-specific configuration, so it should be opt-in.

6.3. In any case, if you need customization, you can add that on top of the defaults (vs having to copy configuration from the defaults).
That's not possible now either, because adding any prefix or dictionary file means that you have to override the default configuration list of prefixes / dictionary files.

6.4. And if you need cascading overrides like we have with DotNest Tenants Core, then that should be possible too.
AFAIR that's not what happens due to 6.3. DotNest Tenants Core and DotNest Core Sites SDK both have a completely custom configuration, but both reference allow.txt and expect.txt from LGHA.
This has actually been bothering me for a while too, because if you need the default configuration + one more dictionary file from another source, then you need to start with copying the default configuration, like DotNest Tenants Core and DotNest Sites Core SDK does.
This can be fixed by adding new workflow parameters that are merged with the defaults (so you can override the defaults if you want to, but you also have the option to keep them and add more) or define the defaults not as default value of the parameter, but merge with whatever is passed in to the current parameters (but in both cases, the action needs to take care of the merge logic).

Piedone · 2023-01-11T16:30:11Z

Great then!

Re 6.3 and 6.4: Please chime in here: check-spelling/check-spelling#45.

github-actions bot changed the title ~~Optimize spell-checking dictionary files across projects~~ Optimize spell-checking dictionary files across projects (OSOE-523) Dec 29, 2022

BenedekFarkas mentioned this issue Jan 18, 2023

OSOE-84: New words to allow for Node.js Extensions Lombiq/GitHub-Actions#186

Merged

BenedekFarkas self-assigned this Feb 7, 2023

This was referenced Feb 7, 2023

Use per-line spell-checking ignore pattern and remove one-off ignores (OSOE-491) Lombiq/GitHub-Actions#134

Closed

OSOE-523: Optimize spell-checking dictionary files across projects #397

Merged

BenedekFarkas mentioned this issue Feb 15, 2023

Use "advice" instead of "advices" in .github/actions/spelling/advice.md (OSOE-570) Lombiq/GitHub-Actions#210

Closed

BenedekFarkas mentioned this issue Feb 23, 2023

Spell-checking should allow reusable configuration files (OSOE-579) Lombiq/GitHub-Actions#214

Open

BenedekFarkas closed this as completed in #397 Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize spell-checking dictionary files across projects (OSOE-523) #346

Optimize spell-checking dictionary files across projects (OSOE-523) #346

BenedekFarkas commented Dec 29, 2022 •

edited

Loading

Piedone commented Dec 29, 2022 •

edited

Loading

BenedekFarkas commented Dec 30, 2022 •

edited

Loading

Piedone commented Jan 1, 2023

BenedekFarkas commented Jan 10, 2023

Piedone commented Jan 10, 2023

BenedekFarkas commented Jan 11, 2023 •

edited

Loading

Piedone commented Jan 11, 2023

Optimize spell-checking dictionary files across projects (OSOE-523) #346

Optimize spell-checking dictionary files across projects (OSOE-523) #346

Comments

BenedekFarkas commented Dec 29, 2022 • edited Loading

Piedone commented Dec 29, 2022 • edited Loading

BenedekFarkas commented Dec 30, 2022 • edited Loading

Piedone commented Jan 1, 2023

BenedekFarkas commented Jan 10, 2023

Piedone commented Jan 10, 2023

BenedekFarkas commented Jan 11, 2023 • edited Loading

Piedone commented Jan 11, 2023

BenedekFarkas commented Dec 29, 2022 •

edited

Loading

Piedone commented Dec 29, 2022 •

edited

Loading

BenedekFarkas commented Dec 30, 2022 •

edited

Loading

BenedekFarkas commented Jan 11, 2023 •

edited

Loading