Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Feature/plugin non breaking spaces #55

Merged
merged 18 commits into from
Nov 5, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions packages/plugin-non-breaking-spaces/.npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tsconfig.json
src/
66 changes: 66 additions & 0 deletions packages/plugin-non-breaking-spaces/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# @lokse/plugin-

Plugin for replacing white spaces after single letter characters with non-breking space.

In some languages there are special chracters such as `§ or ¶` or even single letter words like `a` etc.
These characters should not stay by the end of the lines, by the languge typographic rules,
therefore we replace them with non-breking space, to force them appera on a new line.

https://practicaltypography.com/nonbreaking-spaces.html

## Installation

```sh
$ yarn add -D @lokse/plugin-non-breaking-spaces
```

## Usage

Add it into plugins section of lokse config

### Default patterns

This plugin is using regex patterns to find whitespaces before single-letter and other characters.

Patterns names are matching languages defined in your google sheet. Keys are being lowercased by the plugin so even if you provide language `cs-CZ` it will result in `cs-cs`

We currently provide these patterns as default:

(Feel free to contribute with more)

```js
{
cs: /(\s|^)(a|i|k|o|s|u|v|z)(\s+)/gim,
"cs-cz": /(\s|^)(a|i|k|o|s|u|v|z)(\s+)/gim,
}
```
### Options

`useNbsp` — Replacese adds HTML entity for non-breking space instead regular non-breaking white space

`customPatterns` — Adds the possibility to extend the default patterns with custom language patterns, please notice you dont have to pass space matching `(\\s|^)` and `(\\s+)` as it is included in the plugin by default.

```json
{
"plugins": [
{
"name": "@lokse/plugin-non-breaking-spaces",
// options are optional )))
"options": {
useNbsp: true;
customPatterns: {
// provide custom regex patter without flags
// default flag is gim
// use the language code as key (has to be the same as your lang in the spreadsheet)
"ad-HD": "(a|i|k|o|s|u|v|z)"
}
}
}
]
}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document default patterns that work out of the box 🙏


## License
Lokse is licensed under the MIT License.
Documentation is licensed under Creative Commons License.
Created with ♥ by [@horaklukas](https://github.com/horaklukas) and all the great contributors.
31 changes: 31 additions & 0 deletions packages/plugin-non-breaking-spaces/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"name": "@lokse/plugin-non-breaking-spaces",
"description": "",
"version": "1.0.0",
"author": {
"name": "Filip Kubík",
"email": "filip.kubik.dev@gmail.com"
},
"bugs": "https://github.com/AckeeCZ/lokse/issues",
"dependencies": {
"@lokse/core": "^2.0.0"
},
"devDependencies": {},
"engines": {
"node": ">=8.0.0"
},
"homepage": "https://github.com/AckeeCZ/lokse",
"keywords": [
"lokse",
"lokse-plugin"
],
"license": "MIT",
"main": "lib/index.js",
"repository": "AckeeCZ/lokse",
"scripts": {
"build": "tsc -b"
},
"publishConfig": {
"access": "public"
}
}
95 changes: 95 additions & 0 deletions packages/plugin-non-breaking-spaces/src/__tests__/index.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import { Line } from "@lokse/core";

import nonBreakingSpacesPlugin from "..";

import Transformer from "../../../core/lib/transformer/json";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't break packages encapsulation 🙏

Suggested change
import Transformer from "../../../core/lib/transformer/json";
import { transformersByFormat, OutputFormat } from "@lokse/core";
const Transformer = transformersByFormat[OutputFormat.JSON];


describe("Non-breaking spaces plugin", () => {
const logger = { warn: jest.fn(), log: jest.fn() };

beforeEach(() => {
logger.warn.mockReset();
});

describe("transformFullOutput hook", () => {
it("should warn if language pattern is missing", async () => {
const plugin = nonBreakingSpacesPlugin({ logger });

const language = "ad-HD";

const meta = { transformer: Transformer, language };

await plugin.transformFullOutput("some string", meta);

expect(logger.warn).toHaveBeenCalledWith(
expect.stringMatching(
`Pattern for current language ${language} was not found`
)
);
});
});

describe("transformLine hook", () => {
it("should replace white spaces after single letter chars with non-breaking spaces in CS lang", async () => {
const plugin = nonBreakingSpacesPlugin({ logger });

const initialValue =
"A kdyby tady spadli marťani, tak je budeme schvalovat i třeba s marťany. Byly z divokých vajec. K večeři a obědu. O Vánocích. U tebe.";
const targetValue =
"A\u00A0kdyby tady spadli marťani, tak je budeme schvalovat i\u00A0třeba s\u00A0marťany. Byly z\u00A0divokých vajec. K\u00A0večeři a\u00A0obědu. O\u00A0Vánocích. U\u00A0tebe.";

const line = new Line("test.key", initialValue);

const meta = { key: line.key, language: "cs" };

const transformedLine = await plugin.transformLine(line, meta);

expect(transformedLine.value).toBe(targetValue);

expect(logger.warn).not.toHaveBeenCalled();
});

it("should work with custom pattern and lang provided", async () => {
const plugin = nonBreakingSpacesPlugin({
logger,
customPatterns: { "ad-HD": "(a|i|k|o|s|u|v|z)" },
});

const initialValue =
"A kdyby tady spadli marťani, tak je budeme schvalovat i třeba s marťany. Byly z divokých vajec. K večeři a obědu. O Vánocích. U tebe.";
const targetValue =
"A\u00A0kdyby tady spadli marťani, tak je budeme schvalovat i\u00A0třeba s\u00A0marťany. Byly z\u00A0divokých vajec. K\u00A0večeři a\u00A0obědu. O\u00A0Vánocích. U\u00A0tebe.";

const language = "ad-HD";

const line = new Line("test.key", initialValue);

const meta = { key: line.key, language };

const transformedLine = await plugin.transformLine(line, meta);

expect(transformedLine.value).toBe(targetValue);

expect(logger.warn).not.toHaveBeenCalled();
});

it("should replace white spaces after single letter chars with   HTML entity", async () => {
const plugin = nonBreakingSpacesPlugin({ logger, useNbsp: true });

const initialValue =
"A kdyby tady spadli marťani, tak je budeme schvalovat i třeba s marťany. Byly z divokých vajec. K večeři a obědu. O Vánocích. U tebe.";
const targetValue =
"A kdyby tady spadli marťani, tak je budeme schvalovat i třeba s marťany. Byly z divokých vajec. K večeři a obědu. O Vánocích. U tebe.";

const line = new Line("test.key", initialValue);

const meta = { key: line.key, language: "cs" };

const transformedLine = await plugin.transformLine(line, meta);

expect(transformedLine.value).toBe(targetValue);

expect(logger.warn).not.toHaveBeenCalled();
});
});
});
66 changes: 66 additions & 0 deletions packages/plugin-non-breaking-spaces/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import { createPlugin } from "@lokse/core";
import type { GeneralPluginOptions, LoksePlugin } from "@lokse/core";

import { lowerCaseKeys, regexifyValues } from "./utils";

export interface Patterns {
[key: string]: RegExp | string;
}
export interface CustomPatterns {
[key: string]: string;
}

const defaultPatterns: Patterns = {
cs: /(\s|^)(a|i|k|o|s|u|v|z)(\s+)/gim,
"cs-cz": /(\s|^)(a|i|k|o|s|u|v|z)(\s+)/gim,
};

export interface PluginOptions extends GeneralPluginOptions {
useNbsp?: boolean;
customPatterns?: CustomPatterns;
}

// We have to do this in order to process the custom patterns from JSON plugin settings
const normalizeCustomPatterns = (patterns: CustomPatterns) => {
const lowerCasedCustomPatterns = lowerCaseKeys(patterns);
const regexifiedValues = regexifyValues(lowerCasedCustomPatterns);

return regexifiedValues;
};

export default function (options: PluginOptions): LoksePlugin {
const patterns: Patterns = {
...defaultPatterns,
...(options.customPatterns
? normalizeCustomPatterns(options.customPatterns)
: {}),
};

return createPlugin({
transformFullOutput: async (output, meta) => {
const { language } = meta;

const pattern = patterns[language.toLowerCase()];

if (!pattern) {
options.logger.warn(
`Pattern for current language ${language} was not found`
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are you sure you want to log warning for each line in language that doesn't have pattern to replace?
  2. What if I want use same patterns for all supported languages, am I forced to define same patterns for each language separately?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Good point, no. Moved it to transformFullOutput
  2. Using same pattern for all languages could not be wise. Languages have different rules. Sure it makes sense in english uk and us, but thats about it. So I don't know, what do you suggest?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Ok, I understand, lets keep it as it is, I didn't realize the gramatics differencies the feature is based on.

We could maybe provide default definitions for some most used languages like EN/DE. Please create an enhancement ticket for it 🙏

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, makes sense.

#57

}

return output;
},
transformLine: (line, meta) => {
const { language } = meta;

const pattern = patterns[language.toLowerCase()];

if (pattern) {
const replacement = options.useNbsp ? "$1$2 " : "$1$2\u00A0";
line.setValue((value) => value.replace(pattern, replacement));
}

return line;
},
});
}
12 changes: 12 additions & 0 deletions packages/plugin-non-breaking-spaces/src/utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import { mapKeys, mapValues } from "lodash";

import { CustomPatterns } from ".";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this import from '.' ? 🙂


export const lowerCaseKeys = (patterns: CustomPatterns) =>
mapKeys(patterns, (_, key) => key.toLowerCase());

const convertStringToRegex = (string: string) =>
new RegExp(`(\\s|^)${string}(\\s+)`, "gim");

export const regexifyValues = (patterns: CustomPatterns) =>
mapValues(patterns, (value) => convertStringToRegex(value));
13 changes: 13 additions & 0 deletions packages/plugin-non-breaking-spaces/tsconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"extends": "../../tsconfig.json",
"compilerOptions": {
"outDir": "lib",
"rootDir": "src",
},
"include": [
"src/**/*",
],
"exclude": [
"src/**/__tests__/**"
]
}