Skip to content

Commit

Permalink
feat: Improve validation
Browse files Browse the repository at this point in the history
This commit introduces two different validation modes:
- Strict (default): Only allows letters, digits, hyphens
- Lax: Allows any octets and just checks for the max lengths

This allows domains to have an underscore character. Closes #134

BREAKING CHANGE: Introduces a dependency on the global `TextEncoder` constructor which should be available in all modern engines
(see https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder). The strict validation mode (which is the default) will also be a little bit more strict since it will now also check for hyphens at the beginning or end of a domain label. It also requires top-level domain names not to be all-numeric.
  • Loading branch information
jhnns committed Jan 23, 2022
1 parent 4985cc7 commit 171a8c8
Show file tree
Hide file tree
Showing 6 changed files with 387 additions and 80 deletions.
83 changes: 78 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Since domain name registrars organize their namespaces in different ways, it's n
import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain(
// This should be a string with basic latin characters only.
// This should be a string with basic latin letters only.
// More information below.
"www.some.example.co.uk"
);
Expand All @@ -32,7 +32,7 @@ if (parseResult.type === ParseResultType.Listed) {
}
```

This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol) and [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.
This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with [`Symbol()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol), [`URL()`](https://developer.mozilla.org/en-US/docs/Web/API/URL) and [`TextDecoder()](https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder) globally available. You need to transpile it down to ES5 (e.g. by using [Babel](https://babeljs.io/)) if you need to support older environments.

The list of top-level domains is stored in a [trie](https://en.wikipedia.org/wiki/Trie) data structure and serialization format to ensure the fastest lookup and the smallest possible library size. The library is side-effect free (this is important for proper [tree-shaking](https://webpack.js.org/guides/tree-shaking/)).

Expand Down Expand Up @@ -96,7 +96,7 @@ When parsing a hostname there are 5 possible results:

### 👉 Invalid domains

The given input is first validated against [RFC 1034](https://tools.ietf.org/html/rfc1034). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:
The given input is first validated against [RFC 3696](https://datatracker.ietf.org/doc/html/rfc3696#section-2) (the domain labels are limited to basic latin letters, numbers and hyphens). If the validation fails, `parseResult.type` will be `ParseResultType.Invalid`:

```javascript
import { parseDomain, ParseResultType } from "parse-domain";
Expand All @@ -108,6 +108,20 @@ console.log(parseResult.type === ParseResultType.Invalid); // true

Check out the [API](#api-ts-ValidationError) if you need more information about the validation error.

If you don't want the characters to be validated (e.g. because you need to allow underscores in hostnames), there's also a more relaxed validation mode (according to [RFC 2181](https://www.rfc-editor.org/rfc/rfc2181#section-11)).

```javascript
import { parseDomain, ParseResultType, Validation } from "parse-domain";

const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});

console.log(parseResult.type === ParseResultType.Listed); // true
```

See also [#134](https://github.com/peerigon/parse-domain/issues/134) for the discussion.

### 👉 IP addresses

If the given input is an IP address, `parseResult.type` will be `ParseResultType.Ip`:
Expand Down Expand Up @@ -273,17 +287,27 @@ console.log(topLevelDomains); // []
🧬 = TypeScript export

<h3 id="api-js-parseDomain">
🧩 <code>export parseDomain(hostname: string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a>): <a href="#api-ts-ParseResult">ParseResult</a></code>
🧩 <code>export parseDomain(hostname: string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a>, options?: <a href="#api-ts-ParseDomainOptions">ParseDomainOptions</a>): <a href="#api-ts-ParseResult">ParseResult</a></code>
</h3>

Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin characters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.
Takes a hostname (e.g. `"www.example.com"`) and returns a [`ParseResult`](#api-ts-ParseResult). The hostname must only contain basic latin letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.

```javascript
import { parseDomain } from "parse-domain";

const parseResult = parseDomain("www.example.com");
```

Use `Validation.Lax` if you want to allow all characters:

```javascript
import { parseDomain, Validation } from "parse-domain";

const parseResult = parseDomain("_jabber._tcp.gmail.com", {
validation: Validation.Lax,
});
```

<h3 id="api-js-fromUrl">
🧩 <code>export fromUrl(input: string): string | typeof <a href="#api-js-NO_HOSTNAME">NO_HOSTNAME</a></code>
</h3>
Expand All @@ -296,6 +320,54 @@ Takes a URL-like string and tries to extract the hostname. Requires the global [

`NO_HOSTNAME` is a symbol that is returned by [`fromUrl`](#api-js-fromUrl) when it was not able to extract a hostname from the given string. When passed to [`parseDomain`](#api-js-parseDomain), it will always yield a [`ParseResultInvalid`](#api-ts-ParseResultInvalid).

<h3 id="api-ts-ParseDomainOptions">
🧬 <code>export type ParseDomainOptions</code>
</h3>

```ts
export type ParseDomainOptions = {
/**
* If no validation is specified, Validation.Strict will be used.
**/
validation?: Validation;
};
```

<h3 id="api-js-Validation">
🧩 <code>export Validation</code>
</h3>

An object that holds all possible [Validation](#api-ts-Validation) `validation` values:

```javascript
export const Validation = {
/**
* Allows any octets as labels
* but still restricts the length of labels and the overall domain.
*
* @see https://www.rfc-editor.org/rfc/rfc2181#section-11
**/
Lax: "LAX",

/**
* Only allows ASCII letters, digits and hyphens (aka LDH),
* forbids hyphens at the beginning or end of a label
* and requires top-level domain names not to be all-numeric.
*
* This is the default if no validation is configured.
*
* @see https://datatracker.ietf.org/doc/html/rfc3696#section-2
*/
Strict: "STRICT",
};
```

<h3 id="api-ts-Validation">
🧬 <code>export Validation</code>
</h3>

This type represents all possible `validation` values.

<h3 id="api-ts-ParseResult">
🧬 <code>export ParseResult</code>
</h3>
Expand Down Expand Up @@ -391,6 +463,7 @@ const ValidationErrorType = {
LabelMinLength: "LABEL_MIN_LENGTH",
LabelMaxLength: "LABEL_MAX_LENGTH",
LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
LastLabelInvalid: "LAST_LABEL_INVALID",
};
```

Expand Down
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
"import": "./build-esm/src/main.js"
},
"scripts": {
"test": "jest",
"test": "run-p test:*",
"test:suite": "jest",
"posttest": "run-s build posttest:*",
"posttest:lint": "eslint --cache --ext js,ts *.js src bin",
"build": "run-s build:*",
Expand Down
2 changes: 1 addition & 1 deletion src/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ export {
ParseResultListed,
} from "./parse-domain";
export { fromUrl, NO_HOSTNAME } from "./from-url";
export { ValidationError, ValidationErrorType } from "./sanitize";
export { Validation, ValidationError, ValidationErrorType } from "./sanitize";
189 changes: 160 additions & 29 deletions src/parse-domain.test.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { parseDomain, ParseResultType } from "./parse-domain";
import { ValidationErrorType } from "./sanitize";
import { Validation, ValidationErrorType } from "./sanitize";
import { fromUrl } from "./from-url";

const ipV6Samples = [
Expand Down Expand Up @@ -244,25 +244,27 @@ describe(parseDomain.name, () => {
});
});

test("returns type ParseResultType.Invalid and error information for a hostname with an empty label", () => {
expect(parseDomain(".example.com")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelMinLength,
message:
'Label "" is too short. Label is 0 octets long but should be at least 1.',
column: 1,
}),
]),
});
expect(parseDomain("www..example.com")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
column: 5,
}),
]),
test("returns type ParseResultType.Invalid and error information for a hostname with an empty label (both validation modes)", () => {
[Validation.Lax, Validation.Strict].forEach((validation) => {
expect(parseDomain(".example.com", { validation })).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelMinLength,
message:
'Label "" is too short. Label is 0 octets long but should be at least 1.',
column: 1,
}),
]),
});
expect(parseDomain("www..example.com")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
column: 5,
}),
]),
});
});
});

Expand All @@ -277,31 +279,72 @@ describe(parseDomain.name, () => {
});
});

test("returns type ParseResultType.Invalid and error information for a hostname with a label that is too long", () => {
test("returns type ParseResultType.Invalid and error information for a hostname with a label that is too long (both validation modes)", () => {
const labelToLong = new Array(64).fill("x").join("");

expect(parseDomain(`${labelToLong}.example.com`)).toMatchObject({
[Validation.Lax, Validation.Strict].forEach((validation) => {
expect(parseDomain(labelToLong, { validation })).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelMaxLength,
message: `Label "${labelToLong}" is too long. Label is 64 octets long but should not be longer than 63.`,
column: 1,
}),
]),
});
expect(
parseDomain(`www.${labelToLong}.example.com`, { validation })
).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
column: 5,
}),
]),
});
});
// Should work with 63 octets
expect(parseDomain(new Array(63).fill("x").join(""))).toMatchObject({
type: ParseResultType.NotListed,
});
});

test("returns type ParseResultType.Invalid and error information for a hostname that is too long", () => {
const domain = new Array(254).fill("x").join("");

// A single long label
expect(parseDomain(new Array(254).fill("x").join(""))).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelMaxLength,
message: `Label "${labelToLong}" is too long. Label is 64 octets long but should not be longer than 63.`,
column: 1,
type: ValidationErrorType.DomainMaxLength,
message: `Domain "${domain}" is too long. Domain is 254 octets long but should not be longer than 253.`,
column: 254,
}),
]),
});
expect(parseDomain(`www.${labelToLong}.example.com`)).toMatchObject({

// Multiple labels
expect(parseDomain(new Array(128).fill("x").join("."))).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
column: 5,
type: ValidationErrorType.DomainMaxLength,
}),
]),
});

// Should work with 253 octets
expect(parseDomain(new Array(127).fill("x").join("."))).toMatchObject({
type: ParseResultType.NotListed,
});
});

test("returns type ParseResultType.Invalid and error information for a hostname that is too long", () => {
const domain = new Array(127).fill("x").join(".") + "x";
test("interprets the hostname as octets", () => {
// The "ä" character is 2 octets long which is why we only need
// 127 of them to exceed the limit
const domain = new Array(127).fill("ä").join("");

expect(parseDomain(domain)).toMatchObject({
type: ParseResultType.Invalid,
Expand Down Expand Up @@ -362,6 +405,90 @@ describe(parseDomain.name, () => {
});
});

test("accepts any character as labels with Validation.Lax", () => {
// Trying out 2^10 characters
getCharCodesUntil(2 ** 10)
.map((octet) => String.fromCharCode(octet))
.filter((hostname) => hostname !== ".")
.forEach((hostname) => {
const result = parseDomain(hostname, { validation: Validation.Lax });

expect(result).toMatchObject({
type: ParseResultType.NotListed,
});
});
});

test("returns type ParseResultType.Invalid and error information for a hostname where some labels start or end with a -", () => {
expect(parseDomain("-example")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message:
'Label "-example" contains invalid character "-" at column 1.',
column: 1,
}),
]),
});
expect(parseDomain("-example.com")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message:
'Label "-example" contains invalid character "-" at column 1.',
column: 1,
}),
]),
});
expect(parseDomain("example-")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message:
'Label "example-" contains invalid character "-" at column 8.',
column: 8,
}),
]),
});
expect(parseDomain("example-.com")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message:
'Label "example-" contains invalid character "-" at column 8.',
column: 8,
}),
]),
});
});

test("returns type ParseResultType.Invalid and error information for a hostname where the last label just contains numbers", () => {
expect(parseDomain("123")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message: 'Last label "123" must not be all-numeric.',
column: 1,
}),
]),
});
expect(parseDomain("example.123")).toMatchObject({
type: ParseResultType.Invalid,
errors: expect.arrayContaining([
expect.objectContaining({
type: ValidationErrorType.LabelInvalidCharacter,
message: 'Last label "123" must not be all-numeric.',
column: 9,
}),
]),
});
});

test("returns type ParseResultType.Invalid and error information if the input was not domain like", () => {
// @ts-expect-error This is a deliberate error for the test
expect(parseDomain(undefined)).toMatchObject({
Expand Down Expand Up @@ -465,3 +592,7 @@ describe(parseDomain.name, () => {
});
});
});

const getCharCodesUntil = (length: number) => {
return Array.from({ length }, (_, i) => i);
};
Loading

0 comments on commit 171a8c8

Please sign in to comment.