Natural language parser, for the Dutch language, that produces nlcst.
- What is this?
- When should I use this?
- Install
- Use
- API
- Algorithm
- Types
- Compatibility
- Security
- Related
- Contribute
- License
This package exposes a parser that takes Dutch natural language and produces a syntax tree.
If you want to handle Dutch natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-dutch
,
which wraps this project to also parse natural language at a higher-level
(easier) abstraction.
For English or most Latin-script languages, you can instead use
parse-english
or parse-latin
.
This package is ESM only. In Node.js (version 16.0+), install with npm:
npm install parse-dutch
In Deno with esm.sh
:
import {ParseDutch} from 'https://esm.sh/parse-dutch@7'
In browsers with esm.sh
:
<script type="module">
import {ParseDutch} from 'https://esm.sh/parse-dutch@7?bundle'
</script>
import {inspect} from 'unist-util-inspect'
import {ParseDutch} from 'parse-dutch'
const tree = new ParseDutch().parse(
'Kunt U zich βs morgens melden bij het afd. hoofd dhr. Venema?'
)
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:62, 0-61)
ββ0 ParagraphNode[1] (1:1-1:62, 0-61)
ββ0 SentenceNode[24] (1:1-1:62, 0-61)
ββ0 WordNode[1] (1:1-1:5, 0-4)
β ββ0 TextNode "Kunt" (1:1-1:5, 0-4)
ββ1 WhiteSpaceNode " " (1:5-1:6, 4-5)
ββ2 WordNode[1] (1:6-1:7, 5-6)
β ββ0 TextNode "U" (1:6-1:7, 5-6)
ββ3 WhiteSpaceNode " " (1:7-1:8, 6-7)
ββ4 WordNode[1] (1:8-1:12, 7-11)
β ββ0 TextNode "zich" (1:8-1:12, 7-11)
ββ5 WhiteSpaceNode " " (1:12-1:13, 11-12)
ββ6 WordNode[2] (1:13-1:15, 12-14)
β ββ0 PunctuationNode "β" (1:13-1:14, 12-13)
β ββ1 TextNode "s" (1:14-1:15, 13-14)
ββ7 WhiteSpaceNode " " (1:15-1:16, 14-15)
ββ8 WordNode[1] (1:16-1:23, 15-22)
β ββ0 TextNode "morgens" (1:16-1:23, 15-22)
ββ9 WhiteSpaceNode " " (1:23-1:24, 22-23)
ββ10 WordNode[1] (1:24-1:30, 23-29)
β ββ0 TextNode "melden" (1:24-1:30, 23-29)
ββ11 WhiteSpaceNode " " (1:30-1:31, 29-30)
ββ12 WordNode[1] (1:31-1:34, 30-33)
β ββ0 TextNode "bij" (1:31-1:34, 30-33)
ββ13 WhiteSpaceNode " " (1:34-1:35, 33-34)
ββ14 WordNode[1] (1:35-1:38, 34-37)
β ββ0 TextNode "het" (1:35-1:38, 34-37)
ββ15 WhiteSpaceNode " " (1:38-1:39, 37-38)
ββ16 WordNode[2] (1:39-1:43, 38-42)
β ββ0 TextNode "afd" (1:39-1:42, 38-41)
β ββ1 PunctuationNode "." (1:42-1:43, 41-42)
ββ17 WhiteSpaceNode " " (1:43-1:44, 42-43)
ββ18 WordNode[1] (1:44-1:49, 43-48)
β ββ0 TextNode "hoofd" (1:44-1:49, 43-48)
ββ19 WhiteSpaceNode " " (1:49-1:50, 48-49)
ββ20 WordNode[2] (1:50-1:54, 49-53)
β ββ0 TextNode "dhr" (1:50-1:53, 49-52)
β ββ1 PunctuationNode "." (1:53-1:54, 52-53)
ββ21 WhiteSpaceNode " " (1:54-1:55, 53-54)
ββ22 WordNode[1] (1:55-1:61, 54-60)
β ββ0 TextNode "Venema" (1:55-1:61, 54-60)
ββ23 PunctuationNode "?" (1:61-1:62, 60-61)
This package exports the identifier ParseDutch
.
There is no default export.
Create a new parser.
ParseDutch
extends ParseLatin
.
See parse-latin
for API docs.
All of parse-latin
is included, and the following support for
the Dutch natural language:
- unit and time abbreviations (
gr.
,sec.
,min.
,ma.
,vr.
,vrij.
,febr.
,mrt.
, and more) - lots of abbreviations: (
Mr.
,Mv.
,Sr.
,Em.
,bijv.
,zgn.
,amb.
, and more) - common elision (omission of letters) (
dβ
,βn
,βns
,βt
,βs
,βer
,βem
,βie
, and more)
This package is fully typed with TypeScript. It exports no additional types.
Projects maintained by me are compatible with maintained versions of Node.js.
When I cut a new major release, I drop support for unmaintained versions of
Node.
This means I try to keep the current release line, parse-dutch@^7
, compatible
with Node.js 16.
This package is safe.
parse-latin
β Latin-script natural language parserparse-english
β English natural language parser
Yes please! See How to Contribute to Open Source.
MIT Β© Titus Wormer