Skip to content

A Typescript library for Deno that parses text into structured data

License

Notifications You must be signed in to change notification settings

ClaudiuCeia/ts-duckling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ts-duckling

A Typescript library for Deno that parses text into structured data. Inspired by duckling but using a more naive approach with parser combinators.

What this means in practice is that while the library is easy to extend and is relatively light, it will perform badly on larger data sets and it will be much more error prone since the rules for entities are hard coded.

You can test this online here.

When would I use this?

If you have a Deno Typescript codebase, and you want to extract entities from relatively small data samples (ie: blog posts, comments, messages, etc.), this will probably work for you. Even more so if the format of the data is relatively stable.

However, you can expect false positives as well as false negatives in some scenarios since ts-duckling doesn't understand the context surrounding the entities:

// ts-duckling falsely assumes that 6/2022 is a date
const res = Duckling([Time.parser]).extract("6/2022 is 0.00296735905");
/**
   [
      {
        end: 7,
        kind: "time",
        start: 0,
        text: "6/2022 ",
        value: {
          grain: "day",
          when: "2022-01-05T22:00:00.000Z",
        },
      },
    ]
*/

Adding new entity types

type FizzBuzzLanguage = EntityLanguage<
  {
    fizz: Parser<string>;
    buzz: Parser<string>;
    fizzbuzz: Parser<string>;
  },
  string
>;

const Fizzbuzz = createLanguage<FizzBuzzLanguage>({
  fizz: () => fuzzyCase("fizz"),
  buzz: () => fuzzyCase("buzz"),
  fizzbuzz: (s) => mapJoin(seq(s.fizz, s.buzz)),
  parser: (s) => either(s.fizz, s.buzz, s.fizzbuzz),
});

Duckling([Fizzbuzz.parser]).extract(`
    FizzBuzz is a programming problem where you print fizz for multiples
    of 3, buzz for multiples of 5, and fizzbuzz for multiples of both 3 and 5.
`);

License

MIT © Claudiu Ceia

About

A Typescript library for Deno that parses text into structured data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published