Skip to content

Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter

License

Notifications You must be signed in to change notification settings

astoilkov/segmenter

Repository files navigation

segmenter

Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter

Gzipped Size Build Status

Install

npm install segmenter

Why

  • Intl.Segmenter is supported in all major browsers and 94% of users have it available — it's time for adoption.
  • If you have a use case other than iterating over all graphemes/words/sentences in a text, then Intl.Segmenter might be a little hard to work with.
  • In many cases, working with graphemes is preferable to characters. Graphemes are what the end user sees. For example, the emoji 👨‍🔧️ is:
    • a single grapheme
    • '👨‍🔧️'.length returns 6
    • for of looping 👨‍🔧️ will make 4 iterations
  • Before Intl.Segmenter, working with graphemes required libraries like graphemer which is 94KB in size.

Usage

import { graphemeAt, graphemeRangeAt, wordAt, wordRangeAt } from "segmenter";

graphemeAt("👨‍🔧️ the fixer", 0); // 👨‍🔧️
graphemeAt("👨‍🔧️ the fixer", 5); // 👨‍🔧️

graphemeRangeAt("👨‍🔧️ the fixer", 0); // { start: 0, end: 6 }
graphemeRangeAt("👨‍🔧️ the fixer", 3); // { start: 0, end: 6 }

wordAt("hello-world", 0); // "hello"

wordRangeAt("hello-world", 0); // { start: 0, end: 5 }

API

Graphemes

graphemeAt(string: string, position: number): string | undefined

Get the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.

graphemeRangeAt(string: string, position: number): { start: number; end: number; } | undefined

Get the start and end positions of the grapheme at position in string. Returns undefined if position is out of bounds or string is empty.

graphemes(string: string): string[]

Get all graphemes in the string as Array.

Words

wordAt(string: string, position: number): string | undefined

Get the word at position in string. Returns undefined if position is out of bounds or string is empty.

wordRangeAt(string: string, position: number): { start: number; end: number; } | undefined

Get the start and end positions of the word at position in string. Returns undefined if position is out of bounds or string is empty.

words(string: string): string[]

Get all words in the string as Array.

Sentences

Note: Intl.Segmenter doesn't do a perfect job of detecting sentences. For example, I went to Dr. Smith's office will be split into two sentences.

sentenceAt(string: string, position: number): string | undefined

Get the sentence at position in string. Returns undefined if position is out of bounds or string is empty.

sentenceRangeAt(string: string, position: number): { start: number; end: number; } | undefined

Get the start and end positions of the sentence at position in string. Returns undefined if position is out of bounds or string is empty.

sentences(string: string): string[]

Get all sentences in the string as Array.

About

Work with grapheme, words, and sentences with small, simple, and fast API using Intl.Segmenter

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published