Skip to content

Linear-progressive text discovery engine exposing functionality through simple service APIs. Break text into stream of token/non-token slices. Tokens can be annotated with search term matches. Using adapters for popular DOM libraries (HtmlAgilityPack, AngleSharp), you can highlight HTML, break HTML at a word count, and more.

Notifications You must be signed in to change notification settings

mvantzet/TextDiscovery

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

TextDiscovery

Linear-progressive text discovery engine in C#. Exposes functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word detection, tokenization, and matching options are configurable. Use a common adapter interface with interchangeable DOM libraries (HtmlAgility, AngleSharp, etc.) to do the following: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and extract text content with selectively preserved formatting indicators. High degree of extensibility leveraging dependency injection. While regex can be used in advanced configurations, it is not required.

About

Linear-progressive text discovery engine exposing functionality through simple service APIs. Break text into stream of token/non-token slices. Tokens can be annotated with search term matches. Using adapters for popular DOM libraries (HtmlAgilityPack, AngleSharp), you can highlight HTML, break HTML at a word count, and more.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C# 74.1%
  • HTML 25.9%