Skip to content

mattmacleod/aho_corasick_matcher

 
 

Repository files navigation

Aho-Corasick Matcher Build Status

A Ruby gem for finding strings in text using the Aho-Corasick string matching search.

Aho-Corasick is O(n + m) where n is the size of the string to be searched and m is the size of the dictionary. This means it's particularly suited for searching for occurrences of words using large dictionaries, as the runtime increases only linearly.

It's quite memory-intensive, and building a matcher is expensive – but once it's been built, matching terms is very fast.

Current version: 1.0.2
Supported Ruby versions: 1.8.7, 1.9.2, 1.9.3, 2.0, 2.1, 2.2, jruby-1.7, rbx-2.5

Usage

require 'aho_corasick_matcher'

matcher = AhoCorasickMatcher.new(['a', 'b', 'ab'])
matcher.match('aba')
#=> ['a', 'ab', 'b', 'a']

matcher = AhoCorasickMatcher.new(['thistle', 'sift', 'thistles'])
matcher.match('Theophilus thistle, the successful thistle sifter, in sifting a sieve full of un-sifted thistles, thrust three thousand thistles through the thick of his thumb.')
#=> ["thistle", "thistle", "sift", "sift", "sift", "thistle", "thistles", "thistle", "thistles"]

Thanks

Loosely based on Tim Cowlishaw's implementation of the same algorithm.

License

Copyright © 2015-2016 Altmetric LLP

Distributed under the MIT License.

About

Fast, pure-Ruby Aho-Corasick string search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%