A Crystal Shard for iterating over arbitrarily-sized and typed N-grams. Given
any Iterator(T)
, Ngram(T, N)
will yield StaticArray(T, N)
until the
Iterator is exhausted. The first element and any final element(s) of the
yielded n-grams are bumper_item T
, which must be defined for any T
. The
shard comes with definitions for Char
, String
, and any union with Nil
.
This is based on my ngram_iter rust crate, which was in turn loosely inspired by the ngrams crate.
-
Add the dependency to your
shard.yml
:dependencies: ngram: github: dscottboggs/ngram.cr
-
Run
shards install
require "ngram"
corpus = ('a'..'z').each
bigrams = Ngram(Char, 2).new corpus
bigrams.next # => StaticArray['\u{2060}', 'a']
trigrams = Ngram(Char, 3).new corpus
trigrams.next # => StaticArray['\u{2060}', 'a', 'b']
ten_grams = Ngram(Char, 10).new corpus
ten_grams.next # => StaticArray['\u2060', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
corpus = ([1, 2] of Int32?).each
bigrams = Ngram(Int32?, 2).new corpus
bigrams.next # => StaticArray[nil, 1]
# N must be 2 or more.
NGrams(Char, 1).new ['1'] # This will fail to compile
See the tests for more details on the behavior.
Arbitrary types can be used, but an overload for e.g. type T
,
bumper_item(T.class)
must be implemented in the top-level namespace. Crystal
doesn't really offer a way to document this in the code, but compilation will
fail if the function overload isn't present.
record MyType, data : Int32, valid : Bool do
def self.invalid
new 0, false
end
def initialize(@data : Int32)
@valid = true
end
def to_s
if valid
data.to_s
else
"invalid"
end
end
def to_s(io : IO)
io.write to_s
end
end
def bumper_item(_type : MyType.class)
MyType.invalid
end
data = [MyType.new(1), MyType.new(2)]
ngrams = Ngram(MyType, 2).new data.each
if (ngram = ngrams.next).is_a? Iterator::Stop
raise "unreachable"
else
ngram.map(&.to_s).to_a # => ["invalid", "1"]
end
This is a bit of a contrived example, but it demonstrates the flexibility of
the shard. Of course, you can always just use a nullable type and the bumepr
will be nil
, but it may make more sense to use, for example,
Float64::INFINITY
, or a particular enum variant.
- Fork it (https://github.com/dscottboggs/ngram/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
- D. Scott Boggs - creator and maintainer