Skip to content

Notes on processing of text documents and unstructured data.

Notifications You must be signed in to change notification settings

zero-overhead/Information-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Information Retrieval

Notes on processing of text documents and unstructured data.

Counting frequencies of unique words

... and return the the most frequent n words one per line in descending order.

Raku

$*IN: Standard input filehandle, STDIN.

#!/usr/bin/env raku

sub MAIN {
    # Top 3 words
    say $*IN.words.Bag.sort(-*.value).head(3)
}
bin/counting-frequencies-of-unique-words.raku -n=3 -format=json -in=inputs/kjvbible.txt

Create N-grams from strings

#!/usr/bin/env raku
use v6.e.PREVIEW;
sub MAIN {
    # create-n-grams-minimal.raku < inputs/kjvbible.txt
    # 5 most common 3-Grams
    say $*IN.words.grep(*.chars >= 3.comb(3 => -2).Bag.sort(-*.value).head(5)
}
bin/create-n-grams.raku -n=3 -m=10 --format=json < inputs/kjvbible.txt

About

Notes on processing of text documents and unstructured data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages