This is a GO implementation of Double-ARray Trie System. It's a clone of the C++ version
Darts can be used as simple hash dictionary. You can also do very fast Common Prefix Search which is essential for morphological analysis, such as word split for CJK text indexing/searching.
What is Trie
An Implementation of Double-Array Trie
- Support building Double-Array from DAWG有向无环图, reduce the on-disk dict half as Trie. Lookup performance increases 25%.
- Documentation/comments
- Benchmark
gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w darts.go
gofmt -tabs=false -tabwidth=4 -r='rune /*Key_type*/ -> byte /*Key_type*/' -w dawg.go
Key\tFreq
Each key occupies one line. The file should be utf-8 encoded
package main
import (
"darts"
"fmt"
)
func main() {
d, err:= darts.Import("darts.txt", "darts.lib", true)
if err == nil {
if d.ExactMatchSearch([]rune("考察队员", 0)) {
fmt.Println("考察队员 is in dictionary")
}
}
}
package main
import (
"darts"
"fmt"
)
func main() {
d, err := darts.Import("darts.txt", "darts.lib", true)
if err == nil {
key := []byte("考察队员")
r := d.CommonPrefixSearch(key, 0)
for i := 0; i < len(r); i++ {
fmt.Println(string(key[:r[i].PrefixLen]))
}
}
}
Using a 100K item dictionary, a simple search on eath key takes go map 46 ms, takes byte_key version of darts 14 ms, and for unicode_key version of darts 9.5 ms.
Apache License 2.0