Go package words
provides capabilities for extracting words from a string, by a collection of rules.
- Invalid UTF8-strings will not be split
- Hyphenated words will be treated as individual words unless disabled. E.g.
"small-town" => []{"small", "town"}
- If the character is a space, punctuation or symbol, it will be voided, unless disabled. E.g.
"my_string here" => []{"my", "string", "here"}
- Characters of same type in sequence, will be put together.
- If the current character is a lowercase, and the last character of the previous word was uppercase, the uppercase letter will be moved to the lowercase string. E.g.
"YAMLParser" => []{"YAML", "Parser"}
$ go get github.com/imbue11235/words
words.Extract("Do you prefer camelCase to snake_case?")
// => []string{"Do", "you", "prefer", "camel", "case", "to", "snake", "case")
words.Extract("YAMLParser")
// => []string{"YAML", "Parser"}
words.Extract("Bose QC35")
// => []string{"Bose", "QC", "35"}
To further customize the extraction, options can be passed to the extract-method.
To include punctuation
words.Extract("So, now punctuation will be included.", words.IncludePunctuation())
// => []string{"So", ",", "now", "punctuation", "will", "be", "included", "."}
To include spaces
words.Extract("So many spaces", words.IncludeSpaces())
// => []string{"So", " ", "many", " ", "spaces"}
To include symbols
words.Extract("Some>String", words.IncludeSymbols())
// => []string{"Some", ">", "String"}
To allow hyphenated words
words.Extract("An anti-clockwise direction", words.AllowHyphenatedWords())
// => []string{"An", "anti-clockwise", "direction"}
To use multiple options at the same time
words.Extract("Using multiple options!" words.IncludeSpaces(), words.IncludePunctuation())
// => []string{"Using", " ", "multiple", " ", "options", "!"}
This project is licensed under the MIT license.