TSafer (TypeSafer)

A Natural Language Processing app which predicts the next word you want to enter.

For training were used 50% blogs, 40% news and 60% twitter. That allowed to get 75% of unique words from all corpuses.

TSafer uses interpolated Kneser-Ney smoothing for 4,3,2,1 grams and back-off model for unseen words. The higher ngrams coefficients are computed with the formula:

the lowest:

TSafer uses

precomputed values for Kneser-Ney coefficients for every word
stored in R data.table with hashed index
with each query processed by recursive function

All these make it work realy fast.

For text processing regexp was used, RWeka for ngramization.

Learning and processing are paralled with doParallel.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
main		main
test		test
tmp		tmp
.gitignore		.gitignore
ReadME.md		ReadME.md
global.R		global.R
learning.R		learning.R
run.tests.R		run.tests.R
server.R		server.R
ui.R		ui.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSafer (TypeSafer)

A Natural Language Processing app which predicts the next word you want to enter.

About

Releases

Packages

Languages

Kudryavets/typesafer-nlp-r

Folders and files

Latest commit

History

Repository files navigation

TSafer (TypeSafer)

A Natural Language Processing app which predicts the next word you want to enter.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages