R Documentation | Release Notes | FAQ | Multilingual pretrained models
R wrapper for fastText C++ code from Facebook.
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
© Contributors, 2018. Licensed under a MIT license.
You can install the fastrtext
package from Cran or Github as follows:
# From Cran
install.packages("fastrtext")
# From Github
# install.packages("devtools")
devtools::install_github("pommedeterresautee/fastrtext")
All the updated documentation can be reached at this address.
API documentation can be reached at this address.
In particular, command line options are listed there.
Data for a multi-class task are embedded in this package.
Follow this link to learn a model and then measure the accuracy in 5 minutes.
Data for a word representation learning task are embedded in this package.
Following this link will route you to a 5mn tutorial to learn vectorial representation of words (aka word embeddings):
Why not use the command line client?
- You can call the client from the client using
system("fasttext ...")
; - To get prediction, you will need to write file, make predictions from the command line, then read the results ;
fastrtext
makes your life easier by making all these operations in memory ;- It takes less time, and use less commands ;
- Easy to install from R directly.
Why not use fastTextR ?
fastrtext
implements both supervised and unsupervised parts offastText
(fastTextR
implements only the unsupervised part) ;- with
fastrtext
, predictions can be done in memory (fastTextR
requires to write the sentence on hard drive and requires you to read the predictions after) ; - fastText original source code embedded in fastTextR is not up to date (miss several new features, bug fixes since January 2017).
Please cite 1 if using this code for learning word representations or 2 if using for text classification.
[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}
[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
@article{joulin2016bag,
title={Bag of Tricks for Efficient Text Classification},
author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.01759},
year={2016}
}
[3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models
@article{joulin2016fasttext,
title={FastText.zip: Compressing text classification models},
author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas},
journal={arXiv preprint arXiv:1612.03651},
year={2016}
}
(* These authors contributed equally.)