Releases: bnosac/udpipe
Releases · bnosac/udpipe
CRAN Release 0.8.11
CRAN Release 0.8.10
CHANGES IN udpipe VERSION 0.8.10
- use snprintf instead of sprintf to handle the R CMD check deprecating note on M1mac
- reduction of timings of the examples of document_term_matrix, document_term_frequencies, document_term_frequencies_statistics, cooccurrence, dtm_bind, keywords_collocation
CRAN Release 0.8.9
CHANGES IN udpipe VERSION 0.8.9
- fix R CMD check message on Fedora clang infrastructure: rcpp_udpipe.cpp:243:8: warning: use of bitwise '&' with boolean operands
CRAN Release 0.8.8
CHANGES IN udpipe VERSION 0.8.8
- dtm_svd_similarity, fix to make sure that if provided a dtm with features which are all missing/zero,
the scoring still works as expected instead of removing features which contain no data whatsoever.
So that dtm_svd_similarity can be used alongside embeddings of R package word2vec which might contain words which are not in the dtm. See the example in ?dtm_svd_similarity - added txt_grepl
CHANGES IN udpipe VERSION 0.8.7
- txt_count now always returns an integer, even if in the border case where a character vector of length 0 is supplied
CRAN Release 0.8.6
CHANGES IN udpipe VERSION 0.8.6
- Downloading models to paths containing non-ASCII characters now works (issue #95)
- strsplit.data.frame gains ... which are passed on to strsplit (e.g. to use fixed=TRUE for speeding up)
- read_connlu is now using fixed=TRUE when splitting by newline symbol (for speeding up parsing with function udpipe)
- Added txt_paste
- Added txt_context
- Use html_vignette instead of html_document in the vignettes in order to reduce package size
CRAN Release 0.8.5
CHANGES IN udpipe VERSION 0.8.5
- Added document_term_matrix.default, document_term_matrix.integer and document_term_matrix.numeric
- Added groups argument to dtm_colsums and dtm_rowsums
- Added dtm_align
- Added dtm_sample
- Added document_term_matrix.matrix
- dtm_cbind and dtm_rbind allow to pass more than 2 sparse matrices
- cbind_morphological gains argument which to specify which morphological features to extract
- txt_count now returns NA when NA is provided instead of an error
- txt_contains now returns NA when NA is provided instead of FALSE, unless value is set to TRUE
- txt_collapse now also works if provided a list of character vectors
- paste.data.frame now works as well if a data.table is passed instead of a data.frame
- txt_recode gains an extra argument na.rm
CRAN Release 0.8.4-1
CHANGES IN udpipe VERSION 0.8.4-1
- Fixing the Solaris compilation issue in ufal::udpipe::multiword_splitter::append_token
CRAN Release 0.8.4
CHANGES IN udpipe VERSION 0.8.4
- Update to UDPipe 1.2.1 (28 Sep 2018)
- this adds segment_size and learning_rate_final parameters to tokenizer training
- correctly set SpaceAfter for last token when normalizing spaces.
- Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.5 instead of the models build on Universal Dependencies 2.4
- Added txt_count
- Added txt_overlap
- Added dtm_conform
- Added dtm_chisq
- Added dtm_svd_similarity
- Added as_fasttext
- Added unlist_tokens
- txt_recode_ngram now also works gracefully in case ngram is set to 1 although the intention is not to use it when ngram is set to 1
- Experimental changes regarding cbind_dependencies which might change in a subsequent release.
- cbind_dependencies now has been implementend for type 'child'.
- cbind_dependencies now allows to add row numbers of the parent or children where the token is linked to using the dependency parsing output.
- Experimental and unfinished work on allowing to easily query dependency relations
CRAN Release 0.8.3
CHANGES IN udpipe VERSION 0.8.3
- Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.4 instead of the models build on Universal Dependencies 2.3
- also allow strsplit.data.frame to work if the data argument is a data.table
- in case the model loaded with udpipe_load_model is a nil pointer (most likely due to users which restarted their R sessions without knowing), try reloading the model file in udpipe_annotate
- fix issue in udpipe_reconstruct giving wrong values in start/end positions of the token in case someone had as well SpacesBefore as SpacesAfter for a token. For users prior to version 0.8.3 you can easily circumvent this issue by removing leading/trailing white space in your text by using trimws on your text before using udpipe::udpipe.
- document_term_matrix now gains argument weight allowing to select another column to put into the matrix cells
- add txt_contains
CRAN Release 0.8.2
CHANGES IN udpipe VERSION 0.8.2
- udpipe::udpipe now gains 2 arguments: parallel.cores and parallel.chunksize in order to annotate in parallel over your CPU cores.
- document_term_matrix.data.frame now preserves order of the documents (issue #44)
- dtm_remove_lowfreq, dtm_remove_tfidf, dtm_remove_terms gain extra argument remove_emptydocs
explicitely add drop=FALSE to internal dtm_... calls - add dtm_remove_sparseterms (issue #44)
- make sure downloading model fails gracefully if github internet resource is not available on CRAN machines
- udpipe_download_model now also returns download_failed/download_message indicating if the download failed due to internet connectivity issues