Skip to content

Releases: tidymodels/textrecipes

textrecipes 1.0.6

15 Nov 17:29
Compare
Choose a tag to compare
  • textfeatures has been removed from Suggests. (#255)

  • step_textfeatures() no longer returns a politeness feature. (#254)

textrecipes 1.0.5

20 Oct 22:15
Compare
Choose a tag to compare
  • step_untokenize() and step_normalization() now returns factors instead of strings. (#247)

textrecipes 1.0.4

17 Aug 21:25
Compare
Choose a tag to compare

Improvements

  • step_clean_names() now throw an informative error if needed non-standard role columns are missing during bake(). (#235)

  • The keep_original_cols argument has been added to step_tokenmerge. This change should mean that every step that produces new columns has the keep_original_cols argument. (#242)

  • Many internal changes to improve consistency and slight speed increases.

Bug Fixes

  • Fixed bug where step_dummy_hash() and step_texthash() would add new columns before old columns. (#235)

  • Fixed bug where vocabulary_size wasn't tunable in step_tokenize_bpe(). (#239)

textrecipes 1.0.3

14 Apr 23:06
Compare
Choose a tag to compare

Improvements

  • Steps with tunable arguments now have those arguments listed in the documentation.

  • All steps that add new columns will now informatively error if name collision occurs.

Bug Fixes

  • Fixed bug where step_tf() wasn't tunable for weight argument.

textrecipes 1.0.2

21 Dec 17:51
Compare
Choose a tag to compare
  • Setting token = "tweets" in step_tokenize() have been deprecated due to tokenizers::tokenize_tweets() being deprecated. (#209)

  • step_sequence_onehot(), step_dummy_hash(), step_dummy_texthash() now return integers. step_tf() returns integer when weight_scheme is "binary" or "raw count".

  • All steps now have required_pkgs() methods.

textrecipes 1.0.1

06 Oct 03:14
Compare
Choose a tag to compare
  • Examples no longer include if (require(...)) code.

textrecipes 1.0.0

02 Jul 17:46
Compare
Choose a tag to compare
  • Indicate which steps support case weights (none), to align documentation with other packages.

textrecipes 0.5.2

04 May 16:49
Compare
Choose a tag to compare
  • Remove use of okc_text in vignette

  • Fix bug in printing of tokenlists

textrecipes 0.5.1

29 Mar 22:54
Compare
Choose a tag to compare
  • step_tfidf() now correctly saves the idf values and applies them to the testing data set.

  • tidy.step_tfidf() now returns calculated IDF weights.

textrecipes 0.5.0

20 Mar 22:45
Compare
Choose a tag to compare

New steps

  • step_dummy_hash() generates binary indicators (possibly signed) from simple factor or character vectors.

  • step_tokenize() has gotten a couple of cousin functions step_tokenize_bpe(), step_tokenize_sentencepiece() and step_tokenize_wordpiece() which wraps {tokenizers.bpe}, {sentencepiece} and {wordpiece} respectively (#147).

Improvements and Other Changes

  • Added all_tokenized() and all_tokenized_predictors() to more easily select tokenized columns (#132).

  • Use show_tokens() to more easily debug a recipe involving tokenization.

  • Reorganize documentation for all recipe step tidy methods (#126).

  • Steps now have a dedicated subsection detailing what happens when tidy() is applied. (#163)

  • All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect (#141).

  • step_ngram() has been given a speed increase to put it in line with other packages performance.

  • step_tokenize() will now try to error if vocabulary size is too low when using engine = "tokenizers.bpe" (#119).

  • Warning given by step_tokenfilter() when filtering failed to apply now correctly refers to the right argument name (#137).

  • step_tf() now returns 0 instead of NaN when there aren't any tokens present (#118).

  • step_tokenfilter() now has a new argument filter_fun will takes a function which can be used to filter tokens. (#164)

  • tidy.step_stem() now correctly shows if custom stemmer was used.

  • Added keep_original_cols argument to step_lda, step_texthash(), step_tf(), step_tfidf(), step_word_embeddings(), step_dummy_hash(), step_sequence_onehot(), and step_textfeatures() (#139).

Breaking Changes

  • Steps with prefix argument now creates names according to the pattern prefix_variablename_name/number. (#124)