2.0.0
Deprecations and Removals
-
#5757: Removed previously deprecated packages
rasa_nlu
andrasa_core
.Use imports from
rasa.core
andrasa.nlu
instead. -
#5758: Removed previously deprecated classes:
- event brokers (
EventChannel
andFileProducer
,KafkaProducer
,
PikaProducer
,SQLProducer
) - intent classifier
EmbeddingIntentClassifier
- policy
KerasPolicy
Removed previously deprecated methods:
Agent.handle_channels
TrackerStore.create_tracker_store
Removed support for pipeline templates in
config.yml
Removed deprecated training data keys
entity_examples
andintent_examples
from
json training data format. - event brokers (
-
#5834: Removed
restaurantbot
example as it was confusing and not a great way to build a bot. -
#6296:
LabelTokenizerSingleStateFeaturizer
is deprecated. To replicateLabelTokenizerSingleStateFeaturizer
functionality,
add aTokenizer
withintent_tokenization_flag: True
andCountVectorsFeaturizer
to the NLU pipeline.
An example of elements to be added to the pipeline is shown in the improvement changelog 6296`.BinarySingleStateFeaturizer
is deprecated and will be removed in the future. We recommend to switch toSingleStateFeaturizer
. -
#6354: Specifying the parameters
force
andsave_to_default_model_directory
as part of the
JSON payload when training a model usingPOST /model/train
is now deprecated.
Please use the query parametersforce_training
andsave_to_default_model_directory
instead. See the API documentation for more information. -
#6409: The conversation event
form
was renamed toactive_loop
. Rasa Open Source
will continue to be able to read and process oldform
events. Note that
serialized trackers will no longer have theactive_form
field. Instead the
active_loop
field will contain the same information. Story representations
in Markdown and YAML will useactive_loop
instead ofform
to represent the
event. -
#6453: Removed support for
queue
argument inPikaEventBroker
(usequeues
instead).Domain file:
- Removed support for
templates
key (useresponses
instead). - Removed support for string
responses
(use dictionaries instead).
NLU
Component
:- Removed support for
provides
attribute, it's not needed anymore. - Removed support for
requires
attribute (userequired_components()
instead).
Removed
_guess_format()
utils method fromrasa.nlu.training_data.loading
(useguess_format
instead).Removed several config options for TED Policy, DIETClassifier and ResponseSelector:
hidden_layers_sizes_pre_dial
hidden_layers_sizes_bot
droprate
droprate_a
droprate_b
hidden_layers_sizes_a
hidden_layers_sizes_b
num_transformer_layers
num_heads
dense_dim
embed_dim
num_neg
mu_pos
mu_neg
use_max_sim_neg
C2
C_emb
evaluate_every_num_epochs
evaluate_on_num_examples
Please check the documentation for more information.
- Removed support for
-
#6463: The conversation event
form_validation
was renamed toloop_interrupted
.
Rasa Open Source will continue to be able to read and process oldform_validation
events. -
#6658:
SklearnPolicy
was deprecated.TEDPolicy
is the preferred machine-learning policy for dialogue models. -
#6809: Slots of type
unfeaturized
are
now deprecated and will be removed in Rasa Open Source 3.0. Instead you should use
the propertyinfluence_conversation: false
for every slot type as described in the
migration guide. -
#6934: Conversation sessions are now enabled by default
if your Domain does not contain a session configuration.
Previously a missing session configuration was treated as if conversation sessions
were disabled. You can explicitly disable conversation sessions using the following
snippet:session_config: # A session expiration time of `0` # disables conversation sessions session_expiration_time: 0
-
#6952: Using the default action
action_deactivate_form
to deactivate
the currently active loop / Form is deprecated.
Please useaction_deactivate_loop
instead.
Features
-
#4745: Added template name to the metadata of bot utterance events.
BotUttered
event contains atemplate_name
property in its metadata for any
new bot message. -
#5086: Added a
--num-threads
CLI argument that can be passed torasa train
and will be used to train NLU components. -
#5510: You can now define what kind of features should be used by what component
(see Choosing a Pipeline).You can set an alias via the option
alias
for every featurizer in your pipeline.
Thealias
can be anything, by default it is set to the full featurizer class name.
You can then specify, for example, on the
DIETClassifier what features from which
featurizers should be used.
If you don't set the optionfeaturizers
all available features will be used.
This is also the default behavior.
Check components to see what components have the option
featurizers
available.Here is an example pipeline that shows the new option.
We define an alias for all featurizers in the pipeline.
All features will be used in theDIETClassifier
.
However, theResponseSelector
only takes the features from the
ConveRTFeaturizer
and theCountVectorsFeaturizer
(word level).pipeline: - name: ConveRTTokenizer - name: ConveRTFeaturizer alias: "convert" - name: CountVectorsFeaturizer alias: "cvf_word" - name: CountVectorsFeaturizer alias: "cvf_char" analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: RegexFeaturizer alias: "regex" - name: LexicalSyntacticFeaturizer alias: "lsf" - name: DIETClassifier: - name: ResponseSelector epochs: 50 featurizers: ["convert", "cvf_word"] - name: EntitySynonymMapper
:::caution
This change is model-breaking. Please retrain your models.:::
-
#5837: Added
--port
commandline argument to the interactive learning mode to allow
changing the port for the Rasa server running in the background. -
#5957: Add new entity extractor
RegexEntityExtractor
. The entity extractor extracts entities using the lookup tables
and regexes defined in the training data. For more information see RegexEntityExtractor. -
#5996: Introduced a new
YAML
format for Core training data and implemented a parser
for it. Rasa Open Source can now read stories in bothMarkdown
andYAML
format. -
#6020: You can now enable threaded message responses from Rasa through the Slack connector.
This option is enabled using an optional configuration in the credentials.yml fileslack: slack_token: slack_channel: use_threads: True
Button support has also been added in the Slack connector.
-
#6066: The NLU
interpreter
is now passed to the Policies during training and
inference time. Note that this requires an additional parameterinterpreter
in the
methodpredict_action_probabilities
of thePolicy
interface. In case a
customPolicy
implementation doesn't provide this parameter Rasa Open Source
will print a warning and omit passing theinterpreter
. -
#6088: Added the new dialogue policy RulePolicy which will replace the old “rule-like”
policies Mapping Policy,
Fallback Policy,
Two-Stage Fallback Policy, and
Form Policy. These policies are now
deprecated and will be removed in the future. Please see the
rules documentation for more information.Added new NLU component FallbackClassifier
which predicts an intentnlu_fallback
in case the confidence was below a given
threshold. The intentnlu_fallback
may
then be used to write stories / rules to handle the fallback in case of low NLU
confidence.pipeline: - # Other NLU components ... - name: FallbackClassifier # If the highest ranked intent has a confidence lower than the threshold then # the NLU pipeline predicts an intent `nlu_fallback` which you can then be used in # stories / rules to implement an appropriate fallback. threshold: 0.5
-
#6132: Added possibility to split the domain into separate files. All YAML files
under the path specified with--domain
will be scanned for domain
information (e.g. intents, actions, etc) and then combined into a single domain.The default value for
--domain
is stilldomain.yml
. -
#6275: Add optional metadata argument to
NaturalLanguageInterpreter
's parse method. -
#6354: The Rasa Open Source API endpoint
POST /model/train
now supports training data in YAML
format. Please specify the headerContent-Type: application/yaml
when
training a model using YAML training data.
See the API documentation for more information. -
#6374: Added a YAML schema and a writer for 2.0 Training Core data.
-
#6404: Users can now use the
rasa data convert {nlu|core} -f yaml
command to convert training data from Markdown format to YAML format. -
#6536: Add option
use_lemma
toCountVectorsFeaturizer
. By default it is set toTrue
.use_lemma
indicates whether the featurizer should use the lemma of a word for counting (if available) or not.
If this option is set toFalse
it will use the word as it is.
Improvements
-
#4536: Add support for Python 3.8.
-
#5368: Changed the project structure for Rasa projects initialized with the
CLI (using therasa init
command):
actions.py
->actions/actions.py
.actions
is now a Python package (it contains
a fileactions/__init__.py
). In addition, the__init__.py
at the
root of the project has been removed. -
#5481:
DIETClassifier
now also assigns a confidence value to entity predictions. -
#5637: Added behavior to the
rasa --version
command. It will now also list information
about the operating system, python version andrasa-sdk
. This will make it easier
for users to file bug reports. -
#5743: Support for additional training metadata.
Training data messages now to support kwargs and the Rasa JSON data reader
includes all fields when instantiating a training data instance. -
#5748: Standardize testing output. The following test output can be produced for intents,
responses, entities and stories:- report: a detailed report with testing metrics per label (e.g. precision,
recall, accuracy, etc.) - errors: a file that contains incorrect predictions
- successes: a file that contains correct predictions
- confusion matrix: plot of confusion matrix
- histogram: plot of confidence distribution (not available for stories)
- report: a detailed report with testing metrics per label (e.g. precision,
-
#5756: To avoid the problem of our entity extractors predicting entity labels for
just a part of the words, we introduced a cleaning method after the prediction
was done. We should avoid the incorrect prediction in the first place.
To achieve this we will not tokenize words into sub-words anymore.
We take the mean feature vectors of the sub-words as the feature vector of the word.:::caution
This change is model breaking. Please, retrain your models.:::
-
#5759: Move option
case_sensitive
from the tokenizers to the featurizers.- Remove the option from the
WhitespaceTokenizer
andConveRTTokenizer
. - Add option
case_sensitive
to theRegexFeaturizer
.
- Remove the option from the
-
#5766: If a user sends a voice message to the bot using Facebook, users messages was set to the attachments URL. The same is now also done for the rest of attachment types (image, video, and file).
-
#5794: Creating a
Domain
usingDomain.fromDict
can no longer alter the input dictionary.
Previously, there could be problems when the input dictionary was re-used for other
things after creating theDomain
from it. -
#5805: The debug-level logs when instantiating an
SQLTrackerStore
no longer show the password in plain text. Now, the URL is displayed with the password
hidden, e.g.postgresql://username:***@localhost:5432
. -
#5855: Shorten the information in tqdm during training ML algorithms based on the log
level. If you train your model in debug mode, all available metrics will be
shown during training, otherwise, the information is shorten. -
#5913: Ignore conversation test directory
tests/
when importing a project
usingMultiProjectImporter
anduse_e2e
isFalse
.
Previously, any story data found in a project subdirectory would be imported
as training data. -
#5985: Implemented model checkpointing for DIET (including the response selector) and TED. The best model during training will be stored instead of just the last model. The model is evaluated on the basis of
evaluate_every_number_of_epochs
andevaluate_on_number_of_examples
.Checkpointing is enabled iff the following is set for the models in the
config.yml
file:checkpoint_model: True
evaluate_on_number_of_examples > 0
The model is stored to whatever location has been specified with the
--out
parameter when callingrasa train nlu/core ...
. -
#6024:
rasa data split nlu
now makes sure that there is at least one example per
intent and response in the test data. -
#6039: The method
ensure_consistent_bilou_tagging
now also considers the confidence values of the predicted tags
when updating the BILOU tags. -
#6045: We updated the way how we save and use features in our NLU pipeline.
The message object now has a dedicated field, called
features
, to store the
features that are generated in the NLU pipeline. We adapted all our featurizers in a
way that sequence and sentence features are stored independently. This allows us to
keep different kind of features for the sequence and the sentence. For example, the
LexicalSyntacticFeaturizer
does not produce any sentence features anymore as our
experiments showed that those did not bring any performance gain just quite a lot of
additional values to store.We also modified the DIET architecture to process the sequence and sentence
features independently at first. The features are concatenated just before
the transformer.We also removed the
__CLS__
token again. Our Tokenizers will not
add this token anymore.:::caution
This change is model-breaking. Please retrain your models.:::
-
#6052: Add endpoint kwarg to
rasa.jupyter.chat
to enable using a custom action server while chatting with a model in a jupyter notebook. -
#6055: Support for rasa conversation id with special characters on the server side - necessary for some channels (e.g. Viber)
-
#6134: Log the number of examples per intent during training. Logging can be enabled using
rasa train --debug
. -
#6237: Support for other remote storages can be achieved by using an external library.
-
#6273: Add
output_channel
query param to/conversations/<conversation_id>/tracker/events
route, along with booleanexecute_side_effects
to optionally schedule/cancel reminders, and forward bot messages to output channel. -
#6276: Allow Rasa to boot when model loading exception occurs. Forward HTTP Error responses to standard log output.
-
#6294: Rename
DucklingHTTPExtractor
toDucklingEntityExtractor
. -
#6296: * Modified functionality of
SingleStateFeaturizer
.SingleStateFeaturizer
uses trained NLUInterpreter
to featurize intents and action names.
This modifiedSingleStateFeaturizer
can replicateLabelTokenizerSingleStateFeaturizer
functionality.
This component is deprecated from now on.
To replicateLabelTokenizerSingleStateFeaturizer
functionality,
add aTokenizer
withintent_tokenization_flag: True
andCountVectorsFeaturizer
to the NLU pipeline.
Please update your configuration file.For example:
yaml language: en pipeline: - name: WhitespaceTokenizer intent_tokenization_flag: True - name: CountVectorsFeaturizer
Please train both NLU and Core (using
rasa train
) to use a trained tokenizer and featurizer for core featurization.The new
SingleStateFeaturizer
stores slots, entities and forms in sparse features for more lightweight storage.BinarySingleStateFeaturizer
is deprecated and will be removed in the future.
We recommend to switch toSingleStateFeaturizer
.-
Modified
TEDPolicy
to handle sparse features. As a result,TEDPolicy
may require more epochs than before to converge. -
Default TEDPolicy featurizer changed to
MaxHistoryTrackerFeaturizer
with infinite max history (takes all dialogue turns into account). -
Default batch size for TED increased from [8,32] to [64, 256]
-
-
#6323: Response selector templates now support all features that
domain utterances do. They use the yaml format instead of markdown now.
This means you can now use buttons, images, ... in your FAQ or chitchat responses
(assuming they are using the response selector).As a consequence, training data form in markdown has to have the file
suffix.md
from now on to allow proper file type detection- -
#6457: Support for test stories written in yaml format.
-
#6466: Response Selectors are now trained on retrieval intent labels by default instead of the actual response text. For most models, this should improve training time and accuracy of the
ResponseSelector
.If you want to revert to the pre-2.0 default behavior, add the
use_text_as_label=true
parameter to yourResponseSelector
component.You can now also have multiple response templates for a single sub-intent of a retrieval intent. The first response template
containing the text attribute is picked for training(ifuse_text_as_label=True
) and a random template is picked for bot's utterance just as how otherutter_
templates are picked.All response selector related evaluation artifacts -
report.json, successes.json, errors.json, confusion_matrix.png
now use the sub-intent of the retrieval intent as the target and predicted labels instead of the actual response text.The output schema of
ResponseSelector
has changed -full_retrieval_intent
andname
have been deprecated in favour
ofintent_response_key
andresponse_templates
respectively. Additionally a keyall_retrieval_intents
is added to the response selector output which will hold a list of all retrieval intents(faq,chitchat, etc.)
that are present in the training data.An example output looks like this -"response_selector": { "all_retrieval_intents": ["faq"], "default": { "response": { "id": 1388783286124361986, "confidence": 1.0, "intent_response_key": "faq/is_legit", "response_templates": [ { "text": "absolutely", "image": "https://i.imgur.com/nGF1K8f.jpg" }, { "text": "I think so." } ], }, "ranking": [ { "id": 1388783286124361986, "confidence": 1.0, "intent_response_key": "faq/is_legit" }, ]
An example bot demonstrating how to use the
ResponseSelector
is added to theexamples
folder. -
#6472: Do not modify conversation tracker's
latest_input_channel
property when usingPOST /trigger_intent
orReminderScheduled
. -
#6555: Do not set the output dimension of the
sparse-to-dense
layers to the same dimension as the dense features.Update default value of
dense_dimension
andconcat_dimension
fortext
inDIETClassifier
to 128. -
#6591: Retrieval actions with
respond_
prefix are now replaced with usual utterance actions withutter_
prefix.If you were using retrieval actions before, rename all of them to start with
utter_
prefix. For example,respond_chitchat
becomesutter_chitchat
.
Also, in order to keep the response templates more consistent, you should now add theutter_
prefix to all response templates defined for retrieval intents. For example, a response templatechitchat/ask_name
becomesutter_chitchat/ask_name
. Note that the NLU examples for this will still be underchitchat/ask_name
intent.
The exampleresponseselectorbot
should help clarify these changes further. -
#6613: Added telemetry reporting. Rasa uses telemetry to report anonymous usage information.
This information is essential to help improve Rasa Open Source for all users.
Reporting will be opt-out. More information can be found in our
telemetry documentation. -
#6757: Update
extract_other_slots
method insideFormAction
to fill a slot from an entity
with a different name if corresponding slot mapping offrom_entity
type is unique. -
#6809: Slots of any type can now be ignored during a conversation.
To do so, specify the propertyinfluence_conversation: false
for the slot.slot: a_slot: type: text influence_conversation: false
The property
influence_conversation
is set totrue
by default. See the
documentation for slots for more information.A new slot type
any
was added. Slots of this type can store
any value. Slots of typeany
are always ignored during conversations. -
#6856: Improved exception handling within Rasa Open Source.
All exceptions that are somewhat expected (e.g. errors in file formats like
configurations or training data) will share a common base class
RasaException
.::warning Backwards Incompatibility
Base class for the exception raised when an action can not be found has been changed
from aNameError
to aValueError
.
::Some other exceptions have also slightly changed:
- raise
YamlSyntaxException
instead of YAMLError (from ruamel) when
failing to load a yaml file with information about the line where loading failed - introduced
MissingDependencyException
as an exception raised if packages
need to be installed
- raise
-
#6900: Debug logs from
matplotlib
libraries are now hidden by default and are configurable with theLOG_LEVEL_LIBRARIES
environment variable. -
#6943: Update
KafkaEventBroker
to supportSASL_SSL
andPLAINTEXT
protocols.
Bugfixes
-
#3597: Fixed issue where temporary model directories were not removed after pulling from a model server.
If the model pulled from the server was invalid, this could lead to large amounts of local storage usage.
-
#5038: Fixed a bug in the
CountVectorsFeaturizer
which resulted in the very first
message after loading a model to be processed incorrectly due to the vocabulary
not being loaded yet. -
#5135: Fixed Rasa shell skipping button messages if buttons are attached to
a message previous to the latest. -
#5385: Stack level for
FutureWarning
updated to level 2. -
#5453: If custom utter message contains no value or integer value, then it fails
returning custom utter message. Fixed by converting the template to type string. -
#5617: Don't create TensorBoard log files during prediction.
-
#5638: Fixed DIET breaking with empty spaCy model.
-
#5737: Pinned the library version for the Azure
Cloud Storage to 2.1.0 since the
persistor is currently not compatible with later versions of the azure-storage-blob
library. -
#5755: Remove
clean_up_entities
from extractors that extract pre-defined entities.
Just keep the clean up method for entity extractors that extract custom entities. -
#5792: Fixed issue where the
DucklingHTTPExtractor
component would
not work if itsurl
contained a trailing slash. -
#5808: Changed to variable
CERT_URI
inhangouts.py
to a string type -
#5850: Slots will be correctly interpolated for
button
responses.Previously this resulted in no interpolation due to a bug.
-
#5905: Remove option
token_pattern
fromCountVectorsFeaturizer
.
Instead all tokenizers now have the optiontoken_pattern
.
If a regular expression is set, the tokenizer will apply the token pattern. -
#5921: Allow user to retry failed file exports in interactive training.
-
#5964: Fixed a bug when custom metadata passed with the utterance always restarted the session.
-
#5998:
WhitespaceTokenizer
does not remove vowel signs in Hindi anymore. -
#6042: Convert entity values coming from
DucklingHTTPExtractor
to string
during evaluation to avoid mismatches due to different types. -
#6053: Update
FeatureSignature
to store just the feature dimension instead of the
complete shape. This change fixes the usage of the optionshare_hidden_layers
in theDIETClassifier
. -
#6087: Unescape the
\n, \t, \r, \f, \b
tokens on reading nlu data from markdown files.On converting json files into markdown, the tokens mentioned above are espaced. These tokens need to be unescaped on loading the data from markdown to ensure that the data is treated in the same way.
-
#6120: Fix the way training data is generated in rasa test nlu when using the
-P
flag.
Each percentage of the training dataset used to be formed as a part of the last
sampled training dataset and not as a sample from the original training dataset. -
#6143: Prevent
WhitespaceTokenizer
from outputting empty list of tokens. -
#6198: Add
EntityExtractor
as a required component forEntitySynonymMapper
in a pipeline. -
#6222: Better handling of input sequences longer than the maximum sequence length that the
HFTransformersNLP
models can handle.During training, messages with longer sequence length should result in an error, whereas during inference they are
gracefully handled but a debug message is logged. Ideally, passing messages longer than the acceptable maximum sequence
lengths of each model should be avoided. -
#6231: When using the
DynamoTrackerStore
, if there are more than 100 DynamoDB tables, the tracker could attempt to re-create an existing table if that table was not among the first 100 listed by the dynamo API. -
#6282: Fixed a deprication warning that pops up due to changes in numpy
-
#6291: Update
rasabaster
to fix an issue with syntax highlighting on "Prototype an Assistant" page.Update default stories and rules on "Prototype an Assistant" page.
-
#6419: Fixed a bug in the
serialise
method of theEvaluationStore
class which resulted in a wrong end-to-end evaluation of the predicted entities. -
#6535: Forms with slot mappings defined in
domain.yml
must now be a
dictionary (with form names as keys). The previous syntax whereforms
was simply a
list of form names is still supported. -
#6577: Remove BILOU tag prefix from role and group labels when creating entities.
-
#6601: Fixed a bug in the featurization of the boolean slot type. Previously, to set a slot value to "true",
you had to set it to "1", which is in conflict with the documentation. In older versionstrue
(without quotes) was also possible, but now raised an error during yaml validation. -
#6603: Fixed a bug in rasa interactive. Now it exports the stories and nlu training data as yml file.
-
#6711: Fixed slots not being featurized before first user utterance.
Fixed AugmentedMemoizationPolicy to forget the first action on the first going back
-
#6741: Fixed the remote URL of ConveRT model as it was recently updated by its authors.
-
#6755: Treat the length of OOV token as 1 to fix token align issue when OOV occurred.
-
#6757: Fixed the bug when entity was extracted even
if it had a role or group but roles or groups were not expected. -
#6803: Fixed the bug that caused
supported_language_list
ofComponent
to not work correctly.To avoid confusion, only one of
supported_language_list
andnot_supported_language_list
can be set to notNone
now -
#6897: Fixed issue where responses including
text: ""
and nocustom
key would incorrectly fail domain validation. -
#6898: Fixed issue where extra keys other than
title
andpayload
inside ofbuttons
made a response fail domain validation. -
#6919: Do not filter training data in model.py but on component side.
-
#6929: Check if a model was provided when executing
rasa test core
.
If not, print a useful error message and stop. -
#6805: Transfer only response templates for retrieval intents from domain to NLU Training Data.
This avoids retraining the NLU model if one of the non retrieval intent response templates are edited.
Improved Documentation
- #4441: Added documentation on
ambiguity_threshold
parameter in Fallback Actions page. - #4605: Remove outdated whitespace tokenizer warning in Testing Your Assistant documentation.
- #5640: Updated Facebook Messenger channel docs with supported attachment information
- #5675: Update
rasa shell
documentation to explain how to recreate external
channel session behavior. - #5811: Event brokers documentation should say
url
instead ofhost
. - #5952: Update
rasa init
documentation to includetests/conversation_tests.md
in the resulting directory tree. - #6819: Update "Validating Form Input" section to include details about
howFormValidationAction
class makes it easier to validate form slots in custom actions and how to use it. - #6823: Update the examples in the API docs to use YAML instead of Markdown