Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/RasaHQ/rasa into replace-…
Browse files Browse the repository at this point in the history
…os.path-pathlib
  • Loading branch information
RomuloSouza committed Nov 10, 2020
2 parents 5370197 + df7a5b9 commit 4fdd821
Show file tree
Hide file tree
Showing 31 changed files with 2,577 additions and 1,037 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/security-scans.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
name: Security Scans

on: [push, pull_request]
on:
pull_request:
types: [opened, synchronize, labeled]

jobs:
cleanup_runs:
Expand Down
5 changes: 5 additions & 0 deletions changelog/6285.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Predictions of the [`FallbackClassifier`](components.mdx#fallbackclassifier) are
ignored when
[evaluating the NLU model](testing-your-assistant.mdx#evaluating-an-nlu-model)
Note that the `FallbackClassifier` predictions still apply to
[test stories](testing-your-assistant.mdx#writing-test-stories).
1 change: 1 addition & 0 deletions changelog/6973.bugfix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Ignore rules when validating stories
1 change: 1 addition & 0 deletions changelog/6973.doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Correct data validation docs
6 changes: 6 additions & 0 deletions changelog/7027.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Remove dependency between `ConveRTTokenizer` and `ConveRTFeaturizer`. The `ConveRTTokenizer` is now deprecated, and the
`ConveRTFeaturizer` can be used with any other `Tokenizer`.

Remove dependency between `HFTransformersNLP`, `LanguageModelTokenizer`, and `LanguageModelFeaturizer`. Both
`HFTransformersNLP` and `LanguageModelTokenizer` are now deprecated. `LanguageModelFeaturizer` implements the behavior
of the stack and can be used with any other `Tokenizer`.
2 changes: 1 addition & 1 deletion changelog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Each file should be named like `<ISSUE>.<TYPE>.md`, where
* `feature`: new user facing features, like new command-line options and new behavior.
* `improvement`: improvement of existing functionality, usually without requiring user intervention.
* `bugfix`: fixes a reported bug.
* `doc`: documentation improvement, like rewording an entire session or adding missing docs.
* `doc`: documentation improvement, like rewording an entire section or adding missing docs.
* `removal`: feature deprecation or feature removal.
* `misc`: fixing a small typo or internal change, will not be included in the changelog.

Expand Down
23 changes: 23 additions & 0 deletions data/test_stories/rules_without_stories_and_wrong_names.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
>> rule 1
- form{"name": "loop_q_form"} <!-- condition that form is active-->
- slot{"requested_slot": "some_slot"} <!-- some condition -->
- ...
* some_intent_that_doesnt_exist{"some_slot":"bla"} <!-- can be ANY -->
- loop_q_form <!-- can be internal core action, can be anything -->

>> rule 2
- form{"name": "loop_q_form"} <!-- condition that form is active-->
- slot{"requested_slot": "some_slot"} <!-- some condition -->
- ...
* explain <!-- can be anything -->
- utter_some_action_that_doesnt_exist
- loop_q_form
- form{"name": "loop_q_form"} <!-- condition that form is active-->

>> rule 3
- form{"name": "loop_q_form"} <!-- condition that form is active-->
- ...
- loop_q_form <!-- condition that form is active -->
- form{"name": null}
- slot{"requested_slot": null}
- action_stop_q_form
9 changes: 9 additions & 0 deletions data/test_stories/stories_with_rules_conflicting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
>> rule 1
* greet
- utter_noworries

## ML story 1
* greet
- utter_greet
* thankyou
- utter_noworries
27 changes: 15 additions & 12 deletions docs/docs/command-line-interface.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -313,31 +313,34 @@ rasa data convert nlg --help

## rasa data validate

You can check your domain, NLU data, or conversation data for mistakes and inconsistencies.
You can check your domain, NLU data, or story data for mistakes and inconsistencies.
To validate your data, run this command:

```bash
rasa data validate
```

By default, the validator searches only for errors in the data, e.g. the same training
example being listed as an example for two intents.
To catch minor issues that don't prevent training a model but might indicate messy data
(e.g. unused intents), use the `--fail-on-warnings` flag.
The validator searches for errors in the data, e.g. two intents that have some
identical training examples.
The validator also checks if you have any stories where different assistant actions follow from the same
dialogue history. Conflicts between stories will prevent a model from learning the correct
pattern for a dialogue.

You can also validate the story structure by running this command:
If you pass a `max_history` value to one or more policies in your `config.yml` file, provide the
smallest of those values in the validator command using the `--max-history <max_history>` flag.

You can also validate only the story structure by running this command:

```bash
rasa data validate stories
```

This validator checks if you have any stories where different assistant actions follow from the same
dialogue history. Conflicts between stories will prevent a model from learning the correct
pattern for a dialogue.
:::note
Running `rasa data validate` does **not** test if your [rules](./rules.mdx) are consistent with your stories.
However, during training, the `RulePolicy` checks for conflicts between rules and stories. Any such conflict will abort training.
:::

If you have a [Memoization Policy](./policies.mdx#memoization-policy) in your
`config.yml` file, run the validator with the `--max-history` argument and provide the `max_history`
value set in `config.yml`. If you didn't set `max_history` in the config file, provide the default value of `5`.
To interrupt validation even for minor issues such as unused intents or responses, use the `--fail-on-warnings` flag.

:::caution check your story names
The `rasa data validate stories` command assumes that all your story names are unique!
Expand Down
100 changes: 73 additions & 27 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,10 @@ word vectors in your pipeline.

### HFTransformersNLP

:::caution Deprecated
The `HFTransformersNLP` is deprecated and will be removed in a future release. The [LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer)
now implements its behavior.
:::

* **Short**

Expand Down Expand Up @@ -406,6 +410,10 @@ word vectors in your pipeline.

### ConveRTTokenizer

:::caution Deprecated
The `ConveRTTokenizer` is deprecated and will be removed in a future release. The [ConveRTFeaturizer](./components.mdx#convertfeaturizer)
now implements its behavior. Any [tokenizer](./components.mdx#tokenizers) can be used in its place.
:::

* **Short**

Expand Down Expand Up @@ -466,42 +474,46 @@ word vectors in your pipeline.

### LanguageModelTokenizer

:::caution Deprecated
The `LanguageModelTokenizer` is deprecated and will be removed in a future release. The [LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer)
now implements its behavior. Any [tokenizer](./components.mdx#tokenizers) can be used in its place.
:::

* **Short**
* **Short**

Tokenizer from pre-trained language models
Tokenizer from pre-trained language models



* **Outputs**
* **Outputs**

`tokens` for user messages, responses (if present), and intents (if specified)
`tokens` for user messages, responses (if present), and intents (if specified)



* **Requires**
* **Requires**

[HFTransformersNLP](./components.mdx#hftransformersnlp)
[HFTransformersNLP](./components.mdx#hftransformersnlp)



* **Description**
* **Description**

Creates tokens using the pre-trained language model specified in upstream [HFTransformersNLP](./components.mdx#hftransformersnlp) component.
Must be used whenever the [LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer) is used.
Creates tokens using the pre-trained language model specified in upstream [HFTransformersNLP](./components.mdx#hftransformersnlp) component.
Must be used whenever the [LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer) is used.



* **Configuration**
* **Configuration**

```yaml-rasa
pipeline:
- name: "LanguageModelTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
```
```yaml-rasa
pipeline:
- name: "LanguageModelTokenizer"
# Flag to check whether to split intents
"intent_tokenization_flag": False
# Symbol on which intent should be split
"intent_split_symbol": "_"
```


## Featurizers
Expand Down Expand Up @@ -644,7 +656,7 @@ Note: The `feature-dimension` for sequence and sentence features does not have t

* **Requires**

[ConveRTTokenizer](./components.mdx#converttokenizer)
`tokens`



Expand All @@ -667,7 +679,7 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
:::

:::note
To use `ConveRTTokenizer`, install Rasa Open Source with `pip3 install rasa[convert]`.
To use `ConveRTFeaturizer`, install Rasa Open Source with `pip3 install rasa[convert]`.

:::

Expand Down Expand Up @@ -698,7 +710,7 @@ Note: The `feature-dimension` for sequence and sentence features does not have t

* **Requires**

[HFTransformersNLP](./components.mdx#hftransformersnlp) and [LanguageModelTokenizer](./components.mdx#languagemodeltokenizer)
`tokens`.



Expand All @@ -711,8 +723,7 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
* **Description**

Creates features for entity extraction, intent classification, and response selection.
Uses the pre-trained language model specified in upstream [HFTransformersNLP](./components.mdx#hftransformersnlp) component to compute vector
representations of input text.
Uses a pre-trained language model to compute vector representations of input text.

:::note
Please make sure that you use a language model which is pre-trained on the same language corpus as that of your
Expand All @@ -724,14 +735,49 @@ Note: The `feature-dimension` for sequence and sentence features does not have t

* **Configuration**

Include [HFTransformersNLP](./components.mdx#hftransformersnlp) and [LanguageModelTokenizer](./components.mdx#languagemodeltokenizer) components before this component. Use
[LanguageModelTokenizer](./components.mdx#languagemodeltokenizer) to ensure tokens are correctly set for all components throughout the pipeline.
Include a [Tokenizer](./components.mdx#tokenizers) component before this component.

You should specify what language model to load via the parameter `model_name`. See the below table for the
available language models.
Additionally, you can also specify the architecture variation of the chosen language model by specifying the
parameter `model_weights`.
The full list of supported architectures can be found in the
[HuggingFace documentation](https://huggingface.co/transformers/pretrained_models.html).
If left empty, it uses the default model architecture that original Transformers library loads (see table below).

```
+----------------+--------------+-------------------------+
| Language Model | Parameter | Default value for |
| | "model_name" | "model_weights" |
+----------------+--------------+-------------------------+
| BERT | bert | rasa/LaBSE |
+----------------+--------------+-------------------------+
| GPT | gpt | openai-gpt |
+----------------+--------------+-------------------------+
| GPT-2 | gpt2 | gpt2 |
+----------------+--------------+-------------------------+
| XLNet | xlnet | xlnet-base-cased |
+----------------+--------------+-------------------------+
| DistilBERT | distilbert | distilbert-base-uncased |
+----------------+--------------+-------------------------+
| RoBERTa | roberta | roberta-base |
+----------------+--------------+-------------------------+
```

The following configuration loads the language model BERT:

```yaml-rasa
pipeline:
- name: "LanguageModelFeaturizer"
```
- name: LanguageModelFeaturizer
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
# An optional path to a specific directory to download and cache the pre-trained model weights.
# The `default` cache_dir is the same as https://huggingface.co/transformers/serialization.html#cache-directory .
cache_dir: null
```

### RegexFeaturizer

Expand Down
28 changes: 28 additions & 0 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,34 @@ description: |
This page contains information about changes between major versions and
how you can migrate from one version to another.

## Rasa 2.0 to Rasa 2.1

### Deprecations

`ConveRTTokenizer` is now deprecated. [ConveRTFeaturizer](./components.mdx#convertfeaturizer) now implements
its behaviour. To migrate, replace `ConveRTTokenizer` with any other tokenizer, for e.g.:

```yaml
pipeline:
- name: WhitespaceTokenizer
- name: ConveRTFeaturizer
model_url: <Remote/Local path to model files>
...
```

`HFTransformersNLP` and `LanguageModelTokenizer` components are now deprecated.
[LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer) now implements their behaviour.
To migrate, replace both the above components with any tokenizer and specify the model architecture and model weights
as part of `LanguageModelFeaturizer`, for e.g.:

```yaml
pipeline:
- name: WhitespaceTokenizer
- name: LanguageModelFeaturizer
model_name: "bert"
model_weights: "rasa/LaBSE"
...
```

## Rasa 1.10 to Rasa 2.0

Expand Down
25 changes: 18 additions & 7 deletions docs/docs/setting-up-ci-cd.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,20 +38,29 @@ you can make a test run only if the pull request has a certain label (e.g. “NL

### Validating Data and Stories

Data validation verifies that there are no mistakes or major inconsistencies in your domain, NLU
data, or conversation data. To validate your data, have your CI run this command:
Data validation verifies that no mistakes or major inconsistencies appear in your domain, NLU
data, or story data. To validate your data, have your CI run this command:

```bash
rasa data validate --fail-on-warnings --max-history <max_history>
rasa data validate
```

If you pass a `max_history` value to a Memoization policy in your `config.yml` file, provide the
same value in the above validator command. Otherwise, provide the default value of `5`.
If you pass a `max_history` value to one or more policies in your `config.yml` file, provide the
smallest of those values as

If data validation results in errors, training a model will also fail, so it's
```bash
rasa data validate --max-history <max_history>
```

If data validation results in errors, training a model can also fail or yield bad performance, so it's
always good to run this check before training a model. By including the
`--fail-on-warnings` flag, this step will fail on warnings indicating more minor issues.

:::note
Running `rasa data validate` does **not** test if your [rules](./rules.mdx) are consistent with your stories.
However, during training, the `RulePolicy` checks for conflicts between rules and stories. Any such conflict will abort training.
:::

To read more about the validator and all of the available options, see [the documentation for
`rasa data validate`](./command-line-interface.mdx#rasa-data-validate).

Expand Down Expand Up @@ -95,8 +104,10 @@ as you make improvements to your assistant. A good rule of thumb to follow is th
to be representative of the true distribution of real conversations.
Rasa X makes it easy to [add test conversations based on real conversations](https://rasa.com/docs/rasa-x/user-guide/test-assistant/#how-to-create-tests).

Note: Running test stories does **not** execute your action code. You will need to
:::note
Running test stories does **not** execute your action code. You will need to
[test your action code](./setting-up-ci-cd.mdx#testing-action-code) in a separate step.
:::

### Comparing NLU Performance

Expand Down
Loading

0 comments on commit 4fdd821

Please sign in to comment.