Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load schema files for pykwalify to avoid global yaml usage #7970

Merged
merged 2 commits into from
Feb 17, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions changelog/7970.bugfix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Fixed a YAML validation error which happened when executing multiple validations
concurrently. This could e.g. happen when sending concurrent requests to server
endpoints which process YAML training data.
8 changes: 7 additions & 1 deletion rasa/shared/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,15 @@ def validate_yaml_schema(yaml_file_content: Text, schema_path: Text) -> None:
PACKAGE_NAME, SCHEMA_EXTENSIONS_FILE
)

# Load schema content using our YAML loader as `pykwalify` uses a global instance
# which can fail when used concurrently
schema_content = rasa.shared.utils.io.read_yaml_file(schema_file)
schema_utils_content = rasa.shared.utils.io.read_yaml_file(schema_utils_file)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we can convert all schema files to json as this would achieve the same effect and might be better re performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better: define the schemas as Python objects - this saves us reading things entirely. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is related to my insights in #7892 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to define schemas as Python objects (don't see any need to go with JSON in this case) — but we can also create an issue and do it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I'll add a changelog, create a new issue + merge 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema_content = dict(schema_content, **schema_utils_content)

c = Core(
source_data=source_data,
schema_files=[schema_file, schema_utils_file],
schema_data=schema_content,
extensions=[schema_extensions],
)

Expand Down
55 changes: 55 additions & 0 deletions tests/shared/utils/test_validation.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from threading import Thread

import pytest

from pep440_version_utils import Version
Expand All @@ -12,6 +14,7 @@
DOMAIN_SCHEMA_FILE,
LATEST_TRAINING_DATA_FORMAT_VERSION,
)
from rasa.shared.nlu.training_data.formats.rasa_yaml import NLU_SCHEMA_FILE
from rasa.shared.utils.validation import KEY_TRAINING_DATA_FORMAT_VERSION


Expand Down Expand Up @@ -181,3 +184,55 @@ async def test_invalid_training_data_format_version_warns():
for version in [invalid_version_1, invalid_version_2]:
with pytest.warns(UserWarning):
assert validation_utils.validate_training_data_format_version(version, "")


def test_concurrent_schema_validation():
successful_results = []

def validate() -> None:
payload = """
version: "2.0"
nlu:
- intent: greet
examples: |
- hey
- hello
- hi
- hello there
- good morning
- good evening
- moin
- hey there
- let's go
- hey dude
- goodmorning
- goodevening
- good afternoon
- intent: goodbye
examples: |
- good afternoon
- cu
- good by
- cee you later
- good night
- bye
- goodbye
- have a nice day
- see you around
- bye bye
- see you later
"""
rasa.shared.utils.validation.validate_yaml_schema(payload, NLU_SCHEMA_FILE)
successful_results.append(True)

threads = []
for i in range(10):
threads.append(Thread(target=validate))

for thread in threads:
thread.start()

for thread in threads:
thread.join()

assert len(successful_results) == len(threads)