load schema files for `pykwalify` to avoid global `yaml` usage #7970

wochinge · 2021-02-16T19:43:29Z

Proposed changes:

pykwalify uses a global ruamel.yaml instance. This instance sets object attributes while reading files (e.g. the YAML schema files itself!) . This can lead to problem when we have multiple concurrent yaml validations. Fix: Either load the yaml files for pykwalify (what I did in my PR) or provide the schema files in json

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

wochinge · 2021-02-16T19:44:04Z

rasa/shared/utils/validation.py

+    # Load schema content using our YAML loader as `pykwalify` uses a global instance
+    # which can fail when used concurrently
+    schema_content = rasa.shared.utils.io.read_yaml_file(schema_file)
+    schema_utils_content = rasa.shared.utils.io.read_yaml_file(schema_utils_file)


alternatively we can convert all schema files to json as this would achieve the same effect and might be better re performance

Or even better: define the schemas as Python objects - this saves us reading things entirely. What do you think?

(this is related to my insights in #7892 (comment))

I think it might make sense to define schemas as Python objects (don't see any need to go with JSON in this case) — but we can also create an issue and do it later.

True. I'll add a changelog, create a new issue + merge 👍🏻

wochinge · 2021-02-17T08:33:22Z

@alwx Not ready for review yet, I rather want your input before I start making bigger changes

joejuzl · 2021-02-17T09:31:01Z

I wonder if this is the best fix? Maybe it would be safer going forward to just stop using multithreading? We could easily run into more issues like this in the future unless we ensure the threadsafety of all the code we write, and all the libraries we use! We could use multiprocessing instead (should be a very easy change).

wochinge · 2021-02-17T10:12:26Z

@joejuzl Great suggestion! Unfortunately this is not as easy expected.

Pickle error for the local function run: That one was easy to fix
Pickle error for the decorared function as it's a local function as well (all part of create_app)🤯
Pickle error for the request object.

That's the code I used:

def _run_in_process(f, request: Request, *args: Any, **kwargs: Any) -> None:
    # This is a new thread, so we need to create and set a new event loop
    thread_loop = asyncio.new_event_loop()
    asyncio.set_event_loop(thread_loop)

    try:
        return thread_loop.run_until_complete(f(request, *args, **kwargs))
    finally:
        thread_loop.close()


def run_in_thread(f: Callable[..., Coroutine]) -> Callable:
    """Decorator which runs request on a separate thread.

    Some requests (e.g. training or cross-validation) are computional intense requests.
    This means that they will block the event loop and hence the processing of other
    requests. This decorator can be used to process these requests on a separate thread
    to avoid blocking the processing of incoming requests.

    Args:
        f: The request handler function which should be decorated.

    Returns:
        The decorated function.
    """

    @wraps(f)
    async def decorated_function(
        request: Request, *args: Any, **kwargs: Any
    ) -> HTTPResponse:
        with concurrent.futures.ProcessPoolExecutor() as pool:
            return await request.app.loop.run_in_executor(
                pool, functools.partial(_run_in_process, f, request, *args, **kwargs)
            )

    return decorated_function

Do you have an idea? Otherwise I'd suggest to go with the current solution for now. We can still iterate on the solution in case we see threading errors elsewhere

joejuzl · 2021-02-17T10:30:56Z

@wochinge ahaa - that's annoying 🤦
Then yeah, let's fix it like this and create an issue to followup.

wochinge · 2021-02-17T10:40:26Z

it was a very nice bug 😄 And now I have more ideas / opinions about our YAML validation.

Follow up issue #7980

load schema files for pykwalify to avoid global yaml usage

b1fc73f

wochinge commented Feb 16, 2021

View reviewed changes

wochinge requested a review from alwx February 17, 2021 08:32

alwx approved these changes Feb 17, 2021

View reviewed changes

add changelog

b7175d3

wochinge marked this pull request as ready for review February 17, 2021 08:40

wochinge requested a review from a team February 17, 2021 08:40

wochinge added the status:ready-to-merge label Feb 17, 2021

wochinge mentioned this pull request Feb 17, 2021

Specify YAML schemas as Python objects #7977

Closed

2 tasks

wochinge removed the status:ready-to-merge label Feb 17, 2021

wochinge merged commit 1fe10db into 2.3.x Feb 17, 2021

wochinge deleted the fast-schema-validation-fix branch February 17, 2021 10:34

wochinge mentioned this pull request Feb 17, 2021

replace run_in_thread decorator with run_in_process #7980

Closed

1 task

wochinge self-assigned this Feb 17, 2021

erohmensing mentioned this pull request Feb 22, 2021

2.3.0 dramatically increases time to process training data files #8012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load schema files for `pykwalify` to avoid global `yaml` usage #7970

load schema files for `pykwalify` to avoid global `yaml` usage #7970

wochinge commented Feb 16, 2021 •

edited

Loading

wochinge Feb 16, 2021

wochinge Feb 17, 2021

wochinge Feb 17, 2021

alwx Feb 17, 2021

wochinge Feb 17, 2021

wochinge Feb 17, 2021

wochinge commented Feb 17, 2021

joejuzl commented Feb 17, 2021

wochinge commented Feb 17, 2021

joejuzl commented Feb 17, 2021

wochinge commented Feb 17, 2021

load schema files for pykwalify to avoid global yaml usage #7970

load schema files for pykwalify to avoid global yaml usage #7970

Conversation

wochinge commented Feb 16, 2021 • edited Loading

wochinge Feb 16, 2021

Choose a reason for hiding this comment

wochinge Feb 17, 2021

Choose a reason for hiding this comment

wochinge Feb 17, 2021

Choose a reason for hiding this comment

alwx Feb 17, 2021

Choose a reason for hiding this comment

wochinge Feb 17, 2021

Choose a reason for hiding this comment

wochinge Feb 17, 2021

Choose a reason for hiding this comment

wochinge commented Feb 17, 2021

joejuzl commented Feb 17, 2021

wochinge commented Feb 17, 2021

joejuzl commented Feb 17, 2021

wochinge commented Feb 17, 2021

load schema files for `pykwalify` to avoid global `yaml` usage #7970

load schema files for `pykwalify` to avoid global `yaml` usage #7970

wochinge commented Feb 16, 2021 •

edited

Loading