-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load schema files for pykwalify
to avoid global yaml
usage
#7970
Conversation
# Load schema content using our YAML loader as `pykwalify` uses a global instance | ||
# which can fail when used concurrently | ||
schema_content = rasa.shared.utils.io.read_yaml_file(schema_file) | ||
schema_utils_content = rasa.shared.utils.io.read_yaml_file(schema_utils_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively we can convert all schema files to json
as this would achieve the same effect and might be better re performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better: define the schemas as Python objects - this saves us reading things entirely. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is related to my insights in #7892 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make sense to define schemas as Python objects (don't see any need to go with JSON in this case) — but we can also create an issue and do it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I'll add a changelog, create a new issue + merge 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alwx Not ready for review yet, I rather want your input before I start making bigger changes |
I wonder if this is the best fix? Maybe it would be safer going forward to just stop using multithreading? We could easily run into more issues like this in the future unless we ensure the threadsafety of all the code we write, and all the libraries we use! We could use multiprocessing instead (should be a very easy change). |
@joejuzl Great suggestion! Unfortunately this is not as easy expected.
That's the code I used: def _run_in_process(f, request: Request, *args: Any, **kwargs: Any) -> None:
# This is a new thread, so we need to create and set a new event loop
thread_loop = asyncio.new_event_loop()
asyncio.set_event_loop(thread_loop)
try:
return thread_loop.run_until_complete(f(request, *args, **kwargs))
finally:
thread_loop.close()
def run_in_thread(f: Callable[..., Coroutine]) -> Callable:
"""Decorator which runs request on a separate thread.
Some requests (e.g. training or cross-validation) are computional intense requests.
This means that they will block the event loop and hence the processing of other
requests. This decorator can be used to process these requests on a separate thread
to avoid blocking the processing of incoming requests.
Args:
f: The request handler function which should be decorated.
Returns:
The decorated function.
"""
@wraps(f)
async def decorated_function(
request: Request, *args: Any, **kwargs: Any
) -> HTTPResponse:
with concurrent.futures.ProcessPoolExecutor() as pool:
return await request.app.loop.run_in_executor(
pool, functools.partial(_run_in_process, f, request, *args, **kwargs)
)
return decorated_function Do you have an idea? Otherwise I'd suggest to go with the current solution for now. We can still iterate on the solution in case we see threading errors elsewhere |
@wochinge ahaa - that's annoying 🤦 |
it was a very nice bug 😄 And now I have more ideas / opinions about our YAML validation. Follow up issue #7980 |
Proposed changes:
pykwalify
uses a globalruamel.yaml
instance. This instance sets object attributes while reading files (e.g. the YAML schema files itself!) . This can lead to problem when we have multiple concurrent yaml validations. Fix: Either load the yaml files for pykwalify (what I did in my PR) or provide the schema files in jsonStatus (please check what you already did):
black
(please check Readme for instructions)