Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing Limitations in Python Function Execution Model with Asyncio Event Loop in Azure Functions #1574

Open
YunchuWang opened this issue Sep 6, 2024 · 0 comments

Comments

@YunchuWang
Copy link
Contributor

YunchuWang commented Sep 6, 2024

Description:

We have identified foundational limitations in the Python function execution model within the Azure Functions Python Worker that have impacted multiple durable Python customers. The current model runs all asynchronous function calls within a single asyncio event loop, leading to several constraints and challenges that need to be addressed:

Execution Order of Invocations:

The order in which invocations are received and executed is subject to the scheduling logic of the asyncio event loop, resulting in potentially random execution order. Currently, we cannot guarantee a first-come, first-served execution model.
Invocation Timeout Tracking:

There is no mechanism for tracking the real status of an invocation when it times out. The timeout could result from a genuinely long-running invocation, or it might be due to the worker not picking up the invocation for an extended period (e.g., 15 minutes) because the event loop is busy with other tasks. Additionally, the loop may become stuck processing one or more "bad" async calls (i.e., calls declared as async but performing blocking operations). The Function Host sends the request to the worker and begins measuring the timeout, but there is no explicit acknowledgment, status check, or fast-fail mechanism for these bad async calls currently in place.
Monitoring Event Loop Status:

Currently, there is no platform support for real-time monitoring of the asyncio event loop's running status or the ability to take snapshots to diagnose potential issues.
Reference to the relevant code: https://github.com/Azure/azure-functions-python-worker/blob/b734c57b3b81b3cad2f84951ee79c3a493504e32/azure_functions_worker/dispatcher.py#L659C13-L669C62

These challenges present significant difficulties for customers relying on durable/ non durable Python functions, and we are looking for potential enhancements to address these limitations.

++ @davidmrdavid @andystaples @vrdmr @gavin-aguiar @hallvictoria @fabiocav

Expected Behavior

No response

Relevant sample code snipped

No response

Additional Information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant