Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{Core} aaz: Fix importlib deadlocks while loading requests.models in multi threads #26287

Merged
merged 4 commits into from
May 9, 2023

Conversation

kairu-ms
Copy link
Contributor

@kairu-ms kairu-ms commented Apr 28, 2023

closed: #26272
closed: #26297

Related command

Description

In python 3.10.10, we encountered import deadlocks in aaz LRO which try to load the requests.models in two different threads. It's a known issue psf/requests#2925

In this PR, we provide a workaround by importing these modules in the main thread before spawning child threads.

Testing Guide

History Notes

[Component Name 1] BREAKING CHANGE: az command a: Make some customer-facing breaking change
[Component Name 2] az command b: Add some customer-facing feature


This checklist is used to make sure that common guidelines for a pull request are followed.

@azure-client-tools-bot-prd
Copy link

azure-client-tools-bot-prd bot commented Apr 28, 2023

️✔️AzureCLI-FullTest
️✔️acr
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️acs
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️advisor
️✔️latest
️✔️3.10
️✔️3.9
️✔️ams
️✔️latest
️✔️3.10
️✔️3.9
️✔️apim
️✔️latest
️✔️3.10
️✔️3.9
️✔️appconfig
️✔️latest
️✔️3.10
️✔️3.9
️✔️appservice
️✔️latest
️✔️3.10
️✔️3.9
️✔️aro
️✔️latest
️✔️3.10
️✔️3.9
️✔️backup
️✔️latest
️✔️3.10
️✔️3.9
️✔️batch
️✔️latest
️✔️3.10
️✔️3.9
️✔️batchai
️✔️latest
️✔️3.10
️✔️3.9
️✔️billing
️✔️latest
️✔️3.10
️✔️3.9
️✔️botservice
️✔️latest
️✔️3.10
️✔️3.9
️✔️cdn
️✔️latest
️✔️3.10
️✔️3.9
️✔️cloud
️✔️latest
️✔️3.10
️✔️3.9
️✔️cognitiveservices
️✔️latest
️✔️3.10
️✔️3.9
️✔️config
️✔️latest
️✔️3.10
️✔️3.9
️✔️configure
️✔️latest
️✔️3.10
️✔️3.9
️✔️consumption
️✔️latest
️✔️3.10
️✔️3.9
️✔️container
️✔️latest
️✔️3.10
️✔️3.9
️✔️core
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️cosmosdb
️✔️latest
️✔️3.10
️✔️3.9
️✔️databoxedge
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️dla
️✔️latest
️✔️3.10
️✔️3.9
️✔️dls
️✔️latest
️✔️3.10
️✔️3.9
️✔️dms
️✔️latest
️✔️3.10
️✔️3.9
️✔️eventgrid
️✔️latest
️✔️3.10
️✔️3.9
️✔️eventhubs
️✔️latest
️✔️3.10
️✔️3.9
️✔️feedback
️✔️latest
️✔️3.10
️✔️3.9
️✔️find
️✔️latest
️✔️3.10
️✔️3.9
️✔️hdinsight
️✔️latest
️✔️3.10
️✔️3.9
️✔️identity
️✔️latest
️✔️3.10
️✔️3.9
️✔️iot
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️keyvault
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️kusto
️✔️latest
️✔️3.10
️✔️3.9
️✔️lab
️✔️latest
️✔️3.10
️✔️3.9
️✔️managedservices
️✔️latest
️✔️3.10
️✔️3.9
️✔️maps
️✔️latest
️✔️3.10
️✔️3.9
️✔️marketplaceordering
️✔️latest
️✔️3.10
️✔️3.9
️✔️monitor
️✔️latest
️✔️3.10
️✔️3.9
️✔️netappfiles
️✔️latest
️✔️3.10
️✔️3.9
️✔️network
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️policyinsights
️✔️latest
️✔️3.10
️✔️3.9
️✔️privatedns
️✔️latest
️✔️3.10
️✔️3.9
️✔️profile
️✔️latest
️✔️3.10
️✔️3.9
️✔️rdbms
️✔️latest
️✔️3.10
️✔️3.9
️✔️redis
️✔️latest
️✔️3.10
️✔️3.9
️✔️relay
️✔️latest
️✔️3.10
️✔️3.9
️✔️resource
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️role
️✔️latest
️✔️3.10
️✔️3.9
️✔️search
️✔️latest
️✔️3.10
️✔️3.9
️✔️security
️✔️latest
️✔️3.10
️✔️3.9
️✔️servicebus
️✔️latest
️✔️3.10
️✔️3.9
️✔️serviceconnector
️✔️latest
️✔️3.10
️✔️3.9
️✔️servicefabric
️✔️latest
️✔️3.10
️✔️3.9
️✔️signalr
️✔️latest
️✔️3.10
️✔️3.9
️✔️sql
️✔️latest
️✔️3.10
️✔️3.9
️✔️sqlvm
️✔️latest
️✔️3.10
️✔️3.9
️✔️storage
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️synapse
️✔️latest
️✔️3.10
️✔️3.9
️✔️telemetry
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9
️✔️util
️✔️latest
️✔️3.10
️✔️3.9
️✔️vm
️✔️2018-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.10
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.10
️✔️3.9
️✔️latest
️✔️3.10
️✔️3.9

@ghost ghost added Auto-Assign Auto assign by bot Core CLI core infrastructure labels Apr 28, 2023
@ghost ghost requested a review from jiasli April 28, 2023 03:03
@yonzhan
Copy link
Collaborator

yonzhan commented Apr 28, 2023

Core

@ghost ghost requested a review from yonzhan April 28, 2023 03:03
@ghost ghost assigned jiasli Apr 28, 2023
@@ -12,6 +12,10 @@
from azure.core.polling.base_polling import LROBasePolling
from azure.core.tracing.common import with_current_context
from azure.core.tracing.decorator import distributed_trace
# import LROPoller in main thread to resolve import deadlock between threads in python 3.10.10
# reference https://github.com/psf/requests/issues/2925 and https://github.com/Azure/azure-cli/issues/26272
from msrest.polling.poller import LROPoller # pylint: disable=unused-import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling tells me this may not be safe enough, as there may be other dependencies imported by the sub-thread and main thread at the same time.

The SDK implementation is safer because the first PUT operation is performed on the main thread, so subsequent polling GET operations on the sub-thread will have the full dependency tree cached already.

Besides LROPoller, there are also AzureOperationPoller and AzureCoreLROPoller imported from msrestazure.azure_operation and azure.core.polling respectively:

def _is_poller(obj):
# Since loading msrest is expensive, we avoid it until we have to
if obj.__class__.__name__ in ['AzureOperationPoller', 'LROPoller', 'AAZLROPoller']:
return isinstance(obj, poller_classes())
return False

def poller_classes():
from msrestazure.azure_operation import AzureOperationPoller
from msrest.polling.poller import LROPoller
from azure.core.polling import LROPoller as AzureCoreLROPoller
from azure.cli.core.aaz._poller import AAZLROPoller
return (AzureOperationPoller, LROPoller, AzureCoreLROPoller, AAZLROPoller)

Though I haven't got time to look into their actual usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is enough to mitigate the issue. Moving the first request in the main thread will involve a huge change.
We cannot resolve them all because it's python issue python/cpython#91351.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given #26297, do not use command index caching can walk around it. @jiasli

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When loading other modules, some of them will import requests. That's why it can mitigate the issue.

@kairu-ms kairu-ms assigned kairu-ms and unassigned jiasli May 4, 2023
@kairu-ms kairu-ms merged commit a5ef29a into Azure:dev May 9, 2023
yash-nisar pushed a commit to yash-nisar/azure-cli that referenced this pull request May 16, 2023
… in multi threads (Azure#26287)

* import LROPoller in main thread to resolve importlib deadlock in python 3.10.10

* use 'import requests'

* fix style issue

* update comments
avgale pushed a commit to avgale/azure-cli that referenced this pull request Aug 24, 2023
… in multi threads (Azure#26287)

* import LROPoller in main thread to resolve importlib deadlock in python 3.10.10

* use 'import requests'

* fix style issue

* update comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Auto-Assign Auto assign by bot Core CLI core infrastructure
Projects
None yet
5 participants