Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iohub register race condition fix #227

Merged
merged 2 commits into from
Aug 27, 2024
Merged

iohub register race condition fix #227

merged 2 commits into from
Aug 27, 2024

Conversation

mamin506
Copy link
Contributor

@mamin506 mamin506 commented Aug 27, 2024

  1. Add mailbox_res_record to mailbox_channel. Maybe it will be good that res record can also do some statistics work for helping debugging/analyzing, etc.. So, this is a good move.

  2. Move mailbox_rx_worker to before mailbox_irq_handler better readability

  3. In mailbox_irq_handler(), we aware that the clear iohub might race with set iohub from FW side. So that the iohub register is not able to trigger MSI-X interrupt. This leads to the application hangs.
    The idea is to fix this in host. In mailbox_irq_handler(), after clear iohub and launch worker, it keeps reading iohub for up to 4 times.
    If all these read are 0, this means there is no race during this period. Then the handler can exit safely.
    If any of the read is 1, this means FW want to trigger interrupt. The handler will clear iohub again and enqueue another work. This is not the perfect solution in theory. But based on the fact that handler is running very fast, and the FW to trigger next interrupt is slower. This change looks like very promising.

In my stress test, which disabled TDR in the driver, it can run overnight without issue. Without this change, my test will hang in less than half hour.

Signed-off-by: Min Ma <min.ma@amd.com>
@mamin506 mamin506 requested a review from maxzhen August 27, 2024 17:51
…t is obvious

Signed-off-by: Min Ma <min.ma@amd.com>
@mamin506 mamin506 merged commit 4bb5966 into amd:main Aug 27, 2024
@mamin506 mamin506 deleted the mbox_iohub branch August 27, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants