-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: ipc4: change trigger order for chain DMA #4798
ASoC: SOF: ipc4: change trigger order for chain DMA #4798
Conversation
The host DMA (controlled by BE ops) must be stopped before sending PAUSE/STOP IPC (sent from FE ops) to chain DMA. Unless this is done, the DMA stop flow is not following programming sequence and DMA engine may get stuck in busy state. Link: thesofproject/sof#8792 Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
For Intel folks, I scheduled test job #37444 that tests this kernel PR against a SOF built with a newer Zephyr (that raises an error if GBUSY is stuck in the DMA). |
UPDATE: and test still fails. @RanderWang I need your help to understand the original problem sequence in thesofproject/sof#8686 .. after your patch, TGL now fails in every PR plan and we need to figure out a way to unblock integration. Reverting your patch is one option, but I guess we'd then get #8686 back. Plus, DMA stuck in GBUSY after stop, just seems wrong, so we need to fix this as well (but this might take more time, so we can't hold off integration while we debug). |
@kv2019i the problem sequence is simple. For pipeline stop case, host dma is stopped by two paths in the following code in SOF FW. If the time interval between two stop is very short(rare case), the bug will happen. int ipc4_pipeline_trigger(struct ipc_comp_dev *ppl_icd, uint32_t cmd, bool *delayed)
{
.....
/* trigger the component */
ret = pipeline_trigger(host->cd->pipeline, host->cd, cmd); // stop host dma, cmd = COMP_TRIGGER_STOP
if (ret < 0) {
.........
} else if (cmd == COMP_TRIGGER_STOP) {
/*
* reset the pipeline components if STOP trigger is executed in
* the same thread.
* Otherwise, the pipeline will be reset after the STOP trigger
* has finished executing in the pipeline task.
*/
ret = pipeline_reset(host->cd->pipeline, host->cd); // stop host dma
if (ret < 0)
ret = IPC4_INVALID_REQUEST;
}
} My patch was aligned with REF FW. As you mentioned that the bug only happens on CAVS 2.5 platforms. I check the REF FW again and found that hda dma code in current REF FW is totally different with an old CAVS FW code (2 years ago). There was a complete rework for hda dma. The wait for host dma only exists in current FW code (hw change ?), not in an old CAVS FW code. I don't know whether the current REF FW code can support CAVS platforms. It is very likely that we need to disable wait for CAVS platforms. One problem is that hda dma in zephyr is shared by all intel platforms, how to identify hw platforms in hda dma code ? |
Hmm, but even if we have racy calls to DMA stop, intel_adsp_hda_dma_stop() (Zephyr HD-DMA driver) will first check intel_adsp_hda_is_enabled() and that should cover the race, right? I don't undestand how we get past that and end up with inbalanced PM:
GEN and FIFORDY are cleared as the first thing in the stop function, so evenif we get another stop immediately, the function should be correction. So is the "if (!intel_adsp_hda_is_enabled())" check broken? And if yes, can we fix that to protect against stop racing? |
sure, the problem is : intel_adsp_hda_is_enabled() return true for the second stop, so the dma is free again. Current wait is used to fix the second intel_adsp_hda_is_enabled() |
This PR is based on bad assumptions... and does not work. Closing. |
@RanderWang wrote:
Sorry, I don't still understand. is_enable checks "DGCS_GEN | DGCS_FIFORDY" bits at function entry. If they are set, the driver then clears these bits. There is no link to the state of the GBUSY bit. I've checked multiple cases where the GBUSY is stuck and GEN/FIFORDY remain cleared (this happens with chain-dma flow). So I cannot see how a second stop (even if it is immediate) could see is_enable() return true unless we have two calls in parallel, but that should not be happening. I'll try to make an alternative fix, but given how chain-dma works, returning error on GBUSY is not ok. |
@kv2019i This bug is hard to be reproduced and I saved the mtrace for log which is about 130MB+ and happened after 10911 seconds. Thanks for your hard work! "----host put" means try to stop dma before intel_adsp_hda_is_enabled(), "host put 4000020" means dma stop done after intel_adsp_hda_is_enabled() return true. For successive two dma_stop ops, there should be only one "host put 4000020" since the intel_adsp_hda_is_enabled() returns false for the second stop so no "host put 4000020" for second "----host put". You can find the first two dma stop for pipe16 has decreased one more pm ref, so the error happened at the last host dma stop for pipeline 14
|
The host DMA (controlled by BE ops) must be stopped before sending PAUSE/STOP IPC (sent from FE ops) to chain DMA. Unless this is done, the DMA stop flow is not following programming sequence and DMA engine may get stuck in busy state.
Link: thesofproject/sof#8792