Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i2c_master: hangs when SCL is randomly pulled low during communication (IDFGH-12655) #13647

Closed
KJ7LNW opened this issue Apr 19, 2024 · 7 comments
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@KJ7LNW
Copy link
Contributor

KJ7LNW commented Apr 19, 2024

Hi @mythbuster5,

Thank you for your recent commits that help with I2C stability. At this point the only way that I can lock up/crash the chip (in a way that it cannot do error handling) is by randomly grounding the SCL line by hand with a pushbutton.

Eventually I can get it to hang with this WDT warning/trace:

E (60864) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (60864) task_wdt:  - IDLE (CPU 0)
E (60864) task_wdt: Tasks currently running:
E (60864) task_wdt: CPU 0: i2c_task
E (60864) task_wdt: Print CPU 0 (current core) backtrace
Print CPU 0 (current core) registers
Core  0 register dump:
MEPC    : 0x4200fb30  RA      : 0x4200fb5a  SP      : 0x40812800  GP      : 0x4080b8a4  
0x4200fb30: i2c_ll_master_clr_bus at /home/ewheeler/src/esp32/esp-idf/components/hal/esp32c6/include/hal/i2c_ll.h:767 (discriminator 1)
 (inlined by) s_i2c_master_clear_bus at /home/ewheeler/src/esp32/esp-idf/components/esp_driver_i2c/i2c_master.c:77 (discriminator 1)
0x4200fb5a: s_i2c_hw_fsm_reset at /home/ewheeler/src/esp32/esp-idf/components/esp_driver_i2c/i2c_master.c:115

TP      : 0x40812910  T0      : 0x40028192  T1      : 0x80000000  T2      : 0x00000000  
0x40028192: __retarget_lock_release_recursive in ROM

S0/FP   : 0x40812a6c  S1      : 0x00000103  A0      : 0x40811788  A1      : 0x40812428  
A2      : 0x00000000  A3      : 0x00000800  A4      : 0x60004000  A5      : 0x00000013  
A6      : 0x00000001  A7      : 0x0000000a  S2      : 0x00000006  S3      : 0x40812830  
S4      : 0x408128b7  S5      : 0x00000001  S6      : 0x00000000  S7      : 0x00000000  
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
T3      : 0x00000000  T4      : 0x4080d808  T5      : 0x00001800  T6      : 0x00000000  
MSTATUS : 0x4201edbc  MTVEC   : 0x420229b8  MCAUSE  : 0x00000366  MTVAL   : 0x42010c8a  
0x42010c8a: s_i2c_synchronous_transaction at /home/ewheeler/src/esp32/esp-idf/components/esp_driver_i2c/i2c_master.c:877

Which causes it to spin here:

while (hw->scl_sp_conf.scl_rst_slv_en);

Note that pushing the reset button always "fixes" it and communication resumes after the reset.

Question

Based on the comment at the top of the function, perhaps nothing else that can be done to prevent this... but before I acquiesce to this limitation, I have a question:

Is there a way to know if the 9x SCL pulses been sent by the hardware have completed, so that it can stop spinning and return control to the program, even if the bus really is locked up?

For example, can we just read the GPIO line level and return from the spinning while loop if the line is high?

Here is the entire function for your reference:

/**
* @brief Clear I2C bus, when the slave is stuck in a deadlock and keeps pulling the bus low,
* master can controls the SCL bus to generate 9 CLKs.
*
* Note: The master cannot detect if deadlock happens, but when the scl_st_to interrupt is generated, a deadlock may occur.
*
* @param hw Beginning address of the peripheral registers
* @param slave_pulses When I2C master is IDLE, the number of pulses will be sent out.
*
* @return None
*/
static inline void i2c_ll_master_clr_bus(i2c_dev_t *hw, uint32_t slave_pulses)
{
hw->scl_sp_conf.scl_rst_slv_num = slave_pulses;
hw->scl_sp_conf.scl_rst_slv_en = 1;
hw->ctr.conf_upgate = 1;
// hardware will clear scl_rst_slv_en after sending SCL pulses,
// and we should set conf_upgate bit to synchronize register value.
while (hw->scl_sp_conf.scl_rst_slv_en);
hw->ctr.conf_upgate = 1;
}

@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 19, 2024
@github-actions github-actions bot changed the title i2c_master: hangs when SCL is randomly pulled low during communication i2c_master: hangs when SCL is randomly pulled low during communication (IDFGH-12655) Apr 19, 2024
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Apr 20, 2024

As a proof of concept, this is one way to prevent the hang, by waiting only while the SCL line is low.

To prevent the spinning on hw->scl_sp_conf.scl_rst_slv_en I had to modify a chip specific header (esp32c6 in my case) and move while (hw->scl_sp_conf.scl_rst_slv_en) outside the low level function. Thus, adding something like a nowait parameter to i2c_ll_master_clr_bus() across chip implementations could be a solution.

With the diff below I can no longer hang the bus. You know the code best, so maybe there is an even better way to solve this:

diff --git a/components/esp_driver_i2c/i2c_master.c b/components/esp_driver_i2c/i2c_master.c
index 4ee7c2b409..ef56bb01c5 100644
--- a/components/esp_driver_i2c/i2c_master.c
+++ b/components/esp_driver_i2c/i2c_master.c
@@ -75,6 +75,15 @@ static esp_err_t s_i2c_master_clear_bus(i2c_bus_handle_t handle)
 #else
     i2c_hal_context_t *hal = &handle->hal;
     i2c_ll_master_clr_bus(hal->dev, I2C_LL_RESET_SLV_SCL_PULSE_NUM_DEFAULT);
+
+    // It would be a good idea to add timout logic here and return an error,
+    // so the program can proceed without hanging if SCL is stuck low:
+    i2c_dev_t *hw = hal->dev;
+    while (hw->scl_sp_conf.scl_rst_slv_en && !gpio_get_level(handle->scl_num));
+    hw->ctr.conf_upgate = 1;
+
+    // Log if it would have hung, this is only for debug:
+    if (hw->scl_sp_conf.scl_rst_slv_en)
+        ESP_LOGE(TAG, "********************* I2C transaction would have hung!");
+
 #endif
     return ESP_OK;
 }
diff --git a/components/hal/esp32c6/include/hal/i2c_ll.h b/components/hal/esp32c6/include/hal/i2c_ll.h
index 5d0ce895d3..340c4f8496 100644
--- a/components/hal/esp32c6/include/hal/i2c_ll.h
+++ b/components/hal/esp32c6/include/hal/i2c_ll.h
@@ -764,8 +764,8 @@ static inline void i2c_ll_master_clr_bus(i2c_dev_t *hw, uint32_t slave_pulses)
     hw->ctr.conf_upgate = 1;
     // hardware will clear scl_rst_slv_en after sending SCL pulses,
     // and we should set conf_upgate bit to synchronize register value.
-    while (hw->scl_sp_conf.scl_rst_slv_en);
-    hw->ctr.conf_upgate = 1;
 }
 
 /**

@mythbuster5
Copy link
Collaborator

This is a good question. Yes. Usually I2C fsm got something wrong, we reset the bus, that is send 9 SCL pulses. However, during sending SCL pulses, there is another disturbance work on I2C bus, I2C hardware will completely confused.

So, in this part, I will have a talk with our hardware design colleagues to see if there is any better solution.

@AxelLin
Copy link
Contributor

AxelLin commented May 10, 2024

Seems I also hit the issue with v5.2.1-443-g2ba5320112 and v5.3-dev-3675-ge486f3b944.

E (8324) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (8324) task_wdt:  - IDLE (CPU 0)
E (8324) task_wdt: Tasks currently running:
E (8324) task_wdt: CPU 0: main
E (8324) task_wdt: Aborting.


Stack dump detected
Core  0 register dump:
MEPC    : 0x4204d196  RA      : 0x4204e4c0  SP      : 0x4082ce80  GP      : 0x4081a974
0x4204d196: i2c_ll_master_clr_bus at /home/axel/esp/esp-idf/components/hal/esp32c6/include/hal/i2c_ll.h:732 (discriminator 1)
 (inlined by) i2c_master_clear_bus at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:635 (discriminator 1)
 (inlined by) i2c_hw_fsm_reset at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:667 (discriminator 1)
0x4204e4c0: i2c_master_cmd_begin at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:1549

TP      : 0x407e9bf0  T0      : 0x00000000  T1      : 0x40826000  T2      : 0xffffffff
S0/FP   : 0x40836dd8  S1      : 0x00000000  A0      : 0x00000000  A1      : 0x00000000
A2      : 0x4081e464  A3      : 0x00000800  A4      : 0x00000013  A5      : 0x60004000
A6      : 0x00000004  A7      : 0x00000014  S2      : 0x00000001  S3      : 0x000003e8
S4      : 0x000003e8  S5      : 0x00000622  S6      : 0x00000a0a  S7      : 0x00000007
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x40826000  T4      : 0x00000001  T5      : 0x40826000  T6      : 0x00000001
MSTATUS : 0x00000000  MTVEC   : 0x40836cac  MCAUSE  : 0x00000000  MTVAL   : 0x00000000
MHARTID : 0x00000006  


Backtrace:


0x4204d196 in i2c_ll_master_clr_bus (slave_pulses=9, hw=0x60004000) at /home/axel/esp/esp-idf/components/hal/esp32c6/include/hal/i2c_ll.h:732
732         while (hw->scl_sp_conf.scl_rst_slv_en);
#0  0x4204d196 in i2c_ll_master_clr_bus (slave_pulses=9, hw=0x60004000) at /home/axel/esp/esp-idf/components/hal/esp32c6/include/hal/i2c_ll.h:732
#1  i2c_master_clear_bus (i2c_num=I2C_NUM_0) at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:635
#2  i2c_hw_fsm_reset (i2c_num=i2c_num@entry=I2C_NUM_0) at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:667
#3  0x4204e4c0 in i2c_master_cmd_begin (i2c_num=I2C_NUM_0, cmd_handle=cmd_handle@entry=0x40836fc0, ticks_to_wait=ticks_to_wait@entry=1000) at /home/axel/esp/esp-idf/components/driver/i2c/i2c.c:1548
#4  0x4201c246 in i2c_dev_probe (dev=dev@entry=0x40836cac, operation_type=operation_type@entry=I2C_DEV_WRITE) at /home/axel/esp/esp-idf-dev/esp-idf-lib/components/i2cdev/i2cdev.c:264

@eriksl
Copy link

eriksl commented Jun 4, 2024

This is a good question. Yes. Usually I2C fsm got something wrong, we reset the bus, that is send 9 SCL pulses. However, during sending SCL pulses, there is another disturbance work on I2C bus, I2C hardware will completely confused.

So, in this part, I will have a talk with our hardware design colleagues to see if there is any better solution.

Just my $0.02. I have been working with I2C for 30 years now, have also written an extensive driver for ESP8266. The recommended (by Philips/NXP) way to resolve a "stuck" I2C bus (SDA), is to cycle the SCL line at least 8 times (and leave SDA floating high). This will make most state machines in the slaves reset to their initial state. And stop outputting to SDA (data + ack). If a slave hogs SCL (clock stretching) you're on your own. There is no way whatsoever to resolve a bus with a stuck SCL.

So I guess Espressif is doing the right thing here.

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Jun 5, 2024

So I guess Espressif is doing the right thing here.

That may be true in terms of how the bus reset is performed, but it is the current error handling implementation which is problematic:

This issue is that when the bus enters a stuck state, the driver hangs in an interrupt handler and crashes the software via WDT.

The correct software fix would return an error (or a timeout) from s_i2c_master_clear_bus() instead of crashing the entire device.

@eriksl
Copy link

eriksl commented Jun 6, 2024

Ah yes ok, clear!

@dane-io
Copy link

dane-io commented Aug 28, 2024

Just wanted to add, I encountered the exact error originally posted; found out it was due to me using vTaskDelay() to wait for some conversions on an I2C device while also using CONFIG_FREERTOS_USE_TICKLESS_IDLE=y and CONFIG_PM_ENABLE=y with light sleep and Zigbee. Fixed my issue (so far, still testing) by implementing esp_pm_lock_acquire() and related functions around my I2C calls in the application.

Seems that the I2C driver (at least the new one, maybe the old also?) might have some compatibility issues with light sleep.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Opened Issue is new labels Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants