Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double I2C read in one transaction skips a clock pulse (#5528) #6654

Merged
merged 9 commits into from
Oct 27, 2019

Conversation

TD-er
Copy link
Contributor

@TD-er TD-er commented Oct 19, 2019

See #5528 and the more elaborate description
where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.

I also noticed some code duplication, which I moved to separate functions.

According to this documentation on I2C clock stretching it is not allowed to have some slave keep the clock low during transmission of a byte, only after an ack.
So that's why it is not done in the while loop.
But I wondered if there should be an extra delay between the sequence of pulses and the extra added clock "valley". See my comment in the code.

Fixes #5528
Supersedes: #6592 and #6579

See esp8266#5528  and the more elaborate [description](https://github.com/maarten-pennings/I2C-tool/blob/master/I2Ctest8266/README.md#how-to-fix)
where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.

I also noticed some code duplication, which I moved to separate functions.

According to [this documentation on I2C clock stretching](https://www.i2c-bus.org/clock-stretching/) it is not allowed to have some slave keep the clock low during transmission of a byte, only after an ack.
So that's why it is not done in the while loop.
But I wondered if there should be an extra delay between the sequence of pulses and the extra added clock "valley". See my comment in the code.

Fixes esp8266#5528
@Jason2866
Copy link
Contributor

@TD-er merged this PR in a setup. You can use with PlatformIO
platform = https://github.com/Jason2866/platform-espressif8266.git#newfeature/stage

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 20, 2019

According to this documentation on I2C clock stretching it is not allowed to have some slave keep the clock low during transmission of a byte, only after an ack.
So that's why it is not done in the while loop.
But I wondered if there should be an extra delay between the sequence of pulses and the extra added clock "valley". See my comment in the code.

That note at I2C.org is specifically for High-Speed-Mode, and isn't required for 400KHz Fast-mode, which is all this library can support at 80MHz CPU, possibly 600KHz Fast-Mode Plus with 160MHz CPU. In reality, I've only ever seen slaves stretch the ACK, but they're allowed to stretch the address and data clocks in Standard and Fast-modes.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 20, 2019

@Tech-TX So what is your suggestion?
To have the clock stretch also present during transfers themselves?

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 22, 2019

I think I have my compile issues corrected, so I'm looking at your fix now. I think it's missing something, but I have a hard time reading those in-line changes on one page that Github uses... I'll fire it into Beyond Compare so I can make sense of it. It'll take me a bit to duplicate his set-up so I can test the failure conditions and see if I duplicate the failures with the current code base. I have a CCS811 chip myself, and I had problems with the 2.5.2 library so moved to the Enjoyneering library. The current Git looks significantly different, so I'll have to start testing all over again.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 23, 2019

Hmm. His I2C-tool code compiles with errors... I'll have to correct it before I continue. It did finish the compile, but a couple of the errors need looking at before I can trust it. One I see immediately: he has a comma instead of a period in a printf.

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino: In function 'void i2c_isr()':

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:310:28: warning: comparison is always true due to limited range of data type [-Wtype-limits]

           uint8_t val= (0<=slave_cra && slave_cra<SLAVE_REGCOUNT) ? slave_regs[slave_cra] : 0x55;

                            ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:346:30: warning: comparison is always true due to limited range of data type [-Wtype-limits]

             uint8_t val= (0<=slave_cra && slave_cra<SLAVE_REGCOUNT) ? slave_regs[slave_cra] : 0x55;

                              ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:360:22: warning: comparison is always true due to limited range of data type [-Wtype-limits]

               if( 0<=slave_cra && slave_cra<SLAVE_REGCOUNT ) slave_regs[slave_cra]= i2c.data;

                      ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:396:30: warning: comparison is always true due to limited range of data type [-Wtype-limits]

             uint8_t val= (0<=slave_cra && slave_cra<SLAVE_REGCOUNT) ? slave_regs[slave_cra] : 0x55;

                              ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino: In function 'void setup()':

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:547:49: warning: left operand of comma operator has no effect [-Wunused-value]

   Serial,printf("Waiting for i2C traffic...\n\n");

                                                 ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino: In function 'void loop()':

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:579:5: warning: format '%x' expects argument of type 'unsigned int', but argument 8 has type 'long unsigned int' [-Wformat=]

     );  

     ^

C:\Users\Admin\Desktop\micro\I2C Test\I2C-tool\I2C-tool.ino:579:5: warning: format '%x' expects argument of type 'unsigned int', but argument 10 has type 'long unsigned int' [-Wformat=]

edit: The first 4 warnings are all the same; the first term in the IF is always true, uint8_t slave_cra is always >= 0. The other 2 warnings are in printf statements, and I could care less about them as I won't be using the serial port, only looking at the physical results on the bus.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 23, 2019

This I2C tool you try to build, is it from this repo: https://github.com/maarten-pennings/I2C-tool ?
Maybe @maarten-pennings can help here? After all, he is also the one to initiate the original filed issue.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 23, 2019

Yes, I want to verify that the failure still exists and hasn't been corrected in a previous change to the wire library. His tests were done against the 2.4.2 core, and many things have changed since then. I do not want to back-trace every change to the wire libraries since then, as it's a pointless exercise. I haven't seen a repeated start miss a clock in anything I've done in the last few months.

He never checked his compiler warnings. In the I2Ctest8266 program he's constantly mixing INT and UINT8_T, throwing warnings all over the place about signed vs unsigned, and narrowing of int to uint8. I haven't finished that one yet, but my initial impression was to globally change 'int' to 'uint8_t' as I didn't see any place where his numbers exceeded ~70. I'm still walking down each warning.

I worked in 'sustaining engineering' at one company, and had to clean up after people like this many years ago. What works initially may not work years later when conditions change. Things that you ignore will eventually come back to haunt you.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 24, 2019

Test results: it doesn't matter whether I clean up the mess of int vs uint8_t or not. I've tried both his original code on both ends and my modified code on both ends. With no changes to the slave code, it doesn't respond and throws exceptions because the ISRs are in flash. When I correct that, ALL of the two-byte tests pass with any combination of his/mine on either end, but ALL of the 4-byte tests fail with any combination. During the 4-byte tests that failed, the bus was always perfect (no missed clocks, no obvious incorrect data from the master). I don't doubt there's an error in his 4-byte loopback logic in the slave code,

If you were using his analysis of clock stretch errors as a reason for tweaking the library, then you may have started from a false assumption. I don't see a need for an extra clock. His test may have been valid with 2.4.2, but possibly not with the current Git. I can't go backwards at the moment to see if it ever showed a problem with 2.4.2, and that would be a futile exercise. The Wire library is only used in his 'I2Ctest8266' module that is driving the test, and the Wire library is not used in the I2C-tool module that is the loopback slave. The errors that I see appear to be in the slave, which is his own slave driver code.

After looking at the dog's breakfast of INT/UINT mix-ups, it doesn't surprise me that the test setup has errors. Oh, well.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 24, 2019

Never mind, I see what he's talking about. SigRok is decoding a state that isn't there (the NACK from the master just before the repeated start). I'll look into it this weekend and verify that your fix is solid.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 24, 2019

In the previous revision of this PR, there were reports of user for which it did fix (also?) other issues like when running at 160 MHz.
See the linked comments here: #6592 (comment)
Can you relate to these claims to fix those issues with the changes made in this PR?

And also there is the still open question whether the clock stretch should also be available on individual bits.
To me it does make sense when it can also be used per bit instead of per transfer and I don't think it will hurt the functionality when it is not allowed according to the standard. It looks like it does make it more flexible or tolerant.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 25, 2019

Yes, I can recompile both ends of the code for 160MHz easily. If the timing calculations are off, that may show an issue. I just recompiled, and maarten-pennings code seems to work the same on both ends (master and slave) at 160MHz, but I haven't looked at the bus timing yet. I'll add your code in for the next tests.

The test code that maarten-pennings did already verifies that the clock stretch works on all of the normal states of the bus. Someone apparently fixed the first issue he noticed in a previous update. In reality, most of the time the stretch is only on the ACK of the slave address, but I have seen it on the read bytes long ago. Clock stretch should also be checked when you're doing a bus recovery, which maarten-pennings' code doesn't test. I can verify that with my setup by causing a 'stuck slave'.
Once I have it all working, I'll use a GPIO for an error output test point so I can capture any errors with the logic analyzer. I can let it run for several hours that way to make sure there are no error events in any of the tests.

The Twi::Status function is still broken, so maybe we can add my fix in here as well. I shut down the other pull because my compiler went crazy and I wasn't able to compile anything reliably. The compile appears to be working now.

Unrelated: I saw an error with a DS3231 RTC chip last week: it utterly stuck SDA low, and the bus recovery clocks were not successful in releasing the chip; I had to power it down to get it out of whatever state it was in. I only saw that once. The module doesn't have a RESET pin, so power cycle was the only other option. It's obviously a clone part due to the price of the module, so specs are not available and likely don't match the Dallas datasheet.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 26, 2019

I have the results: the test passed, and I can see the added clock both with a scope and a logic analyzer. The timing looks good. I have 2 hours (600 full test cycles each) at 80MHz and 2 hours at 160MHz CPU speed. Each test cycle has 2 different groups of 79 tests with the stretched clock at different locations in the bitstream, so just under 100.000 tests total, half of them proving that your fix is functional and survives a variable clock stretch.

fixed NACK clock

write race condition

However, I noticed something else (unrelated to your fix); see the circled area at the left. There's a glitch at the WRITE bit. Zoomed in, the master is releasing SDA as soon as the clock falls for the WRITE bit. The I2C spec has a minimum HOLD time of 0ns, but actually DOING that in the real world will cause problems with variations in setup and components.

I modified the test setup a little. Maarten Pennings didn't mention external pullup resistors on SCL or SDA. He's using the Wire library on the master side, and his own slave driver on the other end. Both drivers have INPUT_PULLUP on the I2C lines. I've measured the D1 Mini boards I have here, and come up with an average among several boards of roughly 45K equivalent pullup resistance in the GPIOs, so that's 22.5K total pullup for the bus. I'm a purist, so I added 10K resistors on both lines, about 8.2K total pullup resistance. I also had both a logic analyzer and it's probe capacitance PLUS a couple of scope probes and their capacitance added to the bus, so the signals looks pretty messy. Here's what that WRITE glitch looks like on a 'scope:

plot0003

Both signals are passing through the indeterminate region at about the same time. The thresholds on the logic analyzer make it appear like there's 0.4us of hold time between the clock falling and the data changing state, but the scope says that's debatable. Depending on chips, wires, added capacitance, insufficient or excessive termination and the phase of the moon, you might see erratic WRITE bits. I'll walk that problem down tomorrow and propose a fix for it. From what I see, that 3us glitch is the delay before the slave starts sending the ACK, The master should probably maintain the WRITE bit level for at least 1/4 clock cycle beyond the falling edge of SCL to insure SCL and SDA don't change state simultaneously.

Maarten Pennings' test code can only run at 50KHz bus speed, so I'll look at re-tooling that to use the slave function in the current Git so I can run the tests at the maximum speed. All of my experience with the ESP8266 so far was with the 2.5.2 release, so this last week has been my first shot at the current Git code. I don't think that WRITE glitch was there before, as I always look for things like that.

@devyte
Copy link
Collaborator

devyte commented Oct 26, 2019

@Tech-TX I suggest one thing at a time.

You have done the analysis of the fix proposed by @maarten-pennings , and made this PR as a result. Does the fix address the relevant issue? If you confirm the fix, I'll merge this.

As for the other issue you found, please open a new issue and post the results of your investigation there. Once you find a fix, please make a PR with it, and I'll take a look at the code.

BTW, about that other issue, please make sure to look at open PRs related to I2C to make sure a fix isn't already proposed.

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 26, 2019

TD-er did the PR, I verified that it's working correctly. I haven't tried a full-speed test yet, but from looking at the code it's not speed-dependent, so it should work from DC to 1MHz bus speed. I'll need to minimize the capacitive loading of my test setup so that the rise and fall times meet spec at 700KHz clock. The Maarten Pennings test is only run at 50KHz due to limitations of his slave code.

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Seems like we also have 3rd party confirmation it's working. Great!

@Tech-TX
Copy link
Contributor

Tech-TX commented Oct 26, 2019

If you're going to put that clock inside a function, you might as well make it

void Twi::twi_scl_valley(void)
{
    SCL_LOW();
    busywait(twi_dcount);
    SCL_HIGH();
    WAIT_CLOCK_STRETCH();
}

and then remove the duplicate calls at lines 402, 435 & 443 after your calls to twi_scl_valley

Here's what it looks like with Earle's latest change put back in:

core_esp8266_si2c 10-26

I was wrong earlier: one of my D1 Minis is dying. The tHD;DAT is fine in this code; it's about 268ns with 100KHz bus, which nicely avoids a race unless your hardware is damaged. ;-)

@TD-er
Copy link
Contributor Author

TD-er commented Oct 26, 2019

Yep good point. Will move that call to the clock stretch into the valley function.

@earlephilhower earlephilhower merged commit 36c369b into esp8266:master Oct 27, 2019
@TD-er TD-er deleted the bugfix/issue5528_I2C_3rd branch October 27, 2019 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double I2C read in one transaction skips a clock pulse
5 participants