Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastLED breaking Ethernet using RMT (LAN8720) ? And wrong LED pattern using I2S #19

Open
Pacheu opened this issue Sep 29, 2020 · 11 comments

Comments

@Pacheu
Copy link

Pacheu commented Sep 29, 2020

Hi ! I am currently developing something on an ESP32 using ESP IDF and I was adding your library to my project to light up a Neopixel Ring (24 pixels).

I encountered some problems regarding the use of Wi-Fi and FastLED, which I believe I solved (I will describe it later), but now I am stuck with a new problem : On the first call to FastLED.show(), my Ethernet connectivity breaks ; I don't see any receives.

Details on my working environment :

  • I'm using an Olimex EVB ESP32 board (with an ESP32 chip, obviously), which has a LAN Port (LAN8720) .
  • I'm using ESP IDF (idf.py -- version gives ESP-IDF v4.3-dev-907-g6c17e3a64-dirty ), and Visual Studio Code on Ubuntu.
  • I power up my board via USB (5V) and an other USB port (5v) with an adapter powers up my Neopixel Ring. Plus, I added a 10K Ohm resistor on the pin to which the ring is connected. I didn't add the capacitor as recommended on Adafruit website, because my power supply isn't that powerful.
  • I connect via LAN my PC and the board to the same router, so there are in the same subnet, and the router gives via DHCP the IPs to them.
  • I can use either GPIO 12 / 02 / 04 / 05
  • Here's my code to show what I am seeing : I just started my example from your main, so if someone wants to do this on his side, just replace the main folder of this repo. This is not my project, but this example is showing the same issue.
    main.zip

I'm also using the RMT driver, by commenting the (#define FASTLED_ESP32_I2S) line in FastLED.h and un-commenting ("platforms/esp/32/clockless_rmt_esp32.cpp") in CMakeLists.txt. Indeed, I am using the WS2812 LED chipset, and as I understood, this uses the RMT driver. Finally, I added a line inside IDF code, in "esp_eth_mac_esp32.c" in component esp_eth, to print transmits of the mac driver :

  • ESP_LOGD(TAG, "transmit len= %d, buff len= %d", sent_len, length); // (line226)

As my example is running on the Olimex, I am pinging it with my PC. After the initialization, I have no problem pinging it. But, on the first call to FastLED.show(), receives end, and the device is not responding to my pings. I can stop and restart the Ethernet handler, this won't change any thing. The only way is to destroy it (uninstall the driver), re-install it and restart it again. But once again, on the next show, this will stop working. If I don't restart it, receives don't happen, but also, after a while, I am starting to see that the line in "esp_eth_mac_esp32.c" :
MAC_CHECK(sent_len == length, "insufficient TX buffer size", err, ESP_ERR_INVALID_SIZE); // line227
is reporting an error .

At least, my LED are showing correctly =) .

There is also a Wi-Fi handler initialized, but removing it won't solve the problem.

I have no idea what is causing the PB : I have browsed everywhere, I cannot seem to find anything on that, except this :
espressif/esp-idf#2644
But the person's problem was on the RMT side with RX, not on the Ethernet side.
I don't know if the RMT here is messing with the SPI emac pins ( I don't know why it would do that).
I don't know either if it is interrupt - related ( I also cannot see why it would be because the RMT implementation is not using any I believe). EDIT : I may have been wrong, and I believe now that there are actually interrupts, because the RMT sends the signal in a devided way.
It might be my implementation somehow, but what is in the example is pretty basic (just esp_eth usage and one FastLED controller with not that many pins).

Do you think I have to write an issue on FastLED Github page or on ESP-IDF one's ? I mean if the issue is RMT related and not really FastLED-idf related, I can understand that solving this is not really in your interest.
Otherwise, do you have any idea what might be the issue here ? Ah, and here is a capture of the logging, where we can see during the blinking that no receives happen, until I destroy-restart the eth handler :

Ethernet-FastLED

PS : The problem I had with the use of Wi-Fi and FastLED was due, I think, to the fact that Wi-Fi main task is pinned to core 0 and doing FastLED.show() in this core would result in wrong LED show, for example, 5 LEDs light up instead of 2. So, I'm now calling show() in a task pinned to core 1, and the problem disappeared. Have you ever seen this issue with Wi-Fi before ? Thank you for reading me !

@NeliusNDL
Copy link

I am busy investigating led control while using the ESP32 an I stumbled on this. Reading bbulkow interesting readme I see that he on several occasions mention that I2S module works much better and is more stable. Have you tried the I2S module instead of the RMT. I am also going to use the olimex EVB....I hope I don't get the same problem 😳

@Pacheu
Copy link
Author

Pacheu commented Oct 2, 2020

Hi @NeliusNDL ! Thanks for your comment ! I confirm that this problem does not occur with I2S (when commenting / un-commenting lines in CMakeList.txt and FastLED.h). This means, pings are being replied to while LEDs are blinking.
BUT, now LEDs are not showing correctly with and without Wi-Fi initialized :

  • With the example above, with NUM_LEDS at 12, sometimes one more led is lit up with another color, sometimes 5 more are always green ... let's say that there is an almost random pattern of LEDs that are lit but should not be.
  • Using my project which has a lot more going on, it is even more chaotic.

I will dig more into this I2S issue in my examples. If anyone has ideas or links about any of those two issues, I would be glad to have it !

@Pacheu Pacheu changed the title FastLED breaking Ethernet (LAN8720) ? FastLED breaking Ethernet using RMT (LAN8720) ? And wrong LED pattern using I2S Oct 2, 2020
@bbulkow
Copy link
Owner

bbulkow commented Oct 5, 2020

Hi @Pacheu . This sure sounds like the Olimex board is using a pin that you're trying to use. It looks like the EVB board is kinda tough because given the number of features it has, you have to find free pins. I see that all the pins you've tried are bootstrapping pins, I see on the Olimex support board lots of questions about which pins do what. It looks like the EMAC pins can be remapped through the gpio_matrix_* calls.... but I figure it's not doing any of that? Reading the schematic for the EVB, I see Ethernet using 19,22,21,25,26,23,18,27, which is a lot of pins. Looking at the Phy demo, I see the same pins. However, in the schematic, I see even more pins labeled as EMAC, such as GPIO0,1,4,5,12,13,14,15,16.

I also see a bunch of statements in the forum about voltages, as if using the ethernet is simply pulling power that you don't have? This is a complicated board....

The fact that Olimex is not forthcoming about what pins you can use is quite unsettling - but how about this? I just looked at toggle_relay.c in their examples directory, and I see they're using gpio 32,33,34. The I2C example is using 16 and 13. watering is using 32, because it's where the relay is.

GPIO2 is listed as SD_DATA0, GPIO4 is EMAC_TX_ER, GPIO5 is EMAC_RX_CLK, 12 is both SD_DATA2, and EMAC_TXD3... I'm getting that by eyeballing the schematic, checking out the ESP-WROOM-32 component, and reading carefully how they've labeled the pins. Its the only thing I can see on the Olimex site at all regarding what pins are safe.... this would seem a good question for the Olimex support forum?

Ok, so how about pin 32? It's RELAY1, you can disable (and not init) any code using the relays, and use ethernet and FastLED? I also like the looks of 5 and 35, the canbus pins, because you're probably not using canbus --- just make sure you don't init or compile in CAN, and those pins are likely safe.

If it's not a pin collision problem, and it's not a power problem, I'd turn on as much stack protection code as you can find.

@Pacheu
Copy link
Author

Pacheu commented Oct 6, 2020

Hi @bbulkow ! Thank you for your time. After reading your comment I ran some more tests on Power supply, pins, etc ...

  • Using RMT and 1 LED ring :

    • I powered the Olimex board using an external Power supply (5V, up to 3A) : Did not change a thing
    • I tried with pins 32 then 33 (Relay 1 and 2) alone, with no other stuff except Ethernet, had the same Eth RX issue
    • I tried with the Olimex Gateway and the Olimex POE ISO boards, and I had the same issue on pins that should not interfere with LAN
    • I tried to define FASTLED_RMT_BUILTIN_DRIVER to 1 including FastLED.h, but then both Ethernet and LED changes stopped working
  • Using RMT and 4 LED rings :

    • I witness a new thing : using 3 LED rings, nothing changes, same Eth issue. Then, once the 4th ring is added, the Ethernet client is reporting the events ETHERNET_EVENT_CONNECTED and ETHERNET_EVENT_DISCONNECTED although those events never occurred before after the first show(). This happens no matter what pin is given for this 4th LED controller.
    • However, LED sequences are as expected for all LED rings.
  • Using I2S

    • Since RMT is giving me such a hard time, I ran some tests using I2S and, like I said before, this Ethernet issue never occurs, but LED sequences are not what I expect. And this with or without WiFi running or initialized, even with the same example in your main.
    • However, I tried this with a ESP-WROWER-KIT and I2S controllers where giving the right LED sequences with these rings, but this board does not have a LAN port so I cannot use it in my project.

Also, what do you mean by turning " as much stack protection code as I can find " ? I used in IDF configs the Stack smashing protection mode with CONFIG_COMPILER_STACK_CHECK_MODE set to option Overall (COMPILER_STACK_CHECK_MODE_ALL) but nothing changed.

I never had this Eth interference issue before. I used almost all GPIOs that was available according to this Pinout from RIOT OS with ADC, common GPIO with or without interrupts and I never broke anything from Eth driver or stack. That is why my guess was on RMT.

EDIT : on my project real setup, with 4 stripes of 70+ LEDs (powered with 5V up to 3A), when using I2S with ESP-WROWER-KIT, there are some artifacts at the end of the number of LEDs I am using : for example, if I want to use 50 LEDs, the 51th one is sometime lit and the 4 after that are always red). But, using RMT with either WROWER KIT or Olimex EVB, every lit LED is expected.

@bbulkow
Copy link
Owner

bbulkow commented Oct 6, 2020

Thanks for all that testing, but tell me, where can we go next? The RMT driver is specifically built to only use that one pin and attaches the RMT driver to it. How else could it cause problems to the Ethernet driver and stack? My theory was pin conflict, but given your testing, I agree that seems impossible - although thank you for the pointer to RIOT OS documentation. It seems to also state, for example with GATEWAY, that there are few safe pins, but if you have the problem with just one LED string configured, and Just one 'addLeds' and only on the safest of pins, you've covered the bases.

My other theory would be something is hurting the stack. For this there are several entries in menuconfig, such as "Compiler options -> stack smashing protection -> strong, and there's also the chance that there's heap corruption so Component config -> heap memory debugging -> heap corruption detection, which can be set to different levels. There are also controls for the amount of stack space used by interrupt handlers and similar, these are controlled under Component config -> Common ESP-Related. Since we are guessing at what even a cause would be, I can only suggest making these values bigger and see what happens.

However, today, I can only think that it must be a timing issue related to the amount of CPU used by the RMT system or I2S. To test this, one could try lowering the priority of the RMT system and changing the core, to see if there is a difference. The core for RMT interrupts is controlled by the core where the initialization happens - in the main.cpp example code, you see there is an xTaskCreate.... that is on core 0, so the RMT interrupt will be on core 0 also. You can try changing this; I don't know which core the Ethernet system is using. In order to lower the priority ( which might cause pattern problems but at least give us a hint ), look for the line esp_intr_alloc() in clockless_rmt_esp32.cpp and change the values of ESP_INRT_FLAG_IRAM and ESP_INTR_FLAG_LEVEL3 as per the documentation for that function. LEVEL3 is the highest level allowed for C functions.

Your measurement that adding a 4th "ring" creates disconnects could create this suspicion, but it's not a great theory. Disconnects might be caused by the ethernet driver not getting enough time, true but with 4 strings (and pins) there are still 4 in parallel, so I wouldn't expect the FastLED driver to take much more CPU, only a little. To tip over the edge into disconnects, that seems possible? If so lowering the interrupt level and changing the core should be an interesting test.

Another test I can imagine is decrease the number of LEDs (again as a test). If the interrupt handler has to do a certain amount of work, and that's causing ethernet problems, then doing less work would be a way of sniffing out the cause. Since you are using "rings" I think you must not have too many LEDs, though? Is this a possible area of test? I see your real project is 70 so perhaps you are already using fewer, the "rings"? Going down to 1 LED might be an interesting test.

I am also wondering if the Ethernet driver is conflicting with the RMT driver as well as the reverse. There is a define "FASTLED_ESP32_SHOWTIMINGS". This prints out, after every 'led show" call, the jitter at the interrupt level. I used it for determining the best MEM_BLOCK_NUM. Perhaps you can turn this on and see what the jitter is, compared to your WROVER?

I have no Olimex systems to test, myself. I was thinking though, just a few days ago before I saw your post, of getting one of the POE systems, as they are small but mighty. It would be unfortunate if we couldn't get the olimex ethernet running with FastLED, because Ethernet really is the best way to synchronize and control LEDs like this.

Since you have these boards, are there other differences that might give us a clue? For example, are they using Rev 0 chips, with more known errata? Are they using a different XTAL than the internal one, so might have major timing differences?

The fact that the patterns "are not what you expect" with I2S is also interesting, no? What would be causing that? The usual reason would be that the timing generated is not correct. The interesting point of the RMT driver is it has "bail code", that it, it determines when there has been interrupt jitter and stops setting pixels. The I2S code doesn't have this, so if there is jitter and thus there is a > 50us pause, it will create very bad patterns - much worse than an extra pixel here or there. This is why I suggest turning on SHOWTIMINGS, it might determine if the ethernet driver is greedy.

Regarding your test with I2S on a WROVER, I have seen a similar artifact. Interestingly, it happens only one some patterns, and when I wrote test patterns that were simpler, I was unable to reproduce it, so I started suspecting my pattern code and not the driver code. It also seemed to be just one LED too long sometimes, same as you say. If you have simple code to reproduce the "extra LED" on a more standard ESP32, I would love to see it, maybe I can track that down - if you can share that would be great.

@Pacheu
Copy link
Author

Pacheu commented Oct 8, 2020

Hi ! First of all I'd like to thank you again for your time and your answers.
So, once again, after reading your post I have continued to run some tests :

  • Concerning configurations protecting stack / heap memory :

    • Turned on Compiler options -> stack smashing protection -> overall
    • Turned on Component config -> heap memory debugging -> heap corruption detection
    • Looked at Component config -> Common ESP-Related and changed some options, turned on WD timers and abort on WDT expired ...
    • ==> No changes. I kept theses options for the other tests
    • So, with this easy and basic code with wifi, eth and FastLED, I feel like I don't have memory / task stack / heap issues and tasks don't trigger the WDT.
  • Concerning priorities and tasks pinned to core 1 or 0 :

    • I changed into esp-eth code the priority of ETH RX task and fixed it to 32 (I believe it's the highest priority) and fixed the task to core 0.
    • I fixed the task calling FastLED.show() on core 1, with lowest priority 1.
    • I changed in clockless_rmt_esp32.cpp the bit mask concerning the interrupt : values of ESP_INTR_FLAG_IRAM and ESP_INTR_FLAG_LEVEL3. I tried all combinations with level 1-2-3 and ESP_INTR_FLAG_SHARED ESP_INTR_FLAG_EDGE or ESP_INTR_FLAG_IRAM just to be sure.
    • I also tried to turn on Component config -> FreeRTOS -> only on first core enabled and disabled
    • ==> No changes. I kept theses priorities and ESP_INTR_FLAG_IRAM + ESP_INTR_FLAG_LEVEL1 for the next steps.
  • Concerning numer of LEDs and Timings :

    • I enabled the jitter printings with FASTLED_ESP32_SHOWTIMING
    • I tried with 71 LEDs on 4 stripes, and 1 / 2 / 3 Leds on only 1 stripe on the GPIO 32 (REL 1 of Olimex)
    • I tried with different values of MEM_BLOCK_NUM (1, 2, 4)
    • I then compared the values with the WROVER KIT
    • Results : First, Eth is still broken.
      • With MEM_BLOCK_NUM = 1, with 3 LEDs, the "rmt irq print" values are around 36 to 41.
      • With MEM_BLOCK_NUM = 1, with 71 LEDs, the "rmt irq print" values are in the same interval.
      • With MEM_BLOCK_NUM = 2, with 3 or 71 LEDs, the "rmt irq print" values are around 74 to 79.
      • With MEM_BLOCK_NUM = 4, with 71 LEDs, the "rmt irq print" values are around 150 to 159. Moreover, if I have let's say 5 LEDs per stripe, an other 5 LEDs after them are lit with sometimes the same pattern, but it can also be random. This is just a weird thing that I noticed.
      • With WROVER KIT, MEM_BLOCK_NUM = 2, with 3 or 71 LEDs, the "rmt irq print" values are in the same interval as OLIMEX with same parameters. More over, on Olimex, I have wifi + eth, but on wroover I don't have either of them, and still no timing difference between these two.
  • Concerning I2S and the artifact:

  • Here is a simple example on WROVER, made out of your own main but with only the necessary, to show the artifact on the NUM_LEDS + 1 LED. main.zip

I will now dig more into Olimex's boards, if there is somewhere any clue on any Ethernet bug caused by FreeRTOS or any I/O.

@bbulkow
Copy link
Owner

bbulkow commented Oct 10, 2020

@Pacheu I had a think while I was sleeping, and want to confirm a fact.

The I2S driver implementation DOES NOT impact the Ethernet use, it simply is wrong.

If that's the case, since it uses the same amount of memory and mostly the same code paths until it actually manipulates drivers, then my new theory is the Ethernet driver is using the RMT hardware in some way. This can probably best be validated by reading the Ethernet source code (although looking at the driver state might work too). I don't love this theory, because when FastLED attempts to configure the RMT hardware, I would think we would get a failure and we do check for error codes now.

The best thing I could do, if this is true, is to simply find the bug in the I2S code so it works. There are many reports of I2S being more stable so this would be a benefit to the driver -- perhaps you can look into the Ethernet driver, but especially do a quick search for whether it used RMT.

I also think turning this over to the Olimex folks make sense. They have a forum where they reply, and they don't need hardware. One simply runs the test program, you don't need any particular hardware hooked up to the pins, and the ethernet stops functioning you say. They should be able to track that down.

And, oh, my Olimex boards arrived yesterday. I got 4 of them: Two with isolation, one without ( the little POE boards ), and one big EVB board like you've got. I was planning on getting one anyway because Ethernet is the obvious choice for LED control so these boards and FastLED would be a great choice for several projects I'm working on.

Thank you again for your effort on this --- we'll crack it!

@Pacheu
Copy link
Author

Pacheu commented Oct 12, 2020

Hi ! @bbulkow Thank you a lot for your enthusiasm ! I confirm that the I2S does not impact anything else and that it is just wrong linke you said.

Also, I am almost certain that Ethernet here does not use RMT module. I looked over the code, "rmt" does not appear anywhere in any file from esp_eth, and eth_phy + eth_phy_lan8720 files use driver/gpio or eth_phy special structures. Moreover, after initializing Ethernet with LAN8720, I used "rmt_driver_uninstall" on all possible channels and the log result said "No RMT for this channel", for all of them.

I created a post in OLIMEX's forum and linked this GitHub issue to it. A moderator, replied and advise to use other boards than EVB. He also explain and answered on the XTAL and chip questions you asked. I just put the link here so you can check it ! Link to Olimex forum post.

I also did some more test (and I found the source issue while doing my last test and writing this post):

  • I tried the same scenario with Arduino (original FastLED lib + esp32 libs, with or without WiFi or FreeRTOS core/priority stuff), and I tried with both RMT and I2S, I had no issue with Ethernet or LEDs. Just at some point, with a fast fps rate, I started to see some artifacts after my led num count, or a weird stop of a task, but that is all.
  • An the interesting test I did was to use ESP-IDF examples of RMT that also try to use LEDs stripes. The "led_strip" example. Using it as it is, I just changed the LED count, the pin, and added my Ethernet + WiFi initialization, it worked well with no Eth bug (only some LED artifact). BUT, I manage to reproduce the Ethernet stopping and see what was the "added" thing that brakes the Eth :
    • This example initializes the rmt driver for only one LED stripe: when creating the config structure for the rmt, it uses RMT_DEFAULT_CONFIG_TX with the gpio used for the stripe.
    • So, I wanted to use more than one stripe, and I did like it is done in FastLED code : put GPIO 0 in this config structure, and call rmt_set_pin before calling any write function. At this point, Eth broke.
    • But, when I put one of the Pins that I want to use instead of GPIO 0, and still call rmt_set_pin before writing to actually write to the true stripe, Eth never breaks !!!
    • I tried it also in your library, in "clockless_rmt_esp32.cpp", I changed in lines 279 and 285 the basic GPIO num to one of my used Pins, and this also works !!!
    • I now have no error : insufficient TX buffer prints, and I can ping while doing show().
    • ==> The issue in then caused by configuring the rmt driver on a channel with GPIO 0 as a first config pin, and GPIO 0 is EMAC_TX_CLK on ESP32.

What is really curious though is that Eth does not break until I do an actual show(), that is to say an actual rmt writing. But, before writing, the rmt_set_pin function is called and changes the pin. So we never write to GPIO 0, right ?

That does not resolve the I2S problem, but at least, I can now use RMT driver to write my LEDs and have no issue anymore =D .

@bbulkow
Copy link
Owner

bbulkow commented Oct 17, 2020

Wow, great sleuthing! The intended reason for the metaphor of setting the pin to "0" and initializing I've never liked. It is intended to allow a very large number of pins to be used without constantly changing over the PIN structure. It would seem we could change the code to have a "default pin", that is, the pin that we're going to configure as the "first pin", so it could be something other than 0, and thus at least give an option of not messing with someone else's driver.

But your research shows that there's likely to be even more issues, because by the time the 'show' is called, something must have remembered the PIN 0. If we're going to be strictly correct, then, we'd have to init the channel if it wasn't the old value before.

Let me see if I can at least add a section to the README about the issue, and create a config #define for "default pin" to allow changing one simple #define and allowing boards like this to work for others.

Thanks for the excellent sleuthing! I look forward to having my olimex boards work the first try :-)

@bbulkow
Copy link
Owner

bbulkow commented Oct 17, 2020

@Pacheu Hey! Thanks for the simple test code. It clearly shows the I2S fail. I'll follow up on the other incident when I have status.

@Pacheu
Copy link
Author

Pacheu commented Oct 21, 2020

Hi @bbulkow ! Yeah I did quite a lot of testing to find this one ! But, thank you for the guidance ! I am glad it helps you strengthen this library code. I hope you will get through the I2S pb, and wish you the best with Olimex boards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants