ESP32 I2S 3-step Cadence Faster encoding #855
Replies: 2 comments 2 replies
-
Make note of the endianness of variables. I believe I ran into one of the ESP32 (c3?) uses a different core (arm) and the endianness layout was different, part of the reason the move to working only in bytes was to reduce complexity and fix some bugs around this. So, you need to test all the platforms to make sure nothing appears. |
Beta Was this translation helpful? Give feedback.
-
I have created a pull request as part of the DMX512 encoding for the faster 3 step encoding, and i have added the use of big endian byte order in it as well, although i have not found any boards that use it and on the arduino forum no one knew of one that uses it, but if the macro is defined, it should work just fine. I did not go as far as providing for the (new to me) pop-endian. Anyway it's there, have a look if you have time. |
Beta Was this translation helpful? Give feedback.
-
With the new 3-step encoding for the ESP32 I2S, the trade-off is for less memory use vs encoding speed. Now as i was busy encoding the DMX512 method which was, as a by product, much more easily to add for me., it occurred to me that the 3-step encoding could be significantly quicker. So i decided to have a go. My first idea was to use a double (32-bit) lookup table, similar to the 4-step, but double since the 12-bit result would need to be at least eventually have to result in a multiple of 8-bits. The way to speed things up from the current method was to remove as many bit-shifts as possible. Bit-shifts are relatively slow, and shifting a variable 8 bits either way takes 8 times the amount of time as shifting it 1 bit (unlike for instance additions where x + 8 takes as much time as x +17)
I was even looking for a way to directly read the High nibble from the source byte. If i remember correctly the Z80 had a specific instruction to do just that, but that probably doesn't exist on a modern ESP anymore. So anyway i came up with this
First i was testing the resulting bit pattern on my UNO and comparing them to what the current encoding produces, and after some fiddling with the memcpy() pointers i got it to match. And a quick speed comparison showed great promise.
Then another thought occurred to me, to get rid of the memcpy() and directly assign into 16-bit variables and use 6 x 16-bit lookup tables.
Again a bit of fiddling to get it right, but it appeared to marginally slower than the first attempt.
So i migrated to the ESP32 (which unlike the UNO is not always on my desk) and performed a speedtest using this sketch
And i the results (using 160MHz clockrate)
Conclusion. The first attempt at 3-step encoding is marginally quicker than the 2nd attempt and is more than 5x as fast as the current method. With small pixelbuffers it is less than twice as slow as the 4-step, and with large buffers it is almost as fast as the 4-step. The temporary memory demand for it is a bit more than the 4-step ( 2 x 16 x 32-bit lookup vs 16 x 16-bit lookup tables = 96 bytes more in lookup tables)
The quickest is of course 256 size lookup tables, but that seems excessive, wasting a whole KB on them.
Anyway i thought i'd share it. I'll get the whole cloning and branch thing sorted soon.
Beta Was this translation helpful? Give feedback.
All reactions