Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoftUART module fixes and code simplification #3104

Merged
merged 7 commits into from
Sep 4, 2020

Conversation

galjonsfigur
Copy link
Member

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

This is a response to @TerryE comments about C Module coding standards(#1028). Just by using luaL_argcheck code became easier to read. It's still not the best possible implementation of Lua interface as I'm still trying to wrap my head around Lua C API and all available functions and macros.
Changes include: fixing multiple SoftUART rx instances, checking if interrupt was attached successfully and a bit more examples and fixes in documentation. The rest is just code simplification with use of @TerryE tips.

It's still a draft, as I would want to simplify it even more. Any feedback welcome.

@nwf
Copy link
Member

nwf commented Jun 12, 2020

Please rebase on the new dev so we can consider this for inclusion in the next release. :)

@nwf nwf added this to the Next release milestone Jun 12, 2020
- Simplify code by using lua_L* functions and using userdata properly
- Fix some edge-cases
- Add more examples to documentation
@galjonsfigur
Copy link
Member Author

@nwf I rebased my branch - now it should be fine. It's still rough draft because I'm sure that at least Lua API part of code could be simplified even more.

@NicolSpies
Copy link

@galjonsfigur, @vsky279 , I apologize for only getting to test this now. The feedback is as follows:

The previous test method detailed in #2996 was used:

  • I am using a digital scope to send RS485 data requests and replies via the softuart module using a MAX3485 RS485 tranceiver. The requests consists of 8 bytes and replies of 7 bytes, respectively.
  • The the transmitted and received data are confirmed using the scope and compared to the raw data transmitted / received by the softuart.

Test observations:

  • Data transmission does not show any problems.
  • The timing issue, detailed in Fixes to softuart module #2996 related to decoding of received bytes, is still present at a baud rate of 115200. The problem still manifests where the decoding of every second byte is incorrect due to the inclusion of the start bit of the 8N1 byte received as part of the 8 bit data. The first byte received is always decoded correctly, the second incorrectly etc.
  • Like reported before, if the baud rate is slowed to 57600, the decoding is fine and the start bit is never seen as part of the 8 bit data.
  • The tests were extended by sequentially transmitting and receiving 13 different requests and replies at a baud rate of 57600 .
  • The first reply of 7 bytes decodes correctly matching what is measured on the scope.
  • In all subsequent of the 13 replies received, an empty frame (0x00), is inserted in front of the data received.
  • All the tests and results have been confirmed to consistently provide the same results.

@galjonsfigur
Copy link
Member Author

@NicolSpies Thanks for extensive tests!
The best way to make data receive more robust would be moving to the approach similar to Arduino SoftwareSerial - having two interrupt handlers - one for per-bit data capture for lower baudrates and one for whole data frame capture for fast (>76800baud) transmissions (so the similar one softuart uses currently). Can you try swapping this fragment:

			// Wait till start bit is half over so we can sample the next one in the center
			if (elapsed_time < s->bit_time / 2) {
				uint16_t wait_time = s->bit_time / 2 - elapsed_time;
				while ((uint32_t)(asm_ccount() - start_time) < wait_time);
				start_time += wait_time;
			}

to

			// Wait till start bit is half over so we can sample the next one in the center
			uint16_t wait_time = (s->bit_time / 2) - elapsed_time;
			while ((uint32_t)(asm_ccount() - start_time) < wait_time);
			start_time += wait_time;
		

and report results? I will try to rewrite this ISR because now it has problems both in very low and very high baud rates and work on it in the near future but for now I don't have equipment at hand.

@NicolSpies
Copy link

@galjonsfigur, the swapped code fragment displays the same problem as before where the frame start bit is taken as the LSB of the data received in every second frame received at a baud rate of 115200. Like before at slower baud rates it does not occur.

The insertion of the empty frame before the received data still happens as described before at a baud rate of 57600. At slower baud rates the effect is bigger with 2 or 3 empty frames inserted in front of the received data.

@galjonsfigur
Copy link
Member Author

@NicolSpies Thanks for the feedback!
I will try to just rewrite and test this ISR because this one is just too big function for an ISR and not only it takes too many instructions to execute but also on Xtensa architecture there is certain interrupt latency, so every instruction counts.

@galjonsfigur
Copy link
Member Author

I can't get consistent testing environment for the RX module at the moment, but I thought I can be a good idea to make ISR a little smaller by moving part of the code to another function that gets scheduled by ISR with medium priority instead of low. Also now softuart object should de-register it's ISR when the object is GC-ed.
Again - I don't think I have enough skill to make this module as good as espsoftwareserial project that has support for inverted transmission, various parity modes and data frame length and using very interesting way of reading bits from frames.

@NicolSpies
Copy link

@galjonsfigur, 👍 , I will test tomorrow.

@NicolSpies
Copy link

@galjonsfigur , the latest change breaks the data received completely. The data is still transmitted correctly.
Before this change the module did not de-register, with the latest change the module appears to de-register.

I propose rolling back to the code before the latest changes but including the de-registration functionality. That would would enable me to run automated tests to verify the issues observed over many tests.

@galjonsfigur
Copy link
Member Author

I made a rollback of the ISR code - it should be just as before. It's really tricky to get this right - I will be looking for a better way to get correct timings and to get the ISR as small as possible.

@NicolSpies
Copy link

@galjonsfigur , I have tested and have feedback maybe we need to do the feedback offline not to clutter the list. My email is in my github profile.

@NicolSpies
Copy link

@galjonsfigur , the rollback performance is as described before with some improvement allowing the module to be also usable when node.setcpufreq(node.CPU.160MHZ) is used. Like before the first frame decode after module initialization is always correct. In most of the following frames a empty packet (0x00) is added, now and again two empty packets are added and sometimes a packet decodes correctly like the first packet. By using error checking on the received packets with re-request of data it is possible to consistently use the module as is.

The blocking problem is a memory leak. The softuart instance is not de-registering preventing the GC to free the instance memory.

@galjonsfigur
Copy link
Member Author

@NicolSpies 👍 Thanks for checking!
I'm still not sure why empty packets get into the buffer and I still cannot investigate it, but having not the most reliable UART is still better than not having any. Maybe someone with more knowledge on ESP8266 interrupts and C knowledge will notice something.

The blocking problem is a memory leak. The softuart instance is not de-registering preventing the GC to free the instance memory.

What do you mean exactly? Now because softuart_t struct is allocated as Lua userdata it should be automatically freed after calling softuart_gcdelete, where additional global values are zeroed. I haven't noticed any memory leak nor I can see any in the code itself. But I tested it only on Lua 5.3 and before calling node.heap() I called collectgarbage() to get clean results.

@NicolSpies
Copy link

@galjonsfigur , By using

r=debug.getregistry()
for k,v in pairs(r) do print(k,v) end

I have identified an extra function in registry on every iteration of my function using the softuart module.
By using r=debug.getregistry(); r[17]() where 17 is the function to verify, I have traced the extra function on every iteration to the sofuart.port:write() method not deregistering for garbage collection.

}
// Register callback
softuart_rx_cb_ref[softuart->pin_rx] = luaL_ref(L, LUA_REGISTRYINDEX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if() { unref } ref dance can be simplified to @TerryE's new luaL_reref function:

luaL_reref(L, LUA_REGISTRYINDEX, &softuart_rx_cb_ref[softuart->pin_rx])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nwf 👍 Thanks for looking into it - I added this change and unref2 macro in the newest commit.

Copy link

@NicolSpies NicolSpies Jul 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galjonsfigur , I apologize, no leak in the module but in my code, a local timer for some reason does not de-register, a simple tmr=nil did the trick. Thanks for helping.

@NicolSpies
Copy link

@galjonsfigur, update regarding testing of the empty frame issue reported.

The problem is not present, no empty frames, when changing only the ESP8266 module and keeping everything else consistent.
This might point to the softuart timing somehow dependent on ESP module hardware.
I will confirm this going forward testing using different modules.

@marcelstoer marcelstoer requested a review from nwf September 1, 2020 20:01
Copy link
Member

@nwf nwf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor nits, but I'm not in a position to test this.

volatile softuart_buffer_t buffer;
uint16_t bit_time;
uint16_t need_len; // Buffer length needed to run callback function
char end_char; // Used to run callback if last char in buffer will be the same
uint8_t armed;
uint8_t pin_rx;
uint8_t pin_tx;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these moved? Does it reduce padding within the structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the intent - I'm not sure if this would help, but having the buffer as a first element of the struct should make access to it a bit easier and maybe instruction or two faster - but I haven't tested it.

docs/modules/softuart.md Show resolved Hide resolved
@galjonsfigur
Copy link
Member Author

@NicolSpies This is very interesting - this would mean that some ESP modules have greater interrupt latency than the others. I'm not really sure what I could do about it - I will look into some other ISR tricks that other authors have been using to make bit-banged transmission more reliable, but I didn't expect that it would differ so much depending on the hardware. Will look into that - but maybe not in this PR.

@NicolSpies
Copy link

@galjonsfigur, since posting I have confirmed it on 3 units, from the same production batch. Another possibility is that the original unit used for testing is suspect.

@marcelstoer
Copy link
Member

As this has gone through many iterations and heavy testing it should now be ready for prime -> merging

@marcelstoer marcelstoer merged commit e7620b0 into nodemcu:dev Sep 4, 2020
vsky279 pushed a commit to vsky279/nodemcu-firmware that referenced this pull request Nov 21, 2020
* SoftUART fixes:

- Simplify code by using lua_L* functions and using userdata properly
- Fix some edge-cases
- Add more examples to documentation

* Don't de-register interrupt hook if there is more RX instances

* More bug fixes and registering simplification with luaL_reref and unref2

* Correct documentation of SoftUART module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants