Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read serial line in chunks instead of character by character #164

Draft
wants to merge 2 commits into
base: ros2
Choose a base branch
from

Conversation

wentasah
Copy link

@wentasah wentasah commented Apr 1, 2023

This significantly reduces CPU load caused by this driver.

Without this change, CPU load on a powerful x86 machine with a USB-connected GPS is 36%. The output of the strace command looks like this:

...
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "$", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "G", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
read(33, "P", 1)                        = 1
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997180})
read(33, "R", 1)                        = 1
...

It can be seen that characters are read from the serial line one by one. This is due to how IOBase.readline() interacts with pySerial.

In this commit, we implement the same functionality as IOBase.readline(), but in a way allowing to read the serial line characters in bigger chunks. With it, CPU load decreases to 7.5% and strace reports this:

...
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=926286000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=906510472})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 422) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=906287000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=886945652})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 279) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=886721000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=866882570})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 136) = 136
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999996000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999991549})
read(33, ",,*57\r\n", 1024)             = 7
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999827000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=985949688})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 1017) = 143
pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=985746000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=966235438})
read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 874) = 143
...

Reading from the kernel happens most of the time in chunks of 143 bytes (for our GPS) and therefore, the system call overhead is 143× lower.

Tested on Septentrio mosaic-H GPS.

@wentasah wentasah force-pushed the fast-serial-read branch 3 times, most recently from 95edd26 to 239d727 Compare April 4, 2023 13:47
@wentasah wentasah marked this pull request as ready for review April 4, 2023 13:49
@evenator evenator self-assigned this Apr 7, 2023
@evenator evenator added enhancement Fix in Review A fix has been created. Awaiting original reporter's review. labels Apr 7, 2023
This significantly reduces CPU load caused by this driver.

Without this change, CPU load on a powerful x86 machine with a
USB-connected GPS is 36%. The output of the strace command looks
like this:

    ...
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
    read(33, "$", 1)                        = 1
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
    read(33, "G", 1)                        = 1
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997160})
    read(33, "P", 1)                        = 1
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999999000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999997180})
    read(33, "R", 1)                        = 1
    ...

It can be seen that characters are read from the serial line one by
one. This is due to how IOBase.readline() interacts with pySerial.

In this commit, we implement the same functionality as
IOBase.readline(), but in a way allowing to read the serial line
characters in bigger chunks. With it, CPU load decreases to 7.5% and
strace reports this:

    ...
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=926286000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=906510472})
    read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 422) = 143
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=906287000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=886945652})
    read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 279) = 143
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=886721000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=866882570})
    read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 136) = 136
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999996000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=999991549})
    read(33, ",,*57\r\n", 1024)             = 7
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=999827000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=985949688})
    read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 1017) = 143
    pselect6(35, [33 34], [], [], {tv_sec=1, tv_nsec=985746000}, NULL) = 1 (in [33], left {tv_sec=1, tv_nsec=966235438})
    read(33, "$GPRMC,,V,,,,,,,,,,N*53\r\n$GPGGA,"..., 874) = 143
    ...

Reading from the kernel happens most of the time in chunks of 143
bytes (for our GPS) and therefore, the system call overhead is 143×
lower.

Tested on Septentrio mosaic-H GPS.
@evenator
Copy link
Collaborator

I'm testing this locally and seeing some pretty problematic behavior due to the interaction of the timeout (2 seconds) and the buffer length (1024 char) for my device. The call to read() will always wait for 1024 characters or the full two seconds, which can hold several complete sentences in the buffer, introducing a lot of latency (i.e. up to 2 seconds).

Would GPS.read_until(size=1024) achieve the same performance improvement while avoiding the buffering latency?

@evenator evenator assigned wentasah and unassigned evenator Apr 19, 2023
This commit copies the implementation of seria.Serial.read() and
removes one line from it: "if timeout.expired():". With this change,
the function provides classical POSIX semantics of read(), which
returns as soon as some bytes are received. The implementation in
pySerial either waits for the full buffer or timeout expiration -
nothing in between. This means that with pySerial, you either pay big
overherd by reading character-by-character or have increased latency
due to waiting.

Unfortunately, there is no way to fix this in pySerial without
changing the semantics and possibly breaking some applications or
introducing a fixed function with a different name. Also, it seems
that pySerial is no loger maintained (last commit 16 months ago, PRs
without maintainer responses).

It would be much simpler not to use pySerial and use /dev/ttyXXX
directly. This project does not need the features offered by pySerial
(mainly abort read from another thread). The only problem of not using
pySerial would be lower portability.
@wentasah
Copy link
Author

Unfortunately, GPS.read_until would have the same overhead. I tried to come up with a better way of using pySerial API and there is no way how to use it without either overhead or latency.

In the new commit, I copied the implementation of read() from pySerial and fixed the problem by removing the line. With it, the both overhead and latency is low. I copy my commit message here:

This commit copies the implementation of seria.Serial.read() and removes one line from it: "if timeout.expired():". With this change, the function provides classical POSIX semantics of read(), which returns as soon as some bytes are received. The implementation in pySerial either waits for the full buffer or timeout expiration - nothing in between. This means that with pySerial, you either pay big overherd by reading character-by-character or have increased latency due to waiting.

Unfortunately, there is no way to fix this in pySerial without changing the semantics and possibly breaking some applications or introducing a fixed function with a different name. Also, it seems that pySerial is no loger maintained (last commit 16 months ago, PRs without maintainer responses).

It would be much simpler not to use pySerial and use /dev/ttyXXX directly. This project does not need the features offered by pySerial (mainly abort read from another thread). The only problem of not using pySerial would be lower portability.

What do you think? Do you want to use pySerial to stay compatible with many OSes? If you care only about Linux, I'd suggest getting rid of pySerial. I'll provide the implementation.

@ggaessler
Copy link
Contributor

ggaessler commented Apr 19, 2023

Wouldn't a good and simple solution be to just set the timeout to zero and the read size to something like the proposed 1024? That should be fine for performance and latency. That's also the way I used it for quite some years by now without any problems:
https://github.com/boschresearch/nmea_navsat_driver/blob/master/src/libnmea_navsat_driver/nodes/nmea_serial_driver.py#L177-L186
EDIT: The line separation would still be needed. I don't need it because I anyway process mixed binary and ASCII data from the receiver since I also enabled proprietary messages.

@wentasah
Copy link
Author

I think that this way, you would end up busy waiting for the serial line. The CPU load would always be 100%. At least this is what I saw when digging through the source code. Is your experience different?

@evenator
Copy link
Collaborator

evenator commented Apr 19, 2023 via email

@ggaessler
Copy link
Contributor

Sorry my changes were a long time back and I didn't check the full code again. Yes, there indeed was some additional sleep for a time short in GNSS, but long in CPU terms, by default 10 ms, just as @evenator proposed.
I've never tried setting the timeout parameter that way, but that could be the cleaner solution compared to my timeout zero plus sleep.

@wentasah wentasah marked this pull request as draft July 9, 2023 18:48
@evenator
Copy link
Collaborator

@wentasah Are you still working on this enhancement, or should I close the PR as abandoned?

@wentasah
Copy link
Author

wentasah commented Dec 29, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Fix in Review A fix has been created. Awaiting original reporter's review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants