Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex pseudo converter failing misbehaving? #97

Closed
hinxx opened this issue Mar 7, 2024 · 15 comments
Closed

regex pseudo converter failing misbehaving? #97

hinxx opened this issue Mar 7, 2024 · 15 comments

Comments

@hinxx
Copy link

hinxx commented Mar 7, 2024

I have a device sending an ASCII string with fields of different types (strings, chars, decimal integer, hex integer,..), delimited by ':'. I want to parse it out into several records. A sample input might look like this: 66:+44:255:0x122:1:R:190:A03. FWIW, values '66', '+44' and '190' may change from one message to another; and take up 1 - 5 chars each.

If I do not use regex pseudo converter to convert 1 char value to a string all looks fine with this protocol:

in "%*i:%(\$1)d:%*i:0x%(\$2)03x:%(\$3)d:%(\$4)c:%(\$5)i:%(\$6)s";

$4 is setting a value of a stringin record with a single char.

Then I discovered regex pseudo converter and hoped I could convert a single char into more representative string and set a value of mbbi record, instead.
Here is the protocol that does that:

in "%#/:R:/:Idle:/%#/:M:/:Ramping:/%#/:F:/:Fault:/%*i:%(\$1)d:%*i:0x%(\$2)03x:%(\$3)d:%(\$4)[a-zA-Z0-9]:%(\$5)i:%(\$6)s";

It works .. most of the time (no errors reported), but quite often I do see an error:

2024/03/07 10:39:16.783738 A03 Status: Input "A03" does not match format "%s"

Status PV is being processed every 2 s and sometimes the error happens and sometimes not. The error is always about the last token that needs to be parsed.

I've attached two dumps of debug messages, first one for successfully parsed string and the second one for the failed.
stream-parsing.txt

From what I can tell the buffer that regex touches seems to overflow (sometimes?). For example this would be reported in debug for failing case:

buffer after: A03:+11:255:0x122:1:F:12585:A033<00><00><00><00><00><00><00><00>

while this is seen in case of success:

buffer after: A03<00><00><00><00><00><00><00><00>
@dirk-zimoch
Copy link
Member

If you simply want to have more meaningful strings in an mbbi record, I suggest to use the %{} enum converter. It reads (the ugly) strings, and converts them to integers. Then the mbbi can take that integer and convert it to "nice" strings.
E.g. %{R|M|F} together with an mbbi with ZRST="Idle", ONST="Ramping" and TWST="Fault" should give you the conversion you need.
When using regular expression substitution, keep in mind that it converts the whole rest of the input. That means, if you have :R: anywhere in your input, it will be replaced with :Idle:. Make sure that this has no unintended side effects.
The error confuses me, because "A03" should match %s. I will have a closer look at the debug output...

@dirk-zimoch
Copy link
Member

The buffer you see is the 40 char string that should store the value "A03" before passing it to the record with dbPutField.
It looks as if during conversion new data is appended.
After the regex substitutions, the input buffer was 0:+11:255:0x122:1:Fault:12585:A03, which looks correct. Then the initial 7 conversions have been successful: %*i ignores the 0, %d reads the +11, %*i ignores the 255, %03x reads the 0x122 (which looks like a bug: It should read only 3 characters: '0x1'), %d reads the 1, %[a-zA-Z0-9] reads the Fault and finally %d reads the 12585.
Finally, %s should read the A03. However it seems that meanwhile new data has been added to the input buffer. Now %s reads A03:+11:255:0x122:1:F:12585:A033, i.e. the remaining A03 plus new data (If it was new input, it should start with 0:+11. Where did the initial 0 go? And where does the final 3 come from?). At the end, %s consumes 32 characters but was originally called when only 3 characters were left. That inconsistency causes the error.

I need to check what happens here. It does not seem to be new input, because I see no second readHandlercall in between. Maybe something wrong happens with the buffer?

My string buffers work like a circular buffer. New data is appended at the end and consumed data simply moves the start index without moving actual content. Only when the buffer would overflow when new data is appended, remaining valid content is copied from the end to the beginning. It seems that it is not properly terminated here but merges with old content. I need to check this. Maybe related to #96.

I need to dig through my code.
Anyway, thanks for spotting and reporting this!

@hinxx
Copy link
Author

hinxx commented Mar 7, 2024

Thanks for the input! I'll look at the enum converter.
I did a quick test by doing regex pseudo conversion from R|M|F into X|Y|Z (aka single char conversion) and I do not see any errors.

When using regular expression substitution, keep in mind that it converts the whole rest of the input.

Right, I did notice that hence the :<char>: in the regex. I also know what can be in the whole string so regex part is solid, IMHO.

However it seems that meanwhile new data has been added to the input buffer.

My guess is that it is just some junk at the end of the buffer that was there before. The Status is processed at a very slow rate (2 s) and I would not expect new string to arrive until this one was processed. In fact I think I can provoke this behavior even by doing manual / passive scan.
I'll need to look in more closely what follows the A03, but I can tell right now it is not always an ASCII string (ie. I've seen nonASCII chars (and not \0 following the A03 which makes me think would break %s.

It seems that it is not properly terminated here but merges with old content.

Sounds about right. Been digging a little bit myself..

@dirk-zimoch
Copy link
Member

Maybe StreamBuffer::replace has a bug if the inserted string is longer than the removed one. I am checking this right now...

@dirk-zimoch
Copy link
Member

Can't find anything wrong. So this may take a while...

@hinxx
Copy link
Author

hinxx commented Mar 7, 2024

As demonstrated in the PR #98 I think there are two issues at play, AFAICT. The PR is too dirty for the merge and I think you can come up with better/cleaner more complete fix.

@dirk-zimoch
Copy link
Member

Thanks a lot! I will have a look.

@dirk-zimoch
Copy link
Member

I think I found the actual bug.

if (newlen+offs<=cap)

This needs to be a < instead of <=.
A <= may not leave the terminating 0x00 intact if the new content would just fit the buffer capacity, because the terminating 0x00 is not included in len/newlen.
Can you test if that fixes your problem?
(And the following // comment should be in the else block. I will correct that, too.)
I will push an update once you confirm that is fixes the problem.
Thank you for your debugging and analysis which lead me to this!

@dirk-zimoch
Copy link
Member

dirk-zimoch commented Mar 19, 2024

Found another, related bug in

if (newlen<len) memset(buffer+newlen, 0, len-newlen);

This needs to use len+offs instead of len because content is moved from offs to 0.
This may be the actual bug that you have seen. If content grows (newlen>len) and is moved to buffer start, there is still old content to clear up to (old) offs+len if newlen is shorter than off+len.

@hinxx
Copy link
Author

hinxx commented Mar 19, 2024

This needs to use len+offs instead of len because content is moved from offs to 0.

You mean like this?

if (newlen<len) memset(buffer+newlen, 0, len+offs-newlen);

@dirk-zimoch
Copy link
Member

if (newlen<len+offs) memset(buffer+newlen, 0, len+offs-newlen);

@dirk-zimoch
Copy link
Member

How the buffer looks like before and after:

|---junk---|##content_start##|/////////remove_this////////|##content_end##|0000|
|<- offs ->|<-- remstart --->|<--------- remlen --------->|               |    |
|          |<--------------------- len ---------------------------------->|    |
0         offs           offs+remstart                offs+remend    offs+len cap

                                                                  |<clear>|
|##content_start##|++++++++inserted_text++++++++++|##content_end##|0000000|0000|
|<--------------------- newlen ---------------------------------->|       |

@hinxx
Copy link
Author

hinxx commented Mar 19, 2024

Got it. Changed and running the test. Preliminary results look good, let me run this for a couple of hours.

@hinxx
Copy link
Author

hinxx commented Mar 20, 2024

I've been with running the fix over the night and I can confirm that the above two changes solve the issue for me.
Thank you!

@dirk-zimoch
Copy link
Member

Fixed in commit 1496089 tagged 2.8.25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants