Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRaWAN unprocessable messages also contain unknown messages #3520

Open
BobClaerhout opened this issue Jul 25, 2023 · 6 comments
Open

LoRaWAN unprocessable messages also contain unknown messages #3520

BobClaerhout opened this issue Jul 25, 2023 · 6 comments

Comments

@BobClaerhout
Copy link
Contributor

BobClaerhout commented Jul 25, 2023

The metric hono_telemetry_payload_bytes_count contains a status field. This field is either forwarded, unprocessable or undeliverable. We added prometheus alerts to follow up unprocessable messages in the lora adapter. However, in the lora adapter we are getting status unprocessable when the LNS is providing a message that is not an uplink message. This is completely different from an actual uplink message that is unprocessable because of any other unexpected exception. For this we would like to add an additional status: unknown, indicating that the message we received is an unknown message and thus not implemented which is fundamentally different from unrpocessable messages.

BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
@sophokles73
Copy link
Contributor

sophokles73 commented Jul 25, 2023

I would argue that there might be different reasons for why a message cannot be processed. The message might not adhere to the supported/expected syntax or there might be a problem while processing the message due to e.g. a failing back end service. In both cases the messages are unprocessable. So, what we are talking about here seems to be having a way to also indicate the reason why a message could not be processed.

Having a fourth status unknown does not really serve this purpose FMPOV as it seems to suggest that the message does not fall into the unprocessable category (which it still does). Also, changing the semantics of the unprocessable status in this way may lead to existing alarms failing to detect this situation, if the corresponding messages no longer are being tagged as unprocessable.

Another less intrusive solution might be to add another tag for indicating the cause explicitly. This tag could then have values like bad-syntax, unsupported etc. while the status tag would still have value unprocessable

WDYT?

@BobClaerhout
Copy link
Contributor Author

I agree with your reasoning.

I would suggest reason to be the additional tag.
Furthermore, I would suggest following values:

  • bad-syntax: json could not be converted
  • unknown-type: type of the message could not be deducted
  • unsupported-type: type of the message is not supported for uplink
  • unknown: reason could not be deducted from the error

BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Jul 25, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
@BobClaerhout
Copy link
Contributor Author

@sophokles73
Copy link
Contributor

We should also consider whether we only want to introduce the reason tag to the metrics reported by the Lora adapter or if this is relevant to all adapters. IMHO this is mostly relevant to the Lora adapter at the moment because it is the only adapter that actually inspects/parses the message payload. All other adapters simply forward the payload as-is.

WDYT?
@calohmn any opinion on this?

@calohmn
Copy link
Contributor

calohmn commented Aug 12, 2023

The implementation in #3519 already adds the reason tag for reportTelemetry invocations of other adapters as well, and I think that makes sense. I think it would also be useful to extend the reportCommand method accordingly (in a separate PR).

BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Aug 24, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Aug 24, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
BobClaerhout added a commit to BobClaerhout/hono that referenced this issue Sep 12, 2023
Signed-off-by: Bob Claerhout <claerhout.bob@gmail.com>
@sophokles73
Copy link
Contributor

sophokles73 commented Sep 15, 2023

@BobClaerhout
I still do not like the idea of putting concrete reasons for (adapter specific) failures into the general metrics tags. All of the information you want to put into the reason tag is available from the traces. Many of the reasons seem to have meaning only in the context of the Lora adapter (bad-syntax, unknown-type, unsupported-type) because Hono itself is payload agnostic while for the others I have a hard time seeing why I would need to count their occurrences. Is this because you want to detect during runtime that some devices have not properly been implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants