Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return erring response if Lightpush content topic is empty #2083

Merged
merged 1 commit into from
Oct 9, 2023

Conversation

s-tikhomirov
Copy link
Contributor

Description

The Lightpush PFC prohibits empty content topics. The implementation did not reflect this. This PR fixes this: the server now checks if the content topic is empty, and if it is, sends an erring response to the client (no interaction with the P2P network occurs). A new test is added to test this behavior.

Note: this PR only considers the "empty content topic" erring case. There are other erring cases (e.g., decoding error), for which the server does not return a response. I think these are to be addressed in a separate issue / branch.

P.S. I wanted to put a link to the RFC that says that content topic must not be empty, but cannot find it (apparently, it's not 19/WAKU2-LIGHTPUSH). Can anyone point to the spec that specifies this restriction?

Changes

  • in Lightpush protocol: check if the content topic is empty and return an erring response if so
  • in Lightpush tests: add a test that tests this behavior (existing tests are not modified).

How to test

./env.sh bash
nim c -r ./tests/test_waku_lightpush.nim

Issue

closes #1641

@github-actions
Copy link

github-actions bot commented Sep 26, 2023

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2083

Built from c1515c6

Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you!

@s-tikhomirov
Copy link
Contributor Author

I see that two CI tests are failing...

2023-09-26T15:23:36.7503236Z   1) Waku Light Push [node only] - custom pubsub topic
2023-09-26T15:23:36.7503689Z        Push message:
2023-09-26T15:23:36.7504342Z      TypeError: Cannot read properties of undefined (reading 'toString')
2023-09-26T15:23:36.7505745Z       at Context.<anonymous> (file:///home/runner/work/nwaku/nwaku/packages/tests/tests/light-push/custom_pubsub.node.spec.ts:43:39)
2023-09-26T15:23:36.7510507Z       at processTicksAndRejections (node:internal/process/task_queues:95:5)
2023-09-26T15:23:36.7511700Z 
2023-09-26T15:23:36.7512046Z   2) Waku Light Push [node only]
2023-09-26T15:23:36.7512614Z        "before each" hook for "Push message with short payload":
2023-09-26T15:23:36.7513932Z      Error: Timeout of 15000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/runner/work/nwaku/nwaku/packages/tests/tests/light-push/index.spec.ts)
2023-09-26T15:23:36.7514733Z       at listOnTimeout (node:internal/timers:569:17)
2023-09-26T15:23:36.7515243Z       at processTimers (node:internal/timers:512:7)

The second error may be connected to the fact that in the test that I've added ("request with an empty content topic should fail") I got rid or asynchronicity because a) it caused some other issue; b) it's not really needed. In this case, there is no "the server pushes a message to the network, let's wait what comes back" scenario. The server rejects the request before interacting with the Relay network. Hence, my test has no handlerFuture.

I'm not sure how to explain the first failing test.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the approach here of trying to have a default response is the right one. Changes requested is that the bug is about an unset field and RPC decoding failure (not content topic being an "empty string"). My suggestion is to first try to trigger this error condition in the unit test and then make sure that these failure cases also lead to a properly constructed response with error message. The same condition is triggered when the WakuMessage is encoded in any other non-supported way and can't be decoded, as reported in #2059. This PR should also close that issue.

Comment on lines 62 to 65
if contentTopic == "":
response = PushResponse(is_success: false, info: some(emptyContentTopicFailure))
waku_lightpush_errors.inc(labelValues = [emptyContentTopicFailure])
error "empty content topic", error=emptyContentTopicFailure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite interpret the bug in #1641 in this way. The issue is not that the content topic is an "empty string" it's that that field is unset in the protocol buffer, resulting in a wrongly encoded RPC that fails to decode. The same issue states:

Send a light push rpc request that contains a missing protobuf field such as content topic

So it's not related to the content topic specifically, but any missing field that causes an rpc decode failure. Since we still return in lines 45 and 51 on invalid RPC (protobufs), afaict the original issue won't be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, now I think i understand the issue properly!

If we want to return a PushResponse after a decoding failure, what should its requestId be set to? Normally, it is copied from the corresponding field in PushRequest, but if the request cannot be decoded properly, we can't count on that. Can we just omit the requestId field in that case? It is not wrapped in Option in the definition of PushRPC... Or shall we use some default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm. This is a good point. I would actually guess that in almost all cases decoding fail because of an inner field (either PushRequest or WakuMessage) failing to decode, rather than the outer PushResponse. In other words, it may be possible to change to a two stage decoding process where the requestId can be extracted before attempting to decode the encapsulated request.

However, for now this may be overkill and the best option is probably to initialise the requestId as an empty string.

@fryorcraken @danisharora099 since the PushRPC and PushRequest fail to decode properly in this bug, it's difficult to extract a requestId. Would a decent solution (from js-waku's perspective) be to return a well-formed response, with error code, but with the requestId set to an empty string?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fryorcraken @danisharora099 since the PushRPC and PushRequest fail to decode properly in this bug, it's difficult to extract a requestId. Would a decent solution (from js-waku's perspective) be to return a well-formed response, with error code, but with the requestId set to an empty string?

This is a fair solution IMO.

We currently do not use the requestId, we are able to assume the response is on the same multiplex stream than the request (or at least it translates this way in js-libp2p API).

client = newTestWakuLightpushClient(clientSwitch)
serverPeerId = serverSwitch.peerInfo.toRemotePeerInfo()
message = fakeWakuMessage(contentTopic="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is that an empty response is received, not on content topic being an empty string, but that the Waku Message and PushRPC is constructed without this field and decoding fails.

@jm-clius
Copy link
Contributor

P.S. I wanted to put a link to the RFC that says that content topic must not be empty

Ah, I think it's not up to the lightpush service node to interpret this. In other words, it's not about content topics, but about properly constructed protocol buffers and fields. The issue is that the message being published MUST be a well-formed WakuMessage (as in https://rfc.vac.dev/spec/14/). If for any reason this message field (or other field in the PushRequest) fails to decode properly, we should return a proper response.

@s-tikhomirov
Copy link
Contributor Author

s-tikhomirov commented Sep 28, 2023

Thinking more about this, I feel that I don't grasp some fundamentals of what's happening here.

What exactly does it mean to "decode" a message? In particular, how can the following line not fail, if there is no, say, contentTopic inside message?

message = req.request.get().message

After all, the left-hand-side message is type-inferred to WakuMessage, and WakuMessage must have a content topic by definition... How come a statically typed language allows us to put a value into a variable that is not guaranteed to be compatible with that variable's type?

UPD: More precisely, I don't understand why the following code compiles:

message = WakuMessage(
  payload: newSeq[byte](),
  #contentTopic: DefaultContentTopic,
  meta: newSeq[byte](),
  version: 2,
  timestamp: now(),
  ephemeral: false
)

Now I understand it; see my next comment.

@s-tikhomirov
Copy link
Contributor Author

s-tikhomirov commented Sep 28, 2023

Experimenting a bit more, I am not sure I understand how to reproduce the behavior in question. AFAICT, Nim assigns default values to object fields if they are not provided. I modified a playground example with a simplified version of WakuMessage (live plauground):

import strformat

type WakuMessage = object
  contentTopic: string
  ephemeral: bool

let messages = [
  WakuMessage(contentTopic: "SomeTopic", ephemeral: true),
  WakuMessage(contentTopic: "DefaultEphemeral"),
  WakuMessage()
]

for message in messages:
  echo(fmt"Message for content topic {message.contentTopic} is ephemeral: {message.ephemeral}")

The result is:

Message for content topic SomeTopic is ephemeral: true
Message for content topic DefaultEphemeral is ephemeral: false
Message for content topic  is ephemeral: false

My point is: how do I generate a WakuMessage with a missing field (which is not the same as a field equal to an empty string)? If it is not possible, then how should I reproduce the behavior we're trying to fix?

@jm-clius
Copy link
Contributor

jm-clius commented Sep 28, 2023

The problem here is not related to any fields "missing" inside a Nim object - as you say, during construction of the object default values will be filled into each field regardless if initialised or not (sidenote is that ref types can indeed be nil, but this is not relevant to this issue). The issue is about reading serialised bytes from the wire (as received from a remote client) that must adhere to a protocol buffer specification and attempting to parse these with our protocol buffer deserialisation (or "decoding") into a corresponding Nim object. We explicitly ensure that these (protobuf) fields adhere to specification when decoding before the Nim object (e.g. WakuMessage) is fully instantiated. If during this parsing process we encounter a violation, we explicitly raise an error and the object does not get instantiated.

For example, this line:

let reqDecodeRes = PushRPC.decode(buffer)

attempts to decode the raw bytes from the wire according to the protobuf definition in https://rfc.vac.dev/spec/19/#payloads.
This in turn attempts to decode raw bytes into WakuMessage using this codec:
proc decode*(T: type WakuMessage, buffer: seq[byte]): ProtobufResult[T] =

However, if WakuMessage does not contain a content topic, the decoding stops and returns an error
if not ?pb.getField(2, topic):
return err(ProtobufError.missingRequiredField("content_topic"))
else:
msg.contentTopic = topic
.

True enough, this is not immediately easy to reproduce, as creating a WakuMessage Nim object first and then serialising it to protobuf (to use as test vector), means that you will always have well-constructed, parseable protobuf bytes. One possible way of reproducing, would be to simply generate a set of raw bytes that should clearly fail to decode as a proper PushRPC (i.e. just garbage), making sure that it does fail to decode and then proving that we can now send a proper response when decoding failure occurs.

From the perspective of lightpush it's not really necessary to test that decoding fails for every type of spec violation (i.e. we do not have to test for missing content topic specifically - that should be tested with the rest of the WakuMessage codec tests). The important thing here is simply to test that if decoding fails (for whichever reason), we return a proper response.

@s-tikhomirov
Copy link
Contributor Author

simply generate a set of raw bytes

Makes sense. It's not clear yet how to use these raw bytes. I can no longer use client.publish(), as passing raw bytes as message doesn't type-check, and rightly so:

let requestRes = await client.publish(topic, message, peer=serverPeerId)

Similarly, wl.sendPushRequest() expects a PushRequest and not raw bytes:

return await wl.sendPushRequest(pushRequest, peer)

Going down the call stack, it seems that here I can insert random bytes instead of rpc.encode().buffer:

await connection.writeLP(rpc.encode().buffer)

I'm unsure though, what is the proper way to bring this logic into the test? Copy-pasting the whole sendPushRequest from client.nim to the test with one minor modification (push raw bytes instead of a valid encoding) just feels wrong. Is there a better way?

@jm-clius
Copy link
Contributor

Mmm. I see your point. I think this is a good indication that our code is not very unit testable (a sign that concerns aren't well separated).

I wouldn't test this from the client perspective, but rather try to test the lightpush (server-side) protocol directly. One way to do this is to extract the code inside the handler

https://github.com/waku-org/nwaku/blob/master/waku/waku_lightpush/protocol.nim#L39-L77

that returns a PushRPC response into a separate function, taking the raw bytes as argument and returning a PushRPC response. You could then simply unit test this new function. The benefit is that the logic will now be much better separated and maintainable. The handle call will then become something like:

proc handle(conn: Connection, proto: string) {.async, gcsafe, closure.} =
    let buffer = await conn.readLp(MaxRpcSize.int)
    let response = handleRequest(buffer)
    await conn.writeLp(response.encode().buffer)

with handleRequest being unit testable with any raw buffer.

WDYT of this option?

@s-tikhomirov
Copy link
Contributor Author

Thank you @jm-clius for the tip, I agree with the approach.

taking the raw bytes as argument and returning a PushRPC response

It should also probably take a WakuLightPush as an argument, as producing a response implies, among other things, pushing a message into the network? (If decoding is successful, that is.)

Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is good but for the style here's some tips.

@s-tikhomirov
Copy link
Contributor Author

minimize rightward drift

What would be the best way to reconcile this with avoiding early returns? My idea was that as handleRequest progresses with the checks, it sets the appropriate values for the error message, but the actual response object with that error message is just constructed just once, in the end. I want something like: "if decoding fails, set error message to decoding error and skip other checks". Goto-like behavior, if you will. What's the Nim way to achieve this?

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Some minor comments and nitpicks below.

waku/waku_lightpush/protocol_metrics.nim Outdated Show resolved Hide resolved
tests/test_waku_lightpush.nim Outdated Show resolved Hide resolved
waku/waku_lightpush/protocol.nim Outdated Show resolved Hide resolved
waku/waku_lightpush/protocol.nim Outdated Show resolved Hide resolved
waku/waku_lightpush/protocol.nim Outdated Show resolved Hide resolved
@SionoiS
Copy link
Contributor

SionoiS commented Oct 3, 2023

minimize rightward drift

What would be the best way to reconcile this with avoiding early returns? My idea was that as handleRequest progresses with the checks, it sets the appropriate values for the error message, but the actual response object with that error message is just constructed just once, in the end. I want something like: "if decoding fails, set error message to decoding error and skip other checks". Goto-like behavior, if you will. What's the Nim way to achieve this?

This avoids var response and is more compact. Also, no need for try/catch, PushMessageHandler returns an error.

let response =
      if contentTopic == "":
        error "empty content topic", error=emptyContentTopicFailure
        waku_lightpush_errors.inc(labelValues = [emptyContentTopicFailure])
        PushResponse(is_success: false, info: some(emptyContentTopicFailure))
      elif (let res = await wl.pushHandler(conn.peerId, pubsubTopic, message); res.isErr()):
          waku_lightpush_errors.inc(labelValues = [messagePushFailure])
          error "pushed message handling failed", error=res.error
          PushResponse(is_success: false, info: some(res.error))
      else: PushResponse(is_success: true, info: some("OK"))

@s-tikhomirov
Copy link
Contributor Author

minimize rightward drift

What would be the best way to reconcile this with avoiding early returns? My idea was that as handleRequest progresses with the checks, it sets the appropriate values for the error message, but the actual response object with that error message is just constructed just once, in the end. I want something like: "if decoding fails, set error message to decoding error and skip other checks". Goto-like behavior, if you will. What's the Nim way to achieve this?

This avoids var response and is more compact. Also, no need for try/catch, PushMessageHandler returns an error.

let response =
      if contentTopic == "":
        error "empty content topic", error=emptyContentTopicFailure
        waku_lightpush_errors.inc(labelValues = [emptyContentTopicFailure])
        PushResponse(is_success: false, info: some(emptyContentTopicFailure))
      elif (let res = await wl.pushHandler(conn.peerId, pubsubTopic, message); res.isErr()):
          waku_lightpush_errors.inc(labelValues = [messagePushFailure])
          error "pushed message handling failed", error=res.error
          PushResponse(is_success: false, info: some(res.error))
      else: PushResponse(is_success: true, info: some("OK"))

(Judging by if contentTopic == "", it seems to me that this code is modified based on an outdated commit, but anyway.)

We don't only generate a PushResponse here, but also we need to extract the requestId (but only if the buffer is decoded properly), and log the error (if it occurs). With that in mind, I'm not sure moving if-elif-else into let response = applies well here: we still need to remember what error occurred to log it later. And as long as we're doing that, why not also construct PushResponse in the end as well (before it is in turn used to construct PushRPC).

@SionoiS
Copy link
Contributor

SionoiS commented Oct 4, 2023

(Judging by if contentTopic == "", it seems to me that this code is modified based on an outdated commit, but anyway.)

We don't only generate a PushResponse here, but also we need to extract the requestId (but only if the buffer is decoded properly), and log the error (if it occurs). With that in mind, I'm not sure moving if-elif-else into let response = applies well here: we still need to remember what error occurred to log it later. And as long as we're doing that, why not also construct PushResponse in the end as well (before it is in turn used to construct PushRPC).

Feel free to ignore it was just a suggestion.

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great indeed! I simply add a recommendation to enrich a little bit the message in case or error.

waku/waku_lightpush/protocol.nim Outdated Show resolved Hide resolved
tests/test_waku_lightpush.nim Show resolved Hide resolved
Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

This PR may contain changes to configuration options of one of the apps.

If you are introducing a breaking change (i.e. the set of options in latest release would no longer be applicable) make sure the original option is preserved with a deprecation note for 2 following releases before it is actually removed.

Please also make sure the label release-notes is added to make sure any changes to the user interface are properly announced in changelog and release notes.

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

@s-tikhomirov
Copy link
Contributor Author

I tried to retroactively sign my older commits (which GitHub required) but apparently messed something up. Now all my commits in this branch and a prior commit by @vpavlin are squashed together.

Shall I squash-and-merge anyway? If so, should a commit message describe my changes on top of the older changes squashed here? The whole thing looks ugly, I must admit, and I'm sorry for that.

@alrevuelta
Copy link
Contributor

Shall I squash-and-merge anyway? If so, should a commit message describe my changes on top of the older changes squashed here? The whole thing looks ugly, I must admit, and I'm sorry for that.

The diff doesn't look right + afaik there is nothing to squash, i can see just 1 commit. Also it seems to be 30 commits behind master. So I would start with rebasing the PR :)

@s-tikhomirov s-tikhomirov force-pushed the fix-lightpush-empty-response branch from 3fb5a2b to cdfb72c Compare October 6, 2023 15:23
@s-tikhomirov
Copy link
Contributor Author

I would start with rebasing the PR :)

That was a great idea, thanks @alrevuelta ! Now at least I only see relevant changes in this PR. However, they are now seemingly "authored by Vaclav and committed by me", and this commit is not signed, blocking the merge again (which is why I started this local-squash business in the first place). Any ideas on how to go about this?

@alrevuelta
Copy link
Contributor

Any ideas on how to go about this?

mm as an easy fix I would say:

  • undo the last commit. git reset HEAD^1
  • commit again. "git add -A", "git commit -m "your commit name"
  • if you have configured your keys in the prev step, that new commit should be signed)
  • "git push --force-with-lease" to update the PR

@s-tikhomirov s-tikhomirov force-pushed the fix-lightpush-empty-response branch from cdfb72c to 4f0b7b6 Compare October 9, 2023 13:58
@s-tikhomirov s-tikhomirov merged commit 2c5eb42 into master Oct 9, 2023
8 of 10 checks passed
@s-tikhomirov s-tikhomirov deleted the fix-lightpush-empty-response branch October 9, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: light push response is empty when protobuf decode fails
6 participants