Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mediation not working with low settle_timeout and no user notification. #918

Closed
LefterisJP opened this issue Aug 23, 2017 · 10 comments
Closed

Comments

@LefterisJP
Copy link
Contributor

Problem Definition

If 2 channels are opened with settle_timeout being 31 and the default reveal_timeout of 30 then we are in a very interesting situation.

When those channels are used for mediation the node will happily accept to mediate a transfer.

But it will be unable to. During mediation the timeout_blocks will be less than the reveal_timeout and as such there will be no safe route to the target.

The problem lies in the fact that then we will try to make a RefundTransfer back to the initiator but because of the same reason the events_for_refund_transfer() function will return an empty list and no refund will happen.

So we get to the point where the initiator is still waiting for the protocol to continue for his sent mediated transfer, the mediator does nothing because of the timeout as shown above and nobody has a clue what happened or how to proceed from there.

Solution

The proper solution can be discussed further. But we at least need some form of logging from the mediator's side to be able to see why the transfer is not mediated.

It would also be nice if we can somehow notify the initiator that his transfer is not going anywhere.

@hackaugusto
Copy link
Contributor

When those channels are used for mediation the node will happily accept to mediate a transfer.

The mediator must accept it to synchronize it's local state with the sender's state, otherwise all subsequent messages will be rejected because of nonce / locksroot mismatches.

The problem lies in the fact that then we will try to make a RefundTransfer back to the initiator but because of the same reason the events_for_refund_transfer() function will return an empty list and no refund will happen.

Refund transfers are not meant to fix cases were path exhausted the timeout blocks, only in cases were the available paths don't have enough capacity and there are timeout blocks left. From my perspective this is a liquidity issue and I think we should use a lock with two secrets.

the mediator does nothing because of the timeout as shown above and nobody has a clue what happened or how to proceed from there.

The python API must set the async result to False, and the REST api must return a 4xx/5xx error.

But we at least need some form of logging from the mediator's side to be able to see why the transfer is not mediated

IO was intentionally avoided in the state tasks. Ideally we should have all the info available in the event objects, allowing an user to figure it out what is happening through the events API.

It would also be nice if we can somehow notify the initiator that his transfer is not going anywhere.

How would this work?

Btw, at first sight I don't like this, we cannot possible implement a notification reliably and without trust, the network may fail and the partner can exploit these informational messages without consequences. Additionally this could make the default error mode (expiration) a corner case, if the notification to the API user is distinguishable from the standard expiration.

@LefterisJP
Copy link
Contributor Author

IO was intentionally avoided in the state tasks. Ideally we should have all the info available in the event objects, allowing an user to figure it out what is happening through the events API.

Yes this is also fine. We can create informational events to be emitted by the state machine and then let the event handler of the state machine events do the logging/notification.

How would this work?

Future TODO: Send an informational message back to him?

we cannot possible implement a notification reliably and without trust, the network may fail and the partner can exploit these informational messages without consequences.

I am not sure I understand what you want to say here. Can you explain a bit more? How can the partner exploit these informational messages?

@hackaugusto
Copy link
Contributor

hackaugusto commented Aug 23, 2017

I am not sure I understand what you want to say here. Can you explain a bit more? How can the partner exploit these informational messages?

Depends on how it would be implemented, assume the following scenario:

A - B - C
A -> B Mediated Transfer
B -> C Mediated Transfer
C -> A Secret Request
A -> C Secret Reveal
C -> B Whatever message saying that there are not enough blocks left.

What should B do?

@LefterisJP
Copy link
Contributor Author

I guess B would have to wait for the lock to expire? But at least we would know why the protocol did not progress.

@hackaugusto
Copy link
Contributor

I guess B would have to wait for the lock to expire? But at least we would know why the protocol did not progress.

Definitely B needs to wait for one of the following:

  • the lock expires
  • C also reveals the secret
  • C withdraws on chain

So, I guess the proposed extension is to add a message, allowing node n to inform node n-1 what failure happened and node n-1 will only use this data if the lock expires.

IOW, a message like this no interaction with the protocol, has no externally visible side effect, and is not exposed individually.

@hackaugusto
Copy link
Contributor

The proper message to use here is the RemoveExpiredLock, but this message is only sent after the lock has expired, which will take settle_timeout blocks. This can be reduce with the secret registry, because the lock expiration can be constant, but we will still need to wait for the lock expiration.

@hackaugusto
Copy link
Contributor

related #505

@hackaugusto
Copy link
Contributor

Update:

Opening a channel with a settle timeout that is smaller than the reveal timeout is not allowed anymore:

raiden/raiden/api/python.py

Lines 371 to 374 in 3e43309

if settle_timeout < self.raiden.config["reveal_timeout"] * 2:
raise InvalidSettleTimeout(
"settle_timeout can not be smaller than double the reveal_timeout"
)

Configuring the reveal_timeout is not allowed, so the settle timeout will always be valid from the perspective of the node that opened the channel. (Issue to make the reveal timeout configurable has a bullet point to make sure the new value will be validated #4905)

The problem here is from the partner node and mediators. Here we have two things to consider:

  1. The transfer should not use the invalid route to begin with
  2. If it does, it has to expire and the transfer fail.

2 was implemented by #193 , it will emit the events EventPaymentSentFailed EventRouteFailed EventRouteFailed That signal the payment failed. This event will set the status of the transfer as a failure, which will then return a CONFLICT to the user. This process could be made faster by not requiring capacity in the backwards direction, but at the moment I think it works fine.

@palango @karlb what is the status for the scenario 1? How does the scenario player ignore the invalid routes?

hackaugusto added a commit to hackaugusto/raiden that referenced this issue Sep 17, 2019
@hackaugusto hackaugusto mentioned this issue Sep 17, 2019
13 tasks
@hackaugusto
Copy link
Contributor

One additional note, we currently prevent a node from opening a channel that itself cannot use, but that lacks information of the partner node (because the partner may have a higher minimum setting for the settlement or reveal timeouts). To fix this #864 is needed.

@palango
Copy link
Contributor

palango commented Sep 17, 2019

hackaugusto added a commit that referenced this issue Sep 18, 2019
@karlb karlb closed this as completed Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants