Mediation not working with low settle_timeout and no user notification. #918

LefterisJP · 2017-08-23T13:40:56Z

Problem Definition

If 2 channels are opened with settle_timeout being 31 and the default reveal_timeout of 30 then we are in a very interesting situation.

When those channels are used for mediation the node will happily accept to mediate a transfer.

But it will be unable to. During mediation the timeout_blocks will be less than the reveal_timeout and as such there will be no safe route to the target.

The problem lies in the fact that then we will try to make a RefundTransfer back to the initiator but because of the same reason the events_for_refund_transfer() function will return an empty list and no refund will happen.

So we get to the point where the initiator is still waiting for the protocol to continue for his sent mediated transfer, the mediator does nothing because of the timeout as shown above and nobody has a clue what happened or how to proceed from there.

Solution

The proper solution can be discussed further. But we at least need some form of logging from the mediator's side to be able to see why the transfer is not mediated.

It would also be nice if we can somehow notify the initiator that his transfer is not going anywhere.

The text was updated successfully, but these errors were encountered:

hackaugusto · 2017-08-23T14:06:03Z

When those channels are used for mediation the node will happily accept to mediate a transfer.

The mediator must accept it to synchronize it's local state with the sender's state, otherwise all subsequent messages will be rejected because of nonce / locksroot mismatches.

The problem lies in the fact that then we will try to make a RefundTransfer back to the initiator but because of the same reason the events_for_refund_transfer() function will return an empty list and no refund will happen.

Refund transfers are not meant to fix cases were path exhausted the timeout blocks, only in cases were the available paths don't have enough capacity and there are timeout blocks left. From my perspective this is a liquidity issue and I think we should use a lock with two secrets.

the mediator does nothing because of the timeout as shown above and nobody has a clue what happened or how to proceed from there.

The python API must set the async result to False, and the REST api must return a 4xx/5xx error.

But we at least need some form of logging from the mediator's side to be able to see why the transfer is not mediated

IO was intentionally avoided in the state tasks. Ideally we should have all the info available in the event objects, allowing an user to figure it out what is happening through the events API.

It would also be nice if we can somehow notify the initiator that his transfer is not going anywhere.

How would this work?

Btw, at first sight I don't like this, we cannot possible implement a notification reliably and without trust, the network may fail and the partner can exploit these informational messages without consequences. Additionally this could make the default error mode (expiration) a corner case, if the notification to the API user is distinguishable from the standard expiration.

LefterisJP · 2017-08-23T14:12:43Z

IO was intentionally avoided in the state tasks. Ideally we should have all the info available in the event objects, allowing an user to figure it out what is happening through the events API.

Yes this is also fine. We can create informational events to be emitted by the state machine and then let the event handler of the state machine events do the logging/notification.

How would this work?

Future TODO: Send an informational message back to him?

we cannot possible implement a notification reliably and without trust, the network may fail and the partner can exploit these informational messages without consequences.

I am not sure I understand what you want to say here. Can you explain a bit more? How can the partner exploit these informational messages?

hackaugusto · 2017-08-23T14:23:32Z

I am not sure I understand what you want to say here. Can you explain a bit more? How can the partner exploit these informational messages?

Depends on how it would be implemented, assume the following scenario:

A - B - C
A -> B Mediated Transfer
B -> C Mediated Transfer
C -> A Secret Request
A -> C Secret Reveal
C -> B Whatever message saying that there are not enough blocks left.

What should B do?

LefterisJP · 2017-08-23T16:43:26Z

I guess B would have to wait for the lock to expire? But at least we would know why the protocol did not progress.

hackaugusto · 2017-08-24T03:34:09Z

I guess B would have to wait for the lock to expire? But at least we would know why the protocol did not progress.

Definitely B needs to wait for one of the following:

the lock expires
C also reveals the secret
C withdraws on chain

So, I guess the proposed extension is to add a message, allowing node n to inform node n-1 what failure happened and node n-1 will only use this data if the lock expires.

IOW, a message like this no interaction with the protocol, has no externally visible side effect, and is not exposed individually.

hackaugusto · 2018-03-29T16:10:12Z

The proper message to use here is the RemoveExpiredLock, but this message is only sent after the lock has expired, which will take settle_timeout blocks. This can be reduce with the secret registry, because the lock expiration can be constant, but we will still need to wait for the lock expiration.

hackaugusto · 2018-08-24T11:04:25Z

related #505

hackaugusto · 2019-09-17T13:41:56Z

Update:

Opening a channel with a settle timeout that is smaller than the reveal timeout is not allowed anymore:

raiden/raiden/api/python.py

Lines 371 to 374 in 3e43309

    
           if settle_timeout < self.raiden.config["reveal_timeout"] * 2: 
        
               raise InvalidSettleTimeout( 
        
                   "settle_timeout can not be smaller than double the reveal_timeout" 
        
               )

Configuring the reveal_timeout is not allowed, so the settle timeout will always be valid from the perspective of the node that opened the channel. (Issue to make the reveal timeout configurable has a bullet point to make sure the new value will be validated #4905)

The problem here is from the partner node and mediators. Here we have two things to consider:

The transfer should not use the invalid route to begin with
If it does, it has to expire and the transfer fail.

2 was implemented by #193 , it will emit the events EventPaymentSentFailed EventRouteFailed EventRouteFailed That signal the payment failed. This event will set the status of the transfer as a failure, which will then return a CONFLICT to the user. This process could be made faster by not requiring capacity in the backwards direction, but at the moment I think it works fine.

@palango @karlb what is the status for the scenario 1? How does the scenario player ignore the invalid routes?

hackaugusto · 2019-09-17T13:45:29Z

One additional note, we currently prevent a node from opening a channel that itself cannot use, but that lacks information of the partner node (because the partner may have a higher minimum setting for the settlement or reveal timeouts). To fix this #864 is needed.

palango · 2019-09-17T13:45:53Z

The PFS filters out routes that do not fulfill this requirement:
https://github.com/raiden-network/raiden-services/blob/fab27b4e4b31f6d7dfcd4737d2339b3652fe1b41/src/pathfinding_service/model/token_network.py#L136-L148

hackaugusto added the Flag / Usability label Jun 20, 2018

hackaugusto added a commit to hackaugusto/raiden that referenced this issue Sep 17, 2019

Added a test for raiden-network#918

dee721c

hackaugusto mentioned this issue Sep 17, 2019

Added a test for #918 #4906

Merged

13 tasks

hackaugusto added a commit that referenced this issue Sep 18, 2019

Added a test for #918

0d31a74

karlb closed this as completed Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mediation not working with low settle_timeout and no user notification. #918

Mediation not working with low settle_timeout and no user notification. #918

LefterisJP commented Aug 23, 2017

hackaugusto commented Aug 23, 2017

LefterisJP commented Aug 23, 2017

hackaugusto commented Aug 23, 2017 •

edited

Loading

LefterisJP commented Aug 23, 2017

hackaugusto commented Aug 24, 2017

hackaugusto commented Mar 29, 2018

hackaugusto commented Aug 24, 2018

hackaugusto commented Sep 17, 2019

hackaugusto commented Sep 17, 2019

palango commented Sep 17, 2019

Mediation not working with low settle_timeout and no user notification. #918

Mediation not working with low settle_timeout and no user notification. #918

Comments

LefterisJP commented Aug 23, 2017

Problem Definition

Solution

hackaugusto commented Aug 23, 2017

LefterisJP commented Aug 23, 2017

hackaugusto commented Aug 23, 2017 • edited Loading

LefterisJP commented Aug 23, 2017

hackaugusto commented Aug 24, 2017

hackaugusto commented Mar 29, 2018

hackaugusto commented Aug 24, 2018

hackaugusto commented Sep 17, 2019

hackaugusto commented Sep 17, 2019

palango commented Sep 17, 2019

hackaugusto commented Aug 23, 2017 •

edited

Loading