-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send messages getting stuck in mediator and not-redilivered #2111
Comments
@ianco FYI. |
Thanks @jleach Are Per the discussion last week if we:
This will help prevent the mediator from receiving un-deliverable messages. @swcurran your thoughts? The mediator message re-delivery issue is a separate issue from the above. I think the mediator solution should involve Redis (Redis can hold the message if it's undeliverable for whatever reason and the message will survive an agent re-start). |
If by v2 you mean OOB yes, we do OOB. I'm a little unclear about 2.0 beyond that. Even if AFJ did 2.0 I think it would still leave a hole for all other frameworks that don't do it, including older agents that don't regularly update ACA-py. I don't think ACA-py supports v2 connections - does it? I seem to remember an issue when using AFJ I had to dig for the legacy DID |
AIP 1.0 had RFC 0160 Connections as its way to establish connections. And it sucked because of the issue with marking a connection as “complete” — the very thing you are running into, we’ve known about since all the participants first implemented it and started to use it in the wild. It also sucked because it doesn’t support connect reuse (below). AIP 2.0 replaces RFC 0160 with OOB and RFC 0023 (lower number — what???). The changes (quick summary):
The challenge has been that the Mobile Wallets (Trinsic and the like) have been very slow to move to AIP 2.0 and without Wallet support, it’s hard to deploy issuers and verifiers that use AIP 2.0. Hence why we’ve been encouraging for a while getting AFJ to AIP 2.0, getting Bifold to AIP 2.0 and then phasing out (as quickly as possible) the use of AIP 1.0 and especially the use of RFC 0160 Connections. So the question is — does Bifold support both OOB and DID Exchange, and if so, can we phase out using RFC 0160 on the server side? |
@swcurran Yup, AFJ does both OOB (RFC 0434) and DID Exchange Protocol (RFC 0023). |
I'm documenting this issue with video and sample code here. This TL;DR version of the issue is that message can become stuck in an ACA-py mediator under certain circumstances. The mediator does not re-try sending the message and they are only picked up by other agents if they specifically request message delivery. For example, in AFJ the API
initiateMessagePickup()
must be called when a connection enters the "completed" state to see if any messages are queued up.This is not a bug per-se in ACA-py as it appears to be a shortcoming in the V1 protocol where by a connection is considered valid in the "request-sent" state. Given this, it would be a good mitigation strategy to have queued messages re-delivered automatically when a connection transitions to be "completed" or by some other trigger.
The following is from the repo I used to document the issue and provide sample code to trigger in AFJ:
Trigger the Issue
press 1 to offer a credential.
in the message being queued on on the mediator.
is fully setup:
will display no notifications (offers).
Same Scenario, No Issue
will display the notification (offer).
Expected Behaviour
RFC 0160 does not require a acknowlwdgement that a connection is completed before message can be sent over it. This is address in V2.
An ACA-py mediator should atempt delivery of any queued messages when the related connection becomes "completed" to remediate this issue.
Q & A
In AFJ the fn
initiateMessagePickup
can be called to trigger the delivery of messages. The outstanding offer will be delivered.This problem exists on two mediators running similar version of ACA-py hosted on different infrastructure by two different companies. It can also be reproduced locally using Docker.
It can be reproduced locally in Dokcer using ACA-py 0.7.3 and 1.0.0-rc1. The cloud hosted agents both run ACA-py 0.7.x versions.
In one test we used an ACA-py 0.7.3 mediator on a cloud platform which had been running for 1 day under light load. The problem was evident. On the same cloud platform an ACA-py mediator running 0.7.4-rc2 which had been running for <10 min. The problem did not present. This leands us to believe that as a mediator is used performance degrades enough for the problem to present.
Unlikley. The situation can be reproduced using the BC Showcase demo. By using the older mediator mentiond in #4 above the automated showcase demo fails. By using the fresh mediator from #4 above the demo succeeds.
V1.
Maybe. RFC 0160 does not require acknowledgement when a conneciton enters the "completed" state. Its is by convention that a controller should confirm the state before sending a message (offer) over the conneciton. This is a recongized shortcoming that shoudl be addressed in V2.
However, it may be a bug in that an ACA-py mediator does not atempt re-delivery of queued messages if that message comes in before the conneciton is "completed".
The text was updated successfully, but these errors were encountered: