Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 "Could not delete lock key" -- Improve concurrency for ACA-Py Events Processor #959

Closed
wdbasson opened this issue Aug 7, 2024 · 1 comment · Fixed by #963
Closed
Assignees
Labels
improvement improving on something that is already existing (not a new feature as such)

Comments

@wdbasson
Copy link
Collaborator

wdbasson commented Aug 7, 2024

{"message":"Could not delete lock key: lock:acapy-record-with-state-base. Perhaps it expired? | ","levelname":"WARNING","date":"2024-08-07 14:56:19.860175+00:00","name":"webhooks.services.acapy_events_processor","record":{"exception":null,"file":"/webhooks/services/acapy_events_processor.py","function":"_attempt_process_list_events","line":219,"process":{"id":1,"name":"MainProcess"},"thread":{"id":140042948758400,"name":"MainThread"},"time":{"repr":1723042579.860175,"uptime_h:m:s":"0:04:20.180110"}}}
@wdbasson wdbasson changed the title Cloud not delete lock key Could not delete lock key Aug 7, 2024
@ff137 ff137 self-assigned this Aug 7, 2024
@ff137 ff137 added the improvement improving on something that is already existing (not a new feature as such) label Aug 7, 2024
@ff137 ff137 changed the title Could not delete lock key 🐛 "Could not delete lock key" -- Improve concurrency for ACA-Py Events Processor Aug 7, 2024
@ff137
Copy link
Collaborator

ff137 commented Aug 7, 2024

"lock keys" are set in the ACA-Py Events Processor to ensure that only one instance will process a given item (a redis list of webhook events, per wallet).

These lock keys have an expiry of 500ms, where an async background task will re-set the lock key to increment the expiry time, so that it exists as long as is needed.

When load is very high, this async background task might not start soon enough, leaving a window where the lock doesn't exist anymore, resulting in the "Could not delete lock key" warning.

This may be a problem, as it opens up for the same list of events to be processed by an additional Webhooks Service instance.

The reason for the lock expiry is so that locks cannot exist indefinitely, in the case of Webhooks Service getting a sigterm mid-processing, resulting in a permanent lock.

A reasonable solution would be just to make the expiry much longer, perhaps up to 5 seconds long, so that processing time should never exceed that expiry time, and an async background task is not necessary to extend the lock. This is a trade-off in availability, as a sigterm could cause a webhook to take 5s longer to arrive than usual.

Alternatively, increase the frequency of the background task's re-setting of the lock key. This has a trade-off in increased redis traffic.

I think the first option is preferable, but let's see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement improving on something that is already existing (not a new feature as such)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants