-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thrashing of old txn sessions to prevent OOM #12477
Conversation
/ci-repeat 5 |
/ci-repeat 5 |
/ci-repeat 5 |
3a91912
to
54c87cd
Compare
/ci-repeat 5 |
/ci-repeat 5 |
I am wondering if in the case when tx_id limit is breached we should simply return an error until old transactions are expired completed ? Also why other Kafka protocol implementations don't have a limit for max concurrent transactions ? |
We can't rely on expiration because it's controlled by the users and Flink recommends to set it to pretty large values
They don't have the limit but they have the same problem: see KIP-936. Once it's approved and implemented we'll support to be compatible but we can't wait and to fix existing problem we'll use the limit so far. |
Known issue: #12659 |
The KIP-936 is mentioning producer_id not transactional_id so it is still not clear to me how this will help ? |
Reread it, yeah it mentioned only producer id caused OOMs. Maybe they'll introduce quotas to txn coordinator too |
pure refactoring, nothing new added
pure refactoring, nothing new added
`limit_init_tm_tx` will be developed into controlling the number of active tx sessions. Once the number of txs passes the threshold the method will terminate old session to make room for new txes to prevent OOM.
LRU tx list is added to track & look up oldest tx session
`limit_init_tm_tx` is updated into controlling the number of active tx sessions. Once the number of tx sessions on a single controller passes the threshold (`max_transactions_per_coordinator`) the method terminates old tx session to make room for new txes to prevent OOM.
Only known issues fail:
|
/backport v23.2.x |
/backport v23.1.x |
Failed to run cherry-pick command. I executed the commands below:
|
PR adds tx thrashing to abort & evict old txn sessions once the number of known txn session passes a threshold (new
max_transactions_per_coordinator
property).Fixes #12424
Backports Required
Release Notes