Solve connection leak issue in S3 backup remote delete on versioned buckets #637
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm experiencing a connection leak issue on clickhouse-backup where the no. of open connections in the container grow by the thousands each time a 'remote delete' command on s3 is being executed. After a few days, the node would exhaust the TCP memory and drop almost all new TCP connections to it.
Clickhouse-backup version is
altinity/clickhouse-backup:2.1.3
.I'm running Clickhouse on EKS with Clickhouse operator, EKS AMI for arm64 1.25.7-20230406 with linux kernel 5.10.
I noticed that the number of created connections is roughly equal to the number of objects being deleted for that shard/day. My backups are uploaded to a versioned S3 bucket.
I believe the bug is in getObjectVersion(): s3.GetObject() actually retrieves the object's body, which create a new connection until you close it.
Use s3.GetObjectAttributes() (which is also cheaper than s3.GetObject) to retrieve the object's version.
/cc @Slach who kindly assisted in troubleshooting!