Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement local cache with LRU eviction for object storage backends #21212

Open
r0bj opened this issue Nov 18, 2024 · 1 comment
Open

Implement local cache with LRU eviction for object storage backends #21212

r0bj opened this issue Nov 18, 2024 · 1 comment
Labels
kind/requirement New feature or idea on top of harbor

Comments

@r0bj
Copy link

r0bj commented Nov 18, 2024

Is your feature request related to a problem? Please describe.
When object storage (e.g., AWS S3, Google Cloud Storage) is enabled as the backend for Harbor, all image layers and artifacts are fetched from the remote storage every time they are pulled. While using object storage is advantageous for scalability and reduces the need to maintain large local disk volumes—especially in bare-metal setups—it introduces inefficiencies in terms of cost, bandwidth usage, and latency due to frequent data transfers.

Describe the solution you'd like
Introduce a local caching mechanism for image blobs within Harbor for frequently accessed images when using an object storage backend. The cache would store recently pulled images on local storage, utilizing a Least Recently Used (LRU) eviction policy to manage cache cleanup and ensure optimal use of disk space.

Benefits:

  • Cost Reduction: Minimizes bandwidth costs associated with repeatedly fetching the same images from object storage.
  • Improved Performance: Decreases latency for image pulls by serving frequently requested images from the local cache.
  • Optimized Resource Utilization: Maintains the benefits of object storage (scalability, reduced local storage requirements) while enhancing efficiency.
  • Increased Reliability: Allows Harbor to serve cached images even during object storage failures, ensuring continuous availability of frequently used images.
@wy65701436 wy65701436 added the kind/requirement New feature or idea on top of harbor label Nov 18, 2024
@r0bj
Copy link
Author

r0bj commented Nov 21, 2024

With our use case, we have approximately 140 Mbit/s of traffic from clients to the container registry, averaged over 24 hours to simplify calculations. Most of this traffic is served from the local cache of our container registry solution [1], while traffic from object storage is much smaller [2], averaging over 24 hours at 2 Mbit/s, which costs us $50 monthly for S3 egress traffic.

Let’s assume we use Harbor without a local cache, so all this traffic would need to go to the object storage because each image pull would have to be fetched from it. This translates to approximately 1.5 TB of data transferred daily (45 TB monthly) from object storage. With the current AWS S3 egress price of $0.09 per GB (Data Transfer OUT From Amazon S3 To Internet), this amounts to $135 per day and $4,050 per month. With our traffic pattern, the price difference between $50 and $4,050 per month is huge.

[1] Download throughput to the local image cache
Screenshot 2024-11-21 at 08 56 06

[2] Download throughput to object storage
Screenshot 2024-11-21 at 08 56 18

We can see from the graphs that traffic to the local cache is significant compared to traffic to object storage, which only shows single spikes when images are fetched and stored in the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/requirement New feature or idea on top of harbor
Projects
None yet
Development

No branches or pull requests

2 participants