You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When object storage (e.g., AWS S3, Google Cloud Storage) is enabled as the backend for Harbor, all image layers and artifacts are fetched from the remote storage every time they are pulled. While using object storage is advantageous for scalability and reduces the need to maintain large local disk volumes—especially in bare-metal setups—it introduces inefficiencies in terms of cost, bandwidth usage, and latency due to frequent data transfers.
Describe the solution you'd like
Introduce a local caching mechanism for image blobs within Harbor for frequently accessed images when using an object storage backend. The cache would store recently pulled images on local storage, utilizing a Least Recently Used (LRU) eviction policy to manage cache cleanup and ensure optimal use of disk space.
Benefits:
Cost Reduction: Minimizes bandwidth costs associated with repeatedly fetching the same images from object storage.
Improved Performance: Decreases latency for image pulls by serving frequently requested images from the local cache.
Optimized Resource Utilization: Maintains the benefits of object storage (scalability, reduced local storage requirements) while enhancing efficiency.
Increased Reliability: Allows Harbor to serve cached images even during object storage failures, ensuring continuous availability of frequently used images.
The text was updated successfully, but these errors were encountered:
With our use case, we have approximately 140 Mbit/s of traffic from clients to the container registry, averaged over 24 hours to simplify calculations. Most of this traffic is served from the local cache of our container registry solution [1], while traffic from object storage is much smaller [2], averaging over 24 hours at 2 Mbit/s, which costs us $50 monthly for S3 egress traffic.
Let’s assume we use Harbor without a local cache, so all this traffic would need to go to the object storage because each image pull would have to be fetched from it. This translates to approximately 1.5 TB of data transferred daily (45 TB monthly) from object storage. With the current AWS S3 egress price of $0.09 per GB (Data Transfer OUT From Amazon S3 To Internet), this amounts to $135 per day and $4,050 per month. With our traffic pattern, the price difference between $50 and $4,050 per month is huge.
[1] Download throughput to the local image cache
[2] Download throughput to object storage
We can see from the graphs that traffic to the local cache is significant compared to traffic to object storage, which only shows single spikes when images are fetched and stored in the cache.
Is your feature request related to a problem? Please describe.
When object storage (e.g., AWS S3, Google Cloud Storage) is enabled as the backend for Harbor, all image layers and artifacts are fetched from the remote storage every time they are pulled. While using object storage is advantageous for scalability and reduces the need to maintain large local disk volumes—especially in bare-metal setups—it introduces inefficiencies in terms of cost, bandwidth usage, and latency due to frequent data transfers.
Describe the solution you'd like
Introduce a local caching mechanism for image blobs within Harbor for frequently accessed images when using an object storage backend. The cache would store recently pulled images on local storage, utilizing a Least Recently Used (LRU) eviction policy to manage cache cleanup and ensure optimal use of disk space.
Benefits:
The text was updated successfully, but these errors were encountered: