Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

veeraghgoudar · 2022-04-27T17:14:00Z

We are running flux version 0.28.5. We are using s3 bucket as source where the size of the s3 bucket is around 2TB.

The reconciliation always fails with the below error

flux reconcile source bucket s3-bucket-name --verbose
► annotating Bucket s3-bucket-name in flux-system namespace
✔ Bucket annotated
◎ waiting for Bucket reconciliation
✗ Bucket reconciliation failed: 'indexation of objects from bucket 's3-bucket-name' failed: listing objects from bucket 's3-bucket-name' failed: Get "https://s3.dualstack.us-east-1.amazonaws.com/s3-bucket-name/?continuation-token=1lTiNsKzHoVAVOu0PmalPgNkJEFDnybzDBu8XuqkkoZlCP7DtzXiQm%!!(MISSING)B(MISSING)Hea2BxSsoEprb4N3Wm%!!(MISSING)F(MISSING)3EYVLJ18P%!!(MISSING)F(MISSING)KRR0XAJe6kkXPw%!!(MISSING)D(MISSING)%!!(MISSING)D(MISSING)&delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=": context deadline exceeded'

I have tried add the ignore files as shown below but it did not work. I have also tried increasing the timeout upto 10m for testing purpose

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
  name:  s3-bucket-name 
  namespace: flux-system
spec:
  interval: 30s
  provider: aws
  bucketName:  s3-bucket-name 
  endpoint: s3.amazonaws.com
  region: us-east-1
  timeout: 300s
  ignore: |
    # exclude all
    /*
    # include flux dir
    !/flux
    # exclude file extensions from deploy dir
    /flux/**/*.md
    /flux/**/*.txt

I am not sure if the ignore options is included in filter here - https://github.com/fluxcd/source-controller/blob/main/pkg/minio/minio.go#L112

When I test with different s3 bucket with much smaller in size, the reconciliation works fine.
Is there a way to get this working ?

The text was updated successfully, but these errors were encountered:

stefanprodan · 2022-04-28T07:26:47Z

I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.

I have tried add the ignore files as shown below but it did not work.

To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.

hiddeco · 2022-04-28T07:37:06Z

As Stefan said, a Bucket is more like an enriched key/value storage, and we thus have to iterate over every key to see if it's a match and we can't e.g. "skip directories". The only trick left in this area that might help in your case, is if we would support defining a "prefix" to which files must match. This makes the filtering a server-side operation, and would decrease the number of iterations we have to do.

veeraghgoudar · 2022-04-28T08:00:21Z

I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.

I have tried add the ignore files as shown below but it did not work.

To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.

@stefanprodan : Yes you are right. The bucket contains lot of other artifacts and not only Yaml files. As suggested by you we would go with a dedicated bucket for flux.

@hiddeco : If the prefix option is supported in future then we might start using it. I was thinking Ignore option would solve this problem and I was wrong. Thanks for the explanation.

hiddeco added the area/bucket Bucket related issues and pull requests label Apr 28, 2022

stefanprodan mentioned this issue Sep 18, 2023

bucket: Add prefix filtering capability #1228

Merged

stefanprodan closed this as completed in #1228 Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

veeraghgoudar commented Apr 27, 2022 •

edited

Loading

stefanprodan commented Apr 28, 2022 •

edited

Loading

hiddeco commented Apr 28, 2022

veeraghgoudar commented Apr 28, 2022

Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

Comments

veeraghgoudar commented Apr 27, 2022 • edited Loading

stefanprodan commented Apr 28, 2022 • edited Loading

hiddeco commented Apr 28, 2022

veeraghgoudar commented Apr 28, 2022

veeraghgoudar commented Apr 27, 2022 •

edited

Loading

stefanprodan commented Apr 28, 2022 •

edited

Loading