Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source controller listing objects from S3 bucket failed probably due to larger size of the bucket #693

Closed
veeraghgoudar opened this issue Apr 27, 2022 · 3 comments · Fixed by #1228
Labels
area/bucket Bucket related issues and pull requests

Comments

@veeraghgoudar
Copy link

veeraghgoudar commented Apr 27, 2022

We are running flux version 0.28.5. We are using s3 bucket as source where the size of the s3 bucket is around 2TB.

The reconciliation always fails with the below error

flux reconcile source bucket s3-bucket-name --verbose
► annotating Bucket s3-bucket-name in flux-system namespace
✔ Bucket annotated
◎ waiting for Bucket reconciliation
✗ Bucket reconciliation failed: 'indexation of objects from bucket 's3-bucket-name' failed: listing objects from bucket 's3-bucket-name' failed: Get "https://s3.dualstack.us-east-1.amazonaws.com/s3-bucket-name/?continuation-token=1lTiNsKzHoVAVOu0PmalPgNkJEFDnybzDBu8XuqkkoZlCP7DtzXiQm%!!(MISSING)B(MISSING)Hea2BxSsoEprb4N3Wm%!!(MISSING)F(MISSING)3EYVLJ18P%!!(MISSING)F(MISSING)KRR0XAJe6kkXPw%!!(MISSING)D(MISSING)%!!(MISSING)D(MISSING)&delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=": context deadline exceeded'

I have tried add the ignore files as shown below but it did not work. I have also tried increasing the timeout upto 10m for testing purpose

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
  name:  s3-bucket-name 
  namespace: flux-system
spec:
  interval: 30s
  provider: aws
  bucketName:  s3-bucket-name 
  endpoint: s3.amazonaws.com
  region: us-east-1
  timeout: 300s
  ignore: |
    # exclude all
    /*
    # include flux dir
    !/flux
    # exclude file extensions from deploy dir
    /flux/**/*.md
    /flux/**/*.txt

I am not sure if the ignore options is included in filter here - https://github.com/fluxcd/source-controller/blob/main/pkg/minio/minio.go#L112

When I test with different s3 bucket with much smaller in size, the reconciliation works fine.
Is there a way to get this working ?

@stefanprodan
Copy link
Member

stefanprodan commented Apr 28, 2022

I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.

I have tried add the ignore files as shown below but it did not work.

To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.

@hiddeco
Copy link
Member

hiddeco commented Apr 28, 2022

As Stefan said, a Bucket is more like an enriched key/value storage, and we thus have to iterate over every key to see if it's a match and we can't e.g. "skip directories". The only trick left in this area that might help in your case, is if we would support defining a "prefix" to which files must match. This makes the filtering a server-side operation, and would decrease the number of iterations we have to do.

@hiddeco hiddeco added the area/bucket Bucket related issues and pull requests label Apr 28, 2022
@veeraghgoudar
Copy link
Author

I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.

I have tried add the ignore files as shown below but it did not work.

To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.

@stefanprodan : Yes you are right. The bucket contains lot of other artifacts and not only Yaml files. As suggested by you we would go with a dedicated bucket for flux.

@hiddeco : If the prefix option is supported in future then we might start using it. I was thinking Ignore option would solve this problem and I was wrong. Thanks for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bucket Bucket related issues and pull requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants