Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync pod sidecar containers do not have resources properly defined for k8s clusters with resource quotas #10589

Closed
rcheatham-q opened this issue Feb 23, 2022 · 8 comments
Labels
autoteam community team/tse Technical Support Engineers type/bug Something isn't working

Comments

@rcheatham-q
Copy link
Contributor

rcheatham-q commented Feb 23, 2022

Environment

  • Airbyte version: 0.35.30-alpha (though the offending code remains unchanged in the master branch and in version 0.35.36-alpha, the current version as of creating this issue)
  • OS Version / Instance: macOS
  • Deployment: Kubernetes via official helm chart
  • Source Connector and version: Salesforce 0.1.23
  • Destination Connector and version: S3 0.2.7
  • Severity: Critical (Cannot run Airbyte in this environment currently)
  • Step where error happened: Sync job

Current Behavior

Airbyte fails to create the sync pod because the sidecar and init containers do not have requests.cpu set. In Kubernetes clusters with resource quotas implemented for CPU requests, these values must be specified (see the fifth bullet point on the resource quotas page).

Expected Behavior

Airbyte creates the sync pod normally.

Logs

LOG

at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:151) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73) ~[temporal-sdk-1.6.0.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: io.temporal.failure.ApplicationFailure: message='Error while getting spec from image my-registry/airbyte/destination-s3:1.15', type='io.airbyte.workers.WorkerException', nonRetryable=false
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:78) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:23) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:155) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: io.temporal.failure.ApplicationFailure: message='Failure executing: POST at: https://172.20.0.1/api/v1/namespaces/mynamespace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "sync-6609a393" is forbidden: failed quota: request-quota: must specify requests.cpu,requests.memory.', type='io.airbyte.workers.WorkerException', nonRetryable=false
	at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:148) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.process.AirbyteIntegrationLauncher.spec(AirbyteIntegrationLauncher.java:44) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:48) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:23) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:155) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: io.temporal.failure.ApplicationFailure: message='Failure executing: POST at: https://172.20.0.1/api/v1/namespaces/mynamespace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "sync-6609a393" is forbidden: failed quota: request-quota: must specify requests.cpu,requests.memory.', type='io.fabric8.kubernetes.client.KubernetesClientException', nonRetryable=false
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:583) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:520) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:487) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:448) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:263) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:870) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:365) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:53) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:411) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:86) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:394) ~[kubernetes-client-5.3.1.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:86) ~[kubernetes-client-5.3.1.jar:?]
	at io.airbyte.workers.process.KubePodProcess.<init>(KubePodProcess.java:481) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:144) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.process.AirbyteIntegrationLauncher.spec(AirbyteIntegrationLauncher.java:44) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:48) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.DefaultGetSpecWorker.run(DefaultGetSpecWorker.java:23) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:155) ~[io.airbyte-airbyte-workers-0.35.30-alpha.jar:?]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[?:?]

Steps to Reproduce

(these have not been explicitly tested but are the most likely steps to reproduce)

  1. Create a k8s cluster that supports resource quotas
  2. Create a non-default namespace
  3. Set a resource quota for the namespace that defines the maximum total CPU units that can be requested
  4. Deploy Airbyte in the non-default namespace and set up a sync
  5. Attempt to run any sync job

Are you willing to submit a PR?

Potentially. The code changes required are fairly simple. All changes would be made in ./airbyte-workers/src/main/java/io/airbyte/workers/process/KubePodProcess.java. The DEFAULT_SIDECAR_RESOURCES variable defined on line 104 would be changed to add CPU requests, and the init container defined on line 164 would need all resource requests defined.

My current configuration only requires the resource request to be defined, but other resource quotas will require the resource limits to be defined, so a full fix will include all resource requests and limits.

@rcheatham-q
Copy link
Contributor Author

I found the official walkthrough from Kubernetes that demonstrates how to build a namespace with resource quotas, which may be helpful in testing: https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/

@bleonard bleonard added autoteam team/tse Technical Support Engineers labels Apr 26, 2022
@rcheatham-q
Copy link
Contributor Author

It appears this was fixed in this PR: #10759. I will test this as soon as I can

@amitza
Copy link

amitza commented Jul 6, 2022

It appears this was fixed in this PR: #10759. I will test this as soon as I can

I don't see how this can be configured using env vars or config file.
I'm working on a cluster with strict CPU and Memory minimum restrictions.

Is there any way I can help make this configurable?

@marcosmarxm
Copy link
Member

Closing due inactivity. @rcheatham-q please open a new issue if the error persists.

@k0t3n
Copy link

k0t3n commented Apr 17, 2023

@amitza hello! Did you solve this?

@sookeke
Copy link

sookeke commented Dec 25, 2023

not fixed ->

@marcosmarxm
Copy link
Member

@sookeke if the issue is not fixed please reopen adding more information and what version are you using today.

@mateocolina
Copy link

Hi @marcosmarxm we are currently using Airbyte version 0.58.0 with Helm chart version 0.64.308 and this issue still persists.

Caused by: io.temporal.failure.ApplicationFailure: message='Failure executing: POST at: https://kubernetes.cluster.ip:443/api/v1/namespaces/namespace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "orchestrator-repl-job-3-attempt-4" is forbidden: failed quota: kaas-namespace-resourcequota: must specify limits.cpu for: init; limits.memory for: init; requests.cpu for: init; requests.memory for: init.', type='io.fabric8.kubernetes.client.KubernetesClientException', nonRetryable=false

So the init containers do not get an resource definition assigned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoteam community team/tse Technical Support Engineers type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants