Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[helm] Using GCS Results in could not find attempt stats for job_id #48503

Open
helcim-adam-c opened this issue Nov 14, 2024 · 3 comments
Open
Labels
area/platform issues related to the platform community team/platform-move type/bug Something isn't working

Comments

@helcim-adam-c
Copy link

Helm Chart Version

1.1.0

What step the error happened?

None

Relevant information

Hi, we currently having issues implementing a datadog integration with out gke hosted airbyte deployment. We use argocd which runs a helm template on the 1.1.0 helm chart to deploy all the resources.

Here's our current override.yaml file

# Global params that are overwritten with umbrella chart
global:
  auth:
    enabled: true
  # -- The URL where Airbyte will be reached; This should match your Ingress host
  airbyteUrl: "https://airbyte.org.com"

  storage:
    type: "GCS"
    storageSecretName: airbyte-gcs-log-creds
    bucket:
      log: data-airbyte-logs
      state: data-airbyte-logs
      workloadOutput: data-airbyte-logs
    gcs:
      projectId: <project-id>
      credentialsPath: /secrets/gcs-log-creds/gcp.json

  database:
    type: "external" # "external"

    # -- Secret name where database credentials are stored
    secretName: "prod-airbyte-psql-password" # e.g. "airbyte-config-secrets"

    # -- The database host
    host: "10.233.128.5"

    # -- The database port
    port: "5432"

    # -- The database name
    database: "db-airbyte"

    # -- The database user
    user: "DATABASE_USER"
    # -- The key within `secretName` where the user is stored
    userSecretKey: DATABASE_USER

    # -- The database password
    password: "DATABASE_PASSWORD"
    # -- The key within `secretName` where the password is stored
    #passwordSecretKey: "" # e.g."database-password"

  env_vars:
    HTTP_IDLE_TIMEOUT: 25m
    READ_TIMEOUT: 30m
    
    
  
  metrics:
    metricClient: datadog

server:
  enabled: true

  env_vars:
    METRIC_CLIENT: datadog
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125
    
worker:
  enabled: true

  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

workload-launcher:
  enabled: true

  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

workload-api-server:
  enabled: true
  
  env_vars:
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125


metrics:
  enabled: true

  env_vars:
    PUBLISH_METRICS: true
    METRIC_CLIENT: datadog
    DD_AGENT_HOST: datadog-agent.datadog.svc.cluster.local
    DD_DOGSTATSD_PORT: 8125

postgresql:
  # -- Switch to enable or disable the PostgreSQL helm chart
  enabled: false
  image:
    repository: airbyte/db
  # -- Airbyte Postgresql username
  postgresqlUsername: airbyte
  # -- Airbyte Postgresql password
  postgresqlPassword: airbyte
  # -- Airbyte Postgresql database
  postgresqlDatabase: db-airbyte

externalDatabase:
  # -- Database host
  host: "10.233.128.5"
  # -- non-root Username for Airbyte Database
  user: "airbyte"
  # -- Database password
  password: ""
  # -- Name of an existing secret resource containing the DB password
  existingSecret: "prod-airbyte-psql-password"
  # -- Name of an existing secret key containing the DB password
  existingSecretPasswordKey: "DATABASE_PASSWORD"
  # -- Database name
  database: "db-airbyte"
  # -- Database port number
  port: "5432"
  # -- Database full JDBL URL (ex: jdbc:postgresql://host:port/db?parameters)
  jdbcUrl: ""

airbyte-bootloader:
  extraEnv:
    - name: DATABASE_USER
      value: airbyte

I'm provided the log output from the server we're seeing, looks like it's not able to get the job stats from our gcs db, but it's not running into an authentication or connection issue from what I can tell.

Relevant log output

io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0
	at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248) ~[io.airbyte-airbyte-commons-server-1.1.0.jar:?]
	at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
	at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
	at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-1.1.0.jar:?]
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461) ~[micronaut-inject-4.6.5.jar:4.6.5]
	at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350) ~[micronaut-inject-4.6.5.jar:4.6.5]
	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272) ~[micronaut-router-4.6.5.jar:4.6.5]
	at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38) ~[micronaut-router-4.6.5.jar:4.6.5]
	at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498) ~[micronaut-http-server-4.6.5.jar:4.6.5]
	at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475) ~[micronaut-http-server-4.6.5.jar:4.6.5]
	at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87) ~[micronaut-core-4.6.5.jar:4.6.5]
	at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211) ~[micronaut-core-4.6.5.jar:4.6.5]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-11-14 22:56:36 ERROR i.a.c.s.e.h.IdNotFoundExceptionHandler(handle):33 - Not found exception class NotFoundKnownExceptionInfo {
    id: null
    message: Id not found: Could not find attempt stats for job_id: 2199 and attempt no: 0
    exceptionClassName: io.airbyte.commons.server.errors.IdNotFoundKnownException
    exceptionStack: [io.airbyte.commons.server.errors.IdNotFoundKnownException: Id not found: Could not find attempt stats for job_id: 2199 and attempt no: 0, 	at io.airbyte.commons.server.errors.handlers.IdNotFoundExceptionHandler.handle(IdNotFoundExceptionHandler.java:32), 	at io.airbyte.commons.server.errors.handlers.IdNotFoundExceptionHandler.handle(IdNotFoundExceptionHandler.java:23), 	at io.micronaut.http.server.RequestLifecycle.lambda$handlerExceptionHandler$10(RequestLifecycle.java:308), 	at io.micronaut.http.server.RequestLifecycle.handlerExceptionHandler(RequestLifecycle.java:319), 	at io.micronaut.http.server.RequestLifecycle.onErrorNoFilter(RequestLifecycle.java:248), 	at io.micronaut.http.server.RequestLifecycle.lambda$onErrorNoFilter$2(RequestLifecycle.java:210), 	at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onErrorResume(ImperativeExecutionFlowImpl.java:112), 	at io.micronaut.core.execution.DelayedExecutionFlowImpl$OnErrorResume.apply(DelayedExecutionFlowImpl.java:313), 	at io.micronaut.core.execution.DelayedExecutionFlowImpl.work(DelayedExecutionFlowImpl.java:51), 	at io.micronaut.core.execution.DelayedExecutionFlowImpl.complete0(DelayedExecutionFlowImpl.java:64), 	at io.micronaut.core.execution.DelayedExecutionFlowImpl.completeExceptionally(DelayedExecutionFlowImpl.java:75), 	at io.micronaut.core.execution.ExecutionFlow.lambda$async$0(ExecutionFlow.java:92), 	at io.micronaut.core.execution.ImperativeExecutionFlowImpl.onComplete(ImperativeExecutionFlowImpl.java:132), 	at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87), 	at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211), 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144), 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642), 	at java.base/java.lang.Thread.run(Thread.java:1583), Caused by: io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0, 	at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248), 	at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69), 	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28), 	at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69), 	at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source), 	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461), 	at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350), 	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272), 	at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38), 	at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498), 	at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475), 	... 5 more]
    rootCauseExceptionClassName: java.lang.Class
    rootCauseExceptionStack: [io.airbyte.commons.server.errors.IdNotFoundKnownException: Could not find attempt stats for job_id: 2199 and attempt no: 0, 	at io.airbyte.commons.server.handlers.AttemptHandler.getAttemptCombinedStats(AttemptHandler.java:248), 	at io.airbyte.server.apis.AttemptApiController.lambda$getAttemptCombinedStats$2(AttemptApiController.java:69), 	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:28), 	at io.airbyte.server.apis.AttemptApiController.getAttemptCombinedStats(AttemptApiController.java:69), 	at io.airbyte.server.apis.$AttemptApiController$Definition$Exec.dispatch(Unknown Source), 	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461), 	at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350), 	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272), 	at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38), 	at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498), 	at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475), 	at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87), 	at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211), 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144), 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642), 	at java.base/java.lang.Thread.run(Thread.java:1583)]
}
@reidab
Copy link
Contributor

reidab commented Nov 15, 2024

@helcim-adam-c There's a chance this is a similar issue to the one I described in #48502. Is CONTAINER_ORCHESTRATOR_SECRET_NAME set to the correct value on your worker and workload launcher pods?

@helcim-adam-c
Copy link
Author

@reidab Thanks we took a look at that environment variable and seems like it's getting templated correctly on our worker and workerload launcher pods. I'm just seeing this error on the airbyte-server pod when we run a job.

@marcosmarxm marcosmarxm changed the title Using GCS Results in could not find attempt stats for job_id [helm] Using GCS Results in could not find attempt stats for job_id Nov 18, 2024
@marcosmarxm
Copy link
Member

I included the issue to the platform team responsible for this part of the code.
@airbytehq/platform-move can someone take a look into this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform community team/platform-move type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants