Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-common-workers: Collect trace message on failed connection_status #20721

Merged
merged 29 commits into from
Jan 12, 2023

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Dec 20, 2022

What

Closes #18865

Synchronous commands like SPEC, DISCOVER and CHECK can output trace messages that are not currently captured as failure reasons when the process exits with a 0 status code.

This PR aims at normalizing the behavior of these three commands:

  • Do not consider the process exit code to characterize an error
  • When a trace message is received: always add it to the job output
  • When no trace message or catalog/connection status/spec is received -> throw a WorkerException

How

Implement an explicit logic in:

public ConnectorJobOutput run(final JobGetSpecConfig config, final Path jobRoot) throws WorkerException {

public ConnectorJobOutput run(final StandardCheckConnectionInput input, final Path jobRoot) throws WorkerException {

public ConnectorJobOutput run(final StandardDiscoverCatalogInput discoverSchemaInput, final Path jobRoot) throws WorkerException {

  • Update the unit tests

@alafanechere alafanechere temporarily deployed to more-secrets December 20, 2022 15:37 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 20, 2022 15:37 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 20, 2022 16:03 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 20, 2022 16:04 — with GitHub Actions Inactive
Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this on a call, but leaving a comment for posterity:

Looks good overall, but we should:

  • Add testing for the changes on both WorkerUtils and DefaultCheckConnectionWorker
  • Look into following the same approach of always looking for trace messages in the other synchronous commands (spec, discover)

@alafanechere alafanechere marked this pull request as ready for review December 21, 2022 18:29
@alafanechere
Copy link
Contributor Author

@pedroslopez I tried to increase the shared logic between SPEC, DISCOVER and CHECK. I'd be interested in getting your opinion about the throwOnNoFailureReason approach.

@alafanechere alafanechere temporarily deployed to more-secrets December 21, 2022 18:30 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 21, 2022 18:31 — with GitHub Actions Inactive
Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good logically, most of my comments are around making things a bit clearer but non-blocking.

The only thing I'd like to see before approving is some additional tests for WorkerUtils / Default***Workers.

@alafanechere alafanechere temporarily deployed to more-secrets December 27, 2022 11:09 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 27, 2022 11:09 — with GitHub Actions Inactive
@alafanechere
Copy link
Contributor Author

@pedroslopez I tried to make things a bit more readable and explicit:
We do not check the exit code anymore: we throw a WorkerException if:

  • we can't find the object we're trying to read from the output (connection status, spec, catalog)
  • AND we don't have a failure reason

This logic is kept within the worker classes (Default*Worker.java), the Worker utils class only has utils to extract useful objects from messages.

I'll add the tests if the overall logic looks good to you.

@alafanechere alafanechere temporarily deployed to more-secrets December 27, 2022 14:40 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets December 27, 2022 14:41 — with GitHub Actions Inactive
Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this makes it much clearer. Only one comment about reassigning the job output that is non-blocking - just waiting on the tests 😄

@octavia-squidington-iv octavia-squidington-iv added area/platform issues related to the platform area/worker Related to worker labels Dec 28, 2022
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:20 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:20 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:20 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:20 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:34 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:34 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:34 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:34 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:58 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 17:58 — with GitHub Actions Inactive
@alafanechere alafanechere enabled auto-merge (squash) January 12, 2023 19:23
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 19:24 — with GitHub Actions Inactive
@alafanechere alafanechere temporarily deployed to more-secrets January 12, 2023 19:25 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 12, 2023 21:14 — with GitHub Actions Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets January 12, 2023 21:14 — with GitHub Actions Inactive
@alafanechere alafanechere merged commit 2dc5b2f into master Jan 12, 2023
@alafanechere alafanechere deleted the augustin/check/collect-trace-message-on-0-exit branch January 12, 2023 21:49
Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good -- I added a few clarifying questions mostly for my own understanding! Following the logic, it looks like this

  • looks for a connection status message
  • looks for failure reasons in check connection messages
  • sets the failure reason of the job if there's failure reasons
  • if we got a connection status message, set the checkConnection output of the jobOutput
  • if no connection status message and no failure reason, throw
  • return the jobOutput

I followed the logic for checkConnection, and glanced over the other 2 jobs.

A few things that might have helped understand this better at first glance of a new code base:

  • the existence CONNECTION_STATUS messages are being used to determine success, I think. could one consider a connectionStatus to be something like a connectionSuccess or connectionConfirmation?
  • I wonder if it would be worth adding a method isSuccess or isSuccessful that returns output.getCheckConnection != null? (and similar for the other methods)? On spec, it's relatively intuitive that if getSpec is null, the job failed, but for getCheckConnection, as someone new to the code base, it feels like a method named getCheckConnection could contain information about a failed job or a successful job - I have to look at the code to see that it only gets set in the case of success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/server area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect trace message on failed connection_status
5 participants