Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-ci: run poe tasks declared in pyproject.toml file of internal poetry packages #34736

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Feb 1, 2024

What

Closes #33880
This PR fundamentally changes how the airbyte-ci test command operates.
This command is made to test internal packages, and will be renamed to airbyte-ci poetry ci in a follow up PR.

The change is the following:

  • Detect modified internal packages on the current branch
  • Dynamically run poe tasks declared in the pyproject.toml of these packages
  • Customize CI execution with other options declared in [tool.airbyte-ci] (e.g. mounting the docker socket etc.)
  • Run everything in parallel: modified packages and poe tasks

It follows the same pattern as airbyte-ci connectors test:

  • You can pass multiple --poetry-package-path to run CI on
  • We expose a --modified option which is handy in CI context to run CI on packages modified on the branch
  • We remove as much logic as possible from the GHA workflow

Recommended reading order

  1. The command entrypoint: airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/commands.py
  2. The list of packages that are declared internal: airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/__init__.py
  3. The 🍖 of the change: airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/pipeline.py having all the logic to run containerized poe tasks in parallel
  4. A refactor to our classes handling results: airbyte-ci/connectors/pipelines/pipelines/models/steps.py

🚨 User Impact 🚨

  • New internal packages must be declared in the list in airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/__init__.py
  • New internal packages must declare poe task and an airbyte-ci config in their pyproject.toml to be run automatically in CI
  • Detailed CI checks are added to PRs during poe task execution:
Screenshot 2024-02-02 at 10 25 55

Regression checks

Follow up work

  • Migrate the airbyte-ci test command to airbyte-ci poetry ci

Copy link

vercel bot commented Feb 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 7, 2024 4:43pm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worklow is gettign leaner as:

  • The detection of modified packages happens at airbyte-ci execution time
  • The packages test option are now dynamically loaded from their pyproject.toml files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's renamed to internal_poetry_packages_ci.yml in a downstream PR, but I kept the current name on this branch to make sure the edited workflow is triggered as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed as CAT test will be triggered on modification in the airbyte-ci-tests.yml workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

@@ -263,18 +262,3 @@ async def connectors(
ctx.obj["enable_dependency_scanning"],
)
log_selected_connectors(ctx.obj["selected_connectors_with_modified_files"])


async def get_modified_files(git_branch: str, git_revision: str, diffed_branch: str, is_local: bool, ci_context: CIContext) -> Set[str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this function in the pipelines.helpers.git module as its reused in airbyte-ci test command for file change detection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the CommandResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command entrypoint got leaner:

  • We manage package change detection
  • Parallelize the package processing in a task group
  • Call pipeline.run_poe_tasks_for_package where all the magic happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the different Result class here:

  • StepResult (used in connectors ci)
  • CommandResult (used in format)
  • PoeTaskResult (used in poetry package testing)
    They all inherit from the same dataclass.
    Dataclass enforces the declaration of positional arguments to come before the named arguments, this gets mixed up with inheritance, this is why I used kw_only=True: all parameters should become named arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewer: there's no logical change to this file.
The use of kw_only in the StepResult dataclass required this change.

@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch from 5b8c2ea to 62dd35b Compare February 1, 2024 15:32
@alafanechere alafanechere marked this pull request as ready for review February 1, 2024 15:35
@alafanechere alafanechere requested review from a team, erohmensing and girarda February 1, 2024 15:35
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch from 62dd35b to c16e156 Compare February 2, 2024 07:23
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch 4 times, most recently from 3059f88 to d3814e5 Compare February 2, 2024 10:02
@alafanechere alafanechere force-pushed the augustin/02-01-internal_poetry_packages_declare_poe_tasks_and_airbyte-ci_sections_in_pyproject.toml branch from f29b0cd to c3f2ec3 Compare February 2, 2024 10:08
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch from d3814e5 to 76338b4 Compare February 2, 2024 10:08

from pathlib import Path

INTERNAL_POETRY_PACKAGES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific to poetry? Why not call these "internal_packages", or "utility_packages"?

context: the fact that we manage dependencies doesn't seem relevant. what we care about is that this set of packages is part of a single group

Copy link
Contributor Author

@alafanechere alafanechere Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named this INTERNAL_POETRY_PACKAGES because the values of this list conditions the options value at the CLI level.
E.G airbyte-ci test --poetry-package-path=<non-manage-by-poetry-package> will fail because the path does not match a value in this list.

If we'd call it INTERNAL_PACKAGES I think the airbyte-cdk should be in there 😄 but it's not powered by poetry so it can't be in this list.

I think a nicer approach would be to glob for pyproject.toml file and keep the paths which are not in the connectors folder... Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd call it INTERNAL_PACKAGES I think the airbyte-cdk should be in there
that's fair.

I think a nicer approach would be to glob for pyproject.toml file and keep the paths which are not in the connectors folder... Wdyt?
I don't think this will work because the CDK also has a pyproject.

I'm fine with the name if the fact that the projects are using poetry matters, which appears to be the case since it's required by the poetry tasks.

@@ -95,3 +96,18 @@ def get_git_repo() -> git.Repo:
def get_git_repo_path() -> str:
"""Retrieve the git repo path."""
return str(get_git_repo().working_tree_dir)


async def get_modified_files(git_branch: str, git_revision: str, diffed_branch: str, is_local: bool, ci_context: CIContext) -> Set[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool. we rely on a GH action to know which files are modified when running mypy on the cdk. It's not great. I'm looking forward to integrating cdk development in airbyte-ci

"""A dataclass to capture the result of a step."""

step: Step
@dataclass(kw_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not frozen anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's mutated in some tests... I can try to patch it at test time but was not sure how important it is to keep it frozen.
I'll try to keep it frozen for sanity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely feels safer to keep frozen unless needed. Why are the objects mutated in the tests if they're not mutated in the production code paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda I figured that the __post_init__ was the reason a mutation happened.
With the __post_init__ hook we reset the stdout and stderr attribute to a new value to redact secrets from these attributes.
Setting these attributes in a different way avoids the error and allow us to keep frozen classes:

    def __post_init__(self) -> None:
        if self.stderr:
            object.__setattr__(self, "stderr", self.redact_secrets_from_string(self.stderr))
        if self.stdout:
            object.__setattr__(self, "stdout", self.redact_secrets_from_string(self.stdout))

@octavia-squidington-iv octavia-squidington-iv requested a review from a team February 3, 2024 02:10
Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small comments but nothing I feel the need to block on ✅

if pipeline_context.params["modified"]:
poetry_package_paths = await find_modified_internal_packages(pipeline_context)

return poetry_package_paths.union(set(pipeline_context.params["poetry_package_paths"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command description says --modified should be a filter to only run on modified internal packages, but this seems to run both on the explicitly specific packages as well as the modified ones.

Which behavior is correct?

"""A dataclass to capture the result of a step."""

step: Step
@dataclass(kw_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely feels safer to keep frozen unless needed. Why are the objects mutated in the tests if they're not mutated in the production code paths?

@alafanechere alafanechere force-pushed the augustin/02-01-internal_poetry_packages_declare_poe_tasks_and_airbyte-ci_sections_in_pyproject.toml branch from c3f2ec3 to 41ced4b Compare February 7, 2024 13:43
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch from 76338b4 to f8e0ca3 Compare February 7, 2024 13:44
@alafanechere alafanechere force-pushed the augustin/02-01-internal_poetry_packages_declare_poe_tasks_and_airbyte-ci_sections_in_pyproject.toml branch from 41ced4b to 2061f6c Compare February 7, 2024 14:30
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch 2 times, most recently from cf56130 to 56cb825 Compare February 7, 2024 15:04
Base automatically changed from augustin/02-01-internal_poetry_packages_declare_poe_tasks_and_airbyte-ci_sections_in_pyproject.toml to master February 7, 2024 15:08
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch 2 times, most recently from 25e7d08 to 97f7afe Compare February 7, 2024 16:21
@alafanechere alafanechere force-pushed the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch from 97f7afe to 6197f20 Compare February 7, 2024 16:43
@alafanechere alafanechere merged commit 5af9696 into master Feb 7, 2024
24 checks passed
@alafanechere alafanechere deleted the augustin/02-01-airbyte-ci_run_poe_tasks_declared_in_pyproject.toml_file_of_internal_poetry_packages branch February 7, 2024 17:02
Copy link

sentry-io bot commented Feb 7, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ TransportError: Unexpected response from engine: Server error '502 Bad Gateway' for url 'http://127.0.0.1:41655/q... dagger.client._core in execute View Issue
  • ‼️ LiveError: Only one live display may be active at once rich.console in set_live View Issue
  • ‼️ TransportError: Unexpected response from engine: Server error '502 Bad Gateway' for url 'http://127.0.0.1:34443/q... dagger.client._core in execute View Issue
  • ‼️ ExecError: process "git remote add --fetch --track augustin/02-12-source-faker_adopt_our_base_image --track ... dagger.client._core in execute View Issue
  • ‼️ TransportError: Unexpected response from engine: Server error '502 Bad Gateway' for url 'http://127.0.0.1:45535/q... pipelines.helpers.git in get_modified_files_in_... View Issue

Did you find this useful? React with a 👍 or 👎

xiaohansong pushed a commit that referenced this pull request Feb 13, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
xiaohansong pushed a commit that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

airbyte-ci: evolutive CI for poetry packages
2 participants