-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airbyte-ci: run poe tasks declared in pyproject.toml file of internal poetry packages #34736
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @alafanechere and the rest of your teammates on Graphite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worklow is gettign leaner as:
- The detection of modified packages happens at airbyte-ci execution time
- The packages test option are now dynamically loaded from their
pyproject.toml
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's renamed to internal_poetry_packages_ci.yml
in a downstream PR, but I kept the current name on this branch to make sure the edited workflow is triggered as it is.
.github/workflows/cat-tests.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed as CAT test will be triggered on modification in the airbyte-ci-tests.yml
workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
@@ -263,18 +262,3 @@ async def connectors( | |||
ctx.obj["enable_dependency_scanning"], | |||
) | |||
log_selected_connectors(ctx.obj["selected_connectors_with_modified_files"]) | |||
|
|||
|
|||
async def get_modified_files(git_branch: str, git_revision: str, diffed_branch: str, is_local: bool, ci_context: CIContext) -> Set[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this function in the pipelines.helpers.git
module as its reused in airbyte-ci test
command for file change detection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the CommandResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command entrypoint got leaner:
- We manage package change detection
- Parallelize the package processing in a task group
- Call
pipeline.run_poe_tasks_for_package
where all the magic happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the different Result
class here:
StepResult
(used in connectors ci)CommandResult
(used in format)PoeTaskResult
(used in poetry package testing)
They all inherit from the same dataclass.
Dataclass enforces the declaration of positional arguments to come before the named arguments, this gets mixed up with inheritance, this is why I usedkw_only=True
: all parameters should become named arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewer: there's no logical change to this file.
The use of kw_only
in the StepResult
dataclass required this change.
5b8c2ea
to
62dd35b
Compare
62dd35b
to
c16e156
Compare
3059f88
to
d3814e5
Compare
f29b0cd
to
c3f2ec3
Compare
d3814e5
to
76338b4
Compare
|
||
from pathlib import Path | ||
|
||
INTERNAL_POETRY_PACKAGES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specific to poetry? Why not call these "internal_packages", or "utility_packages"?
context: the fact that we manage dependencies doesn't seem relevant. what we care about is that this set of packages is part of a single group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named this INTERNAL_POETRY_PACKAGES
because the values of this list conditions the options value at the CLI level.
E.G airbyte-ci test --poetry-package-path=<non-manage-by-poetry-package>
will fail because the path does not match a value in this list.
If we'd call it INTERNAL_PACKAGES
I think the airbyte-cdk should be in there 😄 but it's not powered by poetry so it can't be in this list.
I think a nicer approach would be to glob for pyproject.toml
file and keep the paths which are not in the connectors folder... Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd call it INTERNAL_PACKAGES I think the airbyte-cdk should be in there
that's fair.
I think a nicer approach would be to glob for pyproject.toml file and keep the paths which are not in the connectors folder... Wdyt?
I don't think this will work because the CDK also has a pyproject.
I'm fine with the name if the fact that the projects are using poetry matters, which appears to be the case since it's required by the poetry tasks.
@@ -95,3 +96,18 @@ def get_git_repo() -> git.Repo: | |||
def get_git_repo_path() -> str: | |||
"""Retrieve the git repo path.""" | |||
return str(get_git_repo().working_tree_dir) | |||
|
|||
|
|||
async def get_modified_files(git_branch: str, git_revision: str, diffed_branch: str, is_local: bool, ci_context: CIContext) -> Set[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cool. we rely on a GH action to know which files are modified when running mypy on the cdk. It's not great. I'm looking forward to integrating cdk development in airbyte-ci
"""A dataclass to capture the result of a step.""" | ||
|
||
step: Step | ||
@dataclass(kw_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not frozen anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's mutated in some tests... I can try to patch it at test time but was not sure how important it is to keep it frozen.
I'll try to keep it frozen for sanity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely feels safer to keep frozen unless needed. Why are the objects mutated in the tests if they're not mutated in the production code paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@girarda I figured that the __post_init__
was the reason a mutation happened.
With the __post_init__
hook we reset the stdout
and stderr
attribute to a new value to redact secrets from these attributes.
Setting these attributes in a different way avoids the error and allow us to keep frozen classes:
def __post_init__(self) -> None:
if self.stderr:
object.__setattr__(self, "stderr", self.redact_secrets_from_string(self.stderr))
if self.stdout:
object.__setattr__(self, "stdout", self.redact_secrets_from_string(self.stdout))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two small comments but nothing I feel the need to block on ✅
if pipeline_context.params["modified"]: | ||
poetry_package_paths = await find_modified_internal_packages(pipeline_context) | ||
|
||
return poetry_package_paths.union(set(pipeline_context.params["poetry_package_paths"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the command description says --modified should be a filter to only run on modified internal packages, but this seems to run both on the explicitly specific packages as well as the modified ones.
Which behavior is correct?
"""A dataclass to capture the result of a step.""" | ||
|
||
step: Step | ||
@dataclass(kw_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely feels safer to keep frozen unless needed. Why are the objects mutated in the tests if they're not mutated in the production code paths?
c3f2ec3
to
41ced4b
Compare
76338b4
to
f8e0ca3
Compare
41ced4b
to
2061f6c
Compare
cf56130
to
56cb825
Compare
25e7d08
to
97f7afe
Compare
97f7afe
to
6197f20
Compare
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
… poetry packages (#34736)
… poetry packages (#34736)
What
Closes #33880
This PR fundamentally changes how the
airbyte-ci test
command operates.This command is made to test internal packages, and will be renamed to
airbyte-ci poetry ci
in a follow up PR.The change is the following:
poe
tasks declared in thepyproject.toml
of these packages[tool.airbyte-ci]
(e.g. mounting the docker socket etc.)It follows the same pattern as
airbyte-ci connectors test
:--poetry-package-path
to run CI on--modified
option which is handy in CI context to run CI on packages modified on the branchRecommended reading order
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/commands.py
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/__init__.py
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/pipeline.py
having all the logic to run containerized poe tasks in parallelairbyte-ci/connectors/pipelines/pipelines/models/steps.py
🚨 User Impact 🚨
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/test/__init__.py
pyproject.toml
to be run automatically in CIRegression checks
Step
refactoairbyte-ci connectors --name=source-faker --name=source-postgres test
airbyte-ci connectors --name=source-pokeapi --name=destination-bigquery publish --pre-release
Follow up work
airbyte-ci test
command toairbyte-ci poetry ci