-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create auto_merge package #38019
create auto_merge package #38019
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# `Auto merge` | ||
|
||
## Purpose | ||
|
||
This Python package is made to merge pull requests automatically on the Airbyte Repo. It is used in | ||
the [following workflow](.github/workflows/auto_merge.yml). | ||
|
||
A pull request is currently considered as auto-mergeable if: | ||
|
||
- It has the `auto-merge` Github label | ||
- It only modifies files in connector-related directories | ||
- All the required checks have passed | ||
|
||
We want to auto-merge a specific set of connector pull requests to simplify the connector updates in | ||
the following use cases: | ||
|
||
- Pull requests updating Python dependencies or the connector base image | ||
- Community contributions when they've been reviewed and approved by our team but CI is still | ||
running: to avoid an extra review iteration just to check CI status. | ||
|
||
## Install and usage | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: |
||
|
||
### Get a Github token | ||
|
||
You need to create a Github token with the following permissions: | ||
|
||
- Read access to the repository to list open pull requests and their statuses | ||
- Write access to the repository to merge pull requests | ||
|
||
### Local install and run | ||
|
||
``` | ||
poetry install | ||
export GITHUB_TOKEN=<your_github_token> | ||
# By default no merge will be done, you need to set the AUTO_MERGE_PRODUCTION environment variable to true to actually merge the PRs | ||
poetry run auto-merge | ||
``` | ||
|
||
### In CI | ||
|
||
``` | ||
export GITHUB_TOKEN=<your_github_token> | ||
export AUTO_MERGE_PRODUCTION=true | ||
poetry install | ||
poetry run auto-merge | ||
``` | ||
|
||
The execution will set the `GITHUB_STEP_SUMMARY` env var with a markdown summary of the PRs that | ||
have been merged. |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
[tool.poetry] | ||
name = "auto-merge" | ||
version = "0.1.0" | ||
description = "" | ||
authors = ["Airbyte <contact@airbyte.io>"] | ||
readme = "README.md" | ||
packages = [ | ||
{ include = "auto_merge", from = "src" }, | ||
] | ||
|
||
[tool.poetry.dependencies] | ||
python = "^3.10" | ||
pygithub = "^2.3.0" | ||
anyio = "^4.3.0" | ||
|
||
|
||
[tool.poetry.group.dev.dependencies] | ||
mypy = "^1.10.0" | ||
ruff = "^0.4.3" | ||
pytest = "^8.2.0" | ||
pyinstrument = "^4.6.2" | ||
|
||
[tool.ruff.lint] | ||
select = [ | ||
"I" # isort | ||
] | ||
|
||
[tool.poetry.scripts] | ||
auto-merge = "auto_merge.main:auto_merge" | ||
|
||
[build-system] | ||
requires = ["poetry-core"] | ||
build-backend = "poetry.core.masonry.api" | ||
|
||
[tool.poe.tasks] | ||
test = "pytest tests" | ||
type_check = "mypy src --disallow-untyped-defs" | ||
lint = "ruff check src" | ||
|
||
[tool.airbyte_ci] | ||
optional_poetry_groups = ["dev"] | ||
poe_tasks = ["type_check", "lint",] |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -0,0 +1,12 @@ | ||||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||||
|
||||
from __future__ import annotations | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think |
||||
|
||||
AIRBYTE_REPO = "airbytehq/airbyte" | ||||
AUTO_MERGE_LABEL = "auto-merge" | ||||
BASE_BRANCH = "master" | ||||
CONNECTOR_PATH_PREFIXES = { | ||||
"airbyte-integrations/connectors", | ||||
"docs/integrations/sources", | ||||
"docs/integrations/destinations", | ||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
|
||
from __future__ import annotations | ||
|
||
import os | ||
|
||
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] | ||
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💎 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not required for this PR, but we have various ways that we define truthy in airbyte-ci. At some point we should probably try to make them consistent. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
|
||
from __future__ import annotations | ||
|
||
import time | ||
from typing import TYPE_CHECKING | ||
|
||
if TYPE_CHECKING: | ||
from github.PullRequest import PullRequest | ||
|
||
|
||
def generate_job_summary_as_markdown(merged_prs: list[PullRequest]) -> str: | ||
"""Generate a markdown summary of the merged PRs | ||
|
||
Args: | ||
merged_prs (list[PullRequest]): The PRs that were merged | ||
|
||
Returns: | ||
str: The markdown summary | ||
""" | ||
summary_time = time.strftime("%Y-%m-%d %H:%M:%S") | ||
header = "# Auto-merged PRs" | ||
details = f"Summary generated at {summary_time}" | ||
if not merged_prs: | ||
return f"{header}\n\n{details}\n\n**No PRs were auto-merged**\n" | ||
merged_pr_list = "\n".join([f"- [#{pr.number} - {pr.title}]({pr.html_url})" for pr in merged_prs]) | ||
return f"{header}\n\n{details}\n\n{merged_pr_list}\n" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
|
||
from __future__ import annotations | ||
|
||
import logging | ||
import os | ||
import time | ||
from collections.abc import Iterable, Iterator | ||
from contextlib import contextmanager | ||
from typing import TYPE_CHECKING | ||
|
||
from github import Auth, Github | ||
|
||
from .consts import AIRBYTE_REPO, AUTO_MERGE_LABEL, BASE_BRANCH, CONNECTOR_PATH_PREFIXES | ||
from .env import GITHUB_TOKEN, PRODUCTION | ||
from .helpers import generate_job_summary_as_markdown | ||
from .pr_validators import ENABLED_VALIDATORS | ||
|
||
if TYPE_CHECKING: | ||
from git.luolix.topmit import Commit as GithubCommit | ||
from github.PullRequest import PullRequest | ||
from github.Repository import Repository as GithubRepo | ||
|
||
logging.basicConfig() | ||
logger = logging.getLogger("auto_merge") | ||
logger.setLevel(logging.INFO) | ||
|
||
|
||
@contextmanager | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💎 Oooh gettting fancy I see |
||
def github_client() -> Iterator[Github]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Neat! |
||
client = None | ||
try: | ||
client = Github(auth=Auth.Token(GITHUB_TOKEN), seconds_between_requests=0) | ||
yield client | ||
finally: | ||
if client: | ||
client.close() | ||
|
||
|
||
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Ok so here is the most important logic in the system. The business logic. Its important that its
I think youve achieved that so this is suggestion is very optional: What if we used a validator pattern def _validate_has_auto_merge_label((head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool, Optional[str]:
has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels)
if not has_auto_merge_label:
return False, f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label"
return True, None
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
validators = [
_validate_has_auto_merge_label,
# ...
]
for pr_validator in validators:
is_valid, error = pr_validator(head_commit, pr, required_checks)
if not is_valid
logging.info(error)
return False (this is an example and could be improved (Named Why this way?
Anyway again very optional |
||
"""Run all enabled validators and return if they all pass. | ||
|
||
Args: | ||
head_commit (GithubCommit): The head commit of the PR | ||
pr (PullRequest): The PR to check | ||
required_checks (set[str]): The set of required passing checks | ||
|
||
Returns: | ||
bool: True if the PR is auto-mergeable, False otherwise | ||
""" | ||
for validator in ENABLED_VALIDATORS: | ||
is_valid, error = validator(head_commit, pr, required_checks) | ||
if not is_valid: | ||
if error: | ||
logger.info(f"PR #{pr.number} - {error}") | ||
return False | ||
return True | ||
|
||
|
||
def process_pr(repo: GithubRepo, pr: PullRequest, required_passing_contexts: set[str], dry_run: bool) -> None | PullRequest: | ||
"""Process a PR to see if it is auto-mergeable and merge it if it is. | ||
|
||
Args: | ||
repo (GithubRepo): The repository the PR is in | ||
pr (PullRequest): The PR to process | ||
required_passing_contexts (set[str]): The set of required passing checks | ||
dry_run (bool): Whether to actually merge the PR or not | ||
|
||
Returns: | ||
None | PullRequest: The PR if it was merged, None otherwise | ||
""" | ||
logger.info(f"Processing PR #{pr.number}") | ||
head_commit = repo.get_commit(sha=pr.head.sha) | ||
if check_if_pr_is_auto_mergeable(head_commit, pr, required_passing_contexts): | ||
if not dry_run: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💎 |
||
pr.merge() | ||
logger.info(f"PR #{pr.number} was auto-merged") | ||
return pr | ||
else: | ||
logger.info(f"PR #{pr.number} is auto-mergeable but dry-run is enabled") | ||
return None | ||
|
||
|
||
def back_off_if_rate_limited(github_client: Github) -> None: | ||
"""Sleep if the rate limit is reached | ||
|
||
Args: | ||
github_client (Github): The Github client to check the rate limit of | ||
""" | ||
remaining_requests, _ = github_client.rate_limiting | ||
if remaining_requests < 100: | ||
logging.warning(f"Rate limit almost reached. Remaining requests: {remaining_requests}") | ||
if remaining_requests == 0: | ||
logging.warning(f"Rate limited. Sleeping for {github_client.rate_limiting_resettime - time.time()} seconds") | ||
time.sleep(github_client.rate_limiting_resettime - time.time()) | ||
return None | ||
|
||
|
||
def auto_merge() -> None: | ||
"""Main function to auto-merge PRs that are candidates for auto-merge. | ||
If the AUTO_MERGE_PRODUCTION environment variable is not set to "true", this will be a dry run. | ||
""" | ||
dry_run = PRODUCTION is False | ||
with github_client() as gh_client: | ||
repo = gh_client.get_repo(AIRBYTE_REPO) | ||
main_branch = repo.get_branch(BASE_BRANCH) | ||
logger.info(f"Fetching required passing contexts for {BASE_BRANCH}") | ||
required_passing_contexts = set(main_branch.get_required_status_checks().contexts) | ||
candidate_issues = gh_client.search_issues(f"repo:{AIRBYTE_REPO} is:pr label:{AUTO_MERGE_LABEL} base:{BASE_BRANCH} state:open") | ||
prs = [issue.as_pull_request() for issue in candidate_issues] | ||
logger.info(f"Found {len(prs)} open PRs targeting {BASE_BRANCH} with the {AUTO_MERGE_LABEL} label") | ||
merged_prs = [] | ||
for pr in prs: | ||
back_off_if_rate_limited(gh_client) | ||
if merged_pr := process_pr(repo, pr, required_passing_contexts, dry_run): | ||
merged_prs.append(merged_pr) | ||
if PRODUCTION: | ||
os.environ["GITHUB_STEP_SUMMARY"] = generate_job_summary_as_markdown(merged_prs) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | ||
|
||
from __future__ import annotations | ||
|
||
from typing import TYPE_CHECKING, Optional, Tuple | ||
|
||
from .consts import AUTO_MERGE_LABEL, BASE_BRANCH, CONNECTOR_PATH_PREFIXES | ||
|
||
if TYPE_CHECKING: | ||
from git.luolix.topmit import Commit as GithubCommit | ||
from github.PullRequest import PullRequest | ||
|
||
|
||
def has_auto_merge_label(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]: | ||
has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels) | ||
if not has_auto_merge_label: | ||
return False, f"does not have the {AUTO_MERGE_LABEL} label" | ||
return True, None | ||
|
||
|
||
def targets_main_branch(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]: | ||
if not pr.base.ref == BASE_BRANCH: | ||
return False, f"does not target {BASE_BRANCH}" | ||
return True, None | ||
|
||
|
||
def only_modifies_connectors(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]: | ||
modified_files = pr.get_files() | ||
for file in modified_files: | ||
if not any(file.filename.startswith(prefix) for prefix in CONNECTOR_PATH_PREFIXES): | ||
return False, "is not only modifying connectors" | ||
return True, None | ||
|
||
|
||
def head_commit_passes_all_required_checks( | ||
head_commit: GithubCommit, pr: PullRequest, required_checks: set[str] | ||
) -> Tuple[bool, Optional[str]]: | ||
successful_status_contexts = [commit_status.context for commit_status in head_commit.get_statuses() if commit_status.state == "success"] | ||
successful_check_runs = [check_run.name for check_run in head_commit.get_check_runs() if check_run.conclusion == "success"] | ||
successful_contexts = set(successful_status_contexts + successful_check_runs) | ||
if not required_checks.issubset(successful_contexts): | ||
return False, "not all required checks passed" | ||
return True, None | ||
|
||
|
||
# A PR is considered auto-mergeable if: | ||
# - it has the AUTO_MERGE_LABEL | ||
# - it targets the BASE_BRANCH | ||
# - it touches only files in CONNECTOR_PATH_PREFIXES | ||
# - the head commit passes all required checks | ||
|
||
# PLEASE BE CAREFUL OF THE VALIDATOR ORDERING | ||
# Let's declared faster checks first as the check_if_pr_is_auto_mergeable function fails fast. | ||
ENABLED_VALIDATORS = [has_auto_merge_label, targets_main_branch, only_modifies_connectors, head_commit_passes_all_required_checks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding it to the index!
Nit: can you also add the live testing tool to this index in here or separate quick PR?