Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement persistent bucket fixtures for integration tests #301

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

kris-konina-reef
Copy link
Collaborator

Key changes:

  • Tests now share a smaller pool of buckets while maintaining isolation thanks to a new PersistentBucketAggregate class.
  • Fixtures and utility functions were added to manage these persistent buckets.
  • Existing tests were updated to use this new functionality.

Benefits:

  • Reduced bucket usage from 53 to 37 and average test execution time by ~14%.
  • Implemented lifecycle rules to automatically clean up resources and minimize potential costs.

Copy link

@mjurbanski-reef mjurbanski-reef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two major comments:

  • we need to have the same solution in SDK
  • IMO we should rely on Life cycle rules more instead of doing time consuming process of manual one-by-one file deletion

@@ -0,0 +1 @@
Introduce PersistentBucketAggregate class to manage bucket name and subfolder.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users won't care for these details in CLI tool changelog

btw, what is your plan for SDK? you started with CLI, but won't this be harder to apply the same thing to SDK?
I can kinda see these changelog message make sense if they were part of public api of SDK that is used in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware SDK was part of the same ticket, will proceed with it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I wouldn't rely on tickets' description much, more on what makes sense and fixing problem in one place while identical exists in another - doesn't

@@ -0,0 +1 @@
Update integration tests to use persistent buckets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like duplicate

IMO as they are right now all of these would go under our "infrastracture" change category i.e. not something that typical users care about, since this is not what they get with the shipped version of CLI tool.

# of a persistent bucket, whose identity is shared across tests.
persistent_bucket = get_or_create_persistent_bucket(b2_api)
b2_api.clean_bucket(persistent_bucket)
b2_api.api.list_buckets()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this call here for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlooked, redundant

# when tests tear down, as otherwise we'd lose the main benefit
# of a persistent bucket, whose identity is shared across tests.
persistent_bucket = get_or_create_persistent_bucket(b2_api)
b2_api.clean_bucket(persistent_bucket)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this break concurrently run tests?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why lifecycle rules was suggested instead to cleanup in the first place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cleans the buckets once, before the tests run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so what will happen if I someone opens a second PR when one is being tested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same thing that would happen if no changes were introduced—such a scenario disrupts the test bucket lifecycle, persistent or not. Question is, is it frequent enough to warrant addressing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket cleanup process only removed stale buckets, not all of them, so before we did support concurrent GHA jobs, and this change breaks it, and for no reason AFAIK, since the solution is to simply to leave that bucket alone forever.

test/integration/persistent_bucket.py Outdated Show resolved Hide resolved


@backoff.on_exception(backoff.expo, Exception, max_tries=3, max_time=10)
def delete_files(bucket: Bucket, subfolder: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def delete_files(bucket: Bucket, subfolder: str):
def delete_files(bucket: Bucket, subfolder: str | None = None):

seems like these two methods should be combined

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like these calls are going:
a) slow down tests (so +++ cost on development side & GHA)
b) incur extra cost on B2 API side - but I guess that doesn't matter much for us

While simply waiting for lifecycle rules to do their thing, also means some extra B2 storage costs, but is much faster since we have to do nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the exec time and the number of buckets used dropped as compared to the state before, I was clearly overcorrecting by clearing the subfolders between cases, trying to fully mimic the behavior of recreated buckets. Clearly redundant.

test/integration/persistent_bucket.py Outdated Show resolved Hide resolved
Comment on lines 56 to 60
# CI environment
repo_id = os.environ.get("GITHUB_REPOSITORY_ID")
if not repo_id:
raise ValueError("GITHUB_REPOSITORY_ID is not set")
bucket_hash = hashlib.sha256(repo_id.encode()).hexdigest()
Copy link

@mjurbanski-reef mjurbanski-reef Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like the source of ID you used (especially the account id), but please note the tests are also run under Jenkins in case of staging environment.

Probably best to simply tests for GITHUB_REPOSITORY_ID presence and use account_id.

test/integration/conftest.py Outdated Show resolved Hide resolved
test/integration/persistent_bucket.py Show resolved Hide resolved
@kris-konina-reef kris-konina-reef removed the request for review from pawelpolewicz September 19, 2024 14:24
@@ -0,0 +1 @@
Improve internal testing infrastructure by updating integration tests to use persistent buckets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as indicated in #301 (comment) this should go under infrastracture not changed category as it is not changing the exposed API of b2 CLI tool. The CI&tests are not even part of the releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants