New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implement persistent bucket fixtures for integration tests #301

Open

kris-konina-reef wants to merge 14 commits into reef-technologies:master from kris-konina-reef:master

Collaborator

kris-konina-reef commented Sep 16, 2024

Key changes:

Tests now share a smaller pool of buckets while maintaining isolation thanks to a new PersistentBucketAggregate class.
Fixtures and utility functions were added to manage these persistent buckets.
Existing tests were updated to use this new functionality.

Benefits:

Reduced bucket usage from 53 to 37 and average test execution time by ~14%.
Implemented lifecycle rules to automatically clean up resources and minimize potential costs.

kris-konina-reef added 10 commits

September 14, 2024 17:27


          Add persistent bucket fixtures

575a69c


          Refactor integration tests to use persistent bucket where applicable

295630c


          Adjust SSE-C&notifcation rules tests to use const bucket \w subfolders

0251d03


          Remove account_info_file dependency in persistent bucket creation

b3c1b11


          Clean up

a171c1c


          Refactor persistent bucket cleanup in tests: manual clear, remove aut…

d1b65f5

…o-teardown, add error handling


          Add changelog

efda8b5


          Add full-stops

1a9dba7


          Format

7d18983


          Retry on duplicate bucket in persistent bucket get_or_create

2c05f31

kris-konina-reef requested a review from pawelpolewicz

September 17, 2024 11:27

mjurbanski-reef requested changes

View reviewed changes

mjurbanski-reef left a comment

Two major comments:

we need to have the same solution in SDK
IMO we should rely on Life cycle rules more instead of doing time consuming process of manual one-by-one file deletion

changelog.d/+persistent_bucket_aggregate.added.md Outdated

		@@ -0,0 +1 @@
		Introduce PersistentBucketAggregate class to manage bucket name and subfolder.

mjurbanski-reef Sep 17, 2024

users won't care for these details in CLI tool changelog

btw, what is your plan for SDK? you started with CLI, but won't this be harder to apply the same thing to SDK?
I can kinda see these changelog message make sense if they were part of public api of SDK that is used in here.

Collaborator Author

kris-konina-reef Sep 19, 2024

Wasn't aware SDK was part of the same ticket, will proceed with it.

mjurbanski-reef Sep 19, 2024

yeah, I wouldn't rely on tickets' description much, more on what makes sense and fixing problem in one place while identical exists in another - doesn't

changelog.d/+update_integration_tests.changed.md Outdated

		@@ -0,0 +1 @@
		Update integration tests to use persistent buckets.

mjurbanski-reef Sep 17, 2024

seems like duplicate

IMO as they are right now all of these would go under our "infrastracture" change category i.e. not something that typical users care about, since this is not what they get with the shipped version of CLI tool.

test/integration/cleanup_buckets.py Outdated

+                  # of a persistent bucket, whose identity is shared across tests.
+                  persistent_bucket = get_or_create_persistent_bucket(b2_api)
+                  b2_api.clean_bucket(persistent_bucket)
+                  b2_api.api.list_buckets()

mjurbanski-reef Sep 17, 2024

what is this call here for?

Collaborator Author

kris-konina-reef Sep 19, 2024

overlooked, redundant

test/integration/cleanup_buckets.py

+                  # when tests tear down, as otherwise we'd lose the main benefit
+                  # of a persistent bucket, whose identity is shared across tests.
+                  persistent_bucket = get_or_create_persistent_bucket(b2_api)
+                  b2_api.clean_bucket(persistent_bucket)

mjurbanski-reef Sep 17, 2024

won't this break concurrently run tests?

mjurbanski-reef Sep 17, 2024

This is why lifecycle rules was suggested instead to cleanup in the first place.

Collaborator Author

kris-konina-reef Sep 19, 2024

It cleans the buckets once, before the tests run.

mjurbanski-reef Sep 19, 2024

ok, so what will happen if I someone opens a second PR when one is being tested?

Collaborator Author

kris-konina-reef Sep 19, 2024

the same thing that would happen if no changes were introduced—such a scenario disrupts the test bucket lifecycle, persistent or not. Question is, is it frequent enough to warrant addressing?

mjurbanski-reef Sep 19, 2024

The bucket cleanup process only removed stale buckets, not all of them, so before we did support concurrent GHA jobs, and this change breaks it, and for no reason AFAIK, since the solution is to simply to leave that bucket alone forever.

test/integration/persistent_bucket.py Outdated Show resolved Hide resolved

test/integration/persistent_bucket.py



		@backoff.on_exception(backoff.expo, Exception, max_tries=3, max_time=10)
		def delete_files(bucket: Bucket, subfolder: str):

mjurbanski-reef Sep 17, 2024

Suggested change

      
            def delete_files(bucket: Bucket, subfolder: str):
          
            def delete_files(bucket: Bucket, subfolder: str | None = None):

seems like these two methods should be combined

mjurbanski-reef Sep 17, 2024

Seems to me like these calls are going:
a) slow down tests (so +++ cost on development side & GHA)
b) incur extra cost on B2 API side - but I guess that doesn't matter much for us

While simply waiting for lifecycle rules to do their thing, also means some extra B2 storage costs, but is much faster since we have to do nothing.

Collaborator Author

kris-konina-reef Sep 19, 2024

Even though the exec time and the number of buckets used dropped as compared to the state before, I was clearly overcorrecting by clearing the subfolders between cases, trying to fully mimic the behavior of recreated buckets. Clearly redundant.

test/integration/persistent_bucket.py Outdated Show resolved Hide resolved

test/integration/persistent_bucket.py Outdated

Comment on lines 56 to 60

+                      # CI environment
+                      repo_id = os.environ.get("GITHUB_REPOSITORY_ID")
+                      if not repo_id:
+                          raise ValueError("GITHUB_REPOSITORY_ID is not set")
+                      bucket_hash = hashlib.sha256(repo_id.encode()).hexdigest()

mjurbanski-reef Sep 17, 2024 •

edited

Loading

I kinda like the source of ID you used (especially the account id), but please note the tests are also run under Jenkins in case of staging environment.

Probably best to simply tests for GITHUB_REPOSITORY_ID presence and use account_id.

test/integration/conftest.py Outdated Show resolved Hide resolved

test/integration/persistent_bucket.py Show resolved Hide resolved

kris-konina-reef added 3 commits

September 19, 2024 14:13


          Improve changelog

01b8a69


          Don't clean up the persistent bucket's subfolder after each test case

b5c5c1c


          Rename persistent bucket fixture

a3b7485

kris-konina-reef removed the request for review from pawelpolewicz

September 19, 2024 14:24


          Delete forgotten pass stmt

c4413d4

mjurbanski-reef reviewed

View reviewed changes

changelog.d/+test_with_persistent_bucket.changed.md

		@@ -0,0 +1 @@
		Improve internal testing infrastructure by updating integration tests to use persistent buckets.

mjurbanski-reef Sep 19, 2024

as indicated in #301 (comment) this should go under infrastracture not changed category as it is not changing the exposed API of b2 CLI tool. The CI&tests are not even part of the releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet