Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add additional_blob_attributes to upload_many_from_filenames #1162

Merged
merged 3 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions google/cloud/storage/transfer_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,8 @@ def upload_many_from_filenames(
raise_exception=False,
worker_type=PROCESS,
max_workers=DEFAULT_MAX_WORKERS,
*,
additional_blob_attributes=None,
):
"""Upload many files concurrently by their filenames.

Expand Down Expand Up @@ -557,6 +559,17 @@ def upload_many_from_filenames(
and the default is a conservative number that should work okay in most
cases without consuming excessive resources.

:type additional_blob_attributes: dict
:param additional_blob_attributes:
A dictionary of blob attribute names and values. This allows the
configuration of blobs beyond what is possible with
blob_constructor_kwargs. For instance, {"cache_control": "no-cache"}
would set the cache_control attribute of each blob to "no-cache".

As with blob_constructor_kwargs, this affects the creation of every
blob identically. To fine-tune each blob individually, use `upload_many`
and create the blobs as desired before passing them in.

:raises: :exc:`concurrent.futures.TimeoutError` if deadline is exceeded.

:rtype: list
Expand All @@ -567,13 +580,17 @@ def upload_many_from_filenames(
"""
if blob_constructor_kwargs is None:
blob_constructor_kwargs = {}
if additional_blob_attributes is None:
additional_blob_attributes = {}

file_blob_pairs = []

for filename in filenames:
path = os.path.join(source_directory, filename)
blob_name = blob_name_prefix + filename
blob = bucket.blob(blob_name, **blob_constructor_kwargs)
for prop, value in additional_blob_attributes.items():
setattr(blob, prop, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use blob._patch_property() that is broadly used in the library? Blob property changes are propagated through blob._get_writable_metadata and blob._get_upload_arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that but decided there was no need to use a private method on the blob class when the typical public interface of attribute assignment would do the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how the blob properties are actually picked up in blob._changes. I thought we're relying on blob._changes to get writable metadata for the upload arguments

object_metadata = {"name": self.name}
for key in self._changes:
if key in _WRITABLE_FIELDS:
object_metadata[key] = self._properties[key]
return object_metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the _PropertyMixin, all the fields set as a _scalar_property have a custom setter that updates _changes. This solution using setattr will also activate that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that makes sense, thanks! One last nit on the naming, maybe consider additional_blob_properties for consistency? Property is used in the official docs and in the lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, that was actually the name in my first revision.

I changed it to "attributes" because it's not only _scalar_property but any attribute that we can set with setattr. Also, since we're using setattr the nomenclature matches.

file_blob_pairs.append((path, blob))

return upload_many(
Expand Down
19 changes: 19 additions & 0 deletions tests/system/test_transfer_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,25 @@ def test_upload_many_skip_if_exists(
assert len(blobs_to_delete) == 1


def test_upload_many_from_filenames_with_attributes(
listable_bucket, listable_filenames, file_data, blobs_to_delete
):
SOURCE_DIRECTORY, FILENAME = os.path.split(file_data["logo"]["path"])

transfer_manager.upload_many_from_filenames(
listable_bucket,
[FILENAME],
source_directory=SOURCE_DIRECTORY,
additional_blob_attributes={"cache_control": "no-cache"},
raise_exception=True,
)

blob = listable_bucket.blob(FILENAME)
blob.reload()
blobs_to_delete.append(blob)
assert blob.cache_control == "no-cache"


def test_download_many(listable_bucket):
blobs = list(listable_bucket.list_blobs())
with tempfile.TemporaryDirectory() as tempdir:
Expand Down
32 changes: 32 additions & 0 deletions tests/unit/test_transfer_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,38 @@ def test_upload_many_from_filenames_minimal_args():
bucket.blob.assert_any_call(FILENAMES[1])


def test_upload_many_from_filenames_additional_properties():
bucket = mock.Mock()
blob = mock.Mock()
bucket_blob = mock.Mock(return_value=blob)
blob.cache_control = None
bucket.blob = bucket_blob

FILENAME = "file_a.txt"
ADDITIONAL_BLOB_ATTRIBUTES = {"cache_control": "no-cache"}
EXPECTED_FILE_BLOB_PAIRS = [(FILENAME, mock.ANY)]

with mock.patch(
"google.cloud.storage.transfer_manager.upload_many"
) as mock_upload_many:
transfer_manager.upload_many_from_filenames(
bucket, [FILENAME], additional_blob_attributes=ADDITIONAL_BLOB_ATTRIBUTES
)

mock_upload_many.assert_called_once_with(
EXPECTED_FILE_BLOB_PAIRS,
skip_if_exists=False,
upload_kwargs=None,
deadline=None,
raise_exception=False,
worker_type=transfer_manager.PROCESS,
max_workers=8,
)

for attrib, value in ADDITIONAL_BLOB_ATTRIBUTES.items():
assert getattr(blob, attrib) == value


def test_download_many_to_path():
bucket = mock.Mock()

Expand Down