Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Azure Storage Blob #444

Merged
merged 29 commits into from
May 17, 2020
Merged

Support Azure Storage Blob #444

merged 29 commits into from
May 17, 2020

Conversation

nclsmitchell
Copy link
Contributor

@nclsmitchell nclsmitchell commented Mar 26, 2020

Support Azure Storage Blob

Motivation

Support of reading and writing blobs with Azure Storage Blob.

If you're adding a new feature, then consider opening a ticket and discussing it with the maintainers before you actually do the hard work.

Tests

If you're fixing a bug, consider test-driven development:

  1. Create a unit test that demonstrates the bug. The test should fail.
  2. Implement your bug fix.
  3. The test you created should now pass.

If you're implementing a new feature, include unit tests for it.

Make sure all existing unit tests pass.
You can run them locally using:

pytest smart_open

If there are any failures, please fix them before creating the PR (or mark it as WIP, see below).

Work in progress

If you're still working on your PR, include "WIP" in the title.
We'll skip reviewing it for the time being.
Once you're ready to review, remove the "WIP" from the title, and ping one of the maintainers (e.g. mpenkov).

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

Workflow

Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.

Copy link

@gfournier gfournier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add more unit tests with the seek function.

smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Show resolved Hide resolved
smart_open/asb.py Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
@nclsmitchell nclsmitchell changed the title [WIP] Support Azure Storage Blob Support Azure Storage Blob Apr 2, 2020
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a detailed review and left you some comments. Please have a look.

smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/tests/test_asb.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/asb.py Show resolved Hide resolved
smart_open/asb.py Outdated Show resolved Hide resolved
smart_open/tests/test_asb.py Outdated Show resolved Hide resolved
smart_open/tests/test_asb.py Outdated Show resolved Hide resolved
smart_open/tests/test_asb.py Outdated Show resolved Hide resolved
@nclsmitchell
Copy link
Contributor Author

The CI builds are failing. Can you please make them pass?

https://travis-ci.org/github/RaRe-Technologies/smart_open/builds/673755778

The CI builds are now passing correctly

@nclsmitchell nclsmitchell requested a review from piskvorky April 21, 2020 15:51
@petedannemann
Copy link
Contributor

petedannemann commented Apr 22, 2020

Do we want integration tests for this? Additionally can we update the README examples and credentials sections so people can understand how to use this plugin?

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 23, 2020

Yes and yes, please.

@mpenkov mpenkov changed the base branch from master to develop April 25, 2020 09:05
@mpenkov
Copy link
Collaborator

mpenkov commented May 13, 2020

@petedannemann Ping on integration tests. How's it coming along?

@petedannemann
Copy link
Contributor

I have a branch with some progress. How do you suggest we get my changes into this PR? I cannot open another PR on @nclsmitchell's branch

@petedannemann petedannemann mentioned this pull request May 13, 2020
5 tasks
@mpenkov
Copy link
Collaborator

mpenkov commented May 13, 2020

@petedannemann I think the easiest way to proceed is to merge this, and then open a new PR with the integration tests.

This looks like it's ready to merge, right?

@petedannemann
Copy link
Contributor

petedannemann commented May 13, 2020

@nclsmitchell did you ever perform any functional tests? I can't get this to work due to azure.storage.blob.BlobServiceClient having a required positional argument account_url. It looks like we always try to instantiate the BlobServiceClient without any arguments in this PR

@petedannemann
Copy link
Contributor

petedannemann commented May 13, 2020

There are seven different ways to authenticate to ASB with Python. They all require different configuration and they all have to be used explicitly. I think we should refactor this such that client is a required argument to open and it is up to the user to decide which authentication method they want to use as ASB does not appear to support inferring credentials.

@piskvorky
Copy link
Owner

piskvorky commented May 13, 2020

I can't get this to work due to azure.storage.blob.BlobServiceClient having a required positional argument account_url. It looks like we always try to instantiate the BlobServiceClient without any arguments in this PR

Wait, how did the CI tests pass then?

@petedannemann
Copy link
Contributor

petedannemann commented May 13, 2020

I can't get this to work due to azure.storage.blob.BlobServiceClient having a required positional argument account_url. It looks like we always try to instantiate the BlobServiceClient without any arguments in this PR

Wait, how did the CI tests pass then?

@piskvorky the CI only runs mocked unit tests for ASB. This PR's mock for BlobServiceClient does not have any arguments in its constructor. This mock (and possibly others) need to be reworked as part of the change I am proposing.

@piskvorky
Copy link
Owner

OK. I assume @nclsmitchell was scratching his own itch. It's weird the PR would not work beyond mock tests.

setup.py Show resolved Hide resolved
@petedannemann
Copy link
Contributor

petedannemann commented May 13, 2020

OK. I assume @nclsmitchell was scratching his own itch. It's weird the PR would not work beyond mock tests.

The PR does work, just the client passed to open is actually required, not optional. The code needs to be updated in several places for this to be obvious to the user and to prevent broken azure.storage.blob.BlobServiceClient instantiations from occurring.

@petedannemann
Copy link
Contributor

petedannemann commented May 13, 2020

@nclsmitchell it also appears that containers are optional and that blobs can be written / read from the root container. Currently this fails if the container is omitted. I think we will want to accommodate for this.

@petedannemann
Copy link
Contributor

I believe I have addressed all issues / feature additions discussed above in this additional PR #498. It seems like the best way to proceed is to merge this into develop and then review that PR, as @mpenkov suggested earlier

@mpenkov mpenkov merged commit b38a4bc into piskvorky:develop May 17, 2020
@mpenkov
Copy link
Collaborator

mpenkov commented May 17, 2020

Merged.

@petedannemann Please go ahead with your improvements.

@nclsmitchell Thank you for your contribution!

@prakharcode
Copy link

Hi I can see this is merged and I have installed with
pip install git+https://github.com/RaRe-Technologies/smart_open.git@develop#egg=smart_open but still when I try

import os 
from smart_open import open 
from azure.storage.blob import BlobServiceClient 
azure_storage_connection_string = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', "") 
client = BlobServiceClient.from_connection_string(azure_storage_connection_string) 
fin = open('azure://my_container/my_blob.txt', transport_params=dict(client=client))

I'm getting

NotImplementedError: Unable to handle scheme 'azure', expected one of ('', 'asb', 'file', 'hdfs', 'http', 'https', 's3', 's3a', 's3n', 's3u', 'scp', 'sftp', 'ssh', 'webhdfs'). Extra dependencies required by 'azure' may be missing. See <https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst> for details.

I went into site-packages to confirm if the right branch with the right changes is picked up and the package is exactly the same as the develop branch.

What am I doing wrong here?

@prakharcode
Copy link

prakharcode commented Jun 12, 2020

oh got it, in the error, it specifies to use asb in the scheme, shouldn't it be consistent with azure?

@petedannemann
Copy link
Contributor

@prakharcode please refer to #501 for discussion on the naming of the Azure Blob Storage integration

@prakharcode
Copy link

prakharcode commented Jun 15, 2020

@petedannemann Hi, I am working with this, but I'm working on a codebase that uses the older API interface by azure i.e. v1.2 which provides BlockBlobService. I see in the client this accepts BlobServiceClient. Could it be used with BlockBlobService?

Talking about this here.

@petedannemann
Copy link
Contributor

@prakharcode this PR really isn't the appropriate place to discuss this. Please open an issue where it can be seen and easily searched for by other smart-open users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Azure BLOB storage
6 participants