-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source: files (s3 and local) #653
base: main
Are you sure you want to change the base?
Conversation
9473aba
to
6525433
Compare
TODO:
|
def _get_client(self) -> S3Client: | ||
return boto_client("s3", **self._credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be self._get_client
too.
BTW, maybe you could replace the _get_client
with a _client
property and a good explanation as to why it reinstantiates a client every time. Then self._client: Optional[S3Client] = None
may be removed from init.
def _get_client(self) -> S3Client: | |
return boto_client("s3", **self._credentials) | |
@property | |
def _client(self) -> S3Client: | |
return boto_client("s3", **self._credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it doesn't need to instantiate one every time...it basically must be instantiated after it's kicked off as a multiprocess
Process via Source.run()
(which is where the pickling error is occurring).
The only reason this is a problem at all is I need to count the partition folders to see what the expected partition count is, which occurs before actually performing the run operation, so that's the thing I have to work around (by not storing that instance of the client when I do that call).
Ideally I'd like to not have to instantiate a new client every time I call it...currently I'm setting the client on the first call I know that is made after .run()
and then I can just re-use it as originally desired.
Co-authored-by: Remy Gwaramadze <gwaramadze@users.noreply.github.com>
1cb3802
to
4726928
Compare
This is ready to merge, depending on final thoughts on the client stuff for S3. |
Not 100% done yet but here is a fairly refined draft. Just a little more cleanup to do and a tiny bit more work left on Azure, and then documentation assuming we like the approach.
Currently extends
FileSource
(which itself had some minor adjustments to better accommodate extension).