-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO Implementation using Go CDK #176
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Loïc Alleyne <loicalleyne@gmail.com>
Signed-off-by: Loïc Alleyne <loicalleyne@gmail.com>
@dwilson1988 I saw your note about wanting to work on the CDK features, if you're able to provide some feedback that would be great. |
@loicalleyne - happy to take a look. We use this internally in some of our software with Parquet and implemented a ReaderAt. I'll do a more thorough review when I get a chance, but my first thought was to leave it completely separate from the |
My goal today was just to "get something on paper" to move this forward since the other PR has been stalled since July, I used the other PR as a starting point so I mostly followed the existing patterns. Very open to moving things around if it makes sense. Do you have any idea how your idea would work with the interfaces defined in io.go? |
Understood! I'll dig into your last question and get back to you. |
Okay, played around a bit and here's where my head is at. The main reason I'd like to isolate the creation of a What I came up with is changing // CreateBlobFileIO creates a new BlobFileIO instance
func CreateBlobFileIO(parsed *url.URL, bucket *blob.Bucket) *BlobFileIO {
ctx := context.Background()
return &BlobFileIO{Bucket: bucket, ctx: ctx, opts: &blob.ReaderOptions{}, prefix: parsed.Host + parsed.Path}
} The URL is still critical there, but now we don't have to concern ourselves with credentials to open the bucket except for in Thoughts on this? |
Signed-off-by: Loïc Alleyne <loicalleyne@gmail.com>
@dwilson1988 |
Signed-off-by: Loïc Alleyne <loicalleyne@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loicalleyne, This looks really good to me! I'm not a maintainer of this repo, so I can't give the final word or anything, but this is exactly the direction I was thinking.
I'm happy to give azure a go after this is merged.
io/blob.go
Outdated
// BlobFileIO represents a file system backed by a bucket in object store. It implements the `iceberg-go/io.FileIO` interface. | ||
type BlobFileIO struct { | ||
*blob.Bucket | ||
ctx context.Context | ||
opts *blob.ReaderOptions | ||
prefix string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to be more conservative in what we actually export. Is there any need to export this type as opposed to just let it be used through the interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to unexported
io/blob.go
Outdated
// Open a Blob from a Bucket using the BlobFileIO. Note this | ||
// function is copied from blob.Bucket.Open, but extended to | ||
// return a iceberg-go/io.File instance instead of io/fs.File |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just wrap and extend the blob.Bucket.Open
instead of duplicating it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it again it has to be a copy because the CDK iofsFileInfo
is unexported and doesn't support the io.ReaderAt
interface .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be accomplished if gocloud changed this to a io.ReadSeeker
which would slightly alter the public interface. Otherwise, I can't see a way to do this without copying:
@loicalleyne is this still on your radar? |
hi @dwilson1988 |
Cool - just checking. I'll be patient. 🙂 |
@dwilson1988 made the suggested changes, there's a deprecation warning on the S3 config EndpointResolver methods that I haven't had time to look into, maybe you could take a look? |
Yes, can probably take a look next week |
Hi @dwilson1988, do you think you'll have time to take a look at this? |
I opened a PR on your branch earlier today |
Extends PR #111
Implements #92. The Go CDK has well-maintained implementations for accessing objects stores from S3, Azure, and GCS via a io/fs.Fs-like interface. However, their file interface doesn't support the io.ReaderAt interface or the Seek() function that Iceberg-Go requires for files. Furthermore, the File components are private. So we copied the wrappers and implement the remaining functions inside of Iceberg-Go directly.
In addition, we add support for S3 Read IO using the CDK, providing the option to choose between the existing and new implementation using an extra property.
GCS connection options can be passed in properties map.