-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge wip-v2 into master #178
Conversation
Since we are releasing a new major version, define a new go module for the CAR v2 implementation, following the recommendations here: - https://blog.golang.org/v2-go-modules The module is versioned as go 1.15 to maintain compatibility as pointed out by comment here: - #75 (comment)
We might want to get this fixed at source; it is valid for a module to have no go files i think, and CI should tolerate that.
Define the magic CAR v2 prefix proposed in ipld/specs#248 and assert that it remains a valid CAR v1 header, along with other tests that verify the expected length, content and version number. Define basic structs to represent CAR v2 header, along with placeholder for the future "characteristics" bitfield, and the optional padding between CAR v1 dump and index added to the end of a CAR v2 archive. Relates to: - ipld/specs#248 (comment)
Reflecting on PR review comments, no harm in exposing this and it may be useful to the users of the library.
Reflecting on the review comment, adding this field could provide optimisation opportunities in the future in the context of block alignment. See: - https://github.com/ipld/go-car/pull/80/files#r649241583
Now that the `CarV1Offset` is added to the header, the size is increased from `32` to `40` bytes. Make it so in the test case name.
Define a constructor with sensible defaults, and the ability to customize the defaults conveniently. Implement `io.WriterTo` interface for both header and characteristics to provide a consistent standard API for writing data into a given `io.Writer`.
Implement a CAR v2 writer, that produces binary structure corresponding to the specification of car V2, consisting of: 1. Version prefix 2. CAR v2 header 3. Dump of Car v1 4. Carbs index The implementation also facilities optional padding before dump of CAR v1 and Carbs index to provide scope for future optimisations. The padding is defined as a dedicated type that writes zero-valued bytes for a given padding size. The CAR v2 header, then captures the offsets from the beginning of the file for both the CAR v1 dump and Carbs index. This allows the user to quickly skip to the part of the CAR v2 they need as well as offset alignment, if necessary, by altering the value of padding. Note, this is an intermediate implementation, and does not correctly count the number of bytes written by the writer, pending the transfer/refactor of Carbs implementation in this repo. In the meantime, This implementation depends on Carbs as a go module. The implementation of extensive writer tests is postponed until Carbs is transferred and the written byte-count can be returned from the index marshaller. Address review comments - Avoid representing `Characteristics` as a pointer for easier casting of memory regions. - Remove anonymous fields in structs as a human readable way to document what interfaces a struct implements. - Simplify Header writing and written byte count calculation. The change reduces the number of lines by assuming that `writeUint64To` will always write 8 bytes. - Avoid `_` in package names. Use `carv1` and `carv2` to name corresponding packages. - Rename `Marshal` in tests to reflect `WriteTo` related tests. The method was renamed in the implementation but tests were not renamed. - Write padding in bulk to reduce redundant memory allocation. Write padding in bulks of `1024` bytes to reduce large memory allocation when padding itself is large. - Avoid using # in docs. Use go syntax instead, i.e. `.`. - Remove redundant type in constants, since it is implicitly converted. - Simplify over-refactored prefix write Since it is called only once. - Avoid `Header` pointer, since its size is small and this would reduce unnecessary allocations. - Use buffer in writer to store encoded Car V1. This is to avoid reallocation of bytes buffer. - Add TODO re optimisation of index generation. The current implementation reads the entire CAR v1 into memory in order to index it, because `carbs` API requires `io.Reader`. Once `a`carbs` is incorporated into this repo consider refactoring the API to make this a streaming operation that avoids copying all the bytes since CAR v1 can be large. - Remove a dedicated var for empty characteristics, since, we assume the default to be all zero, and it is easy enough to construct a zero valued Characteristics. - Unexport PrefixBytesSize, because we can, and if it is public it might have to stay public forever. So, unexport until we know we need it exported. - Remove `.Size()` on CAR v2 structs. Because we have the constant for them and don't want to expose them twice. - Use CamelCase for sub-test naming. This is to establish CamelCase for sub-test naming as the convention for this repo since it keeps the test names consistent in code and in the output ot `go test`. - Unexport padding as a type since it is only used internally
The implementation comes from here: - https://github.com/willscott/carbs The rationale for copying the code here is to be able to taylor the indexing API to the needs of CAR v2 in one place.
Remove unused structs, convert error messages to lower case and remove redundant `return` statements. Address review comments - Reoder imports using `gofumpt`. - Use consistent import alias for `carv1`. - Rename structs for better readability. - Add TODO to fix logging, tests, etc. - Move test related files to `testdata`.
Because `SectionReader` requires the max number of bytes to read, since it implements `Seek`. We need something like the `SectionReader` to read the index at the end of a CAR v2 that does not require the user to know the number of readable bytes. This is because, we do not store the size of index in CAR v2 header, since it is always added as the last section. The `OffsetReader` works just like `SectionReader`, except if `n`, the number of bytes to read, is set to zero it simply carries on reading until the underlying `io.ReaderAt` returns EOF. Consequently, `OffsetReader` does not implement `Seek`.
Implement a CAR v2 reader that allows access to each section of the CAR, i.e. CAR v1 dump and Index, as well as a higher level API that provides a `blockstore.BlockStore` from a CAR v2 file. Address Review comments - Simplify naming of section size constants - Simplify OffsetReader by removing the dual SectionReader functionality - Improve read efficiency by reading header in one chunk - Add TODOs to improve write efficiency in a similar manner
Make the copied carbon code part of the go-car/v2 module, and address `staticcheck` issues. Move README content from the original implementation into `doc.go`.
Remove the carbs copied without history
partial finish interface license / readme add cli for hydration indirection extensible indexes fixed int sizes additional typed int expose car Roots fix issues with index restoration add gob-based hash index typo in util/hydrate Create go.yml Create codeql-analysis.yml Create dependabot.yml work on testing round trip test passes mmap idx fix issue in sorted index gets add progress bar
Bumps [github.com/ipfs/go-ipld-cbor](https://github.com/ipfs/go-ipld-cbor) from 0.0.4 to 0.0.5. - [Release notes](https://github.com/ipfs/go-ipld-cbor/releases) - [Commits](ipfs/go-ipld-cbor@v0.0.4...v0.0.5) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/ipfs/go-ipfs-blockstore](https://github.com/ipfs/go-ipfs-blockstore) from 1.0.1 to 1.0.2. - [Release notes](https://github.com/ipfs/go-ipfs-blockstore/releases) - [Commits](ipfs/go-ipfs-blockstore@v1.0.1...v1.0.2) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/ipfs/go-ipfs-blockstore](https://github.com/ipfs/go-ipfs-blockstore) from 1.0.2 to 1.0.3. - [Release notes](https://github.com/ipfs/go-ipfs-blockstore/releases) - [Commits](ipfs/go-ipfs-blockstore@v1.0.2...v1.0.3) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/multiformats/go-multihash](https://github.com/multiformats/go-multihash) from 0.0.14 to 0.0.15. - [Release notes](https://github.com/multiformats/go-multihash/releases) - [Commits](multiformats/go-multihash@v0.0.14...v0.0.15) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/ipfs/go-block-format](https://github.com/ipfs/go-block-format) from 0.0.2 to 0.0.3. - [Release notes](https://github.com/ipfs/go-block-format/releases) - [Commits](ipfs/go-block-format@v0.0.2...v0.0.3) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/cheggaaa/pb/v3](https://github.com/cheggaaa/pb) from 3.0.5 to 3.0.7. - [Release notes](https://github.com/cheggaaa/pb/releases) - [Commits](cheggaaa/pb@v3.0.5...v3.0.7) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/cheggaaa/pb/v3](https://github.com/cheggaaa/pb) from 3.0.7 to 3.0.8. - [Release notes](https://github.com/cheggaaa/pb/releases) - [Commits](cheggaaa/pb@v3.0.7...v3.0.8) Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github.com/ipld/go-car](https://github.com/ipld/go-car) from 0.2.2 to 0.3.0. - [Release notes](https://github.com/ipld/go-car/releases) - [Commits](v0.2.2...v0.3.0) Signed-off-by: dependabot[bot] <support@github.com>
Remove dependency to `bufio.Reader` in internal `carv1` package that seems to be mainly used for peeking a byte to return appropriate error when stream abruptly ends, relating to #36. This allows simplification of code across the repo and remove all unnecessary wrappings of `io.Reader` with `bufio.Reader`. This will also aid simplify the internal IO utilities which will be done in future PRs. For now we simply remove dependency to `bufio.Reader` See: - #36
Implement the ability to generate index from a CARv1 payload given only an `io.Reader`, where the previous implementation required `io.ReadSeeker`. The rationale is to be minimal in what we expect in the API, since index generation from a CARv1 payload never need to rewind the reader and only moves forward in the stream. Refactor utility IO functions that convert between types in one place. Implement constructor functions that only instantiate wrappers when the passed argument does not satisfy a required interface. Fixes: - #146 Relates to: - #145
Unexport the CARv2 writer API until it is reimplemented using WriterAt or WriteSeeker to be more efficient. This API is not used anyway and we can postpone releasing it until SelectiveCar writer API is figured out. For now we unexport it to carry on with interface finalization and alpha release.
All the "New" APIs take IO interfaces, so they don't open any file by themselves. However, the APIs that take a path to disk and return a blockstore or a reader need closing, so the prefix "Open" helps clarify that. Plus, it makes names more consistent.
Add tests that asserts: - when padding options are set the payload is as expected - when finalized, blockstore calls panic - when resumed from mismatching padding error is as expected - when resumed from non-v2 file error is as expected Remove redundant TODOs in code
Use both CID v0 and v1 in testing `ReadWrite` blockstore. Use consistent import package name `merkledag` across tests.
Fix an issue where single width index not finding a given key returns `0` offset instead of `index.ErrNotFound`. Reflect the changes in blockstore to return appropriate IPFS blockstore error. Add tests that asserts error types both in index and blockstore packages. Remove redundant TODOs. Relates to: - #158
Remove the duplicate test CARv1 file that existed both in `testdata` and `blockstore/testdata` in favour of the one in root. Reflect change in tests. Duplicity checked using `md5` digest: ```sh $ md5 testdata/sample-v1.car MD5 (testdata/sample-v1.car) = 14fcffc271e50bc56cfce744d462a9bd $ md5 blockstore/testdata/test.car MD5 (blockstore/testdata/test.car) = 14fcffc271e50bc56cfce744d462a9bd ```
Improve reader type conversion by checking if type satisfies ReaderAt to avoid unnecessary wrapping. Move io converters into one place. Fixes: - #145
We've agreed to have unified options, since many will be shared between the root and blockstore packages. Include docs, and update the tests.
Add examples that show how to read and write an index to/from file. Test marshalling and unmarshalling index files. Run `gofumt` on repo.
Remove the memory intensive CARv2 writer implementation since it needs to be reimplemented anyway and not used. We will release writers if needed when it comes to SelectiveCar writer implementation. Just now it is not clear if we want a CARv2 writer that works with the deprecated `format.NodeGetter`. Add tests for V1 wrapping functionality. Reformat code in repo for consistency using `gofumpt`.
Rename terminology to match what the spec uses to describe the inner CARv1 payload, i.e. Data payload. Update docs to use CARv1 and CARv2 consistently. Fix typo in Options API. See: - https://ipld.io/specs/transport/car/carv2/
Please do a merge commit when merging this, because it's a lot of commits and we want the merge to be recorded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If appendage is needed there is already `car.AttachIndex`.
We are continuing to send PRs to wip-v2 while this is waiting for a final ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with merging. I believe we also got an implicit acknowledgement/approval from @Stebalien.
Great, thank you! We have two other PRs in flight targeting wip-v2, so we'll merge those soon and then merge this. |
It's two lines of code extra and we already have read from and write to APIs for index. Remove `hydrate` cli since the assumptions it makes about appending index are no longer true. header needs updating for example. Removing to be re-added when needed as part of a propper CARv2 CLI.
Implement examples that show how to open a read-only blockstore, and a read-write blockstore along with resumption from the same file. The example surfaced a race condition where if the `AllKeysChan` is partially consumed and file is closed then the gorutie started by chan population causes the race condition. Locking on Close will make the closure hang. Better solution is needed but choices are limited due to `AllKeysChan` signature. Fixes: - #124
We've been working on CARv2 in the wip-v2 branch for some weeks.
In preparation for the first v2 beta, merge all of those commits into master.
The commits are mostly as they were originally, minus some minor squashing to clean up the history.