Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pow: support offline Filecoin batch preparation #848

Merged
merged 5 commits into from
Jul 2, 2021
Merged

pow: support offline Filecoin batch preparation #848

merged 5 commits into from
Jul 2, 2021

Conversation

jsign
Copy link
Contributor

@jsign jsign commented Jul 2, 2021

This PR adds support for pow offline prepare --aggregate which takes a folder as an argument and:

  • Dagifies all the files
  • Batches them in a Filecoin batch as described by this spec.
  • Calculates the PieceCid and PieceSize.

And outputs:

  • Prints in standard output (can use --json for json formatting; highly recommended):
    • The PayloadCid, PieceCid, and PieceSize.
    • Each processed file with their corresponding Cid.
  • Generates the following assets in the current folder:
    • The CAR file of the UnixFS batch.
    • The manifest file that was generated and lives inside the batch as described by the spec.

jsign added 2 commits July 1, 2021 16:04
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign self-assigned this Jul 2, 2021
jsign added 2 commits July 2, 2021 09:46
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@@ -36,6 +41,7 @@ func init() {
prepare.Flags().String("tmpdir", os.TempDir(), "path of folder where a temporal blockstore is created for processing data")
prepare.Flags().String("ipfs-api", "", "IPFS HTTP API multiaddress that stores the cid (only for Cid processing instead of file/folder path)")
prepare.Flags().Bool("json", false, "avoid pretty output and use json formatting")
prepare.Flags().Bool("aggregate", false, "aggregates a folder of files")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New flag that indicates the files in the provided folder should be aggregated with the aggregator library.

c.Fatal(fmt.Errorf("get api addr flag: %s", err))
}
if ipfsAPI != "" && aggregate {
c.Fatal(errors.New("the --aggregate flag can't be used with a remote go-ipfs node"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now... we could keep working to support it.

Comment on lines +270 to +297
if aggregate {
rootNd, err := dagService.Get(ctx, payloadCid)
if err != nil {
c.Fatal(fmt.Errorf("get root node: %s", err))
}
manifestLnk := rootNd.Links()[0]
manifestNd, err := dagService.Get(ctx, manifestLnk.Cid)
if err != nil {
c.Fatal(fmt.Errorf("get manifest node: %s", err))
}
manifestF, err := unixfile.NewUnixfsFile(context.Background(), dagService, manifestNd)
if err != nil {
c.Fatal(fmt.Errorf("get unifxfs manifest file: %s", err))
}
manifest := manifestF.(files.File)
fManifest, err := os.Create(args[1] + ".manifest")
if err != nil {
c.Fatal(fmt.Errorf("creating manifest file: %s", err))

}
defer func() {
if err := fManifest.Close(); err != nil {
c.Fatal(fmt.Errorf("closing manifest file: %s", err))
}
}()
if _, err := io.Copy(fManifest, manifest); err != nil {
c.Fatal(fmt.Errorf("writing manifest file: %s", err))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code section is to retrieve from the generated batch the manifest file that lives inside.
The manifest file lives as the first file in the root folder (defined in the spec), thus Links()[0] on 275 will always succeed and give us the file.

The rest is pulling the file and writing it to the current folder.

Comment on lines 461 to 489
if !aggregate {
dataCid, err = dataprep.Dagify(ctx, dagService, path, progressChan)
if err != nil {
return cid.Undef, nil, fmt.Errorf("creating dag for data: %s", err)
}
} else {
var lst []aggregator.AggregateDagEntry
err = filepath.Walk(path, func(p string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if info.IsDir() {
return nil
}
dcid, err := dataprep.Dagify(ctx, dagService, p, progressChan)
if err != nil {
return err
}
lst = append(lst, aggregator.AggregateDagEntry{
RootCid: dcid,
})
aggregatedFiles = append(aggregatedFiles, aggregatedFile{Name: p, Cid: dcid.String()})
return err
})
dataCid, err = aggregator.Aggregate(ctx, dagService, lst)
if err != nil {
return cid.Undef, nil, fmt.Errorf("aggregating: %s", err)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --aggregate is provided, we walk the folder, ingest each file, and later aggregate it.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign requested a review from merlinran July 2, 2021 12:55
c.Fatal(fmt.Errorf("get unifxfs manifest file: %s", err))
}
manifest := manifestF.(files.File)
fManifest, err := os.Create(args[1] + ".manifest")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we generate the manifest file on the fly in dagify after walked the directory, without writing anything to the local directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm writing it to the local directory as a feature.
The manifest file is generated on the fly by the aggregator library and included in the batch UnixFS DAG.

Also, the dagify is per file and the manifest is a batch scoped concept.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. I confused the order ;)

@@ -81,4 +83,6 @@ require (
nhooyr.io/websocket v1.8.6 // indirect
)

replace github.com/ipfs/go-unixfs => github.com/ipfs/go-unixfs v0.2.2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this instead of setting the version directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becase a lot of libs in this ecosystem use v0. That means that the dependency graph is very brittle when you include a new dependency that ends up using a newer version of an indirect dependency considering how Go MVS version resolving works.

A lot of libs in the IPFS ecosystem that are v0 make breaking changes in further v0.. and that doesn't play well with Go MVS (which selects the maximum referenced minor version of a major version).

Yes... horrible. Usually you have some luck by trying to upgrade other deps hoping that they will stop reference the old v0 version that is causing this needed downgrade, but that's not usually an easy task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long story short, if you don't use strict semantic versioning in the Go ecosystem.. you'll create nightmares for your clients.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see this is to force its direct and indirect dependencies to use this specific version instead of relying on MVS. Maybe worth a comment? Or maybe I'm ignorant enough to not understanding the intention in the first place ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly the intention. We usually only use replace to solve MVS problems on dependencies that don't follow strict semantic versioning. So that's the default correct interpretation.

Copy link

@merlinran merlinran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jsign jsign merged commit 34dc0fe into master Jul 2, 2021
@jsign jsign deleted the jsign/aggr branch July 2, 2021 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants