Feat/migration ipfs download #8064

gammazero · 2021-04-08T15:59:12Z

go-ipfs can now fetch migrations via IPFS

Project proposal: protocol/web3-dev-team#86

mvdan

Just did a first pass as it's quite late. I can do a second one tomorrow.

repo/fsrepo/migrations/ipfsfetcher.go

mvdan · 2021-04-08T20:00:18Z

repo/fsrepo/migrations/ipfsfetcher_test.go

+)
+
+func TestIpfsFetcher(t *testing.T) {
+	t.Skip("manually-run dev test only")


maybe hide it behind a flag, then :) otherwise the only way to run it is by editing the source.

is there any way to eventually make this test work in a local way without extra setup? e.g. by setting up some IPFS/HTTP servers/nodes.

Done - hidden behind flag

For the HttpFetcher I created a dummy test server. I am not really sure there is a great way to do this for IPFS. Mocking interfaces seems a bit excessive.

I'm just a bit worried that we have a test that will practically never be run. Mocking isn't great, but if running a real IPFS node for the test is too difficult, at least testing with a mock by default would be better than testing nothing by default. The flag could still be used to test against a real IPFS node instead of a mock.

Another pattern is to use an environment variable and put the name of it in the Skip message. Inspired by this blog post https://peter.bourgon.org/blog/2021/04/02/dont-use-build-tags-for-integration-tests.html

This was handled by adding a skipUnlessEpic function like as is done here:
https://github.com/ipfs/go-ipfs/blob/ef866a1400b3b2861e5e8b6cc9edc8633b890a0a/test/integration/addcat_test.go#L173?

repo/fsrepo/migrations/ipfsfetcher_test.go

gammazero · 2021-04-08T23:40:01Z

@mvdan I addressed all review comments and made suggested changes. One other big change I needed to make was to move IpfsFetcher into its own package. This is so that things outside of ipfs, that import migrations, are not forced to link with all the IpfsFetcher baggage.

gammazero · 2021-04-09T11:30:52Z

cmd/ipfs/daemon.go

aschmahmann · 2021-04-12T16:37:09Z

cmd/ipfs/daemon.go

@@ -288,9 +291,30 @@ func daemonFunc(req *cmds.Request, re cmds.ResponseEmitter, env cmds.Environment
 			return fmt.Errorf("fs-repo requires migration")
 		}

+		// Read from existing config
+		cfg, err := cctx.GetConfig()


Unfortunately, we can't get the config until have the migration has been performed since the format of the config may have changed between versions.

An alternative way to do this is to have a --migration-temp-node-config (or better name 😊) option that takes a config file to use for the temporary node. This could protect us from a variety of bugs/corner cases as it allows users to configure the temporary node as if it was a regular one.

Some potential corner cases include:

If we change/move/rename any of the config options we use

Peering.Peers might not be what you're looking for since they might not have the migrations and might not be DHT servers

DHT bootstrap nodes might not be sufficient since your node might need to turn off DHT usage for some reason

We could add future routing/friend-ing options that wouldn't get picked up from a config file used for an older node version

It seems like allowing an entire config to be specified for a temp node also invites many problems, as users will not necessarily know what the new configuration looks like until after the migration and may typically just use their existing config (leading to issues mentioned above). Also, the configuration for a temp, used only to download a migration or two, does not need more than a default config. Users not knowing this may think it best to use a config similar to what they are using for their real node.

Instead of a config file, WDYT about letting the user specify peer(s) on the command line, if they happen to know one or more that may be useful for migration?

I have implemented the above, and peers can be specified using the migration-peers flag. I am not sure this is an ideal solution, but it does provide a way to specify peers until a decision is made regarding config files. Maybe do that in a follow-up PR?

It seems like allowing an entire config to be specified for a temp node also invites many problems...

Ya, it's using a pretty big hammer to deal with a smaller set of problems but needing to modify the config for the purposes of migrations at all seems like a potentially advanced requirement for an end user, perhaps sufficiently so that they might as well get access to the full range of tools.

However, there are also potentially a large number of options that could be specified during a migration so idk if command line flags for each of them is really a reasonable alternative. For example, if someone is on a private network and therefore needs the network symmetric key they're a bit out of luck with just some peer flags. If we make our routing systems more configurable we're similarly out of luck.

cc @Stebalien as this and #8064 (comment) seem to be related to the comments in the proposal PR (https://github.com/protocol/web3-dev-team/pull/86/files#diff-991744cab1b93840a21929cbf5aa7bc00a37ae4244b5c569df086214e36e47cfR66 and https://github.com/protocol/web3-dev-team/pull/86/files#r610934803)

An alternative way to do this is to have a --migration-temp-node-config (or better name blush) option that takes a config file to use for the temporary node. This could protect us from a variety of bugs/corner cases as it allows users to configure the temporary node as if it was a regular one.

We could, maybe, later, support that for advanced users/distros. That's not going to really help in this case because users will be very confused about why their migration options aren't being applied. Users are going to be less confused if we start a temporary node that ignores their config.

I've left a comment on the proposal. IMO, we should just try to read the relevant part of the config and ignore the rest. Downsides:

Migrations can't touch the migration part of the config. Honestly, I'm fine with that.

If we decide to move the config, things could get... tricky. But trying a set of configs in order of precedence, just to read a small portion probably isn't a huge issue.

We could also do this on the command-line, but it could get icky (as @aschmahmann said, lots of options).

I think I'm fine with this approach. If we define the migrations config options as described in the proposal and then we try to deserialize the config file (or object) into a struct that only has the migration values that should be fine.

This will mean that unlike the rest of the config which is nominatively typed in that backwards incompatible changes are tied to a repo version, that we are likely to use structural typing for the migrations specific section (although if this proves too hard we can add a field into the migrations struct to track versioning).

This reliance on structural typing means we basically don't have to worry too much about the potential migration config changes for the time being and it's possible we'll end up with a repo/package that contains all the migration configs (like what we used to do with fs-repo-migrations, but really small since it's only for a portion of the config file).

For the time being we can probably punt on dealing with private swarms or people with custom bootstrap nodes since they can:

Just use HTTP, like they've been doing so far

Download the migrations binary manually, or run the full fs-repo-migrations binary or ipfs-update

If this becomes a problem and we need the sledge hammer of passing in --migration-temp-node-config we can do that, but let's delay until we have a user + use case who needs this.

I am going to leave the configuration for a later time. Currently, go-ipfs accepts a couple of optional flags that allow the user to specify how to download the migrations (--download-migration) and if/how to keep downloaded migrations (--keep-migration). Even if this information is available from a config file in the future, I think these flags will will still be useful to override that.

So, I thought this solution would be fine... but I've changed my mind. We have way too many special-purpose daemon flags that should be settings. We're already trying to get rid of the existing daemon flags, I don't want to add more that we'll need to maintain into the future for backwards compatibility.

If we want to be able to override config options, we need a coordinated system to do that but this isn't that.

@Stebalien Done. Now reading migration setting from config file, and using defaults for any values not present. Removed CLI flags for specifying migration configuration.

aschmahmann · 2021-04-12T16:49:25Z

repo/fsrepo/migrations/fetcher.go

+	err := fmt.Errorf("fetch errors:")
+	for i := range errs {
+		err = fmt.Errorf("%s\n%d) %s", err.Error(), i, errs[i])


What's the idea here, to return the error from the last fetcher? We should probably just use something like https://github.com/hashicorp/go-multierror (it's already used in go-ipfs).

Returns list of errors if all fetchers fail. Switched to using multierror

FYI, https://github.com/uber-go/multierr/ nicer in my experience (doesn't matter too much).

daemon was already using https://github.com/hashicorp/go-multierror. 🤷

aschmahmann · 2021-04-12T17:11:18Z

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher.go

+	// configure the temporary node
+	cfg.Routing.Type = "dhtclient"
+
+	// Disable listening for inbound connections


We could do that, but then if you try and import the data into the new node from the old node for seeding purposes you're going to need to either:

Use the import/export APIs (e.g. use CAR files to move the data around)

Have the temporary node connect to the new node so it can pull the data using Bitswap

Since the files are already available on the local disk (since they need to be run), it is just as easy for me to add the files to the real node after migration. Unless there is a compelling reason to transfer via bitswap instead of reading the files, then I think this is OK.

it is just as easy for me to add the files to the real node after migration

Right, but if we do this it needs to be as a CAR file not a UnixFS import of some binary since we want the CIDs to match and ipfs add <file> is not guaranteed to be the same across versions and boxing ourselves into some parameters now wouldn't be a good idea either. i.e. go-ipfs needs to understand "I am importing this DAG" not "I am importing this binary", whether it's a CAR file via DAG import, or a UnixFS DAG via Bitswap isn't really a big deal.

When migrations are downloaded via IPFS, add the migration archives to IPFS by connecting to the temporary migration node and getting the migrations over IPFS.

When migrations are downloaded via HTTP, add the migration archive files directly to IPFS. After talking to @Stebalien he thought it was still worth to add the files directly, even though it might result in a different dag. If we decide later to not do that, it is a trivial change.

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher.go

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher_test.go

aschmahmann · 2021-04-12T17:20:20Z

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher_test.go

+
+func TestIpfsFetcher(t *testing.T) {
+	if !*runIpfsTest {
+		t.Skip("manually-run dev test, use '-ipfstest' flage to run")


Is the reason this test is manual because it makes network requests?

Yes, and I did not know a good way to simulate or mock an IPFS node, but I wanted to have some test that a developer could run to check if things are working. WDYT? Leave it, toss it, create a better test?

There are some tests in the integration test package (e.g. https://github.com/ipfs/go-ipfs/blob/master/test/integration/three_legged_cat_test.go) that could be of inspiration here.

If it's a pain to write the test lmk and we can decide on an alternative. Using a flag/env var seems fine (I'm not sure if we have an appropriate one, although perhaps this one https://github.com/ipfs/go-ipfs/blob/ef866a1400b3b2861e5e8b6cc9edc8633b890a0a/test/integration/addcat_test.go#L173 would do), although I don't really trust tests that don't run in CI since they aren't reliably run before merging PRs and are subject to breaking.

@Stebalien lmk if you have any additional thoughts here given you've been thinking about our CI situation a bit, or if it's just 👍.

We need to test this in CI but we can use sharness tests. If a test is disabled in CI, it might as well not exist.

@aschmahmann I will see what I can do with sharness, but likely this is going to have to become a job for your team if it needs to get done before the upcoming release.

aschmahmann · 2021-04-13T02:07:02Z

cmd/ipfs/migration.go

+			return err
+		}
+
+		ipfsPath, err := ufs.Add(ctx, files.NewReaderFile(f), options.Unixfs.Pin(pin))


This is the thing that doesn't work #8064 (comment).

For example, if we add the migration binaries using the current defaults for ipfs add and then change the defaults in a subsequent release then the swarms won't converge.

Note: @gammazero if this is a pain to implement for you given time constraints I'm happy to either drop this requirement for the time being or I'll just code up the CAR import/export stuff on my own (likely requires cut-pasting code from the commands package for DAG Import/Export into a function on the IpfsNode and then calling it).

I went with the approach of connecting to the temporary migration node and then getting the data over IPFS. Exposing a CAR interface on the node may be a better solution in the long run, and we can revisit that.

gammazero · 2021-04-15T15:29:21Z

bump

gammazero · 2021-04-15T19:13:37Z

Rebased on master to pick up fixed sharness tests.

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher.go

gammazero · 2021-04-16T19:02:46Z

@aschmahmann If you merge PR #8076, then I can rebase and use CAR files to import downloaded migrations. I can also do that in a separate PR. In any case, once #8076 is in, import via CAR will be simple. Making that change to use CAR files will also enable PR #7658 to do the same.

aschmahmann · 2021-04-16T19:13:21Z

@gammazero yep, that's the idea. That PR seems close to being good, but has a bug causing it to fail sharness at the moment. In any event I wouldn't block on that PR here.

aschmahmann · 2021-04-19T14:37:23Z

cmd/ipfs/migration.go

+		case "":
+			// Ignore empty string


Because we state that if they leave the DownloadSources empty, that results in the default behavior. I thought it might be considered reasonable by some to think [""] means empty.

aschmahmann · 2021-04-19T14:41:52Z

cmd/ipfs/migration_test.go

@@ -56,27 +71,52 @@ func TestGetMigrationFetcher(t *testing.T) {
 	if mf.Len() != 3 {
 		t.Fatal("expected3 fetchers in MultiFetcher")


Suggested change

t.Fatal("expected3 fetchers in MultiFetcher")

t.Fatal("expected 3 fetchers in MultiFetcher")

aschmahmann · 2021-04-19T14:51:00Z

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher.go

 	}

-	a, err := ma.NewMultiaddr(tempNodeTcpAddr)
+	addrs, err := ipfs.Swarm().LocalAddrs(ctx)


What's the idea behind where the background context is getting used and where the passed in context is getting used?

If this function doesn't complete then the temporary node is useless right (aside from the fact that if MDNS is working then everything is probably fine)?

If we don't get the addresses do we need to error + close the node?

The background context is used as a mechanism to shutdown the temporary node. Quote from go-ipfs/core/builder.go:

// Shut down the application if the lifetime context is canceled. // NOTE: we _should_ stop the application by calling `Close()` // on the process. But we currently manage everything with contexts.

So, the background context is for now shutdown, in the absence of a node.Close() method, and the passed-in context is used for cancellation of the current function call.

True, if the function does not complete, then the node is useless. However, failure to get the local swarm address only means that the downloaded migrations cannot be fetched from the temp node.... So, right, it does not need to fail out here.

Done.

- Adding migrations is specified using the --keep-migrations flag. This may have one of the values: "keep", "pin", "discard". Default: "keep" - Choose how to fetch migrations based on download specification given by the --download-migration flag. This can be "ipfs", "http", or one or more custom gateways. Default: "http,ipfs" - Attempt to read existing config to get peers - Fetcher interface has Close() to do any cleanup when done with a fetcher. - Changes requested in PR review.

Reading from an existing config file, particularly before migration, may be problematic. Instead, let the user specify peers on the command line if they want to specify any to assist with downloading migrations.

- When migrations are downloaded via IPFS, add the migration archives to IPFS by connecting to the temporary migration node and getching the migrations. - When migrations are downloaded via HTTP, add the migration archive files directly to IPFS.

- Read "Migration" section of IPFS config file before migrations are - Use default values for any items not specified in config - Download migrations and save downloads according to config

1. This will only decode the "migration" section of the config. Other sections may not be compatible. 2. This will avoid _caching_ the pre-migration config in the command context.

And close the file.

- Add unit test for readMigrationConfig

- Peer info is not provided by Migration condif - Do not fail, but use defaults, if Bootstrap or Peering config is not readable

gammazero requested review from mvdan and aschmahmann April 8, 2021 15:59

BigLep added this to the go-ipfs 0.9 milestone Apr 8, 2021

mvdan reviewed Apr 8, 2021

View reviewed changes

gammazero requested a review from mvdan April 8, 2021 23:40

gammazero force-pushed the feat/migration-ipfs-download branch from 600fd42 to bd7e480 Compare April 9, 2021 21:22

aschmahmann requested changes Apr 12, 2021

View reviewed changes

aschmahmann mentioned this pull request Apr 13, 2021

Release v0.9.0 #8058

Closed

71 tasks

gammazero requested a review from aschmahmann April 13, 2021 00:56

aschmahmann reviewed Apr 13, 2021

View reviewed changes

gammazero requested a review from aschmahmann April 14, 2021 02:20

gammazero force-pushed the feat/migration-ipfs-download branch 2 times, most recently from e3eba81 to 8a5dd4c Compare April 14, 2021 16:46

gammazero force-pushed the feat/migration-ipfs-download branch from 8a5dd4c to 4989b69 Compare April 15, 2021 19:12

aschmahmann reviewed Apr 16, 2021

View reviewed changes

repo/fsrepo/migrations/ipfsfetcher/ipfsfetcher.go Outdated Show resolved Hide resolved

gammazero requested a review from aschmahmann April 17, 2021 08:17

aschmahmann reviewed Apr 19, 2021

View reviewed changes

gammazero requested review from aschmahmann and removed request for mvdan April 19, 2021 18:53

gammazero force-pushed the feat/migration-ipfs-download branch 3 times, most recently from 4939e74 to 38d3c36 Compare May 4, 2021 00:05

lidel mentioned this pull request May 4, 2021

failed update: failed to find latest fs-repo-migrations: net/http: TLS handshake timeout ipfs/ipfs-desktop#1773

Closed

gammazero and others added 23 commits May 11, 2021 10:06

Disable listening for inbound connections

8a7a4c3

Read peers from CLI flag instead of from existing config file

f9da061

Reading from an existing config file, particularly before migration, may be problematic. Instead, let the user specify peers on the command line if they want to specify any to assist with downloading migrations.

fix test

1e16ba1

Build on go < 1.16

55f239b

fix missed error check

ac8aa16

Migration temp node listens on any port available

890bd7e

Configure migration from config and not from CLI flags

e5254d2

- Read "Migration" section of IPFS config file before migrations are - Use default values for any items not specified in config - Download migrations and save downloads according to config

check error return

163c003

Review changes

9768016

Use latest config

3a0be4c

Document Migration config

ee577fe

error message formatting

9727b4f

manually load the config when migrating

7d16a95

1. This will only decode the "migration" section of the config. Other sections may not be compatible. 2. This will avoid _caching_ the pre-migration config in the command context.

fix: factor the migration file reading code into a function

96bad6f

And close the file.

Move readMigrationConfig into migration.go

d40d0f1

- Add unit test for readMigrationConfig

Attempt to read Bootstrap and Peering from config

8d1dbfa

- Peer info is not provided by Migration condif - Do not fail, but use defaults, if Bootstrap or Peering config is not readable

Revert to previous multihash version

03454b7

Additional unit tests

bd6646f

Update documentation

3a4d85e

Better error messages

40d412a

Review changes

e37896d

gammazero force-pushed the feat/migration-ipfs-download branch from ddaffdd to e37896d Compare May 11, 2021 17:22

aschmahmann approved these changes May 12, 2021

View reviewed changes

aschmahmann merged commit c54cdaa into master May 12, 2021

gammazero deleted the feat/migration-ipfs-download branch May 12, 2021 17:10

This was referenced Aug 2, 2022

HTTP fetch of fs-migrations should use CAR #9159

Closed

Repo migration using IPFS #4247

Closed

		@@ -56,27 +71,52 @@ func TestGetMigrationFetcher(t *testing.T) {
		if mf.Len() != 3 {
		t.Fatal("expected3 fetchers in MultiFetcher")

	t.Fatal("expected3 fetchers in MultiFetcher")
	t.Fatal("expected 3 fetchers in MultiFetcher")

Feat/migration ipfs download #8064

Feat/migration ipfs download #8064

Conversation

gammazero commented Apr 8, 2021

mvdan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero commented Apr 8, 2021

gammazero commented Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

gammazero Apr 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero Apr 12, 2021 • edited Loading

Choose a reason for hiding this comment

aschmahmann Apr 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aschmahmann Apr 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aschmahmann Apr 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero commented Apr 15, 2021

gammazero commented Apr 15, 2021

gammazero commented Apr 16, 2021

aschmahmann commented Apr 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

gammazero commented Apr 9, 2021 •

edited

Loading

gammazero Apr 12, 2021 •

edited

Loading

gammazero Apr 12, 2021 •

edited

Loading

aschmahmann Apr 12, 2021 •

edited

Loading

aschmahmann Apr 12, 2021 •

edited

Loading

aschmahmann Apr 13, 2021 •

edited

Loading

gammazero Apr 19, 2021 •

edited

Loading