Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow manually remove invalid snapshots on restore #901

Merged
merged 1 commit into from
Sep 4, 2022

Conversation

ktock
Copy link
Member

@ktock ktock commented Aug 29, 2022

Related: #901

On restart, containerd-stargz-grpc doesn't startup if one of the snapshots cannot restore.
This makes us impossible even to manually remove images and snapshots (e.g. by using ctr).

# ctr-remote i rpull --plain-http registry2:5000/ubuntu:20.04-esgz
(Kill "registry2:5000")
# ps -C containerd-stargz-grpc -opid | xargs -I{} kill {}
# containerd-stargz-grpc &
{"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/2/fs","time":"2022-08-29T16:16:55.765328918Z"}
{"error":"failed to restore remote snapshot: failed to prepare remote snapshot: sha256:6311216555c423c3b445877021293bf0285ab61850216d001a70bebd627743c4: failed to resolve layer: failed to resolve layer \"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\" from \"registry2:5000/ubuntu:20.04-esgz\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\"): failed to request: GET https://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0 giving up after 6 attempt(s): Get \"https://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\"): failed to request: GET http://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0 giving up after 6 attempt(s): Get \"http://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to resolve: failed to resolve target","level":"fatal","msg":"failed to create new snapshotter","time":"2022-08-29T16:16:58.635012814Z"}

This commit allows to start containerd-stargz-grpc even if there are unusable snapshots.
This leaves unusable snapshots (i.e. their mount.Mount will fail with error) after restore so user needs to remove these snapshots (e.g. by using ctr).
The warning message shows image name and key of the unusable snapshot so use can use this info for manually removing the image.

Add the following config to config.toml:

[snapshotter]
allow_invalid_mounts_on_restart = true
# ctr-remote i rpull --plain-http registry2:5000/ubuntu:20.04-esgz
(Kill "registry2:5000")
# ps -C containerd-stargz-grpc -opid | xargs -I{} kill {}
# containerd-stargz-grpc &
{"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/2/fs","time":"2022-08-29T16:26:13.244775026Z"}
{"error":"failed to resolve layer: failed to resolve layer \"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\" from \"registry2:5000/ubuntu:20.04-esgz\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\"): failed to request: GET https://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0 giving up after 6 attempt(s): Get \"https://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\"): failed to request: GET http://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0 giving up after 6 attempt(s): Get \"http://registry2:5000/v2/ubuntu/blobs/sha256:ea8aeca69d07706fb87e659f5303278f0a416188e69e14eaecc08e41f4cb6ca0\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to resolve: failed to resolve target","level":"warning","msg":"failed to restore remote snapshot sha256:6311216555c423c3b445877021293bf0285ab61850216d001a70bebd627743c4; remove this snapshot manually","time":"2022-08-29T16:26:15.940504334Z"}
{"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/3/fs","time":"2022-08-29T16:26:15.940560695Z"}
{"error":"failed to resolve layer: failed to resolve layer \"sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076\" from \"registry2:5000/ubuntu:20.04-esgz\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076\"): failed to request: GET https://registry2:5000/v2/ubuntu/blobs/sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076 giving up after 6 attempt(s): Get \"https://registry2:5000/v2/ubuntu/blobs/sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076\"): failed to request: GET http://registry2:5000/v2/ubuntu/blobs/sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076 giving up after 6 attempt(s): Get \"http://registry2:5000/v2/ubuntu/blobs/sha256:61a9dd44cb77a78807152f401650d8a5233cf1cbc67c736e2c6d495704462076\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to resolve: failed to resolve target","level":"warning","msg":"failed to restore remote snapshot sha256:ae017491d3ee79590f16c753ee3a0638aa3945a92efa507d37e62f0ee6c846fb; remove this snapshot manually","time":"2022-08-29T16:26:18.671965059Z"}
{"level":"info","msg":"preparing filesystem mount at mountpoint=/var/lib/containerd-stargz-grpc/snapshotter/snapshots/1/fs","time":"2022-08-29T16:26:18.672022051Z"}
{"error":"failed to resolve layer: failed to resolve layer \"sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8\" from \"registry2:5000/ubuntu:20.04-esgz\": failed to resolve the blob: failed to resolve the source: cannot resolve layer: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8\"): failed to request: GET https://registry2:5000/v2/ubuntu/blobs/sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8 giving up after 6 attempt(s): Get \"https://registry2:5000/v2/ubuntu/blobs/sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to redirect (host \"registry2:5000\", ref:\"registry2:5000/ubuntu:20.04-esgz\", digest:\"sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8\"): failed to request: GET http://registry2:5000/v2/ubuntu/blobs/sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8 giving up after 6 attempt(s): Get \"http://registry2:5000/v2/ubuntu/blobs/sha256:93ef9da9c5f3db2ac7fd52cab0e63c7e6fc9a722aec43d1b936a130e979ab8e8\": dial tcp: lookup registry2: Temporary failure in name resolution: failed to resolve: failed to resolve target","level":"warning","msg":"failed to restore remote snapshot sha256:c6a49101bf086368b1f5cd9a1cbbf00a5669cbcb289a7ed024c813093323a419; remove this snapshot manually","time":"2022-08-29T16:26:21.170017745Z"}

In this case, containerd-stargz-grpc restarted but the image registry2:5000/ubuntu:20.04-esgz isn't usable as shown in the above warning message.
When you run this image, you get the following error:

# ctr-remote snapshot --snapshotter=stargz ls
KEY                                                                     PARENT                                                                  KIND      
sha256:6311216555c423c3b445877021293bf0285ab61850216d001a70bebd627743c4 sha256:c6a49101bf086368b1f5cd9a1cbbf00a5669cbcb289a7ed024c813093323a419 Committed 
sha256:ae017491d3ee79590f16c753ee3a0638aa3945a92efa507d37e62f0ee6c846fb sha256:6311216555c423c3b445877021293bf0285ab61850216d001a70bebd627743c4 Committed 
sha256:c6a49101bf086368b1f5cd9a1cbbf00a5669cbcb289a7ed024c813093323a419                                                                         Committed 
# ctr-remote run --snapshotter=stargz --rm -t registry2:5000/ubuntu:20.04-esgz foo echo hi
{"error":"layer not registered","key":"sha256:ae017491d3ee79590f16c753ee3a0638aa3945a92efa507d37e62f0ee6c846fb","level":"warning","mount-point":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/1/fs","msg":"layer is unavailable","time":"2022-08-29T16:27:15.123297261Z"}
{"error":"layer not registered","key":"sha256:ae017491d3ee79590f16c753ee3a0638aa3945a92efa507d37e62f0ee6c846fb","level":"warning","mount-point":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/3/fs","msg":"layer is unavailable","time":"2022-08-29T16:27:15.123349491Z"}
{"error":"layer not registered","key":"sha256:ae017491d3ee79590f16c753ee3a0638aa3945a92efa507d37e62f0ee6c846fb","level":"warning","mount-point":"/var/lib/containerd-stargz-grpc/snapshotter/snapshots/2/fs","msg":"layer is unavailable","time":"2022-08-29T16:27:15.123364359Z"}
ctr-remote: layer "4" unavailable: unavailable

So you need to manually remove it using ctr.

# ctr-remote i rm registry2:5000/ubuntu:20.04-esgz
registry2:5000/ubuntu:20.04-esgz
# ctr-remote snapshot --snapshotter=stargz ls
KEY PARENT KIND

In the future, we should have a better support for restoring/restarting.

@ktock
Copy link
Member Author

ktock commented Sep 1, 2022

depends on #902 to pass CI


// SnapshotterConfig is snapshotter-related config.
type SnapshotterConfig struct {
// NoRestoreInvalid doesn't restore invalid snapshots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "restore" mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the patch not to use "restore" but use clearer expression.

// AllowInvalidMountOnRestart allows that there are snapshot mounts that cannot access to the
// data source when restarting the snapshotter.
// NOTE: User needs to manually remove the snapshots from containerd's metadata store using
// ctr (e.g. `ctr i rm`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean ctr snapshot rm ?

// data source when restarting the snapshotter.
// NOTE: User needs to manually remove the snapshots from containerd's metadata store using
// ctr (e.g. `ctr i rm`).
AllowInvalidMountOnRestart bool `toml:"allow_invalid_mount_on_restart"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AllowInvalidMountOnRestart bool `toml:"allow_invalid_mount_on_restart"`
AllowInvalidMountsOnRestart bool `toml:"allow_invalid_mounts_on_restart"`

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants