dregsy lets you sync container images between registries, public or private. Several sync tasks can be defined, as one-off or periodic tasks (see Configuration section). An image is synced by using a sync relay. Currently, this can be either Skopeo or a local Docker daemon. When using the latter, the image is first pulled from the source, then tagged for the destination, and finally pushed there. Skopeo in contrast, can directly transfer an image from source to destination, which makes it the preferred choice.
Sync tasks are defined in a YAML config file:
# relay type, either 'skopeo' or 'docker'
relay: skopeo
# whether to watch this config file and restart on change, defaults to false
watch: true
# relay config sections
skopeo:
# path to the skopeo binary; defaults to 'skopeo', in which case it needs to
# be in PATH
binary: skopeo
# directory under which to look for client certs & keys, as well as CA certs
# (see note below)
certs-dir: /etc/skopeo/certs.d
docker:
# Docker host to use as the relay
dockerhost: unix:///var/run/docker.sock
# Docker API version to use, defaults to 1.41
api-version: 1.41
# settings for image matching (see below)
lister:
# maximum number of repositories to list, set to -1 for no limit, defaults to 100
maxItems: 100
# for how long a repository list will be re-used before retrieving again;
# specify as a Go duration value ('s', 'm', or 'h'), set to -1 for not caching,
# defaults to 1h
cacheDuration: 1h
# list of sync tasks
tasks:
- name: task1 # required
# interval in seconds at which the task should be run; when omitted,
# the task is only run once at start-up
interval: 60
# determines whether for this task, more verbose output should be
# produced; defaults to false when omitted
verbose: true
# 'source' and 'target' are both required and describe the source and
# target registries for this task:
# - 'registry' points to the server; required
# - 'auth' contains the base64 encoded credentials for the registry
# in JSON form {"username": "...", "password": "..."}
# - 'auth-refresh' specifies an interval for automatic retrieval of
# credentials; only for AWS ECR (see below)
# - 'skip-tls-verify' determines whether to skip TLS verification for the
# registry server (only for 'skopeo', see note below); defaults to false
source:
registry: source-registry.acme.com
auth: eyJ1c2VybmFtZSI6ICJhbGV4IiwgInBhc3N3b3JkIjogInNlY3JldCJ9Cg==
target:
registry: dest-registry.acme.com
auth: eyJ1c2VybmFtZSI6ICJhbGV4IiwgInBhc3N3b3JkIjogImFsc29zZWNyZXQifQo=
skip-tls-verify: true
# 'mappings' is a list of 'from':'to' pairs that define mappings of image
# paths in the source registry to paths in the destination:
# - 'from' is required, while 'to' can be dropped if the path should remain
# the same as 'from'.
# - Regular expressions are supported in both fields (read on below for
# more details).
# - The tags being synced for a mapping can be limited by providing a 'tags'
# list. This list may contain semver and regular expressions filters
# (see below). When omitted, all image tags are synced.
# - With 'platform', the image to sync from a multi-platform source image
# can be selected (see below).
mappings:
- from: test/image
to: archive/test/image
tags: ['0.1.0', '0.1.1']
- from: test/another-image
platform: linux/arm64/v8
When syncing via a Docker relay, do not use the same Docker daemon for building local images (even better: don't use it for anything else but syncing). There is a risk that the reference to a locally built image clashes with the shorthand notation for a reference to an image on docker.io
. E.g. if you built a local image busybox
, then this would be indistinguishable from the shorthand busybox
pointing to docker.io/library/busybox
. One way to avoid this is to use registry.hub.docker.com
instead of docker.io
in references, which would never get shortened. If you're not syncing from/to docker.io
, then all of this is not a concern.
By setting watch: true
, you can make dregsy watch the config file. If it changes, dregsy will restart. If there is a task currently being synced, dregsy waits for it to complete. The restart is a full restart, so if your config contains one-off tasks, they will be run. If dregsy was started with a task filter via the run
option, this filter stays active. If the new config causes any validation errors, dregsy will stop. Note that the config file watch does not work if the file resides on an NFS, SMB, or FUSE file system (see the fsnotify package).
You can also trigger a restart by sending SIGHUP
to the dregsy process. This can be useful if you want more control over when a restart should occur, or you cannot use config file watch. The restart behavior is the same as outlined above for config file watch.
The mappings
section of a task can employ Go regular expressions for describing what images to sync, and how to change the destination path and name of an image. Details about how this works and examples can be found in this design document. Also keep in mind that regular expressions can be surprising at times, so it would be a good idea to try them out first in a Go playground. You may otherwise potentially sync large numbers of images, clogging your target registry, or running into rate limits. Feedback about this feature is encouraged!
The tags
list of a task can use semver and regular expression filters, so you can do something like this:
tags:
- 'semver: >=1.31.0 <1.31.9'
- 'regex: 1\.26\.[0-9]-(glibc|uclibc|musl)'
- '1.29.4'
- 'latest'
This syncs all tags describing versions equal to or larger than 1.31.0
, but lower than 1.31.9
, via the semver:
filter. The regex:
filter additionally syncs any 1.26.
x image with suffix -glibc
, -uclibc
, or -musl
. Finally, the verbatim tags 1.29.4
and latest
are also synced.
Note that the tags of an image need to conform to the semver specification 2.0.0 in order to be considered during filtering. The implementation uses the blang/semver lib. Have a look at their page or the GoDoc for more info on how to write semver filter expressions. Semver filtering handles tags starting with a v
prefix. It also tolerates suffixes, for example platform IDs which are often used in tags, as long as the tag starts with a full major.minor.patch semver. Semver filter expressions however must not use a v
prefix or any suffix.
Regex filters use standard Go regular expressions. When the first non-whitespace character after regex:
is !
, the filter will use inverted match. Keep in mind that when a regex contains a backslash, you need to place it inside single quotes to keep the YAML valid.
You can add multiple semver:
and regex:
filters under tags
. Note however that the filters are simply ORed, i.e. a tag is synced if it satisfies at least one of the items under tags
, be it semver, regex, or verbatim. So this is not a filter chain. Also, no sanity checks are done on the filters, so care must be taken to avoid competing or contradicting filters that select all or nothing at all.
Additionally it is possible to prune the resulting tag set with one or more keep:
filters. These are regular expressions, identical to regex:
filters (including inversion), but they get applied last, independent of where in the list they appear. If a tag in the filtered tag set does not match all of the keep:
filters, it is removed from the set. This helps in defining tag filters that would be hard to describe with only semver and regular expressions. Note however that keep:
filters do not apply to verbatim tags!
Here's an example for a source registry that attaches OS suffixes to their version tags, such as 2.1.4-buster
:
tags:
- 'latest'
- 'semver: >=2.0.0'
- 'keep: .+-(alpine|buster)'
This selects all releases starting with version 2.0.0
, but only for the -alpine
and -buster
suffixes. The latest
tag however is still included in the sync.
Limiting the Tag Count α feature
A special keep:
directive is keep: latest n
. This limits the set of tags to the latest n tags, and is enforced at the very end of tag set pruning, i.e. on the already pruned tag set. The latest tags are determined by sorting the tags as semvers in descending order and picking the first n. Any tags that are not legal semvers are kept, so the reduced set may actually contain more than n items. This allows to include certain verbatim tags such as testing
or qa
in addition to the latest n releases. However, if there are no legal semver tags in the set at all, the first n tags based on a descending string sort are picked.
Here is an example:
tags:
- 'glibc-tests'
- 'semver: >=1.34.0 <=1.36.0'
- 'keep: latest 5'
This selects all releases from version 1.34.0
through 1.36.0
, but limits them to the latest 5. The verbatim tag glibc-tests
is always included.
Keep the following in mind when using tag count limits:
-
Since for repositories using both semver and non-semver tags, the latter ones are always kept, use appropriate tag & pruning filters if there are many non-semver tags. This avoids overly large result sets, which could otherwise render tag count limiting useless.
-
For repositories with no semver tags, tag count limiting may not be suitable depending on what kind of tags are present. When sorted in string order, the tags need to represent their temporal order. Where for example arbitrary code names are used, this will not work.
-
If several
keep: latest
directives are specified in atags
list, the last one is used.
Verbatim tags in a tags
list may also contain image digests to uniquely identify the requested image. The format for verbatim tags with digests is [tag@]sha256:{digest value}
, i.e. the tag name can be dropped. As all verbatim tags, they can be mixed with tag filter expressions (see above). If a digest is present, the behavior is as follows:
-
When pulling from the source, the tag is dropped if present, and only the digest used. This is done to achieve consistent behavior between the Skopeo and Docker relays.
Background: Skopeo currently does not support both tag and digest in the same image reference and exits with an error. For Docker, the behavior depends on the version: up through version 1.13.1 and starting again with v20.10.20, Docker checks whether tag and digest match and throws an error if they don't. For versions in between, the tag is ignored.
-
When pushing to the target, the name if present is used and the digest dropped. Otherwise the digest is used.
If there is only a digest, the Docker relay auto-generates a tag of the format
dregsy-{digest value}
, since Docker does not support pushing by digest-only references. If tags of this format are not desired, specify tags for all digests in your sync config, which would then be used instead.
Here's an example:
tags:
- 'sha256:1d8a...'
- '1.36.0-uclibc@sha256:58f1...'
This syncs two distinct versions of an image, according to the given SHA256 sums. For the second digest in the list, tag 1.36.0-uclibc
is created in the target repository.
When the source image is a multi-platform image, the platform image adequate for the system on which dregsy runs is synced by default. Where this is not applicable, the desired platform can be specified via the platform
setting, separately for each mapping. To sync all available platform images, platform: all
can be used. Note however that this shorthand is only supported by the Skopeo relay.
To sync a selection of platform images from the same multi-platform source image, several mappings with according platform
settings can be defined. However, be careful not to map them into the same destination, i.e. use different to
settings. Otherwise, the synced platform images will "overwrite" each other, with only the last image synced being available from the target repository.
When connecting to source and target repository servers, TLS validation is performed to verify the identity of a server. If you're using self-signed certificates for a repo server, or a server's certificate cannot be validated with the CA bundle available on your system, you need to provide the required CA certs. The dregsy container image includes the CA bundle that comes with the Alpine base image. Also, if a repo server requires client authentication, i.e. mutual TLS, you need to provide an appropriate client key & cert pair.
How you do that for Docker is described here. The short version: create a folder under /etc/docker/certs.d
with the same name as the repo server's host name, e.g. source-registry.acme.com
, and place any required CA certs there as *.crt
(mind the extension). Client key & cert pairs go there as well, as *.key
and *.cert
.
Example:
/etc/docker/certs.d/
└── source-registry.acme.com
├── client.cert
├── client.key
└── ca.crt
When using the skopeo
relay, this is essentially the same, except that you specify the root folder with the skopeo
setting certs-dir
(defaults to /etc/skopeo/certs.d
). However, it's important to note the following differences:
-
When a repo server uses a non-standard port, the port number is included in image references when pulling and pushing. For TLS validation,
docker
will accordingly expect a{registry host name}:{port}
folder. Forskopeo
, this is not the case, i.e. the port number is dropped from the folder name. This was a conscious decision to avoid pain when running dregsy in Kubernetes and mounting certs & keys from secrets: mount paths must not contain:
. -
To skip TLS verification for a particular repo server when using the
docker
relay, you need to configure the Docker daemon accordingly. Withskopeo
, you can easily set this in any source or target definition with theskip-tls-verify
setting.
If a source (private registry only) or target (private & public) is an AWS ECR registry, you need to retrieve the auth
credentials via AWS CLI. They would however only be good for 12 hours, which is ok for one off tasks. For periodic tasks, or to avoid retrieving the credentials manually, you can specify an auth-refresh
interval as a Go Duration
, e.g. 10h
. If set, dregsy will initially and whenever the refresh interval has expired retrieve new access credentials. auth
can be omitted when auth-refresh
is set. Setting auth-refresh
for anything other than an AWS ECR registry will raise an error.
Note however that you either need to set environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
for the AWS account you want to use and a user with sufficient permissions. Or if you're running dregsy on an EC2 instance in your AWS account, the machine should have an appropriate instance profile. An according policy could look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:CreateRepository"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:DescribeRepositories",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
],
"Resource": "arn:aws:ecr:<your_region>:<your_account>:repository/*"
}
]
}
If a source or target is a Google Container Registry (GCR) or a Google Artifact Registry for containers, auth
may be omitted altogether. In this case either GOOGLE_APPLICATION_CREDENTIALS
variable must be set (which is supposed to contain a path to a JSON file with credentials for a GCP service account), or dregsy must be run on a GCE instance with an appropriate service account attached. registry
must be either specified as any of the GCR addresses (i.e. gcr.io
, us.gcr.io
, eu.gcr.io
, or asia.gcr.io
), or have the suffix -docker.pkg.dev
for artifact registry. The from
/to
mapping must include your GCP project name (i.e. your-project-123/your-image
). Note that GOOGLE_APPLICATION_CREDENTIALS
, if set, takes precedence even on a GCE instance.
If these mechanisms are not applicable in your use case, you can also authenticate with an OAuth2 token as described in the Artifact Registry Authentication -> Access token documentation. This token will then be used continuously without performing any authentication refreshes. In this case, set the content of auth
to the base64 encoded JSON credentials:
{
"username": "oauth2accesstoken",
"password": "<oauth2 token as described in the Artifact Registry documentation>"
}
If you want to use GCR or artifact registry as the source for a public image, you can deactivate authentication all together by setting auth
to none
.
Keeping the Config Dry & Secure
If you need to use the same configuration items in several places, for example when you want to sync the same image mapping from one source registry to several different destinations, you can use YAML anchors & aliases to avoid duplication. For example:
tasks:
- name: one
source: &source
registry: source.reg
target:
registry: foo
mappings: &mappings
- from: library/busybox
to: base/library/busybox
tags: ['a', 'b', 'c']
- ...
- name: two
source: *source
target:
registry: bar
mappings: *mappings
This sets the anchors source
and mappings
for the source registry and the desired mapping on their first occurrence, which are then referenced with *source
and *mappings
wherever else they are needed.
If you want to avoid including secrets such as registry passwords in the config, you can put ${...}
style variables in their place, and use for example envsubst
(either standard from package repos, or this one for more options) to substitute them during deployment, while secrets are present in the environment.
dregsy -config={path to config file} [-run={task name regexp}]
If there are any periodic sync tasks defined (see Configuration above), dregsy remains running indefinitely. Otherwise, it will return once all one-off tasks have been processed. With the -run
argument you can filter tasks. Only those tasks for which the task name matches the given regular expression will be run. Note that the regular expression performs a line match, so you don't need to place the expression in ^...$
to get an exact match. For example, -run=task-a
will only select task-a
, but not task-abc
.
Logging behavior can be changed with these environment variables:
variable | function | values |
---|---|---|
LOG_LEVEL |
log level; defaults to info |
fatal , error , warn , info , debug , trace |
LOG_FORMAT |
log format; gets automatically switched to JSON when dregsy is run without a TTY | json to force JSON log format, text to force text output |
LOG_FORCE_COLORS |
force colored log messages when running with a TTY | true , false |
LOG_METHODS |
include method names in log messages | true , false |
Pre-built binaries are provided in the release section of this project. Note however that currently, only the linux_amd64
flavor is tested upon release. All other binaries are untested. Feedback is welcome!
If you run dregsy natively on your system, with relay type docker
, the Docker daemon of your system will be used as the relay for all sync tasks, so all synced images will wind up in the Docker storage of that daemon.
You can use the dregsy image on Dockerhub for running dregsy containerized. There are two variants: one is based on Alpine, and suitable when you just want to run dregsy. The other variant is based on Ubuntu. It's somewhat larger, but may be better suited as a base when you want to extend the dregsy image. It's often easier to add things there than on Alpine, e.g. the AWS command line interface.
With each release, three tags get published: {version}-ubuntu
, {version}-alpine
, and {version}
, with the latter two referring to the same image. The same applies for latest
. The Skopeo versions contained in the two variants may not always be exactly the same, but should only differ in patch level.
The image includes the skopeo
binary, so all that's needed is:
docker run --rm -v {path to config file}:/config.yaml xelalex/dregsy
This will still use the local Docker daemon as the relay:
docker run --privileged --rm -v {path to config file}:/config.yaml -v /var/run/docker.sock:/var/run/docker.sock xelalex/dregsy
When you run a container registry inside your Kubernetes cluster as an image cache, dregsy can come in handy as an automated updater for that cache. The example config below uses the skopeo
relay:
relay: skopeo
tasks:
- name: task1
interval: 60
source:
registry: registry.acme.com
auth: eyJ1c2VybmFtZSI6ICJhbGV4IiwgInBhc3N3b3JkIjogInNlY3JldCJ9Cg==
target:
registry: registry.my-cluster
auth: eyJ1c2VybmFtZSI6ICJhbGV4IiwgInBhc3N3b3JkIjogImFsc29zZWNyZXQifQo=
mappings:
- from: test/image
to: archive/test/image
- from: test/another-image
To keep your registry auth tokens in the config file secure, we are creating a Kubernetes Secret instead of a ConfigMap:
kubectl create secret generic dregsy-config --from-file=./config.yaml
In addition, you will most likely want to mount client certs & keys, and CA certs from Kubernetes secrets into the pod for TLS validation to work. (The CA bundle from the official golang
image is already included in the dregsy image.)
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: kube-registry-updater
namespace: kube-system
labels:
k8s-app: kube-registry-updater
kubernetes.io/cluster-service: "true"
spec:
serviceName: kube-registry-updater
replicas: 1
template:
metadata:
labels:
k8s-app: kube-registry-updater
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: dregsy
image: xelalex/dregsy
command: ['dregsy', '-config=/config/config.yaml']
resources:
requests:
cpu: 10m
memory: 32Mi
volumeMounts:
- name: dregsy-config
mountPath: /config
readOnly: true
volumes:
- name: dregsy-config
secret:
secretName: dregsy-config
The Makefile
has targets for building the binary and container image, and other stuff. Just run make
to get a list of the targets, and info about configuration items. Note that for consistency, building is done inside a Golang build container, so you will need Docker to build. dregsy's Docker image is based on Alpine, and installs Skopeo via apk
during the image build.
Tests are also started via the Makefile
. To run the tests, you will need to prepare the following:
-
Configure the Docker daemon: The tests run containerized, but need access to the local Docker daemon for testing the Docker relay. One way is to mount the
/var/run/docker.socks
socket into the container (theMakefile
takes care of that). However, thedocker
group on the host may not map onto the group of the user inside the testing container. The preferred way is therefore to let the Docker daemon listen on127.0.0.1:2375
. Since the testing container runs with host network, the tests can access this directly. Decide which setup to use and configure the Docker daemon accordingly. Additionally, set it to accept127.0.0.1:5000
as an insecure registry. -
An AWS account to test syncing with ECR: Create a technical user in that account. This user should have full ECR permissions, i.e. the
AmazonEC2ContainerRegistryFullAccess
policy attached, since it will delete the used repo after the tests are done. -
A Google Cloud account to test syncing with GCR and artifact registry: Create a project with the Container Registry and Artifact Registry APIs enabled. In that project, you need a service account with the roles Cloud Build Service Agent, Storage Object Admin, and Artifact Registry Repository Administrator enabled, since this service account also will need to delete the synced images again after the tests.
The details for above requirements are configured via a .makerc
file in the root of this project. Just run make
and check the Notes section in the help output. Here's an example:
# Docker config; to use the Unix socket, set to unix:///var/run/docker.sock
DREGSY_TEST_DOCKERHOST = tcp://127.0.0.1:2375
# ECR
DREGSY_TEST_ECR_REGISTRY = {account ID}.dkr.ecr.eu-central-1.amazonaws.com
DREGSY_TEST_ECR_REPO = dregsy/test
AWS_ACCESS_KEY_ID = {key ID}
AWS_SECRET_ACCESS_KEY = {access key}
# GCP
GCP_CREDENTIALS = {full path to access JSON of service account}
# GCR
DREGSY_TEST_GCR_HOST = eu.gcr.io
DREGSY_TEST_GCR_PROJECT = {your project}
DREGSY_TEST_GCR_IMAGE = dregsy/test
# GAR
DREGSY_TEST_GAR_HOST = europe-west3-docker.pkg.dev
DREGSY_TEST_GAR_PROJECT = {your project}
DREGSY_TEST_GAR_IMAGE = dregsy/test
You can select a particular test case with the -run
argument, which you can pass to go test
via the TEST_OPTS
environment variable. Note that the -run
argument is a regular expression. This for example would run only the Skopeo end-to-end tests, with verbose output and skipping the Ubuntu based dregsy image:
VERBOSE=y TEST_UBUNTU=n TEST_OPTS="-run TestE2ESkopeo.*" make tests