Skip to content
This repository has been archived by the owner on Feb 12, 2021. It is now read-only.

Quay Enterprise: add monitoring doc and small update to builder doc #1237

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions quay-enterprise/build-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ If Quay is setup to use a SSL certificate that is not globally trusted, for exam
docker run --restart on-failure -e SERVER=wss://myquayenterprise -v /path/to/ssl/rootCA.pem:/usr/local/share/ca-certificates/rootCA.pem -v /var/run/docker.sock:/var/run/docker.sock --entrypoint /bin/sh quay.io/coreos/quay-builder:v2.8.0 -c '/usr/sbin/update-ca-certificates && quay-builder'
```

The logs of the quay-builder container can provide insight into the build related issues. It can be useful configure a log driver to aggregate and archive these logs. This can be done by passing a `--log-driver` flag in the `docker run` comand or configuring a logging driver via the Docker daemon.

### Setup GitHub build (optional)

If your organization plans to have builds be conducted via pushes to GitHub (or GitHub Enterprise), please continue
Expand Down
52 changes: 52 additions & 0 deletions quay-enterprise/monitoring-quay-enterprise.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Monitoring Quay Enterprise

It is essential to identify, measure and evaluate the performance of Quay Enterprise when running in production.

## Endpoints

There are two main web endpoints to monitor in Quay Enterprise:

`/heath/endtoend`

`/health/instance` (on every container instance)

Both of these endpoints return JSON that describes the status of the various components of the registry. It is essential to monitor the `status_code` parameter for a non-200 response as this indicates an issue that should be immediately addressed as it may impact registry functionality. Other services in this JSON report `true` or `false`. With `true` meaning the service is functioning and `false` indicating an issue with the service.

### Healthcheck Frequency

Once a minute or so on `/health/instance`

Once every 2 minutes on `/health/endtoend`

### Considerations

Pinging these endpoints opens connections to the various services utilized by the registry. This is essential to ensuring the availability of the registry but may cause issues if checks are improperly configured. For example a improperly configured healthcheck, such as one with an incredibly small interval, may cause db connection timeouts or connection limits to be hit or cause storage engines to trigger automatic rate limiting.

## Database

Quay Enterprise does not support master-master database setups therefore it is essential to monitor the performance and ensure the availability of the database being utilized by registry. It is advised to follow the upstream documentation of the database vendor or use a third party monitoring tool such as Sysdig, Datadog, or Prometheus.

## Storage

Quay Enterprise supports a variety of storage engines each with guidelines for monitoring. Setting up monitoring of these services is not within the scope of this document.

The JSON returned by the `/health/endtoend` endpoint includes a check of the connection between the registry and the storage engine. If this value returns `false` this indicates that the storage engine is down or inaccessible by the registry.

## Redis

Redis is utilized by Quay Enterprise in a manner that allows toleration of failure. Browse the upstream redis documentation for more information on this topic.

## Auth

The endpoints described above will return `"auth": false` if there is any issues related to authentication services that may prevent users from login. This should be monitored closely if the registry is configured to use LDAP or another external auth service.

## Kubernetes Probes

The Quay Enterprise deployment manifest does not include any Liveness or Readiness probes. If probes are added the advice in this document should be taken into account. Liveness probes should make use of HTTP GET requests on the endpoints described above.