-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus metrics for dockerregistry #12711
Add prometheus metrics for dockerregistry #12711
Conversation
cce6b7d
to
1cc1c07
Compare
@legionus I assume the metrics will only be visible for the cluster admin user, right? @jcantrill FYI (this might be interesting for metrics) |
@mfojtik No. Right now |
@legionus you probably don't want to expose these data to anonymous users. also if we are going to include per-repo metrics (num of pulls), you want to make that info available only for admin (or the user that has access to the repo?). |
Metrics about response time and registry performance should not be exposed to end users. They can and should be restricted to cluster-admins or another privileged ops kind of role. Where do you envision per-repository metrics going, what would you expose (and how), and how do you plan to restrict access? |
@deads2k I know it's ugly, but I don't know other way how to control access without giving any access to other resources. |
cc @mwringe |
The approach established for app metrics is probably what we should follow - something associated with the registry container definition that is a shared secret that can be accessed by the metrics endpoint. |
Prometheus recommends being careful on high cardinality metrics. A good read is here: https://prometheus.io/docs/practices/instrumentation/ Will look more |
Some metrics I would expect to be able to tell end users about:
|
So we agreed we are not going to expose any data to end-users, but this endpoint will be for operator kind of users. The auth mechanism will be shared secret I assume, so think endpoint won't use the auth middleware. |
1cc1c07
to
6a148e1
Compare
var registerMetrics sync.Once | ||
|
||
// Register all metrics. | ||
func Register() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to make this function public nor to use sync.Once: it's called from init()
and should not be called manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually don't want to call things from init() unless we have to. That registers metrics globally and pollutes the namespace. We should only log metrics when we're going to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Not a problem. I call Register()
from pkg/cmd/dockerregistry.Execute()
if metrics enabled.
fca31a5
to
967600d
Compare
967600d
to
2cbfe94
Compare
@mfojtik @miminar @smarterclayton I added an additional section in the config.yml because this functionality is not specific to any particular middleware. I added the option to enable metrics collection and the option to specify a shared secret. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the prometheus. I'll take a look what it is and return back.
Tests are welcome.
context.GetLogger(app).Infof("listening on %v", config.HTTP.Addr) | ||
if err := http.ListenAndServe(config.HTTP.Addr, handler); err != nil { | ||
if extraConfig.Metrics.Enabled { | ||
log.Debugf("configured metrics endpoint at \"/extensions/v2/metrics\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.GetLogger(app)
|
||
ctx := context.Background() | ||
ctx, err = configureLogging(ctx, config) | ||
ctx = server.WithConfiguration(ctx, extraConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be set on the context? Can it be passed to RegisterMetricHandler
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved eye-to-eye. If we're going with the top-level openshift
section, this is the probably the only sensible way of passing the configuration to the registry/repository middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless, I'd still want to avoid using the configuration from the context wherever possible. Let's use server.AccessControllerParams
to configure the auth handler and pass the other configuration options directly to any other object we initialize here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar Please no. Otherwise we have to add that to the repository middleware options as well. It already contains many parameters. I don't want to adв internal options to repository middleware.
Please take a look at my new changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to adв internal options to repository middleware.
@legionus It's fine to use the config object in repository
middleware since we don't have a better way to pass the config there. On the other hand, there's no sense to have auth module differentiate between parameters passed via configuration file, parameters loaded from environment variables and parameters created on the fly. Those are assumptions that may easily become obsolete. Moreover, it makes unit tests hard to follow.
I still think we should pre-process the config and set server.AccessControllerParams
accordingly. And do similar for all the other wrappers but repository
. I won't block the review because of this though.
@dmage What is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing variables through a context is a code smell, like a having global variables. It's very convenient to write this way (you can get data from anywhere), but it really complicates the code.
But in this case I don't care, I want to try to refactor the repository
struct, after which there can emerge a place to pass the config. But it's too scary to do that without the good test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it as is. We can further tune it in a follow-up while fixing #13568. Let's focus on the prometheus here.
return err | ||
} | ||
|
||
type blobWriter struct { | ||
distribution.BlobWriter | ||
|
||
repo *repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we avoiding passing the repository everywhere? This could be just the Named
.
if audit.LoggerExists(ctx) { | ||
audit.GetLogger(ctx).LogResult(err, "BlobStore.Delete") | ||
} | ||
|
||
return err | ||
} | ||
|
||
type blobWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could this be renamed to auditBlobWriter
?
) | ||
|
||
// auditManifestService wraps a distribution.ManifestService to track operation result and | ||
// write it in the audit log. | ||
type auditManifestService struct { | ||
manifests distribution.ManifestService | ||
repo *repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the name please.
}, | ||
} | ||
} | ||
extensionsRouter := app.NewRoute().PathPrefix("/extensions/v2/").Subrouter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you turn this into a constant and use it on all the other places as well?
|
||
"github.com/docker/distribution/registry/handlers" | ||
|
||
gorillahandlers "github.com/gorilla/handlers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be squashed.
audit.GetLogger(ctx).Log("BlobStore.Stat") | ||
defer metrics.NewTimer(metrics.RegistryAPIRequests, []string{"blobstore.stat", b.repo.Named().Name()}).Stop() | ||
|
||
if audit.LoggerExists(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have audit and metric wrappers separated.
@@ -35,3 +35,7 @@ middleware: | |||
blobrepositorycachettl: 10m | |||
storage: | |||
- name: openshift | |||
openshift: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream has a configuration section called reporting. This could be probably nested there - similarly to the auth
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved eye to eye. @legionus convinced me that this is a preferred place for our current and future configuration. As long as it's versioned properly. In the future, we'd like to move middleware
section here completely. The openshift
section can co-exist with the middleware
entries for now since they have separate areas of interest. The middleware
section gets out of hand already so I'd prefer to do the configuration refactoring sooner rather than later. I'll open an issue to track this.
err := b.store.Delete(ctx, dgst) | ||
audit.GetLogger(ctx).LogResult(err, "BlobStore.Delete") | ||
|
||
if audit.LoggerExists(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two many ifs. I think separated metricBlobStore
would really make it more pleasant to the eye.
2cbfe94
to
8809b42
Compare
@miminar Most of your commens addressed. |
} | ||
|
||
// TODO add https scheme | ||
adminRouter := app.NewRoute().PathPrefix("/admin/").Subrouter() | ||
adminRouter := app.NewRoute().PathPrefix(api.AdminPrefix).Subrouter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the terminating slash. Quoting PathPrefix
's GoDoc:
// Note that it does not treat slashes specially ("/foobar/" will be matched by
// the prefix "/foo") so you may want to use a trailing slash here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar Fixed in dockerregistry/server/api
}, | ||
} | ||
} | ||
extensionsRouter := app.NewRoute().PathPrefix(api.ExtensionsPrefix).Subrouter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add terminating slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar Fixed in dockerregistry/server/api
"github.com/docker/distribution/digest" | ||
) | ||
|
||
// BlobStore wraps a distribution.BlobStore to collect statistic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statistics
|
||
AdminPath = "/blobs/{digest:" + reference.DigestRegexp.String() + "}" | ||
SignaturesPath = "/{name:" + reference.NameRegexp.String() + "}/signatures/{digest:" + reference.DigestRegexp.String() + "}" | ||
MetricsPath = "/metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leading slashes should be probably moved to the prefixes above.
} | ||
|
||
func (m *metricTimer) Stop() { | ||
m.collector.WithLabelValues(m.labels...).Observe(float64(time.Since(m.startTime) / time.Microsecond)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WithLabelValues
can panic. Could GetMetricWithLabelValues
be used instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar and what I gona do in this function if err != nil
?
This function is used as a defer
and I have at this moment no other possibility as to panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about logging the error. If the probability of panicking is close to zero, this is ok.
gorillahandlers "github.com/gorilla/handlers" | ||
) | ||
|
||
func Dispatcher(ctx *handlers.Context, r *http.Request) http.Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GoDoc
Secret string `yaml:"secret"` | ||
} | ||
|
||
func Parse(rd io.Reader) (*configuration.Configuration, *Configuration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc
return nil, nil, err | ||
} | ||
|
||
return dockerConfig, &config.Openshift, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this encounters newer version than supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar I'm not sure I understand the question but I will try to answer. The docker developers always return the configuration.Configuration
structure. They can convert old config to this structure, but they can't return something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last remark to the configuration concerns the version. We should define our own and panic if the configuration file defines any other.
|
||
p := configuration.NewParser("registry", []configuration.VersionedParseInfo{ | ||
{ | ||
Version: configuration.CurrentVersion, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be different from upstream version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar No. This is required to parse config file. Without it you will get Unsupport version error. Here, the entire upstream configuration is not important to us at all because we want to parse our own part of config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar But you are right and here we must specify the current version of config file.
|
||
ctx := context.Background() | ||
ctx, err = configureLogging(ctx, config) | ||
ctx = server.WithConfiguration(ctx, extraConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to adв internal options to repository middleware.
@legionus It's fine to use the config object in repository
middleware since we don't have a better way to pass the config there. On the other hand, there's no sense to have auth module differentiate between parameters passed via configuration file, parameters loaded from environment variables and parameters created on the fly. Those are assumptions that may easily become obsolete. Moreover, it makes unit tests hard to follow.
I still think we should pre-process the config and set server.AccessControllerParams
accordingly. And do similar for all the other wrappers but repository
. I won't block the review because of this though.
@dmage What is your opinion?
52e4106
to
d116155
Compare
@smarterclayton since @mfojtik asked me to add this PR to the 3.6, please take a look again. |
0453916
to
998f7c8
Compare
[test] #13644 |
Will try and take a look tomorrow.
…On Wed, Apr 5, 2017 at 2:02 PM, Alexey Gladkov ***@***.***> wrote:
[test]
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p3ZyU_bCApFIqnbqIqBRQsHKtXUmks5rs4LfgaJpZM4LxbRp>
.
|
@smarterclayton my reason this should be in 3.6 is basically the ops team being able to monitor health of the registry and make scaling decisions. we don't have to make this perfect right now and we can always add more metrics to this as we go. |
@mfojtik @smarterclayton ping |
LGTM Will wait for Clayton to have final word. |
config := ConfigurationFrom(ctx) | ||
if config.Metrics.Enabled { | ||
return config.Metrics.Secret == token | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already doing auth from the registry, we should use the SAR check instead of a secret. That allows us to then use the same policy the apiservers do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton In the beginning, we discussed this already and came to the decision that we would use the shared secret for metrics endpoint because router doing it in same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with that. If you are already doing auth via the master, we want metrics to use that behavior. Static secret is only for things that aren't integrated with master auth (which is a limited set, but registry is under)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, does the registry support cert auth (is it using the correct filter for that)? If not, I'm ok with static secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are already doing auth via the master, we want metrics to use that behavior. Static secret is only for things that aren't integrated with master auth (which is a limited set, but registry is under)
We have already discussed this. Registry metrics are not part of the master API. The metrics belongs to the registry server only. You propose to make some virtual object (registrymetrics
for example) and use SAR to check access to it ?
Actually, does the registry support cert auth (is it using the correct filter for that)? If not, I'm ok with static secret.
The registry does not have a ready solution for it. Additionally, cert authentication will work only if HTTPS enabled. It means that it will be impossible to restrict access to metrics in an insecure registry.
@mfojtik @smarterclayton I propose to leave the possibility of shared secret authentication and add cert authentication later (in 3.7) because we don't have time to add it for 3.6. ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm oking with it move to 3.7, make sure there is an issue tracking it.
Namespace: registryNamespace, | ||
Subsystem: registrySubsystem, | ||
Name: "request_duration_microseconds", | ||
Help: "Request latency summary in microseconds for each operation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard Prometheus convention is for everythjng to be in seconds now. See the Prometheus docs - there's a section that recommends naming and pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton Changed it to seconds. But http_request_duration_microseconds
will remain in place. We have really old version of github.com/prometheus/client_golang/prometheus
in the vendor and InstrumentHandler
uses microseconds as unit, which is deprecated in new version and should be replaced by seconds.
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
998f7c8
to
021cf0c
Compare
I'm ok with shared secret for now. Just to confirm - it can be set via
environment variable using normal Docker registry configuration override
rules?
…On Wed, Apr 12, 2017 at 3:56 PM, Alexey Gladkov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/dockerregistry/server/auth.go
<#12711 (comment)>:
> @@ -535,3 +545,11 @@ func verifyPruneAccess(ctx context.Context, client client.SubjectAccessReviews)
}
return nil
}
+
+func isMetricsBearerToken(ctx context.Context, token string) bool {
+ config := ConfigurationFrom(ctx)
+ if config.Metrics.Enabled {
+ return config.Metrics.Secret == token
+ }
If you are already doing auth via the master, we want metrics to use that
behavior. Static secret is only for things that aren't integrated with
master auth (which is a limited set, but registry is under)
We have already discussed this
<#12711 (comment)>.
Registry metrics are not part of the master API. The metrics belongs to the
registry server only. You propose to make some virtual object (
registrymetrics for example) and use SAR to check access to it ?
Actually, does the registry support cert auth (is it using the correct
filter for that)? If not, I'm ok with static secret.
The registry does not have a ready solution for it. Additionally, cert
authentication will work only if HTTPS enabled. It means that it will be
impossible to restrict access to metrics in an insecure registry.
@mfojtik <https://github.com/mfojtik> @smarterclayton
<https://github.com/smarterclayton> I propose to leave the possibility of
shared secret authentication and add cert authentication later (in 3.7)
because we don't have time to add it for 3.6. ok?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p2elg5j-UacRy9nX0fikdWMjKbcgks5rvSx6gaJpZM4LxbRp>
.
|
@smarterclayton Yes. |
@smarterclayton [merge] -ing this as we need this for perf testing. i think it will be easy to add more metrics as we go. |
Installer failed [merge] |
Ansible Task Failed [merge] |
flake #13271 [merge] |
Installer failed [merge] |
[merge] |
AWS EC2 problems [merge] |
[merge] |
[merge]
…On Mon, Apr 24, 2017 at 6:45 AM, OpenShift Bot ***@***.***> wrote:
continuous-integration/openshift-jenkins/merge FAILURE (
https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/456/)
(Base Commit: 53a4d1b
<53a4d1b>
)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p6hzjkoyoiBSLGJvboZ10TCVD289ks5rzH1DgaJpZM4LxbRp>
.
|
[test] in meanwhile... if we can get the test green, we can pretest-merge this... |
Evaluated for origin test up to 021cf0c |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/903/) (Base Commit: d41996f) |
Evaluated for origin merge up to 021cf0c |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/461/) (Base Commit: 576f2c5) (Image: devenv-rhel7_6174) |
version: 1.0 | ||
metrics: | ||
enabled: false | ||
secret: TopSecretToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incredibly unsafe and needs to be fixed :)
We should not check in default secrets - instead this should be empty, and the user MUST set both secret and enabled in order to turn it on. That should be done by oc adm registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be impossible for a user to accidentally turn this on and have the default secret in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, this is abad practice
@legionus FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More examples:
storage:
azure:
accountname: accountname
accountkey: base64encodedaccountkey
s3:
accesskey: awsaccesskey
secretkey: awssecretkey
swift:
username: username
password: password
oss:
accesskeyid: accesskeyid
accesskeysecret: accesskeysecret
health:
http:
headers:
Authorization: [Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==]
proxy:
username: username
password: password
@smarterclayton @mfojtik This config file is full of plaintext passwords even if we can fix this. We have already discussed the possibility of using secrets for this file and came to the conclusion that if this is done, then the entire configuration will have to be put there.
agree, this is abad practice
Don't tell me. In the beginning, I had not even considered this decision. You asked for a shared secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@legionus the problem is the fact that 'secret: TopSecretToken' is hardcoded and looks like nobody will change it by default, so we will end up with tons of clusters with this secret :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same problem :-) how about not having it enabled by default and update oc registry
to allow to set this up?
Example:
This is only one part of implemented metrics. Currently collected metrics about the process state, memstats, the garbage collector metrics.