Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping service: harden security restrictions on instance files #558

Merged
merged 4 commits into from
Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,17 @@ cross-compilation issue, but will return in v0.13.0.

# Main (unreleased)

BREAKING CHANGES: For security, the scraping service config API will reject
configs that read credentials from disk to prevent malicious users from reading
artbirary files and sending their contents over the network. The old behavior
can be achieved by enabling `dangerous_allow_reading_files` in the scraping
service config.

- [FEATURE] Added Automatic Logging feature for Tempo (@joe-elliott)

- [FEATURE] Disallow reading files from within scraping service configs by
default. (@rfratto)

- [BUGFIX] Ensure defaults are applied to undefined sections in config file.
This fixes a problem where integrations didn't work if `prometheus:` wasn't
configured. (@rfratto)
Expand All @@ -21,7 +30,7 @@ cross-compilation issue, but will return in v0.13.0.
- [BUGFIX] Validate that incoming scraped metrics do not have an empty label
set or a label set with duplicate labels, mirroring the behavior of
Prometheus. (@rfratto)

- [FEATURE] Tail-based sampling for tracing pipelines (@mapno)

# v0.13.1 (2021-04-09)
Expand Down
6 changes: 6 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,12 @@ defined in the Configuration Reference. The name field of the configuration is
ignored and the name in the URL takes precedence. The request body must be
formatted as YAML.

**WARNING**: By default, all instance configuration files that read
credentials from a file on disk will be rejected. This prevents malicious users
from reading the contents of arbitrary files as passwords and sending their
contents to fake remote_write endpoints. To change the behavior, set
`dangerous_allow_reading_files` to true in the `scraping_service` block.

Status code: 201 with a new config, 200 on updated config.
Response on success:

Expand Down
26 changes: 18 additions & 8 deletions docs/configuration-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,9 +243,19 @@ agents distribute discovery and scrape load between nodes.
# reshard_interval). A timeout of 0 indicates no timeout.
[reshard_timeout: <duration> | default = "30s"]

# Configuration for the KV store to store metrics
# Configuration for the KV store to store configurations.
kvstore: <kvstore_config>

# When set, allows configs pushed to the KV store to specify configuration
# fields that can read secrets from files.
#
# This is disabled by default. When enabled, a malicious user can craft an
# instance config that reads arbitrary files on the machine the Agent runs
# on and sends its contents to a specically crafted remote_write endpoint.
#
# If enabled, ensure that no untrusted users have access to the Agent API.
[dangerous_allow_reading_files: <boolean>]

# Configuration for how agents will cluster together.
lifecycler: <lifecycler_config>
```
Expand Down Expand Up @@ -2045,7 +2055,7 @@ remote_write:
[ sending_queue: <otlpexporter.sending_queue> ]
[ retry_on_failure: <otlpexporter.retry_on_failure> ]

# This processor writes a well formatted log line to a Loki instance for each span, root, or process
# This processor writes a well formatted log line to a Loki instance for each span, root, or process
# that passes through the Agent. This allows for automatically building a mechanism for trace
# discovery and building metrics from traces using Loki.
automatic_logging:
Expand All @@ -2063,7 +2073,7 @@ remote_write:
[ process_attributes: <string array> ]
# timeout on sending logs to Loki
[ timeout: <duration> | default = 1ms ]
overrides:
overrides:
[ loki_tag: <string> | default = "tempo" ]
[ service_key: <string> | default = "svc" ]
[ span_name_key: <string> | default = "span" ]
Expand Down Expand Up @@ -2122,7 +2132,7 @@ tail_sampling:
decision_wait: [ <string> | default="5s" ]
# load_balancing configures load balancing of spans across multiple agents.
# It ensures that all spans of a trace are sampled in the same instance.
# It's not necessary when only one agent is receiving traces (e.g. single instance deployments).
# It's not necessary when only one agent is receiving traces (e.g. single instance deployments).
load_balancing:
# resolver configures the resolution strategy for the involved backends
# It can be static, with a fixed list of hostnames, or DNS, with a hostname (and port) that will resolve to all IP addresses.
Expand All @@ -2141,22 +2151,22 @@ tail_sampling:
exporter:
# Controls whether compression is enabled.
[ compression: <string> | default = "gzip" | supported = "none", "gzip"]

# Controls whether or not TLS is required. See https://godoc.org/google.golang.org/grpc#WithInsecure
[ insecure: <boolean> | default = false ]

# Disable validation of the server certificate. Only used when insecure is set
# to false.
[ insecure_skip_verify: <bool> | default = false ]

# Sets the `Authorization` header on every trace push with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]

```

### integrations_config
Expand Down
27 changes: 26 additions & 1 deletion docs/migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,32 @@ releases and how to migrate to newer versions.

# v0.14.0

v0.14.0 introduces a breaking change to the SigV4 configuration and the deprecation of `push_config` in favor of `remote_write` for Tempo configs.
## Scraping Service security change

v0.14.0 changes the default behavior of the scraping service config management
API to reject all configuration files that read credentials from a file on disk.
This prevents malicious users from crafting an instance config file that read
arbitrary files on disk and send their contents to remote endpoints.

To revert to the old behavior, add `dangerous_allow_reading_files: true` in your
`scraping_service` config.

Example old config:

```yaml
prometheus:
scraping_service:
# ...
```

Example new config:

```yaml
prometheus:
scraping_service:
dangerous_allow_reading_files: true
# ...
```

## SigV4 config change

Expand Down
27 changes: 22 additions & 5 deletions pkg/prom/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ import (
type Cluster struct {
mut sync.RWMutex

log log.Logger

cfg Config
log log.Logger
cfg Config
baseValidation ValidationFunc

//
// Internally, Cluster glues together four separate pieces of logic.
Expand Down Expand Up @@ -55,7 +55,7 @@ func New(
l = log.With(l, "component", "cluster")

var (
c = &Cluster{log: l, cfg: cfg}
c = &Cluster{log: l, cfg: cfg, baseValidation: validate}
err error
)

Expand All @@ -74,7 +74,7 @@ func New(
if err != nil {
return nil, fmt.Errorf("failed to initialize configstore: %w", err)
}
c.storeAPI = configstore.NewAPI(l, c.store, validate)
c.storeAPI = configstore.NewAPI(l, c.store, c.storeValidate)
reg.MustRegister(c.storeAPI)

c.watcher, err = newConfigWatcher(l, cfg, c.store, im, c.node.Owns, validate)
Expand All @@ -87,6 +87,23 @@ func New(
return c, nil
}

func (c *Cluster) storeValidate(cfg *instance.Config) error {
c.mut.RLock()
defer c.mut.RUnlock()

if err := c.baseValidation(cfg); err != nil {
return err
}

if c.cfg.DangerousAllowReadingFiles {
return nil
}

// If configs aren't allowed to read from the store, we need to make sure no
// configs coming in from the API set files for passwords.
return validateNofiles(cfg)
}

// Reshard implements agentproto.ScrapingServiceServer, and syncs the state of
// configs with the configstore.
func (c *Cluster) Reshard(ctx context.Context, _ *agentproto.ReshardRequest) (*empty.Empty, error) {
Expand Down
2 changes: 2 additions & 0 deletions pkg/prom/cluster/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ type Config struct {
KVStore kv.Config `yaml:"kvstore"`
Lifecycler ring.LifecyclerConfig `yaml:"lifecycler"`

DangerousAllowReadingFiles bool `yaml:"dangerous_allow_reading_files"`

// TODO(rfratto): deprecate scraping_service_client in Agent and replace with this.
Client client.Config `yaml:"-"`
}
Expand Down
135 changes: 135 additions & 0 deletions pkg/prom/cluster/validation.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
package cluster

import (
"fmt"

"github.com/grafana/agent/pkg/prom/instance"
"github.com/prometheus/common/config"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/azure"
"github.com/prometheus/prometheus/discovery/consul"
"github.com/prometheus/prometheus/discovery/digitalocean"
"github.com/prometheus/prometheus/discovery/dns"
"github.com/prometheus/prometheus/discovery/dockerswarm"
"github.com/prometheus/prometheus/discovery/ec2"
"github.com/prometheus/prometheus/discovery/eureka"
"github.com/prometheus/prometheus/discovery/file"
"github.com/prometheus/prometheus/discovery/gce"
"github.com/prometheus/prometheus/discovery/hetzner"
"github.com/prometheus/prometheus/discovery/kubernetes"
"github.com/prometheus/prometheus/discovery/marathon"
"github.com/prometheus/prometheus/discovery/openstack"
"github.com/prometheus/prometheus/discovery/scaleway"
"github.com/prometheus/prometheus/discovery/triton"
"github.com/prometheus/prometheus/discovery/zookeeper"
)

func validateNofiles(c *instance.Config) error {
for i, rw := range c.RemoteWrite {
if err := validateHTTPNoFiles(&rw.HTTPClientConfig); err != nil {
return fmt.Errorf("failed to validate remote_write at index %d: %w", i, err)
}
}

for i, sc := range c.ScrapeConfigs {
if err := validateHTTPNoFiles(&sc.HTTPClientConfig); err != nil {
return fmt.Errorf("failed to validate scrape_config at index %d: %w", i, err)
}

for j, disc := range sc.ServiceDiscoveryConfigs {
if err := validateDiscoveryNoFiles(disc); err != nil {
return fmt.Errorf("failed to validate service discovery at index %d within scrape_config at index %d: %w", j, i, err)
}
}
}

return nil
}

func validateHTTPNoFiles(cfg *config.HTTPClientConfig) error {
checks := []struct {
name string
check func() bool
}{
{"bearer_token_file", func() bool { return cfg.BearerTokenFile != "" }},
{"password_file", func() bool { return cfg.BasicAuth != nil && cfg.BasicAuth.PasswordFile != "" }},
{"credentials_file", func() bool { return cfg.Authorization != nil && cfg.Authorization.CredentialsFile != "" }},
{"ca_file", func() bool { return cfg.TLSConfig.CAFile != "" }},
{"cert_file", func() bool { return cfg.TLSConfig.CertFile != "" }},
{"key_file", func() bool { return cfg.TLSConfig.KeyFile != "" }},
}
for _, check := range checks {
if check.check() {
return fmt.Errorf("%s must be empty unless dangerous_allow_reading_files is set", check.name)
}
}
return nil
}

func validateDiscoveryNoFiles(disc discovery.Config) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider using reflection, to avoid breakage when new discovery mechanisms are added

switch d := disc.(type) {
case discovery.StaticConfig:
// no-op
case *azure.SDConfig:
// no-op
case *consul.SDConfig:
if err := validateHTTPNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *digitalocean.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *dns.SDConfig:
// no-op
case *dockerswarm.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *ec2.SDConfig:
// no-op
case *eureka.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *file.SDConfig:
// no-op
case *gce.SDConfig:
// no-op
case *hetzner.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *kubernetes.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *marathon.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
if d.AuthTokenFile != "" {
return fmt.Errorf("auth_token_file must be empty unless dangerous_allow_reading_files is set")
}
case *openstack.SDConfig:
if err := validateHTTPNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *scaleway.SDConfig:
if err := validateHTTPNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *triton.SDConfig:
if err := validateHTTPNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *zookeeper.NerveSDConfig:
// no-op
case *zookeeper.ServersetSDConfig:
// no-op
default:
return fmt.Errorf("unknown service discovery %s; rejecting config for safety. set dangerous_allow_reading_files to ignore", d.Name())
}

return nil
}
Loading