Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIFFE ID is not in the expected format #15668

Closed
ferhatvurucu opened this issue Dec 5, 2022 · 11 comments
Closed

SPIFFE ID is not in the expected format #15668

ferhatvurucu opened this issue Dec 5, 2022 · 11 comments
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-nomad Consul & Nomad shared usability

Comments

@ferhatvurucu
Copy link

ferhatvurucu commented Dec 5, 2022

Hi,

I am upgrading my consul servers from 1.13.3 to 1.14.2 and facing an issue with the grpc_tls configuration. Port configuration has been changed as below however I am still able to see error logs about agent.cache and agent.server.cert-manager. We were already using grpc tls disabled and we added grpc_tls: -1 configuration with the new version.

I don't see any error when I disable connect feature in the configuration file.

Configuration

"ports": {
    "grpc": 8502,
    "grpc_tls": -1
  }

Journalctl logs

[WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="rpc error making call: SPIFFE ID is not in the expected format: spiffe://xxx.consul/agent/server/dc/dc1" index=0
[ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: rpc error making call: SPIFFE ID is not in the expected format: spiffe://xxx.consul/agent/server/dc/dc1"

Consul info for Server

build:
	prerelease =
	revision = 0ba7a401
	version = 1.14.2
	version_metadata =
consul:
	acl = disabled
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = x.x.x.x:8300
	server = true

. . .

serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 12
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1059145
	members = 10
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 14555
	members = 4
	query_queue = 0
	query_time = 1

Operating system and Environment details

Ubuntu 22.04

@jkirschner-hashicorp
Copy link
Contributor

Hi @ferhatvurucu,

I have a few follow-up questions that may help reveal what's happening here:

  • Can you share the server agent configuration with any sensitive details sanitized?
  • Are you using Consul for service mesh? Or just service discovery?
  • Are you using auto-encrypt or auto-config?

@ferhatvurucu
Copy link
Author

Hi @jkirschner-hashicorp,

Thanks for the quick reply. It's just for service discovery at the moment. There is no auto-encrypt or auto-config. You may find the server agent configuration below.

{
  "advertise_addr": "x.x.x.x",
  "bind_addr": "x.x.x.x",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "datacenter": "dc1",
  "node_name": "xxxx",
  "retry_join": [
    "provider=aws region=eu-west-1 tag_key=ServiceType tag_value=consul-server"
  ],
  "server": true,
  "encrypt": "xxxx",
  "autopilot": {
    "cleanup_dead_servers": true,
    "last_contact_threshold": "200ms",
    "max_trailing_logs": 250,
    "server_stabilization_time": "10s",
    "redundancy_zone_tag": "az",
    "disable_upgrade_migration": false,
    "upgrade_version_tag": ""
  },
  "ports": {
    "grpc": 8502,
    "grpc_tls": -1
  },
  "connect": {
    "enabled": true
  },
  "ui": true
}

@jkirschner-hashicorp
Copy link
Contributor

My understanding is that you'd only need the grpc ports for:

  1. Consul client agent / dataplane proxy communication with Envoy proxies
  2. Cluster peering communication between Consul server agents in different datacenters

If you're only using Consul for service discovery, (1) shouldn't apply to you. Do you have a multi-datacenter Consul deployment? If so, do you know if it's using WAN federation or cluster peering to connect the multiple datacenters?

It's also possible that the grpc port isn't needed at all. Was there a set of docs / tutorials you followed that suggested you might need that port? I'm wondering if there's a small docs improvement to be made here.

Was your ports config as of 1.13.3 set like the below?

  "ports": {
    "grpc": 8502
  },

@jkirschner-hashicorp
Copy link
Contributor

Leaving some breadcrumbs for the future based on some initial digging into the code:

When tracking down what generates the SPIFFE ID related error message, I found that it attempts to match SPIFFE IDs against these regexes:

spiffeIDServiceRegexp = regexp.MustCompile(
`^(?:/ap/([^/]+))?/ns/([^/]+)/dc/([^/]+)/svc/([^/]+)$`)
spiffeIDAgentRegexp = regexp.MustCompile(
`^(?:/ap/([^/]+))?/agent/client/dc/([^/]+)/id/([^/]+)$`)
spiffeIDServerRegexp = regexp.MustCompile(
`^/agent/server/dc/([^/]+)$`)
spiffeIDMeshGatewayRegexp = regexp.MustCompile(
`^(?:/ap/([^/]+))?/gateway/mesh/dc/([^/]+)$`)

Per your error message, the SPIFFE ID being matched against is: xxx.consul/agent/server/dc/dc1.

That SPIFFE ID is closest to the format of spiffeIDServerRegexp, but fails to match because it doesn't start exactly with /agent ... it instead has xxxx.consul in front.


I have no particular experience with this area of the codebase, so I'm not sure what would cause a SPIFFE ID of xxx.consul/agent/server/dc/dc1 to be generated (and whether that's expected behavior). The above is just what I found digging through the code where that error message seems to be generated.

@jkirschner-hashicorp
Copy link
Contributor

I've since seen indication that a SPIFFE ID in the form xxx.consul/agent/server/dc/dc1 is normal, so it's probable that my comments above are based on a misreading of the relevant code. I'll still leave the comments there in case they are relevant for future readers / investigation.

@ferhatvurucu
Copy link
Author

ferhatvurucu commented Dec 8, 2022

We are not actively using consul connect yet however it's a plan for the early future. Even if we disabled grpc tls, we still see the error message above. So with these settings, how can I enable consul connect and keep using tls disabled for now?

It seems this is related to hashicorp/nomad#15360

@jkirschner-hashicorp
Copy link
Contributor

Which Nomad version are you using? Per the Consul 1.14.x upgrade docs:

The changes to Consul service mesh in version 1.14 are incompatible with Nomad 1.4.2 and earlier. If you operate Consul service mesh using Nomad 1.4.2 or earlier, do not upgrade to Consul 1.14 until hashicorp/nomad#15266 is fixed.

@ferhatvurucu
Copy link
Author

We upgraded to Nomad 1.4.3 and Consul 1.14.2 respectively.

@jkirschner-hashicorp
Copy link
Contributor

jkirschner-hashicorp commented Dec 8, 2022

Were you on Nomad 1.4.3 at the time you reported this issue? Or just upgraded now?

It sounds like the former, but wanted to double-check.

@ferhatvurucu
Copy link
Author

We were already on Nomad 1.4.3.

@jkirschner-hashicorp jkirschner-hashicorp added the theme/certificates Related to creating, distributing, and rotating certificates in Consul label Dec 21, 2022
@jkirschner-hashicorp jkirschner-hashicorp added the theme/consul-nomad Consul & Nomad shared usability label Jan 27, 2023
@0xbentang
Copy link

I had the same error after upgrading, adding this in the agent config seems to fix it

peering {
  enabled = false
}

Prior to Consul 1.14, cluster peering or Consul connect were disabled by default. A breaking change was made in Consul 1.14 that:

Cluster Peering is enabled by default. Cluster peering and WAN federation can coexist, so there is no need to disable cluster peering to upgrade existing WAN federated datacenters. To disable cluster peering nonetheless, set peering.enabled to false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/certificates Related to creating, distributing, and rotating certificates in Consul theme/consul-nomad Consul & Nomad shared usability
Projects
None yet
Development

No branches or pull requests

3 participants