Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i/o deadline reached using blocking queries on consul 1.12.1 #13303

Closed
kevinschoonover opened this issue May 31, 2022 · 8 comments · Fixed by #13344
Closed

i/o deadline reached using blocking queries on consul 1.12.1 #13303

kevinschoonover opened this issue May 31, 2022 · 8 comments · Fixed by #13344
Labels
theme/api Relating to the HTTP API interface theme/consul-nomad Consul & Nomad shared usability theme/kv Issues related to the key value store type/bug Feature does not function as expected

Comments

@kevinschoonover
Copy link

kevinschoonover commented May 31, 2022

Overview of the Issue

Hey Folks! I recently updated my cluster to consul 1.12.1 and started notice a lot of i/o deadline reached errors from vault-agent / nomad / consul-template. It seems like consul template is performing the following request on a consul client add getting a rpc error making call: i/o deadline reached after ~7 seconds

> curl "http://localhost:8500/v1/kv/infrastructure/aws/vault_unseal_key_id?index=4594355&stale=&wait=60000ms"
rpc error making call: i/o deadline reached

when i downgrade to 1.12.0 this error goes away

# curl "http://localhost:8500/v1/kv/infrastructure/aws/vault_unseal_key_id?index=4594355&stale=&wait=60000ms"
[{"LockIndex":0,"Key":"infrastructure/aws/vault_unseal_key_id","Flags":0,"Value":"<redacted>","CreateIndex":4594355,"ModifyIndex":4594355}]

If you perform this same request on the server, you also don't get the i/o deadline reached error.

Not sure if this is something I did wrong in my configuration, but i didnt see any related timeout settings.

Reproduction Steps

  1. Create consul server with 1.12.1 and consul client with 1.12.1
  2. Create a key
  3. Attempt to query that key with curl "http://localhost:8500/v1/kv/<key>?index=<latest index>&stale=&wait=60000ms where localhost:8500 is the clients HTTP ADDR. The server does not time out the http request

Consul info for both Client and Server

Client info
root@prod-do-sfo3-s-2vcpu-4gb-amd-vault-server-c970:/etc/consul.d# consul info
agent:
        check_monitors = 0
        check_ttls = 1
        checks = 5
        services = 5
build:
        prerelease =
        revision = e57f76ae
        version = 1.12.1
        version_metadata =
consul:
        acl = enabled
        known_servers = 3
        server = false
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 83
        max_procs = 2
        os = linux
        version = go1.18.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 491
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 33109
        members = 14
        query_queue = 0
        query_time = 31
Server info
# consul info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 3
build:
	prerelease =
	revision = e57f76ae
	version = 1.12.1
	version_metadata =
consul:
	acl = enabled
	bootstrap = false
	known_datacenters = 1
	leader = true
	leader_addr = 10.124.0.2:8300
	server = true
raft:
	applied_index = 4596490
	commit_index = 4596490
	fsm_pending = 0
	last_contact = 0
	last_log_index = 4596490
	last_log_term = 784466
	last_snapshot_index = 4594676
	last_snapshot_term = 784466
	latest_configuration = [{Suffrage:Voter ID:360d6788-3b2f-2599-267b-6a24bfa0ccdb Address:10.124.0.7:8300} {Suffrage:Voter ID:a1647a59-66d7-e8a0-4737-ba5c42bbdbbb Address:10.124.0.2:8300} {Suffrage:Voter ID:e863927a-d1c2-0952-dd3f-6ca864ce5aae Address:10.124.0.5:8300}]
	latest_configuration_index = 0
	num_peers = 2
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 784466
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 494
	max_procs = 2
	os = linux
	version = go1.18.1
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 491
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 33105
	members = 14
	query_queue = 0
	query_time = 31
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 515
	members = 3
	query_queue = 0
	query_time = 3

Operating system and Environment details

debian 11

Additional Information

Please let me know if you need any additional information!

@AlexeyDemidov
Copy link

I had similar errors with nomad and consul-template after updating to consul 1.12.1 on Ubuntu 20.04 LTS.

May 26 12:36:47 app0 consul[3766612]: 2022-05-26T12:36:47.716+0100 [ERROR] agent.http: Request error: method=GET url=/v1/kv/app/admin_email?index=5914981&stale=&wait=60000ms from=127.0.0.1:37118 error="rpc error making call: i/o deadline reached"
May 26 12:36:47 app0 nomad[570]:     2022-05-26T12:36:47.757+0100 [WARN]  agent: (view) kv.block(app/admin_email): Unexpected response code: 500 (rpc error making call: i/o deadline reached) (retry attempt 1 after "250ms")

Running consul kv get app/admin_email on the same app0 host returns the app/admin_email value without an error but with a few seconds delay when requested for the first time. Downgrading consul client on the app hosts down to 1.12.0 fixed the errors, the delay is still present on the first request.

@LadyNamedLaura
Copy link

same issue here, I noticed that bumping rpc_hold_timeout to a value higher than our wait value seems to fix it, but that is definitely not what rpc_hold_timeout is documented to do.
This seems to be introduced by #11500

@Amier3 Amier3 added type/bug Feature does not function as expected theme/api Relating to the HTTP API interface labels May 31, 2022
@jkirschner-hashicorp jkirschner-hashicorp added theme/consul-nomad Consul & Nomad shared usability theme/kv Issues related to the key value store labels May 31, 2022
@Amier3
Copy link
Contributor

Amier3 commented Jun 1, 2022

Hey @kevinschoonover

Thanks for the report, @SimonPe seems to be right about the PR that introduced this bug. We're actively investigating this and will keep this issue updated with what we find.

In the meantime.. I'd suggest anyone that comes to this issue looking for a temporary fix to try the workaround posted above and " 👍 " react if it worked for you.

EDIT: The team is cautioning that raising the rpc_hold_timeout value higher may affect performance, so please be mindful of that when implementing the workaround

@AlexeyDemidov
Copy link

Adding following to consul.hcl didn't help in my case.

performance {                                                                                                                                        
  rpc_hold_timeout = "30s"                                                                                                                           
}  

@gudvardur
Copy link

This worked for us...

It must be higher than the wait time in the error:

> curl "http://localhost:8500/v1/kv/infrastructure/aws/vault_unseal_key_id?index=4594355&stale=&wait=60000ms"
rpc error making call: i/o deadline reached
performance {
  rpc_hold_timeout = "70s"
}

@Amier3
Copy link
Contributor

Amier3 commented Jun 2, 2022

This issue is fixed with #13344

@jkirschner-hashicorp
Copy link
Contributor

This bug affects KV read operations in Consul 1.12.1.

It will be fixed in 1.12.2 (refer to PR #13344)

@jkirschner-hashicorp
Copy link
Contributor

Consul 1.12.2 has been released with this bugfix: https://discuss.hashicorp.com/t/consul-1-12-2-released/40483

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface theme/consul-nomad Consul & Nomad shared usability theme/kv Issues related to the key value store type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants