Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atomic Compare-and-Swap PrevNoExist, the operation failed,but the key was stored #5832

Closed
yorkart opened this issue Jul 1, 2016 · 10 comments

Comments

@yorkart
Copy link

yorkart commented Jul 1, 2016

etcd: v2.3.6

there is a demo https://github.com/yorkart/etcd-demo
when loop invoke atomic set and delete
after some times get error :

  • set error (but value of the key has stored in it)
    set key error: 105: Key already exists (/demo/a) [1533028]
  • delete (indeed key has been deleted )
    delete key error: 100: Key not found (/demo/a) [1530634]
@xiang90
Copy link
Contributor

xiang90 commented Jul 1, 2016

Can you format your code? Also it would be helpful if you can provide the full code block.

@heyitsanthony
Copy link
Contributor

heyitsanthony commented Jul 1, 2016

@yorkart I tried to reproduce this but no luck. What is opts? Are you sure that Delete isn't failing so that Set sometimes operates on a key that wasn't deleted?

@yorkart
Copy link
Author

yorkart commented Jul 4, 2016

@xiang90 @heyitsanthony I have edited issue and submitted the code. Run a few times, can see the above two kinds of errors

@heyitsanthony
Copy link
Contributor

@yorkart I still can't reproduce this bug after looping a few hundred times with that code. What do you mean by "run a few times"? Do the errors show up immediately when the program starts or does it start giving errors in the middle of a run?

@yorkart
Copy link
Author

yorkart commented Jul 6, 2016

giving errors in the middle of a run

in addition, cluster log frequently prompted

failed to send out heartbeat on time (deadline exceeded for 646.014151ms)
server is likely overloaded

heartbeat config:

ETCD_HEARTBEAT_INTERVAL=1000
ETCD_ELECTION_TIMEOUT=5000

I guess the timeout has led to the operation of the result is not consistent, the cluster has been processed, but to return to the failure to the client

@heyitsanthony
Copy link
Contributor

@yorkart that wouldn't lead to loss of consistency. Are the Set/Delete requests returning time-out errors? etcd shouldn't return success unless the request is committed. Can you please provide the full server log?

@yorkart
Copy link
Author

yorkart commented Jul 6, 2016

@heyitsanthony no time-out error. As the demo code , sequential execute set , delete
get error only when set is Key already exists, delet is Key not found

I have push the server log
etcd-144
etcd-147
etcd-148

@heyitsanthony
Copy link
Contributor

@yorkart that is a very unhealthy cluster; it's doing a leader election several times a minute. I'll see if I can reproduce under similar conditions. Do you see the same behavior with 3.0?

heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Jul 7, 2016
Old behavior would retry set and delete even if there's an error. This
can lead to the client returning an error for deleting twice, instead
of returning an error for an interdeterminate state.

Fixes etcd-io#5832
@yorkart
Copy link
Author

yorkart commented Jul 7, 2016

v3 is ok . Using the same test logic, only timed out error v3 demo
rpc error: code = 13 desc = etcdserver: request timed out, possibly due to previous leader failure

at this time, cluster get log

raft.node: 7ab00e9f791aa00a lost leader 17047805852fad33 at term 12678
raft.node: 7ab00e9f791aa00a elected leader 85ed6313fba477e3 at term 12678

in addition, when the demo is running, cluster always print log

apply entries took too long [164.283544ms for 1 entries]
avoid queries with large range/delete range!

#5871 - I have the same problem when use v3 client

@heyitsanthony
Copy link
Contributor

@yorkart your etcd cluster has very high latencies which is why it's triggering leader elections. You'll need to increase the ETCD_ELECTION_TIMEOUT and ETCD_HEARTBEAT_INTERVAL to stop the frequent leader elections. The apply entry warning is hardcoded to 10ms; we'll probably change that to a less aggressive value soon.

heyitsanthony pushed a commit to heyitsanthony/etcd that referenced this issue Jul 7, 2016
Old behavior would retry set and delete even if there's an error. This
can lead to the client returning an error for deleting twice, instead
of returning an error for an interdeterminate state.

Fixes etcd-io#5832
gyuho pushed a commit that referenced this issue Jul 8, 2016
Old behavior would retry set and delete even if there's an error. This
can lead to the client returning an error for deleting twice, instead
of returning an error for an interdeterminate state.

Fixes #5832
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants