Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete found in fold operation (counterexample Cp8) #180

Open
slfritchie opened this issue Jul 15, 2014 · 4 comments · May be fixed by #181
Open

Delete found in fold operation (counterexample Cp8) #180

slfritchie opened this issue Jul 15, 2014 · 4 comments · May be fixed by #181
Assignees
Labels
Milestone

Comments

@slfritchie
Copy link
Contributor

Here's counterexample Cp8:

Cp8 = [[{set,{var,1},{call,bitcask_pulse,bc_open,[true,{true,{62,59,175},26}]}},{set,{var,2},{call,bitcask_pulse,puts,[{var,1},{2,3},<<188,81,240,255,122,185,227,8,104,196,56,142,250,34,8>>]}},{set,{var,6},{call,bitcask_pulse,bc_close,[{var,1}]}},{set,{var,10},{call,bitcask_pulse,fork,[[{init,{state,undefined,false,false,[]}},{set,{not_var,1},{not_call,bitcask_pulse,bc_open,[false,{true,{282,224,221},100}]}},{set,{not_var,2},{not_call,bitcask_pulse,fold,[{not_var,1}]}},{set,{not_var,3},{not_call,bitcask_pulse,fold,[{not_var,1}]}}]]}},{set,{var,12},{call,bitcask_pulse,bc_open,[true,{true,{270,229,202},55}]}},{set,{var,18},{call,bitcask_pulse,delete,[{var,12},2]}},{set,{var,19},{call,bitcask_pulse,merge,[{var,12}]}},{set,{var,20},{call,bitcask_pulse,fork_merge,[{var,12}]}},{set,{var,23},{call,bitcask_pulse,puts,[{var,12},{3,43},<<0,0,0,0>>]}},{set,{var,25},{call,bitcask_pulse,fold,[{var,12}]}},{set,{var,26},{call,bitcask_pulse,bc_close,[{var,12}]}},{set,{var,29},{call,bitcask_pulse,bc_open,[true,{true,{246,190,50},100}]}},{set,{var,35},{call,bitcask_pulse,delete,[{var,29},5]}},{set,{var,36},{call,bitcask_pulse,fold,[{var,29}]}}],{43245,66780,7696},[{events,[]}]].
[begin Now = {1405,406317,535184}, io:format("Now ~w ", [Now]), true = eqc:check(bitcask_pulse:prop_pulse(), [lists:nth(1,Cp8), Now, lists:nth(3,Cp8)]) end || _ <- lists:seq(1,100)].

That seed of {1405,406317,535184} is nice & deterministic on my Mac. YMMV, substitute now() in its place and run until you find something that fails very consistently.

The last three ops of the main thread are:

  {set,{var,29},
       {call,bitcask_pulse,bc_open,[true,{true,{246,190,50},100}]}},
  {set,{var,35},{call,bitcask_pulse,delete,[{var,29},5]}},
  {set,{var,36},{call,bitcask_pulse,fold,[{var,29}]}}],

The failure is key 5 should not be found by step 36's fold, but it is.

Bad:
[{0,410681,[]},
 {410681,infinity,
  [{bad,<<"pid_1">>,
        {fold,[{2,[not_found]},
               {3,[<<0,0,0,0>>]},
               {4,[<<0,0,0,0>>]},
               {5,[not_found]},
[...]
               {43,[<<0,0,0,0>>]}],
              [{3,<<0,0,0,0>>},
               {4,<<0,0,0,0>>},
               {5,<<0,0,0,0>>},
[...]

More research required. There's a possible race with a forked merge and/or with merge at open time via bitcask:make_merge_file().

@slfritchie
Copy link
Contributor Author

This isn't a merge race, it's a model problem checking with folds that truly freeze the keydir.

  1. Fork a fold that runs for a while.
  2. Do a lot of puts so that the keydir is frozen at epoch 47.
  3. Do some more puts while folding a couple more times. The epoch counter creeps up to 50.
  4. Do a delete. The thing that we're deleting is inside file 5. The keydir_remove deletes the key in the keydir at epoch 50.
  5. A new fold starts (by the same pid that did the delete in previous step, FWIW), bitcask:open_fold_files() starts executing. The keydir epoch is now 51, but because we're really frozen, the keydir's pending_start_epoch 47.
  6. After all fold files are open, yes indeed, we're using a folding epoch of 47.
  7. When folding, and we're folding file 5, we query the keydir for our key, and because the delete at epoch 51 isn't visible, the keydir says that file 5's entry is current. Thus the key appears in the fold results by pid P when P had deleted that key just moments earlier.

@engelsanchez
Copy link
Contributor

Good to hear it's a test model issue. I'm actually surprised we haven't seen more of these now that the keydir can freeze and cap the visibility of folds at hard to predict times without the EQC model accounting for it. shrug

@slfritchie
Copy link
Contributor Author

I used to have such a check and would disregard results that happened during a known-frozen test case. sigh I haven't decided if I want to resurrect it or not. The model already knows that multiple folds are happening, which is a prerequisite for a frozen keydir.

@slfritchie slfritchie self-assigned this Jul 16, 2014
@slfritchie slfritchie added the Bug label Jul 16, 2014
@slfritchie slfritchie added this to the 2.0 milestone Jul 16, 2014
@slfritchie
Copy link
Contributor Author

Ping ... milestone and code review are still pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants