Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconciler: Fix flake in reconciler tests due to ID being unset #66

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Nov 7, 2024

The fix to JSON status marshalling had the unintended effect that now the reconciler test suite ran with the ID field being uninitialized, leading to the reconciler overwriting the status of a changed object. This resulted in small fraction of the reconciler test runs failing (e.g. object was marked pending, but instead of being re-reconciled the previous status was inserted).

This did not affect non-test code as status should always be constructed with StatusPending() etc.

Fix this by exporting the ID and filling it in the tests.

Fixes: 132edd6 ("reconciler: Fix Status JSON marshalling")

The fix to JSON status marshalling had the unintended effect that now
the reconciler test suite ran with the ID field being uninitialized,
leading to the reconciler overwriting the status of a changed object.
This resulted in small fraction of the reconciler test runs failing
(e.g. object was marked pending, but instead of being re-reconciled the
previous status was inserted).

This did not affect non-test code as status should always be constructed
with StatusPending() etc.

Fix this by exporting the ID and filling it in the tests.

Fixes: 132edd6 ("reconciler: Fix Status JSON marshalling")

Signed-off-by: Jussi Maki <jussi.maki@isovalent.com>
@joamaki joamaki requested a review from a team as a code owner November 7, 2024 15:29
@joamaki joamaki requested review from bimmlerd and removed request for a team November 7, 2024 15:29
@joamaki
Copy link
Contributor Author

joamaki commented Nov 7, 2024

Failure can be seen in https://github.com/cilium/statedb/actions/runs/11707677739/job/32607608514:

            > db show test-objects
            [stdout]
            ID   Faulty   StatusKind   StatusError
            1    false    Pending      
            2    false    Done         
            3    false    Done         
            > db cmp test-objects step1+3.table
        scripttest.go:230: FAIL: testdata/incremental.txtar:27: db cmp test-objects step1+3.table: table mismatch:
              ID  StatusKind StatusError
            - 1   Error      update fail
            + 1   Done
              2   Done       
              3   Done       

Easy to also hit by stressing go test ./reconciler.

So yeah, I did actually have a purpose for making the Status.UnmarshalJSON fill in the ID which 132edd6 reverted. ("There is no need for filling in the id field when unmarshalling Status as this is completely internal to the reconciler."... oops)

WDYT about solving it this way by just making the ID public (don't see any harm with that)? Or should we allow manipulating reconciler.Status via JSON in a way that doesn't expose this sort of issue?

Copy link

github-actions bot commented Nov 7, 2024

$ make test
go: downloading golang.org/x/sys v0.17.0
go: downloading golang.org/x/tools v0.17.0
go: downloading golang.org/x/exp v0.0.0-20240119083558-1b970713d09a
go: downloading github.com/fsnotify/fsnotify v1.7.0
go: downloading github.com/sagikazarmark/slog-shim v0.1.0
go: downloading github.com/spf13/afero v1.11.0
go: downloading github.com/spf13/cast v1.6.0
go: downloading github.com/subosito/gotenv v1.6.0
go: downloading github.com/hashicorp/hcl v1.0.0
go: downloading gopkg.in/ini.v1 v1.67.0
go: downloading github.com/magiconair/properties v1.8.7
go: downloading github.com/pelletier/go-toml/v2 v2.1.0
go: downloading golang.org/x/text v0.14.0
	github.com/cilium/statedb/reconciler/benchmark		coverage: 0.0% of statements
	github.com/cilium/statedb/reconciler/example		coverage: 0.0% of statements
ok  	github.com/cilium/statedb	26.143s	coverage: 82.9% of statements
ok  	github.com/cilium/statedb/index	0.004s	coverage: 28.7% of statements
ok  	github.com/cilium/statedb/internal	0.012s	coverage: 46.7% of statements
ok  	github.com/cilium/statedb/part	2.516s	coverage: 82.9% of statements
ok  	github.com/cilium/statedb/reconciler	0.270s	coverage: 89.7% of statements
-----
$ make bench
go test ./... -bench . -benchmem -test.run xxx
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkDB_WriteTxn_1-4                    	  449768	      2633 ns/op	    379782 objects/sec	    2816 B/op	      32 allocs/op
BenchmarkDB_WriteTxn_10-4                   	 1231192	       992.5 ns/op	   1007549 objects/sec	     743 B/op	      10 allocs/op
BenchmarkDB_WriteTxn_100-4                  	 1495872	       770.8 ns/op	   1297439 objects/sec	     600 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_1000-4                 	 1479782	       797.7 ns/op	   1253592 objects/sec	     550 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-4   	  505599	      2300 ns/op	    434758 objects/sec	    1503 B/op	      37 allocs/op
BenchmarkDB_Modify-4                        	    1314	    922783 ns/op	   1083679 objects/sec	  774488 B/op	    8463 allocs/op
BenchmarkDB_GetInsert-4                     	    1209	   1007198 ns/op	    992855 objects/sec	  760882 B/op	    8465 allocs/op
BenchmarkDB_RandomInsert-4                  	    2424	    502094 ns/op	   1991657 objects/sec	  402249 B/op	    7098 allocs/op
BenchmarkDB_RandomReplace-4                 	     319	   3733807 ns/op	    267823 objects/sec	 2357438 B/op	   48572 allocs/op
BenchmarkDB_SequentialInsert-4              	    1537	    800190 ns/op	   1249704 objects/sec	  552202 B/op	    7290 allocs/op
BenchmarkDB_Changes_Baseline-4              	    1272	    931814 ns/op	   1073177 objects/sec	  553122 B/op	   10252 allocs/op
BenchmarkDB_Changes-4                       	     708	   1703703 ns/op	    586957 objects/sec	  993536 B/op	   14493 allocs/op
BenchmarkDB_RandomLookup-4                  	   21708	     54156 ns/op	  18465244 objects/sec	     160 B/op	       1 allocs/op
BenchmarkDB_SequentialLookup-4              	   27510	     43806 ns/op	  22828116 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_Prefix_SecondaryIndex-4         	    7686	    145836 ns/op	   6857011 objects/sec	   93903 B/op	    1044 allocs/op
BenchmarkDB_FullIteration_All-4             	     895	   1318837 ns/op	  75824536 objects/sec	     480 B/op	      12 allocs/op
BenchmarkDB_FullIteration_Get-4             	     224	   5492446 ns/op	  18206859 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_PropagationDelay-4              	  501723	      2338 ns/op	        20.00 50th_µs	        24.00 90th_µs	       100.0 99th_µs	    1589 B/op	      24 allocs/op
PASS
ok  	github.com/cilium/statedb	27.810s
PASS
ok  	github.com/cilium/statedb/index	0.004s
PASS
ok  	github.com/cilium/statedb/internal	0.003s
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: AMD EPYC 7763 64-Core Processor                
Benchmark_Insert_RootOnlyWatch-4    	    8713	    133106 ns/op	   7512784 objects/sec	  104164 B/op	    2041 allocs/op
Benchmark_Insert-4                  	    5851	    182817 ns/op	   5469945 objects/sec	  219051 B/op	    3064 allocs/op
Benchmark_Modify-4                  	    8391	    142065 ns/op	   7039041 objects/sec	  212907 B/op	    1206 allocs/op
Benchmark_GetInsert-4               	    6796	    172197 ns/op	   5807323 objects/sec	  212914 B/op	    1205 allocs/op
Benchmark_Replace-4                 	26791344	        44.63 ns/op	  22404743 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Replace_RootOnlyWatch-4   	27086860	        44.29 ns/op	  22577927 objects/sec	       0 B/op	       0 allocs/op
Benchmark_txn_1-4                   	 3049065	       394.5 ns/op	   2534967 objects/sec	     448 B/op	       7 allocs/op
Benchmark_txn_10-4                  	 7450203	       158.7 ns/op	   6303120 objects/sec	     154 B/op	       2 allocs/op
Benchmark_txn_100-4                 	 8420151	       142.7 ns/op	   7006885 objects/sec	     224 B/op	       2 allocs/op
Benchmark_txn_1000-4                	 7398694	       162.1 ns/op	   6169933 objects/sec	     216 B/op	       2 allocs/op
Benchmark_txn_delete_1-4            	 3170356	       376.4 ns/op	   2657064 objects/sec	     856 B/op	       6 allocs/op
Benchmark_txn_delete_10-4           	 8504860	       141.3 ns/op	   7074719 objects/sec	     132 B/op	       1 allocs/op
Benchmark_txn_delete_100-4          	10389570	       114.6 ns/op	   8727338 objects/sec	      60 B/op	       1 allocs/op
Benchmark_txn_delete_1000-4         	11157631	       106.9 ns/op	   9354735 objects/sec	      26 B/op	       1 allocs/op
Benchmark_Get-4                     	   39403	     30961 ns/op	  32298313 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Iterate-4                 	  175426	      6934 ns/op	 144222874 objects/sec	      80 B/op	       3 allocs/op
Benchmark_Hashmap_Insert-4          	   15933	     75337 ns/op	  13273682 objects/sec	   86550 B/op	      64 allocs/op
Benchmark_Hashmap_Get_Uint64-4      	  153100	      7772 ns/op	 128661371 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Hashmap_Get_Bytes-4       	  146179	      8108 ns/op	 123336034 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Uint64Map_Random-4        	    1399	    851743 ns/op	   1174064 items/sec	 2701793 B/op	    9030 allocs/op
Benchmark_Uint64Map_Sequential-4    	    1476	    792795 ns/op	   1261361 items/sec	 2492407 B/op	    9749 allocs/op
PASS
ok  	github.com/cilium/statedb/part	29.104s
PASS
ok  	github.com/cilium/statedb/reconciler	0.005s
?   	github.com/cilium/statedb/reconciler/benchmark	[no test files]
?   	github.com/cilium/statedb/reconciler/example	[no test files]
go run ./reconciler/benchmark -quiet
1000000 objects reconciled in 2.91 seconds (batch size 1000)
Throughput 343569.76 objects per second
Allocated 6011325 objects, 424763kB bytes, 520384kB bytes still in use

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right. Ouch.

@joamaki joamaki merged commit cafbbe7 into main Nov 11, 2024
1 check passed
@joamaki joamaki deleted the pr/joamaki/fix-reconciler-test branch November 11, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants