-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high latencies when running tpc-c 2100 on the same cluster as previous schema change testing #36856
Comments
Here is an extract that narrows this debug.zip down to just the window of the latency spike: |
There are a bunch of warnings from this assertion cockroach/pkg/storage/engine/mvcc.go Lines 2247 to 2250 in 0c83360
I feel like we've seen this before. The comment stats that it should never misfire. Do you remember what our thinking was there @nvanbenschoten? I'm hoping that this assertion misfires if the committed value is a deletion.
Then there's this:
The rest of the log looks remarkably quiet. You can tell the system is working (lots of goroutines, etc) but it doesn't seem like the core is creaking. |
I was able to reproduce this problem after creating and dropping two indexes and then quickly reducing the GC TTL using I grabbed @ajwerner and we took a look at the cluster. CPU was pegged above 80% on all nodes. The cluster seemed to be busy doing rocksdb compactions. I reduced the load on the cluster to an active warehouse load of 1000 warehouses and the problem immediately went away See the dip at the end of these graphs with the latency going down dramatically: |
@nvanbenschoten to what extent do you think #36748 informs this issue? |
I'm closing this as stale. |
I ran a series of schema change tests on this cluster including stopping and starting workload. As I went to conduct another schema change test, I noticed that latencies were climbing (before I made a schema change).
I wonder if this was caused by the cluster entering into a bad state after this work or by something else.
At the time of the test I had ran
SET CLUSTER SETTING kv.bulk_io_write.addsstable_max_rate=.1;
and was running ` roachprod run $CLUSTER:4 "./workload run tpcc --ramp=5m --warehouses=2500 --active-warehouses=2100 --duration=2h --split --scatter {pgurl:1-3}"Cockroach Version
v19.1.0-rc.2-40-g0c83360
The text was updated successfully, but these errors were encountered: