Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: kv0/encrypt=true/nodes=3 failed #31577

Closed
cockroach-teamcity opened this issue Oct 18, 2018 · 3 comments
Closed

roachtest: kv0/encrypt=true/nodes=3 failed #31577

cockroach-teamcity opened this issue Oct 18, 2018 · 3 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/310a04983cda8ab8d67cd401814341b9b7f8ce79

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=kv0/encrypt=true/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=972825&tab=buildLog

The test failed on master:
	test.go:606,cluster.go:1098,kv.go:47,cluster.go:1420,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-972825-kv0-encrypt-true-nodes-3:4 -- ./workload run kv --init --read-percent=0 --splits=1000 --histograms=logs/stats.json --concurrency=192 --duration=10m {pgurl:1-3} returned:
		stderr:
		
		stdout:
		: signal: killed
	test.go:606,cluster.go:1441,kv.go:50,kv.go:71: unexpected node event: 3: dead

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 18, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 18, 2018
@petermattis petermattis self-assigned this Oct 18, 2018
@petermattis
Copy link
Collaborator

clock synchronization error: this node is more than 500ms away from at least half of the known nodes (0 of 1 are within the offset)
goroutine 15 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4204dc700, 0xc4204dc720, 0x4024600, 0x10)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:997 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x4929c20, 0xc400000004, 0x40246e2, 0x10, 0xf9, 0xc420012120, 0x88)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:864 +0x8fe
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3107160, 0xc4200a1890, 0x4, 0x2, 0x0, 0x0, 0xc420ccfc58, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3107160, 0xc4200a1890, 0x1, 0x4, 0x0, 0x0, 0xc420ccfc58, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(0x3107160, 0xc4200a1890, 0xc420ccfc58, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:181 +0x6c
github.com/cockroachdb/cockroach/pkg/server.NewServer.func1()
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:249 +0xb0
github.com/cockroachdb/cockroach/pkg/rpc.(*Context).runHeartbeat(0xc420396dc0, 0xc42091a500, 0xc4204f5b80, 0x13, 0xc420a362a0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:790 +0x667
github.com/cockroachdb/cockroach/pkg/rpc.(*Context).GRPCDial.func1.1.1(0x3107160, 0xc42075d8c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:668 +0x74
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4202ec3d0, 0xc420584000, 0xc42075d890)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

@petermattis
Copy link
Collaborator

@mberhault Is there additional ntp (or chrony) setup we should be doing on AWS?

@mberhault
Copy link
Contributor

AWS recommends pointing chrony at their true time service: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

In short, it consists of:

$ sudo apt install chrony

set the following in /etc/chrony.conf:

server 169.254.169.123 prefer iburst
$ sudo /etc/init.d/chrony restart

petermattis added a commit to cockroachdb/roachprod that referenced this issue Oct 19, 2018
rail added a commit to rail/cockroach that referenced this issue Mar 16, 2021
Fixes cockroachdb#62063

In cockroachdb#31577 we switched to `chrony` for AWS, but not for CGE. By default
they GCE clusters based on Ubuntu 16.04 use `ntp`.

This patch installs `chrony` (and automatically removes `ntp`) on GCE
and configures `chrony` to use Google's time server.

Release note: None
craig bot pushed a commit that referenced this issue Mar 22, 2021
62108: roachprod: Install and configure chrony on GCE clusters r=rail a=rail

Fixes #62063

In #31577 we switched to `chrony` for AWS, but not for CGE. By default
they GCE clusters based on Ubuntu 16.04 use `ntp`.

This patch installs `chrony` (and automatically removes `ntp`) on GCE
and configures `chrony` to use Google's time server.

Release note: None

Co-authored-by: Rail Aliiev <rail@iqchoice.com>
rail added a commit to rail/cockroach that referenced this issue Mar 29, 2021
Fixes cockroachdb#62063

In cockroachdb#31577 we switched to `chrony` for AWS, but not for CGE. By default
they GCE clusters based on Ubuntu 16.04 use `ntp`.

This patch installs `chrony` (and automatically removes `ntp`) on GCE
and configures `chrony` to use Google's time server.

Release note: None
tbg pushed a commit to tbg/cockroach that referenced this issue May 14, 2021
Fixes cockroachdb#62063

In cockroachdb#31577 we switched to `chrony` for AWS, but not for CGE. By default
they GCE clusters based on Ubuntu 16.04 use `ntp`.

This patch installs `chrony` (and automatically removes `ntp`) on GCE
and configures `chrony` to use Google's time server.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants