Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/poller/rangefeed=false failed #61176

Closed
cockroach-teamcity opened this issue Feb 26, 2021 · 8 comments · Fixed by #61662
Closed

roachtest: cdc/poller/rangefeed=false failed #61176

cockroach-teamcity opened this issue Feb 26, 2021 · 8 comments · Fixed by #61662
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).cdc/poller/rangefeed=false failed on master@6601d827b814d4e85a1081b03bf2562d8ac2a4ab:

		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2688,cdc.go:207,cdc.go:611,test_runner.go:767: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.cdcBasicTest
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:207
		  | main.registerCDC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:611
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2721381-1614322964-28-n4cpu16 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: dead
		1: 5139
		2: 4856
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /cdc/poller/rangefeed=false

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 26, 2021
@adityamaru
Copy link
Contributor

I see a panic:

I210226 08:11:07.749799 1 util/log/file_sync_buffer.go:238 â‹® [config]   file created at: 2021/02/26 08:11:07
I210226 08:11:07.749826 1 util/log/file_sync_buffer.go:238 ⋮ [config]   running on machine: ‹teamcity-2721381-1614322964-28-n4cpu16-0003›
I210226 08:11:07.749838 1 util/log/file_sync_buffer.go:238 â‹® [config]   binary: CockroachDB CCL v21.1.0-alpha.3-879-g6601d827b8-dirty (x86_64-unknown-linux-gnu, built 2021/02/26 06:46:17, go1.15.6)
I210226 08:11:07.749846 1 util/log/file_sync_buffer.go:238 ⋮ [config]   arguments: ‹[./cockroach start --insecure --store=path=/mnt/data1/cockroach --log-dir=/home/ubuntu/logs --cache=25% --max-sql-memory=25% --port=26257 --http-port=26258 --locality=cloud=gce,region=us-central1,zone=us-central1-b --join=104.154.162.0:26257 --advertise-host=10.128.0.169]›
I210226 08:11:07.749869 1 util/log/file_sync_buffer.go:238 ⋮ [config]   line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I210226 08:11:07.749432 1 util/log/flags.go:195  [-] 1  stderr capture started
panic: resolved timestamp regression, was 1614327531.833072282,1, recomputed as 1614327531.833072282,0 [recovered]
	panic: resolved timestamp regression, was 1614327531.833072282,1, recomputed as 1614327531.833072282,0

@adityamaru
Copy link
Contributor

@miretskiy assigning to you as CDC L2.

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/poller/rangefeed=false failed on master@6de4313ec216161c79fe725fcc31fc87ef1804ea:

		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2688,cdc.go:207,cdc.go:611,test_runner.go:767: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.cdcBasicTest
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:207
		  | main.registerCDC.func3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:611
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2738824-1614754482-29-n4cpu16 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: dead
		2: 4904
		1: 5049
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /cdc/poller/rangefeed=false

See this test on roachdash
powered by pkg/cmd/internal/issues

@miretskiy
Copy link
Contributor

@tbg
Copy link
Member

tbg commented Mar 8, 2021

  • panic: resolved timestamp regression, was 1614758686.122209475,1, recomputed as 1614758686.122209475,0

@miretskiy
Copy link
Contributor

miretskiy commented Mar 8, 2021

@tbg @andreimatei Can you confirm if this is KV? Also, need to confirm if this is a release blocker.

@tbg
Copy link
Member

tbg commented Mar 8, 2021

Confirmed on both

@tbg tbg assigned andreimatei and unassigned miretskiy Mar 8, 2021
@andreimatei
Copy link
Contributor

In panic: resolved timestamp regression, was 1614758686.122209475,1, recomputed as 1614758686.122209475,0, notice that the diff is in the logical parts.
I don't know what's what, but this FloorPrev in the resolved ts code seems suspicious

txnTS := txn.timestamp.FloorPrev()

andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 8, 2021
Before this patch, the resolved timestamp computation code didn't handle
closed timestamps with logical parts properly. I believe we didn't use
to have such closed timestamps under "the old" closed timestamp
mechanism, and we also don't have with the "new one" [*]. Still,
the resolved timestamp should support them.

The trouble was when the resolved timestamp was tracking an intent at,
say, 11.0, with the closed timestamp at 10.2.
resolvedTimestamp.recompute() was doing FloorPrev(11)==10, and compared
that with the closed ts. It was freaking out when the FloorPrev was
going below the closed timestamp.
This patch rewords this logic to handle this case.

[*] Before cockroachdb#61559, the new closed ts code did sometimes close logical
timestamps - when we were trying to close above the lease expiration, we
were doing Backwards(leaseExpiration) (and leases with a logical
expiration are possible, although that seems separately dubious to me).
This case change in cockroachdb#61559, which added physical rounding to the
respective lease expiration.

Fixes cockroachdb#61176 - although I think the bug has gone back to hidden after cockroachdb#61559

Release note: None
Release justification: Sort of bug fix for new functionality. But more
importantly, this affords extra flexibility to 21.1 in dealing with
potential changes in 21.2.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 9, 2021
Before this patch, the resolved timestamp computation code didn't handle
closed timestamps with logical parts properly. I believe we didn't use
to have such closed timestamps under "the old" closed timestamp
mechanism, and we also don't have with the "new one" [*]. Still,
the resolved timestamp should support them.

The trouble was when the resolved timestamp was tracking an intent at,
say, 11.0, with the closed timestamp at 10.2.
resolvedTimestamp.recompute() was doing FloorPrev(11)==10, and compared
that with the closed ts. It was freaking out when the FloorPrev was
going below the closed timestamp.
This patch rewords this logic to handle this case.

[*] Before cockroachdb#61559, the new closed ts code did sometimes close logical
timestamps - when we were trying to close above the lease expiration, we
were doing Backwards(leaseExpiration) (and leases with a logical
expiration are possible, although that seems separately dubious to me).
This case change in cockroachdb#61559, which added physical rounding to the
respective lease expiration.

Fixes cockroachdb#61176 - although I think the bug has gone back to hidden after cockroachdb#61559

Release note: None
Release justification: Sort of bug fix for new functionality. But more
importantly, this affords extra flexibility to 21.1 in dealing with
potential changes in 21.2.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 11, 2021
Before this patch, the resolved timestamp computation code didn't handle
closed timestamps with logical parts properly. I believe we didn't use
to have such closed timestamps under "the old" closed timestamp
mechanism, and we also don't have with the "new one" [*]. Still,
the resolved timestamp should support them.

The trouble was when the resolved timestamp was tracking an intent at,
say, 11.0, with the closed timestamp at 10.2.
resolvedTimestamp.recompute() was doing FloorPrev(11)==10, and compared
that with the closed ts. It was freaking out when the FloorPrev was
going below the closed timestamp.
This patch rewords this logic to handle this case; we now truncate the
resolved timestamp to a physical timestamp.

[*] Before cockroachdb#61559, the new closed ts code did sometimes close logical
timestamps - when we were trying to close above the lease expiration, we
were doing Backwards(leaseExpiration) (and leases with a logical
expiration are possible, although that seems separately dubious to me).
This case change in cockroachdb#61559, which added physical rounding to the
respective lease expiration.

Fixes cockroachdb#61176 - although I think the bug has gone back to hidden after cockroachdb#61559

Release note: None
Release justification: Sort of bug fix for new functionality. But more
importantly, this affords extra flexibility to 21.1 in dealing with
potential changes in 21.2.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 11, 2021
Before this patch, the resolved timestamp computation code didn't handle
closed timestamps with logical parts properly. I believe we didn't use
to have such closed timestamps under "the old" closed timestamp
mechanism, and we also don't have with the "new one" [*]. Still,
the resolved timestamp should support them.

The trouble was when the resolved timestamp was tracking an intent at,
say, 11.0, with the closed timestamp at 10.2.
resolvedTimestamp.recompute() was doing FloorPrev(11)==10, and compared
that with the closed ts. It was freaking out when the FloorPrev was
going below the closed timestamp.
This patch rewords this logic to handle this case; we now truncate the
resolved timestamp to a physical timestamp.

[*] Before cockroachdb#61559, the new closed ts code did sometimes close logical
timestamps - when we were trying to close above the lease expiration, we
were doing Backwards(leaseExpiration) (and leases with a logical
expiration are possible, although that seems separately dubious to me).
This case change in cockroachdb#61559, which added physical rounding to the
respective lease expiration.

Fixes cockroachdb#61176 - although I think the bug has gone back to hidden after cockroachdb#61559

Release note: None
Release justification: Sort of bug fix for new functionality. But more
importantly, this affords extra flexibility to 21.1 in dealing with
potential changes in 21.2.
craig bot pushed a commit that referenced this issue Mar 11, 2021
60946: changefeedccl: Envelope schema name reflects schema config r=[stevendanna] a=HonoreDB

Fixes a bug where the Avro -envelope schema name, as opposed to subject,
 did not honor the schema prefix and full table name options.

Release note (bug fix): Envelope schema in avro registry now honors schema_prefix and full_table_name

61662: rangefeed: handle closed timestamps with logical parts r=andreimatei a=andreimatei

Before this patch, the resolved timestamp computation code didn't handle
closed timestamps with logical parts properly. I believe we didn't use
to have such closed timestamps under "the old" closed timestamp
mechanism, and we also don't have with the "new one" [*]. Still,
the resolved timestamp should support them.

The trouble was when the resolved timestamp was tracking an intent at,
say, 11.0, with the closed timestamp at 10.2.
resolvedTimestamp.recompute() was doing FloorPrev(11)==10, and compared
that with the closed ts. It was freaking out when the FloorPrev was
going below the closed timestamp.
This patch rewords this logic to handle this case.

[*] Before #61559, the new closed ts code did sometimes close logical
timestamps - when we were trying to close above the lease expiration, we
were doing Backwards(leaseExpiration) (and leases with a logical
expiration are possible, although that seems separately dubious to me).
This case change in #61559, which added physical rounding to the
respective lease expiration.

Fixes #61176 - although I think the bug has gone back to hidden after #61559

Release note: None
Release justification: Sort of bug fix for new functionality. But more
importantly, this affords extra flexibility to 21.1 in dealing with
potential changes in 21.2.

61790: colexec: propagate disk full error as expected r=yuzefovich a=yuzefovich

Previously, we would always propagate the errors emitted by the spilling
queues and disk queues as "internal" which resulted in errors being
annotated. However, "disk full" errors are expected to occur, so this
commit cleans that up. Additionally, it plumbs the propagation into the
spilling queue's `Enqueue` method itself to remove some of the
duplicated code.

Fixes: #61769.

Release note: None

Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in c59b0d5 Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants