Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processor: fix error chan buffer full will block the whole server #1309

Merged
merged 5 commits into from
Jan 21, 2021

Conversation

amyangfei
Copy link
Contributor

@amyangfei amyangfei commented Jan 15, 2021

What problem does this PR solve?

Fix the channel block issue found in #1305

What is changed and how it works?

As for channel block issue, there are two solutions: increase the channel buffer size or change all send to error operation to the non-blocking way. This PR chooses the latter one for the following reasons

  • If we create a channel with buffer size N, we can't ensure that the errors that will be sent to the channel are no more than N. The block could still happen.
  • The error channel is used to receive the first happened error and stop the processor, as for the addition errors, add a log is enough, we don't lose any information.
  • The latter way must ensure all error senders send error in a non-blocking way, if not, it is a bug definitely. In fact current code has a mixture of the blocking and non-blocking send, we can unify the style.

Check List

Tests

  • Unit test
  • Integration test

Release note

  • Fix a bug that TiCDC server could hang when processing a new changefeed with invalid sort-engine parameter

@amyangfei amyangfei added the type/bugfix This PR fixes a bug. label Jan 15, 2021
@amyangfei amyangfei added this to the v4.0.11 milestone Jan 15, 2021
@amyangfei
Copy link
Contributor Author

/run-all-tests

@amyangfei amyangfei added status/ptal Could you please take a look? needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. labels Jan 17, 2021
@zier-one
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 19, 2021
@liuzix
Copy link
Contributor

liuzix commented Jan 19, 2021

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Jan 19, 2021
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Jan 19, 2021
@zier-one
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jan 19, 2021
@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1247

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@amyangfei merge failed.

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1247

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@amyangfei merge failed.

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1247

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@amyangfei merge failed.

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1247

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@amyangfei merge failed.

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 1247

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@amyangfei merge failed.

@amyangfei
Copy link
Contributor Author

amyangfei commented Jan 21, 2021

Fixed a panic introduced in #1247

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x4ab846d]

goroutine 324 [running]:
github.com/pingcap/ticdc/cdc.(*processor).addTable.func3()
	/home/apple/work/code/pingcap/ticdc/cdc/processor.go:886 +0x8d
github.com/pingcap/ticdc/cdc.(*processor).stop(0xc00023ac00, 0x5b02d20, 0xc0004d5cc0, 0x24, 0xc00004fc28)
	/home/apple/work/code/pingcap/ticdc/cdc/processor.go:1112 +0x573
github.com/pingcap/ticdc/cdc.(*Capture).handleTaskEvent(0xc0007cf900, 0x5b02d20, 0xc0004d5cc0, 0xc006b34600, 0x5a9b320, 0xc000549b60)
	/home/apple/work/code/pingcap/ticdc/cdc/capture.go:238 +0x485
github.com/pingcap/ticdc/cdc.(*Capture).Run(0xc0007cf900, 0x5b02d20, 0xc0004d5cc0, 0x0, 0x0)
	/home/apple/work/code/pingcap/ticdc/cdc/capture.go:175 +0x815
github.com/pingcap/ticdc/cdc.(*Server).run.func4(0xc0008a2f50, 0x38564f4f55575570)
	/home/apple/work/code/pingcap/ticdc/cdc/server.go:397 +0xab
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc0006d65a0, 0xc000184420)
	/home/apple/.gvm/pkgsets/go1.15/global/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57 +0x95
created by golang.org/x/sync/errgroup.(*Group).Go
	/home/apple/.gvm/pkgsets/go1.15/global/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:54 +0x74

@amyangfei
Copy link
Contributor Author

/run-all-tests

1 similar comment
@amyangfei
Copy link
Contributor Author

/run-all-tests

@amyangfei
Copy link
Contributor Author

amyangfei commented Jan 21, 2021

/run-integration-tests

2 similar comments
@amyangfei
Copy link
Contributor Author

/run-integration-tests

@amyangfei
Copy link
Contributor Author

/run-integration-tests

@amyangfei
Copy link
Contributor Author

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit 405c23a into pingcap:master Jan 21, 2021
ti-srebot pushed a commit to ti-srebot/ticdc that referenced this pull request Jan 21, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #1326

@amyangfei amyangfei deleted the refine-err-chan branch January 25, 2021 06:30
@amyangfei amyangfei added the priority/P0 The issue has P0 priority. label Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. priority/P0 The issue has P0 priority. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. status/ptal Could you please take a look? type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants