Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][fn] Go functions do not reset idleTimer correctly. #20449

Closed
2 tasks done
flowchartsman opened this issue May 31, 2023 · 2 comments · Fixed by #20450 · May be fixed by flowchartsman/pulsar#5
Closed
2 tasks done

[Bug][fn] Go functions do not reset idleTimer correctly. #20449

flowchartsman opened this issue May 31, 2023 · 2 comments · Fixed by #20450 · May be fixed by flowchartsman/pulsar#5
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@flowchartsman
Copy link
Contributor

flowchartsman commented May 31, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Version

The code in instance.go which handles incoming messages and idle timeout basically boils down to this:

	idleDuration := getIdleTimeout(time.Millisecond * time.Duration(gi.context.instanceConf.killAfterIdleMs))
	idleTimer := time.NewTimer(idleDuration)
	defer idleTimer.Stop()

CLOSE:
	for {
		idleTimer.Reset(idleDuration)
		select {
		case cm := <-channel:
			// handle message
		case <-idleTimer.C:
			close(channel) // this is not great either, but that's a separate bug ;)
			break CLOSE
		}
	}

However, this does not account for the idle timer firing during work. If it does, then the reset will not have the desired effect, and the idle clause will fire erroneously on a subsequent loop, since there's now a timestamp on the channel.

Minimal reproduce step

Here's a simple repro:

package main

import "fmt"
import "time"

func main() {
	const idleDuration = 250*time.Millisecond
	work := make(chan int,2)
	work <- 1
	work <- 2
	t := time.NewTimer(idleDuration)
	EVENTLOOP:
	for {
		t.Reset(idleDuration)
		fmt.Println("waiting for new work")
		select {
		case w := <- work:
			fmt.Printf("doing work on %d\n", w)
			time.Sleep(500*time.Millisecond)
			fmt.Println("done with work")
		case <- t.C:
			fmt.Println("timed out")
			break EVENTLOOP
		}
	}
}

Run that a few times, and you'll see:

waiting for new work
doing work on 1
done with work
waiting for new work
timed out

This is exactly what would happen in a Go function, only it would then call log.Fatal

What did you expect to see?

n/a

What did you see instead?

n/a

Anything else?

The correct fix is to call Stop() on the timer and, if it returns false, drain the channel before resetting:

CLOSE:
	for {
		select {
		case cm := <-channel:
			// handle message
		case <-idleTimer.C:
			close(channel)
			break CLOSE
		}
		if !idleTimer.Stop() {
			<-idleTimer.C
		}
	}

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@flowchartsman flowchartsman added the type/bug The PR fixed a bug or issue reported a bug label May 31, 2023
flowchartsman added a commit to flowchartsman/pulsar that referenced this issue May 31, 2023
@lhotari
Copy link
Member

lhotari commented Jun 1, 2023

@flowchartsman Thanks for reporting and contributing the fix for this issue.

I recently happened to review some code in the Pulsar Go client and it contains similar code for handling the send timeout on the producer. This is the code: https://github.com/apache/pulsar-client-go/blob/2eb340511dbf0602f7c561a1d86e179098f4662b/pulsar/producer_partition.go#L910-L924
Does this have a similar issue?

@flowchartsman
Copy link
Contributor Author

flowchartsman commented Jun 1, 2023

@flowchartsman Thanks for reporting and contributing the fix for this issue.

I recently happened to review some code in the Pulsar Go client and it contains similar code for handling the send timeout on the producer. This is the code: https://github.com/apache/pulsar-client-go/blob/2eb340511dbf0602f7c561a1d86e179098f4662b/pulsar/producer_partition.go#L910-L924 Does this have a similar issue?

From a quick glance, I'd say this looks fine. The loop is a range over the channel, which guarantees that it will be read before Reset is called, and each use of Reset is immediately followed by a control flow statement that begins the next loop iteration, so none of the timeout is expended before waiting on it again. The timer is local to the function and doesn't appear to be passed anywhere. Finally, on the off chance that the timer fires after the loop is exited, the deferred Stop should ensure that the timer is no longer active, and since C is buffered, it should be GCed, whether or not there's a pending message.

This avoids the main problem we had here where the timer wasn't guaranteed to be read in each loop, meaning it could fire during work and be left until the next iteration.

michaeljmarshall pushed a commit that referenced this issue Jun 1, 2023
Co-authored-by: Andy Walker <andy@andy.dev>

Fix #20449

<!-- or this PR is one task of an issue -->

Master Issue: #20449

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [X] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: flowchartsman#5
michaeljmarshall pushed a commit that referenced this issue Jun 1, 2023
Co-authored-by: Andy Walker <andy@andy.dev>

Fix #20449

<!-- or this PR is one task of an issue -->

Master Issue: #20449

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [X] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: flowchartsman#5

(cherry picked from commit e05b890)
michaeljmarshall pushed a commit that referenced this issue Jun 1, 2023
Co-authored-by: Andy Walker <andy@andy.dev>

Fix #20449

<!-- or this PR is one task of an issue -->

Master Issue: #20449

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [X] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: flowchartsman#5

(cherry picked from commit e05b890)
michaeljmarshall pushed a commit that referenced this issue Jun 1, 2023
Co-authored-by: Andy Walker <andy@andy.dev>

Fix #20449

<!-- or this PR is one task of an issue -->

Master Issue: #20449

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [X] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: flowchartsman#5

(cherry picked from commit e05b890)
michaeljmarshall pushed a commit to datastax/pulsar that referenced this issue Jun 1, 2023
Co-authored-by: Andy Walker <andy@andy.dev>

Fix apache#20449

<!-- or this PR is one task of an issue -->

Master Issue: apache#20449

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [X] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: flowchartsman#5

(cherry picked from commit e05b890)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
2 participants