Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in Postgres, Mongo Storage #199

Open
tooolbox opened this issue Nov 15, 2019 · 0 comments
Open

Panic in Postgres, Mongo Storage #199

tooolbox opened this issue Nov 15, 2019 · 0 comments

Comments

@tooolbox
Copy link
Collaborator

tooolbox commented Nov 15, 2019

The following test will produce a panic in the Postgres and Mongo storage methods:

func TestPersistEpsilon(t *testing.T) {

        db := NewDatabase(dsn) // placeholder for connecting to the applicable database
	defer db.Close()
	cache := job.NewMemoryJobCache(db)

	mockJob := job.GetMockRecurringJobWithSchedule(time.Now().Add(time.Second*1), "PT1H")
	mockJob.Epsilon = "PT1H"
	mockJob.Command = "asdf"
	mockJob.Retries = 2

	err := mockJob.Init(cache)
	if assert.NoError(t, err) {

		err := db.Save(mockJob)
		if assert.NoError(t, err) {

			retrieved, err := db.GetAll()
			if assert.NoError(t, err) {
				retrieved[0].Run(cache)
			}
		}
	}
}

The panic will look like:

goroutine 33 [running]:
github.com/ajvb/kala/utils/iso8601.(*Duration).ToDuration(0x0, 0x365d6390000dfcb0)
	/Users/me/git/kala/utils/iso8601/iso8601.go:129 +0x26
github.com/ajvb/kala/job.(*JobRunner).shouldRetry(0xc0000dff38, 0x6aaa84c04)
	/Users/me/git/kala/job/runner.go:180 +0x5d
github.com/ajvb/kala/job.(*JobRunner).Run(0xc0000dff38, 0x1598120, 0xc0000a2fc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/me/git/kala/job/runner.go:68 +0x425
github.com/ajvb/kala/job.(*Job).Run(0xc00024e000, 0x1598120, 0xc0000a2fc0)
	/Users/me/git/kala/job/job.go:434 +0x10e
created by github.com/ajvb/kala/job.(*Job).Init
	/Users/me/git/kala/job/job.go:195 +0x535
FAIL	github.com/ajvb/kala/job/storage/mysql	28.677s

The real-world scenario where this occurs is: creating a job with a longish Epsilon, running once and failing, then rebooting the service. Panic will occur on startup when the cache is re-loading the jobs from the db and starts firing them off.

The root cause is not calling InitDelayDuration() on jobs when retrieving them from the database. If this isn't done, any job with an Epsilon will cause this panic.

From reading the source, these storage drivers call InitDelayDuration() correctly:

  • boltdb
  • consul
  • redis

These do not:

  • mongo
  • postgres
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant