Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API daemon crashing with invalid memory address or nil pointer dereference #582

Closed
lazyfrosch opened this issue Jun 8, 2017 · 10 comments
Closed
Labels

Comments

@lazyfrosch
Copy link

We previously had the issue #431

Please let me know if I can provide any details.

Detailed Description

Aptly crashed twice last week, without any real indication whats going on.

We are rebuilding a lot of snapshots over the night.

Jun  8 02:05:09 aptly aptly[16230]: Unable to reopen database, sleeping 10s
Jun  8 02:05:19 aptly aptly[16230]: Unable to reopen database, sleeping 10s
Jun  8 02:05:29 aptly aptly[16230]: Unable to reopen database, sleeping 10s
Jun  8 02:05:39 aptly aptly[16230]: Unable to reopen database, sleeping 10s
Jun  8 02:05:49 aptly aptly[16230]: Unable to reopen database, sleeping 10s
Jun  8 02:05:59 aptly aptly[16230]: Unable to reopen database, sleeping 10s
...
Jun  8 02:06:09 aptly aptly[16230]: panic: runtime error: invalid memory address or nil pointer dereference
Jun  8 02:06:09 aptly aptly[16230]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x168 pc=0x776545]
Jun  8 02:06:09 aptly aptly[16230]: goroutine 7 [running]:
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).isClosed(0x0, 0x7f70119934b0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb/db_state.go:230 +0x5
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).ok(0x0, 0x7f70119934b0, 0x0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb/db_state.go:235 +0x2b
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).NewIterator(0x0, 0x0, 0x0, 0x0, 0x0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb/db.go:876 +0x47
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/database.(*levelDB).FetchByPrefix(0xc4200acae0, 0xc420fde088, 0x1, 0x8, 0x0, 0x0, 0x0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/database/leveldb.go:172 +0xba
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/deb.NewRemoteRepoCollection(0xfdd6c0, 0xc4200acae0, 0xc4201595a8)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/deb/remote.go:657 +0xfa
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/deb.(*CollectionFactory).RemoteRepoCollection(0xc42000f9a0, 0x0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/deb/collections.go:51 +0xa1
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/api.flushColections()
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:34 +0x3f
Jun  8 02:06:09 aptly aptly[16230]: github.com/smira/aptly/api.acquireDatabase(0xc420011260, 0xc4200112c0)
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:94 +0x80
Jun  8 02:06:09 aptly aptly[16230]: created by github.com/smira/aptly/api.Router
Jun  8 02:06:09 aptly aptly[16230]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/router.go:26 +0x196
Jun  8 02:06:09 aptly systemd[1]: aptly-api.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun  8 02:06:09 aptly systemd[1]: Unit aptly-api.service entered failed state.

The only API accesses at that file were:

10.10.0.178 - apiuser [08/Jun/2017:02:04:29 +0200] "DELETE /api/files/app-openSUSE-42.3-allarch-snapshot HTTP/1.1" 502 5513 "-" "curl/7.37.0"
10.10.0.178 - apiuser [08/Jun/2017:02:06:09 +0200] "POST /api/files/app-openSUSE-42.3-allarch-snapshot HTTP/1.1" 503 5361 "-" "curl/7.37.0"

We are uploading RPM files here, aptly is only used to upload the files obviously.

Looks like the DELETE crashed the daemon. Theoretically the directory mentioned should be there on deletion, but empty.

We use another script to handle the RPM repositories.

Your Environment

  • OS: Debian 8.8
  • Aptly: 1.0.0+95+g0d04189 (nightly installed yesterday)

Aptly API is running behind an Apache proxy, securing access.

@smira
Copy link
Contributor

smira commented Jun 8, 2017

Looks like once again same bug - DB instance is nil, so it wasn't successfully re-opened.

@lazyfrosch
Copy link
Author

@smira is there anything I can provide or test?

@lazyfrosch
Copy link
Author

lazyfrosch commented Jun 9, 2017

Logs suggest he is just dropping the DB connection then at some point.

But this time the daemon didn't crash.

REDACTED - this was caused by a accidential downgrade - my bad 😢

@lazyfrosch
Copy link
Author

Last comment was my error, we downgraded aptly from nightly and got the old error

@smira
Copy link
Contributor

smira commented Jun 9, 2017

@lazyfrosch just to confirm - something else was keeping DB locked (some other aptly instance?), so that API never managed to acquire the lock?

It looks like there was a bug in 1.0.0, which I tried to fix in nightly, but my fix wasn't correct

@lazyfrosch
Copy link
Author

It should not, we are using both, CLI and API.

CLI mainly from Puppet (to check repos) and some maintenance.

But noone of them should lock the DB permanently. From my cron logs, they should not have run at the same time then.

Could the API crash like that when a CLI command is running at the same time?

@smira
Copy link
Contributor

smira commented Jun 9, 2017

It shouldn't be the case, I was looking at:

Jun  8 02:05:09 aptly aptly[16230]: Unable to reopen database, sleeping 10s

Which sounds like opening database was hitting the filesystem lock.

The logic which does API close/re-open is certainly incorrect today, but I'm trying to figure out how exactly API serve got into that state

@lazyfrosch
Copy link
Author

I added some more logging to our background tools, and will monitor it over the next few days.

@lazyfrosch
Copy link
Author

Now I tracked it down so far as that the crashes happen during DB cleanup.

We currently run the cleanup every hour, what would you recommend here? I noticed a lot of DB files accumulating over time. (We are building daily snapshots and lots of packages)

Jun 13 07:00:46 icinga-aptly aptly-cleanup-snapshots: Running DB cleanup...
Jun 13 07:04:37 icinga-aptly aptly[2463]: Unable to reopen database, sleeping 10s
Jun 13 07:04:47 icinga-aptly aptly[2463]: [GIN] 2017/06/13 - 07:04:47 |#033[97;42m 200 #033[0m| 20.007648804s | 127.0.0.1:59173 |#033[9
7;41m  #033[0m DELETE  /api/files/icinga2-SUSE-11.4-allarch-snapshot
Jun 13 07:04:47 icinga-aptly aptly[2463]: panic: runtime error: invalid memory address or nil pointer dereference
Jun 13 07:04:47 icinga-aptly aptly[2463]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x168 pc=0x776545]
Jun 13 07:04:47 icinga-aptly aptly[2463]: goroutine 7 [running]:
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).isClosed(0x0, 0x7efe5
86844b0)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/level
db/db_state.go:230 +0x5
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).ok(0x0, 0x1500c420392
d90, 0x100)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/level
db/db_state.go:235 +0x2b
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/leveldb.(*DB).NewIterator(0x0, 0x0,
 0x0, 0x0, 0x0)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/vendor/github.com/syndtr/goleveldb/level
db/db.go:876 +0x47
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/database.(*levelDB).FetchByPrefix(0xc4200aeb40, 0xc420444070, 0x1, 0x8
, 0x0, 0x0, 0x0)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/database/leveldb.go:172 +0xba
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/deb.NewRemoteRepoCollection(0xfdd6c0, 0xc4200aeb40, 0xc42015b5c8)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/deb/remote.go:657 +0xfa
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/deb.(*CollectionFactory).RemoteRepoCollection(0xc4201b0000, 0x0)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/deb/collections.go:51 +0xa1
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/api.flushColections()
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:34 +0x3f
Jun 13 07:04:47 icinga-aptly aptly[2463]: github.com/smira/aptly/api.acquireDatabase(0xc420011260, 0xc4200112c0)
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:94 +0x80
Jun 13 07:04:47 icinga-aptly aptly[2463]: created by github.com/smira/aptly/api.Router
Jun 13 07:04:47 icinga-aptly aptly[2463]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/router.go:26 +0x196
Jun 13 07:04:47 icinga-aptly systemd[1]: aptly-api.service: main process exited, code=exited, status=2/INVALIDARGUMENT

The second crash in that time is a bit different.

Jun 13 07:05:15 icinga-aptly aptly[13046]: panic: (*context.FatalError) (0xaf5b60,0xc420220000)
Jun 13 07:05:15 icinga-aptly aptly[13046]: goroutine 7 [running]:
Jun 13 07:05:15 icinga-aptly aptly[13046]: github.com/smira/aptly/context.Fatal(0xfd1340, 0xc4201c9210)
Jun 13 07:05:15 icinga-aptly aptly[13046]: /Users/smira/Documents/go/src/github.com/smira/aptly/context/context.go:64 +0xcc
Jun 13 07:05:15 icinga-aptly aptly[13046]: github.com/smira/aptly/context.(*AptlyContext).CollectionFactory(0xc4201ab600, 0x0)
Jun 13 07:05:15 icinga-aptly aptly[13046]: /Users/smira/Documents/go/src/github.com/smira/aptly/context/context.go:294 +0x19b
Jun 13 07:05:15 icinga-aptly aptly[13046]: github.com/smira/aptly/api.flushColections()
Jun 13 07:05:15 icinga-aptly aptly[13046]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:34 +0x31
Jun 13 07:05:15 icinga-aptly aptly[13046]: github.com/smira/aptly/api.acquireDatabase(0xc420011260, 0xc4200112c0)
Jun 13 07:05:15 icinga-aptly aptly[13046]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/api.go:94 +0x80
Jun 13 07:05:15 icinga-aptly aptly[13046]: created by github.com/smira/aptly/api.Router
Jun 13 07:05:15 icinga-aptly aptly[13046]: /Users/smira/Documents/go/src/github.com/smira/aptly/api/router.go:26 +0x196
Jun 13 07:05:15 icinga-aptly systemd[1]: aptly-api.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 13 07:05:37 icinga-aptly aptly-cleanup-snapshots: Cleanup finished...

@smira
Copy link
Contributor

smira commented Jun 13, 2017

Thanks, @lazyfrosch for detailed logs! I feel it's the same issue I was going to fix. I will try to prepare PR today or tomorrow, as I will be out one week till next Wed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants