Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Error Due to Too Many Open Files on macOS #334

Closed
dbrisinda opened this issue Aug 26, 2020 · 8 comments
Closed

Fatal Error Due to Too Many Open Files on macOS #334

dbrisinda opened this issue Aug 26, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@dbrisinda
Copy link

Describe the bug
Ran the Everest node after doing a git pull into the gecko repo from master, compiled, and ran it. It was running successfully for a while, I successfully set up my validator and staking, then after some time I was getting the following errors:

ERROR[08-25|18:46:21] /nat/nat.go#116: Renewing port mapping from external port 9651 to internal port 9651 failed with goupnp: error performing SOAP HTTP request: Post "http://192.168.1.254:5431/uuid:941b0148-a0f4-4f24-a549-f694da024585/WANIPConnection:1": dial tcp 192.168.1.254:5431: socket: too many open files

I left the node running overnight to see if it would recover, then the next day I get the following fatal error:

ERROR[08-26|01:01:30] /nat/nat.go#116: Renewing port mapping from external port 9651 to internal port 9651 failed with goupnp: error performing SOAP HTTP request: Post "http://192.168.1.254:5431/uuid:941b0148-a0f4-4f24-a549-f694da024585/WANIPConnection:1": dial tcp 192.168.1.254:5431: socket: too many open files
WARN [08-26|01:01:59]

/vms/platformvm/common_blocks.go#277: unable to commit vm's DB: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000166.log: too many open files
FATAL[08-26|01:01:59]

/snow/networking/router/handler.go#254: forcing chain to shutdown due to: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000166.log: too many open files
INFO [08-26|01:01:59]

/snow/engine/snowman/transitive.go#133: shutting down consensus engine
INFO [08-26|01:01:59]

/snow/networking/router/handler.go#468: finished shutting down chain

The error "too many open files" was strange, so I rebooted my mac to clear out file descriptors just in case. After restarting the node, it almost immediately failed fatally. The script below is the log output of the node with the fatal error:

INFO [08-26|13:24:48] /nat/nat.go#94: Mapped Protocol TCP Internal 9651 External 9651.
INFO [08-26|13:24:48] /node/node.go#654: Gecko version is: avalanche/0.6.4
INFO [08-26|13:24:48] /node/node.go#308: Set node's ID to D1ZQrv4asyZL9JqvFCScXjhB63tJhxjcP
INFO [08-26|13:24:48] /node/node.go#391: Initializing API server
INFO [08-26|13:24:48] /node/node.go#495: initializing keystore
INFO [08-26|13:24:48] /node/node.go#506: initializing keystore API
INFO [08-26|13:24:48] /api/server.go#120: adding route /ext/keystore
INFO [08-26|13:24:48] /node/node.go#522: initializing metrics API
INFO [08-26|13:24:48] /api/server.go#120: adding route /ext/metrics
INFO [08-26|13:24:48] /node/node.go#487: initializing SharedMemory
INFO [08-26|13:24:48] /api/server.go#120: adding route /ext/vm/rWhpuQPF1kb72esV2momhMuTYGkEb1oL29pt2EBXWmSy4kxnT
INFO [08-26|13:24:48] /api/server.go#120: adding route /ext/vm/jvYyfQTxGMJLuGWa55kdP2p2zSUYsQ5Raupu4TW34ZAUBAbtq
INFO [08-26|13:24:52] /api/server.go#120: adding route /ext/vm/sqjdyTKUSrQs1YmKDTUbdUhdstSdtRTGRbUn8sqK8B6pkZkz1
INFO [08-26|13:24:52] /api/server.go#120: adding route /ext/vm/sqjchUjzDqDfBPGjfQq2tXW1UCwZTyvzAWHsNzF2cb1eVHt6w
INFO [08-26|13:24:52] /node/node.go#537: skipping admin API initialization because it has been disabled
INFO [08-26|13:24:52] /node/node.go#553: initializing info API
INFO [08-26|13:24:52] /api/server.go#120: adding route /ext/info
INFO [08-26|13:24:52] /node/node.go#568: initializing Health API
INFO [08-26|13:24:52] /api/server.go#120: adding route /ext/health
INFO [08-26|13:24:52] /node/node.go#606: skipping ipc API initializaion because it has been disabled
INFO [08-26|13:24:52] /node/node.go#619: initializing aliases
INFO [08-26|13:24:52] /node/node.go#354: initializing chains
INFO [08-26|13:24:52] /chains/manager.go#231: creating chain:
ID: 11111111111111111111111111111111LpoYY
VMID:rWhpuQPF1kb72esV2momhMuTYGkEb1oL29pt2EBXWmSy4kxnT
INFO [08-26|13:24:52]

/vms/platformvm/vm.go#403: initializing Subnets
INFO [08-26|13:24:52]

/vms/platformvm/vm.go#389: initializing blockchains
INFO [08-26|13:24:52]

/vms/platformvm/vm.go#347: Initializing last accepted block as qP24YE9Qj7cHnk5dv1mdJVcS65jA5b5Q1xrpiqBMT7D63BLK6
INFO [08-26|13:24:52]

/snow/engine/snowman/transitive.go#58: initializing consensus engine
INFO [08-26|13:24:52] /api/server.go#104: adding route /ext/bc/11111111111111111111111111111111LpoYY
INFO [08-26|13:24:52] /api/server.go#53: API server listening on "127.0.0.1:9650"
INFO [08-26|13:24:52] /node/node.go#378: Connected to required bootstrap nodes. Starting Platform Chain...
INFO [08-26|13:24:52]

/snow/engine/common/bootstrapper.go#167: Bootstrapping started syncing with 1 vertices in the accepted frontier
INFO [08-26|13:24:57]

/snow/engine/snowman/bootstrap/bootstrapper.go#244: bootstrapping finished fetching 557 blocks. executing state transitions...
INFO [08-26|13:25:16]

/snow/engine/snowman/bootstrap/bootstrapper.go#285: executed 557 blocks
INFO [08-26|13:25:16]

/snow/engine/snowman/transitive.go#114: bootstrapping finished with GdNZkbPLREPXeLdSUiZLUtAg2FsYt7bs8recTavwNqcE4Ncrj as the last accepted block
INFO [08-26|13:25:16] /chains/manager.go#231: creating chain:
ID: jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC
VMID:jvYyfQTxGMJLuGWa55kdP2p2zSUYsQ5Raupu4TW34ZAUBAbtq
INFO [08-26|13:25:16] /snow/engine/avalanche/transitive.go#63: Initializing consensus engine
INFO [08-26|13:25:16] /api/server.go#104: adding route /ext/bc/jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC/pubsub
INFO [08-26|13:25:16] /api/server.go#104: adding route /ext/bc/jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC

FATAL[08-26|13:25:16] /utils/logging/log.go#250: Panicing due to:
open /Users/dalmazio/.gecko/logs/node/chain/jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC/http/0.log: too many open files
From:
goroutine 1499 [running]:
github.com/ava-labs/gecko/utils/logging.Stacktrace.String(0x1034400, 0xc002b59601, 0xb51bbd8)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/stack.go:20 +0x71
fmt.(*pp).handleMethods(0xc0002ae1a0, 0x73, 0x6400000001)
/usr/local/Cellar/go/1.14.4/libexec/src/fmt/print.go:630 +0x302
fmt.(*pp).printArg(0xc0002ae1a0, 0x1996740, 0xc0003afcd0, 0x73)
/usr/local/Cellar/go/1.14.4/libexec/src/fmt/print.go:713 +0x1e4
fmt.(*pp).doPrintf(0xc0002ae1a0, 0x1aa62c1, 0x1c, 0xc002b59c38, 0x2, 0x2)
/usr/local/Cellar/go/1.14.4/libexec/src/fmt/print.go:1030 +0x15a
fmt.Sprintf(0x1aa62c1, 0x1c, 0xc002b59c38, 0x2, 0x2, 0xc001e66af0, 0x49)
/usr/local/Cellar/go/1.14.4/libexec/src/fmt/print.go:219 +0x66
github.com/ava-labs/gecko/utils/logging.(*Log).format(0xc000306d00, 0x1, 0x1aa62c1, 0x1c, 0xc002b59c38, 0x2, 0x2, 0x203000, 0xc0005a0000)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:168 +0x11a
github.com/ava-labs/gecko/utils/logging.(*Log).log(0xc000306d00, 0x1, 0x1aa62c1, 0x1c, 0xc002b59c38, 0x2, 0x2)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:139 +0x113
github.com/ava-labs/gecko/utils/logging.(*Log).Fatal(...)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:183
github.com/ava-labs/gecko/utils/logging.(*Log).StopOnPanic(0xc000306d00)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:250 +0xf4
panic(0x1974700, 0xc003480ff0)
/usr/local/Cellar/go/1.14.4/libexec/src/runtime/panic.go:969 +0x166
github.com/ava-labs/gecko/utils/logging.(*Log).run(0xc000306d00)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:58 +0x74a
github.com/ava-labs/gecko/utils/logging.(*Log).RecoverAndPanic(0xc000306d00, 0xc0003ea820)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:257 +0x4f
created by github.com/ava-labs/gecko/utils/logging.New
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:46 +0x18e

panic: open /Users/dalmazio/.gecko/logs/node/chain/jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC/http/0.log: too many open files [recovered]
panic: open /Users/dalmazio/.gecko/logs/node/chain/jnUjZSRt16TcRnZzmh5aMhavwVHz3zBrSN8GfFMTQkzUnoBxC/http/0.log: too many open files

goroutine 1499 [running]:
github.com/ava-labs/gecko/utils/logging.(*Log).StopOnPanic(0xc000306d00)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:252 +0x11d
panic(0x1974700, 0xc003480ff0)
/usr/local/Cellar/go/1.14.4/libexec/src/runtime/panic.go:969 +0x166
github.com/ava-labs/gecko/utils/logging.(*Log).run(0xc000306d00)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:58 +0x74a
github.com/ava-labs/gecko/utils/logging.(*Log).RecoverAndPanic(0xc000306d00, 0xc0003ea820)
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:257 +0x4f
created by github.com/ava-labs/gecko/utils/logging.New
/Users/dalmazio/go/src/github.com/ava-labs/gecko/utils/logging/log.go:46 +0x18e

To Reproduce
Simply pulled latest Everest, and compiled using ./scripts/build.sh then ran the node.

Expected behavior
Expected that the node would run uninterrupted.

Screenshots
none

Operating System
macOS 10.15.6
go v1.14.4 (via homebrew)

Additional context
none

By submitting this issue I agree to the Terms and Conditions of the Developer Accelerator Program.

@dbrisinda dbrisinda added the bug Something isn't working label Aug 26, 2020
@dbrisinda
Copy link
Author

I renamed the ~/.gecko/db/everest folder to everest-old, and rebuilt avalanche from scratch via ./scripts/build.sh. I get the following log output and fatal error:

INFO [08-26|14:09:56] /nat/nat.go#94: Mapped Protocol TCP Internal 9651 External 9651.
INFO [08-26|14:09:56] /node/node.go#654: Gecko version is: avalanche/0.6.4
INFO [08-26|14:09:56] /node/node.go#308: Set node's ID to D1ZQrv4asyZL9JqvFCScXjhB63tJhxjcP
INFO [08-26|14:09:56] /node/node.go#391: Initializing API server
INFO [08-26|14:09:56] /node/node.go#495: initializing keystore
INFO [08-26|14:09:56] /node/node.go#506: initializing keystore API
INFO [08-26|14:09:56] /api/server.go#120: adding route /ext/keystore
INFO [08-26|14:09:56] /node/node.go#522: initializing metrics API
INFO [08-26|14:09:56] /api/server.go#120: adding route /ext/metrics
INFO [08-26|14:09:56] /node/node.go#487: initializing SharedMemory
INFO [08-26|14:09:56] /api/server.go#120: adding route /ext/vm/rWhpuQPF1kb72esV2momhMuTYGkEb1oL29pt2EBXWmSy4kxnT
INFO [08-26|14:09:56] /api/server.go#120: adding route /ext/vm/jvYyfQTxGMJLuGWa55kdP2p2zSUYsQ5Raupu4TW34ZAUBAbtq
INFO [08-26|14:09:57] /api/server.go#120: adding route /ext/vm/sqjdyTKUSrQs1YmKDTUbdUhdstSdtRTGRbUn8sqK8B6pkZkz1
INFO [08-26|14:09:57] /api/server.go#120: adding route /ext/vm/sqjchUjzDqDfBPGjfQq2tXW1UCwZTyvzAWHsNzF2cb1eVHt6w
INFO [08-26|14:09:57] /node/node.go#537: skipping admin API initialization because it has been disabled
INFO [08-26|14:09:57] /node/node.go#553: initializing info API
INFO [08-26|14:09:57] /api/server.go#120: adding route /ext/info
INFO [08-26|14:09:57] /node/node.go#568: initializing Health API
INFO [08-26|14:09:57] /api/server.go#120: adding route /ext/health
INFO [08-26|14:09:57] /node/node.go#606: skipping ipc API initializaion because it has been disabled
INFO [08-26|14:09:57] /node/node.go#619: initializing aliases
INFO [08-26|14:09:57] /node/node.go#354: initializing chains
INFO [08-26|14:09:57] /chains/manager.go#231: creating chain:
ID: 11111111111111111111111111111111LpoYY
VMID:rWhpuQPF1kb72esV2momhMuTYGkEb1oL29pt2EBXWmSy4kxnT
INFO [08-26|14:09:57]

/vms/platformvm/vm.go#403: initializing Subnets
INFO [08-26|14:09:57]

/vms/platformvm/vm.go#389: initializing blockchains
INFO [08-26|14:09:57]

/vms/platformvm/vm.go#347: Initializing last accepted block as HHAZsK9nCVqMBr9ZT1u4tcJoV4KxUQxbtaGWxZj5Hi2UkrgUF
INFO [08-26|14:09:57]

/snow/engine/snowman/transitive.go#58: initializing consensus engine
INFO [08-26|14:09:57] /api/server.go#104: adding route /ext/bc/11111111111111111111111111111111LpoYY
INFO [08-26|14:09:57] /api/server.go#53: API server listening on "127.0.0.1:9650"
INFO [08-26|14:09:57] /node/node.go#378: Connected to required bootstrap nodes. Starting Platform Chain...
INFO [08-26|14:09:57]

/snow/engine/common/bootstrapper.go#167: Bootstrapping started syncing with 1 vertices in the accepted frontier
INFO [08-26|14:10:00]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 2500 blocks
INFO [08-26|14:10:01]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 5000 blocks
INFO [08-26|14:10:01]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 7500 blocks
INFO [08-26|14:10:03]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 10000 blocks
INFO [08-26|14:10:03]

/snow/engine/snowman/bootstrap/bootstrapper.go#244: bootstrapping finished fetching 10105 blocks. executing state transitions...
WARN [08-26|14:10:09]

/vms/platformvm/common_blocks.go#277: unable to commit vm's DB: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000004.log: too many open files
FATAL[08-26|14:10:09]

/snow/networking/router/handler.go#254: forcing chain to shutdown due to: failed to accept block in bootstrapping: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000004.log: too many open files
INFO [08-26|14:10:09]

/snow/engine/snowman/transitive.go#133: shutting down consensus engine
INFO [08-26|14:10:09]

/snow/networking/router/handler.go#468: finished shutting down chain
^CINFO [08-26|14:11:07] /node/node.go#726: shutting down the node
INFO [08-26|14:11:07] /node/node.go#732: node shut down successfully
INFO [08-26|14:11:07] /nat/nat.go#140: Unmapped all ports

@dbrisinda
Copy link
Author

Followup: downloaded and ran the latest release build instead of building directly. Exactly the same issue as above.

@dbrisinda
Copy link
Author

Followup: toasted my go/ava-labs installation, and downloaded the source, rebuilt, and run using go 1.14.7 and still getting a fatal error. Here is a portion of the log leading up to the fatal error:

INFO [08-26|18:07:46]

/snow/engine/common/bootstrapper.go#167: Bootstrapping started syncing with 1 vertices in the accepted frontier
INFO [08-26|18:07:48]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 2500 blocks
INFO [08-26|18:07:49]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 5000 blocks
INFO [08-26|18:07:50]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 7500 blocks
INFO [08-26|18:07:51]

/snow/engine/snowman/bootstrap/bootstrapper.go#211: fetched 10000 blocks
INFO [08-26|18:07:51]

/snow/engine/snowman/bootstrap/bootstrapper.go#244: bootstrapping finished fetching 10271 blocks. executing state transitions...
INFO [08-26|18:08:02]

/snow/engine/snowman/bootstrap/bootstrapper.go#279: executed 2500 blocks
INFO [08-26|18:08:25]

/snow/engine/snowman/bootstrap/bootstrapper.go#279: executed 5000 blocks
INFO [08-26|18:09:02]

/snow/engine/snowman/bootstrap/bootstrapper.go#279: executed 7500 blocks
WARN [08-26|18:09:40]

/vms/platformvm/common_blocks.go#277: unable to commit vm's DB: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000089.ldb: too many open files
FATAL[08-26|18:09:40]

/snow/networking/router/handler.go#254: forcing chain to shutdown due to: failed to accept block in bootstrapping: open /Users/dalmazio/.gecko/db/everest/v0.6.1/000089.ldb: too many open files
INFO [08-26|18:09:40]

/snow/engine/snowman/transitive.go#133: shutting down consensus engine
INFO [08-26|18:09:40]

/snow/networking/router/handler.go#468: finished shutting down chain
^CINFO [08-26|18:14:48] /node/node.go#726: shutting down the node
INFO [08-26|18:14:48] /node/node.go#732: node shut down successfully
INFO [08-26|18:14:48] /nat/nat.go#140: Unmapped all ports

@dbrisinda
Copy link
Author

I think I discovered the problem. If I check the maximum open files allowed per user process in macOS with the shell command: launchctl limit maxfiles I get:

maxfiles    256            unlimited      

Soft limit is 256, hard limit is not specified. This with the latest supplementary update of macOS 10.15.6.

Following the procedure to permanently increase the maximum open file descriptors outlined here and here.

After adding the example plist to the /Library/LaunchDameons folder and rebooting, I can now start the avalanche node without issues.

Is there some way for Ava Labs devs to code this into avalanche for macOS? The default limit of 256 maximum file descriptors is way too low, and I don't really understand what Apple is thinking.

@dbrisinda dbrisinda changed the title Fatal Error Running Everest Fatal Error Due to Too Many Open Files Aug 27, 2020
@dbrisinda dbrisinda changed the title Fatal Error Due to Too Many Open Files Fatal Error Due to Too Many Open Files on macOS Aug 27, 2020
@dbrisinda
Copy link
Author

Perhaps this will be helfpul. ConsenSys ethereum go implementation had similar issues and they included fixes for macOS. This introduces proper file handle limits for darwin based OSes (macOS/OSX), and a few other test fixes.

Consensys/quorum#892

@aaronbuchwald
Copy link
Collaborator

Thanks for bringing this up @dbrisinda ! I just created a PR to raise the file descriptor limit to at least a reasonable default value when the node starts up: https://github.com/ava-labs/gecko/pull/346/files

@dbrisinda
Copy link
Author

Awesome!

@tasinco
Copy link
Contributor

tasinco commented Oct 12, 2020

PR merged updated to most recent version of node.

@tasinco tasinco closed this as completed Oct 12, 2020
StephenButtolph added a commit that referenced this issue May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants