Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent map iteration and map write in debug logging #1599

Closed
Tracked by #1514
mrd0ll4r opened this issue Jun 15, 2022 · 7 comments · Fixed by #1629
Closed
Tracked by #1514

concurrent map iteration and map write in debug logging #1599

mrd0ll4r opened this issue Jun 15, 2022 · 7 comments · Fixed by #1629
Assignees
Labels
exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors kind/bug A bug in existing code (including security flaws)

Comments

@mrd0ll4r
Copy link
Contributor

mrd0ll4r commented Jun 15, 2022

go-ipfs v0.13.0, go-libp2p-swarm@v0.10.2
stock go-ipfs v0.13.0

Trying go-ipfs v0.13.0 currently. Compiled from v0.13.0 sources with Go 1.18.3 on Linux x86_64. Running on Linux x86_64.
Since the repository was archived/moved, I'm opening the issue here.

(Possibly) relevant changes to the stock IPFS config:

...
  "Swarm": {
    "AddrFilters": [
    ...
    ],
    "ConnMgr": {
      "GracePeriod": "0s",
      "HighWater": 0,
      "LowWater": 0,
      "Type": "none"
    },
    "EnableHolePunching": false,
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {
      "Enabled": false
    },
    "ResourceMgr": {
      "Enabled": false
    },
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
...

Invocation:

LIBP2P_SWARM_FD_LIMIT=100000 IPFS_LOGGING="debug" ipfs-v0.13.0 daemon
fatal error: concurrent map iteration and map write

goroutine 335 [running]:
runtime.throw({0x24627dc?, 0xc000078800?})
        /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc0057f2e40 sp=0xc0057f2e10 pc=0xce1d71
runtime.mapiternext(0xc003c4d698?)
        /usr/local/go/src/runtime/map.go:871 +0x4eb fp=0xc0057f2eb0 sp=0xc0057f2e40 pc=0xcb964b
reflect.mapiternext(0xd6cbaf?)
        /usr/local/go/src/runtime/map.go:1378 +0x19 fp=0xc0057f2ec8 sp=0xc0057f2eb0 pc=0xd0eb19
reflect.(*MapIter).Next(0xc003c4d680?)
        /usr/local/go/src/reflect/value.go:1787 +0x7e fp=0xc0057f2ef8 sp=0xc0057f2ec8 pc=0xd6cdde
internal/fmtsort.Sort({0x201c8a0?, 0xc0000c7578?, 0xc0000c7574?})
        /usr/local/go/src/internal/fmtsort/sort.go:63 +0x159 fp=0xc0057f2fb8 sp=0xc0057f2ef8 pc=0xd7fdb9
fmt.(*pp).printValue(0xc0000da1a0, {0x201c8a0?, 0xc0000c7578?, 0x57eeea4b8eef9?}, 0x76, 0x2)
        /usr/local/go/src/fmt/print.go:769 +0x445 fp=0xc0057f31a0 sp=0xc0057f2fb8 pc=0xda36c5
fmt.(*pp).printValue(0xc0000da1a0, {0x224f040?, 0xc0000c7500?, 0x6f3bcf7217db0?}, 0x76, 0x1)
        /usr/local/go/src/fmt/print.go:806 +0x124e fp=0xc0057f3388 sp=0xc0057f31a0 pc=0xda44ce
fmt.(*pp).printValue(0xc0000da1a0, {0x21d9340?, 0xc0036f4d08?, 0x0?}, 0x76, 0x0)
        /usr/local/go/src/fmt/print.go:876 +0xce5 fp=0xc0057f3570 sp=0xc0057f3388 pc=0xda3f65
fmt.(*pp).badVerb(0xc0000da1a0, 0x36f4d08?)
        /usr/local/go/src/fmt/print.go:343 +0x457 fp=0xc0057f3618 sp=0xc0057f3570 pc=0xda0a17
fmt.(*pp).fmtPointer(0xc0000da1a0, {0x21d9340?, 0xc0036f4d08?, 0x2677e3206c253?}, 0x73)
        /usr/local/go/src/fmt/print.go:532 +0x418 fp=0xc0057f36b8 sp=0xc0057f3618 pc=0xda1cd8
fmt.(*pp).printValue(0xc0000da1a0, {0x21d9340?, 0xc0036f4d08?, 0xc005e4fd30?}, 0x73, 0x2)
        /usr/local/go/src/fmt/print.go:882 +0xef7 fp=0xc0057f38a0 sp=0xc0057f36b8 pc=0xda4177
fmt.(*pp).printValue(0xc0000da1a0, {0x222af00?, 0xc0036f4cf0?, 0x22bb543?}, 0x73, 0x1)
        /usr/local/go/src/fmt/print.go:806 +0x124e fp=0xc0057f3a88 sp=0xc0057f38a0 pc=0xda44ce
fmt.(*pp).printValue(0xc0000da1a0, {0x21fce00?, 0xc0036f4cf0?, 0xc000078800?}, 0x73, 0x0)
        /usr/local/go/src/fmt/print.go:876 +0xce5 fp=0xc0057f3c70 sp=0xc0057f3a88 pc=0xda3f65
fmt.(*pp).printArg(0xc0000da1a0, {0x21fce00?, 0xc0036f4cf0}, 0x73)
        /usr/local/go/src/fmt/print.go:712 +0x74c fp=0xc0057f3d10 sp=0xc0057f3c70 pc=0xda31ec
fmt.(*pp).doPrintf(0xc0000da1a0, {0x2463538, 0x26}, {0xc0057f3f78?, 0x1, 0x1})
        /usr/local/go/src/fmt/print.go:1026 +0x288 fp=0xc0057f3e08 sp=0xc0057f3d10 pc=0xda59e8
fmt.Sprintf({0x2463538, 0x26}, {0xc0057f3f78, 0x1, 0x1})
        /usr/local/go/src/fmt/print.go:219 +0x59 fp=0xc0057f3e60 sp=0xc0057f3e08 pc=0xd9fe59
go.uber.org/zap.getMessage({0x2463538?, 0xc0057f3ea8?}, {0xc0057f3f78?, 0x1f?, 0xc0065d3620?})
        /go/pkg/mod/go.uber.org/zap@v1.21.0/sugar.go:238 +0x74 fp=0xc0057f3e98 sp=0xc0057f3e60 pc=0x1040f74
go.uber.org/zap.(*SugaredLogger).log(0xc0000cc920, 0xff, {0x2463538?, 0x26?}, {0xc0057f3f78?, 0x1?, 0x1?}, {0x0, 0x0, 0x0})
        /go/pkg/mod/go.uber.org/zap@v1.21.0/sugar.go:225 +0x9d fp=0xc0057f3ed8 sp=0xc0057f3e98 pc=0x1040e3d
go.uber.org/zap.(*SugaredLogger).Debugf(...)
        /go/pkg/mod/go.uber.org/zap@v1.21.0/sugar.go:133
github.com/libp2p/go-libp2p-swarm.(*Swarm).AddListenAddr.func2()
        /go/pkg/mod/github.com/libp2p/go-libp2p-swarm@v0.10.2/swarm_listen.go:103 +0x176 fp=0xc0057f3fe0 sp=0xc0057f3ed8 pc=0x1737896
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0057f3fe8 sp=0xc0057f3fe0 pc=0xd15f61
created by github.com/libp2p/go-libp2p-swarm.(*Swarm).AddListenAddr
        /go/pkg/mod/github.com/libp2p/go-libp2p-swarm@v0.10.2/swarm_listen.go:79 +0x25a

Messages logged from that line are rather unreadable in general:

2022-06-15T15:10:40.050+0200    DEBUG   swarm2  go-libp2p-swarm@v0.10.2/swarm_listen.go:103     swarm listener accepted connection: &{%!s(*quic.connection=&{[120 197 234 237] [] <nil> 4 1 4278190109 0xc001ee6160 0xc00041cb00 0xc001ed3d10 0xc0005bc3f0 0xc004001810 0xc0005bc380 0xc001d84cc0 0xc00041cec0 0xc00043bee0 0xc001efc060 0xc000678e80 0xc001745c20 0xc00041cd80 0xc000592ab0  0xc0008a3cb0 0xc000daf278 0xc002e0b140 0xc0005e0a90 0xc001f8d1c0 0xc001fadbd0 0xc003042f00 0xc001745c80 0xc001fc5aa0 {0 {0 0}} 0xc001745ce0 0xc00041cdc0 0xd42300 0xc00041cd00 0xd42300 [] [] <nil> 0xc001fc5980 <nil> true true false false true 30000000000 {13880820226829160137 5710759658 0x37e59e0} {13880820227495721005 6303578732 0x37e59e0} {13880820227496538652 6304396252 0x37e59e0} {0 0 <nil>} 0xc004329810 0xc001efc720 false 15000000000 <nil> c2dffd23d1eaa9b105359f3e63b21eb00e 0xc001745b60 0xc001efd440}) %!s(*libp2pquic.reuseConn=&{0xc0011043f0 {0 0} 68 {0 0 <nil>}}) %!s(*libp2pquic.transport=&{0xc00000fe00 ^@^A^R ^\
<C1>]<D9><F9><DD><D3>^V<8E>j�^Wtz^Y<8B><DE><ED><B2>c<B5><A3>
<92>@<F5>tp<A5><B7> 0xc00025ca80 0xc000247940 0xc0000364d0 0xc000036580 0xc00100ff80 0x38181e0 {0 0} map[] {0 0} map[0xc000464300:0xc000725290 0xc000464c00:0xc000651ef0 0xc001114600:0xc001d82b40 0xc001114900:0xc000973680 0xc001114c00:0xc002d82ea0 0xc001114f00:0xc000725a70 0xc001115200:0xc001d82fc0 0xc001115500:0xc000724d80 0xc001115800:0xc002d82090 0xc001810600:0xc0006510e0 0xc001810900:0xc001d7b4d0 0xc001810f00:0xc002ad0630 0xc001811200:0xc002ad1e60 0xc00185e000:0xc0026a2f30 0xc00185e900:0xc0026a3c20 0xc00185ec00:0xc001d82d80 0xc00185ef00:0xc002733050 0xc00185f200:0xc0023fac60 0xc00185f500:0xc001d83dd0 0xc00185fb00:0xc00222f320 0xc00223a000:0xc0023fa360 0xc00223a600:0xc002d82630 0xc00223a900:0xc0023fae10 0xc00223ac00:0xc0023fbd40 0xc00223af00:0xc003841680 0xc00223b200:0xc00055f320 0xc0028d6600:0xc003672e10 0xc0028d6900:0xc00222f050 0xc0028d6c00:0xc00308afc0 0xc0028d7200:0xc00343a360 0xc0028d7500:0xc002683560 0xc0028d7800:0xc0033bf710 0xc003052300:0xc0033be7e0 0xc003052900:0xc00358ee10 0xc003053200:0xc0038b06c0 0xc003053b00:0xc003f04ab0 0xc003534000:0xc003362900 0xc003534f00:0xc003ae6000 0xc003535200:0xc003e722d0 0xc003535500:0xc003dd87e0 0xc003535800:0xc003eb0c60 0xc003640000:0xc00358f680 0xc003640900:0xc003916360 0xc003640c00:0xc003a11b90 0xc003640f00:0xc000137a70 0xc003641200:0xc0026a3830 0xc003641500:0xc003b5f560 0xc003641800:0xc0006506c0 0xc003641b00:0xc004090630 0xc003860000:0xc003a106c0 0xc003860c00:0xc003b5e7e0 0xc003860f00:0xc003841b90 0xc003861500:0xc003baefc0 0xc003861800:0xc003d49950 0xc003861b00:0xc003e73a70 0xc003eb6300:0xc0043d2bd0 0xc003eb6600:0xc004312ea0 0xc003eb6900:0xc0027323f0 0xc003f16000:0xc0041083f0 0xc003f16600:0xc004277a70 0xc003f16900:0xc0027327e0 0xc003f16f00:0xc000725440 0xc004150000:0xc0041ac3f0 0xc004150300:0xc00428c510 0xc004150600:0xc00428c630 0xc004150900:0xc00428cf30 0xc004150f00:0xc0001b8120 0xc004151200:0xc002d823f0 0xc004151500:0xc002d827e0 0xc004151b00:0xc00055fc20]}) %!s(*network.nullScope=&{}) ^@^A^R ^\<C1>]<D9><F9><DD><D3>^V<8E>j�^Wtz^Y<8B><DE><ED><B2>c<B5><A3>
<92>@<F5>tp<A5><B7> %!s(*crypto.Ed25519PrivateKey=&{[xxx]}) %!s(*multiaddr.multiaddr=&{[4 0 0 0 0 145 2 15 161 204 3]}) ^@^A^R 9<DE><EF>@<B6>0<A5>_ESC^W<CF>Ҿ<B8>a<B4><A3><83><AF><98><EB><EA><C0>/r<91>?<A0>(t<83>^? %!s(*crypto.Ed25519PublicKey=&{[xxx}) %!s(*multiaddr.multiaddr=&{[4 18 220 26 81 145 2 117 49 204 3]})}

The map it logs grows over time.
This issue is reproducible for me. Just leave it running for a few minutes and it'll fail eventually.

@mrd0ll4r mrd0ll4r changed the title concurrent map iteration and map write concurrent map iteration and map write in debug logging Jun 15, 2022
@marten-seemann
Copy link
Contributor

We should probably not try to log a connection as a string: https://github.com/libp2p/go-libp2p-swarm/blob/2c34422a83f4f8740aafa7f37e28076f733388dd/swarm_listen.go#L103

@marten-seemann marten-seemann mentioned this issue Jun 18, 2022
41 tasks
@marten-seemann
Copy link
Contributor

@aschmahmann @guseggert Have you seen this in production? Given that concurrent map operations cause (unrecoverable) panics, I'd expect this to happen quite frequently. Is there desire on your side to cut a v0.13.x IPFS patch release? If so, we'd provide a v0.20.x patch release of libp2p.

@marten-seemann marten-seemann added the kind/bug A bug in existing code (including security flaws) label Jun 20, 2022
@marten-seemann
Copy link
Contributor

marten-seemann commented Jun 21, 2022

2022-06-20 update: This code was introduced in January 2018. We should fix this (logging something like local multiaddr <-> remote multiaddr). This will be release with the v0.21.0 release (#1514).
Given that this code has been around for many years, this doesn't warrant a patch release.

@mrd0ll4r, do you want to send us a PR? Note that the swarm now lives in this repo, in p2p/net/swarm.

@mrd0ll4r
Copy link
Contributor Author

I'm sorry, I don't quite have the time to work this out at the moment :(

@MarcoPolo MarcoPolo added good first issue Good issue for new contributors exp/beginner Can be confidently tackled by newcomers labels Jun 21, 2022
@schomatis
Copy link

@marten-seemann Could you expand a bit more on the recommended fix, please?

We should fix this (logging something like local multiaddr <-> remote multiaddr).

@marten-seemann
Copy link
Contributor

@schomatis It's about this line:

log.Debugf("swarm listener accepted connection: %s", c)

It's logging the entire connection struct (which contains a map, and therefore causes the concurrent map access). Instead, we should just log local multiaddr <-> remote multiaddr. This would literally be a one line fix.

@BigLep
Copy link
Contributor

BigLep commented Jul 1, 2022

2022-07-01 conversation: @marten-seemann is going to do the quick fix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants