Limit the size of provider sets #316

Stebalien · 2019-04-10T03:57:27Z

Currently, provider sets can grow quite large. Unfortunately, if a peer ID happens to lie close to some very popular content, that node will end up storing a ton of provider records for that content. This usually isn't much of an issue in terms of storage space but is definitely an issue in terms of performance.

See ipfs/kubo#5613 (comment).

cc @markg85

Stebalien · 2019-04-10T03:59:51Z

@Kubuxu do you have time to land a fix?

vyzo · 2019-04-10T09:51:10Z

If the storage space isn't an issue, perhaps we need some smarter provider set management so that it is not a performance issue any more.

Kubuxu · 2019-04-10T09:52:29Z

@Stebalien can you share the performance trace?

vyzo · 2019-04-10T09:53:08Z

We probably want to optimize the garbage collector (and possibly the data structure), since that's what seems to be eating the cpu in the linked issue.

raulk · 2019-04-10T10:04:20Z

This is a fundamental flaw in our design and we can’t solve it nicely without introducing some form of scoring.

raulk · 2019-04-10T10:08:49Z

Involving @jhiesey and @anacrolix

raulk · 2019-04-10T10:09:49Z

Alternatively adopting sloppy hashing would alleviate the burden on strictly near peer ID.

vyzo · 2019-04-10T10:11:43Z

So one thing we could do to immediately reduce the stress, is to bin the provider record by (next) hour of expiry.
So instead of having to iterate through N provider records, we would iterate through a fixed number of (24) bins.

edit: referring to gc efficiency, which seems to be the limiting factor as it is a linear check of all provider records.

raulk · 2019-04-10T10:12:49Z

Nodes should throttle provider records, and providers should expand their radius upon getting throttled. That way, content is sloppily radiated across the network the more popular it gets, and it’s also findable more rapidly as you don’t need a perfect match any longer.

raulk · 2019-04-10T10:14:59Z

This approach is similar to how topics will be QoSsed in discv5 in Ethereum. See this proposal: libp2p/specs#145

vyzo · 2019-04-10T10:17:50Z

Actually with binning we wouldn't even have to iterate on expiry, we could simply throw away the next bin on every tick (which is hourly).

raulk · 2019-04-10T10:24:30Z

@vyzo for GC, have a look at the 2-cycle algorithm we implemented in the datastore-backed peerstore. It has lookahead and prune cycles, and it operates similarly to what you propose.

How do you deal with record renewals with binning, where the record hasn’t yet been pruned yet and still lives in a bin? Looking it up becomes O(n) again?

raulk · 2019-04-10T10:25:54Z

Would bins contain references or the physical record? I guess the former, in which case the above could be a problem. Can’t be the latter because access would be O(n).

vyzo · 2019-04-10T10:31:22Z

I think it is fine if we have a provider appear in multiple bins, but we'll have to deduplicate on the response.

vyzo · 2019-04-10T10:35:12Z

nevermind, I think the deduplication alone would turn the response into O(n) which is unacceptable.

vyzo · 2019-04-10T10:49:15Z

So if the provider record size is the issue, I am not sure that dropping new providers is the right solution.
We should probabilistically replace existing providers I think, so that we get to refresh the provider set.

raulk · 2019-04-10T11:02:35Z

The provider mechanism in our DHT is inspired by Coral, but it’s a half-baked, naïve implementation. Let’s implement it all the way with sloppy hashing (backwards compatible) and record rejection and backtracking. Although backtracking is not Byzantine resistant IIRC, we can make it so (keep going forward a few steps, disjoint paths, etc.)

raulk · 2019-04-10T15:06:48Z

Opened a discussion in libp2p/specs to get this implemented: libp2p/specs#163

Stebalien · 2019-04-10T16:21:56Z

When I say "space isn't an issue" I mean that that's not what's crashing the node. It's still an issue and there's no way we can meaningfully use more than say 100 provider records.

I can't share the dumps but I can share the results.

CPU Profile:

Showing nodes accounting for 23.31s, 86.27% of 27.02s total
Dropped 542 nodes (cum <= 0.14s)
Showing top 20 nodes out of 133
      flat  flat%   sum%        cum   cum%
    13.82s 51.15% 51.15%     14.76s 54.63%  runtime.mapiternext
     2.04s  7.55% 58.70%      2.04s  7.55%  math/big.addMulVVW
     1.09s  4.03% 62.73%      1.09s  4.03%  runtime.duffcopy
     1.01s  3.74% 66.47%      1.04s  3.85%  runtime.findObject
     0.56s  2.07% 68.54%      0.56s  2.07%  runtime._div64by32
     0.53s  1.96% 70.50%      1.73s  6.40%  runtime.scanobject
     0.50s  1.85% 72.35%      1.06s  3.92%  runtime.dodiv
     0.47s  1.74% 74.09%      3.64s 13.47%  time.Time.Sub
     0.42s  1.55% 75.65%      2.57s  9.51%  time.Time.Add
     0.41s  1.52% 77.17%      0.43s  1.59%  syscall.Syscall
     0.34s  1.26% 78.42%      0.87s  3.22%  runtime.int64div
     0.34s  1.26% 79.68%      0.87s  3.22%  runtime.int64mod
     0.31s  1.15% 80.83%      2.28s  8.44%  math/big.nat.montgomery
     0.31s  1.15% 81.98%      0.31s  1.15%  runtime.markBits.isMarked (inline)
     0.22s  0.81% 82.79%      0.22s  0.81%  runtime.add (inline)
     0.21s  0.78% 83.57%     19.15s 70.87%  gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).run
     0.19s   0.7% 84.27%      0.21s  0.78%  crypto/elliptic.p256ReduceDegree
     0.19s   0.7% 84.97%      0.36s  1.33%  runtime.(*bmap).overflow (inline)
     0.18s  0.67% 85.64%      0.24s  0.89%  time.Time.Equal
     0.17s  0.63% 86.27%      0.17s  0.63%  gx/ipfs/QmcTzQXRcU2vf8yX5EEboz1BSvWC7wWmeYAKVQmhp8WZYU/sha256-simd.blockGeneric

CPU Profile Code Listing:

Total: 27.02s
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).getProvKeys.func1 in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
         0       10ms (flat, cum) 0.037% of Total
         .          .    232:			if err != nil {
         .          .    233:				log.Warning("error decoding base32 provider key: %s: %s", parts[2], err)
         .          .    234:				continue
         .          .    235:			}
         .          .    236:
         .       10ms    237:			c, err := cid.Cast(decoded)
         .          .    238:			if err != nil {
         .          .    239:				log.Warning("error casting key to cid from datastore key: %s", err)
         .          .    240:				continue
         .          .    241:			}
         .          .    242:
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).run in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
     210ms     19.15s (flat, cum) 70.87% of Total
         .          .    270:				log.Error("Error loading provider keys: ", err)
         .          .    271:				continue
         .          .    272:			}
         .          .    273:			now := time.Now()
         .          .    274:			for {
         .       10ms    275:				k, ok := keys()
         .          .    276:				if !ok {
         .          .    277:					break
         .          .    278:				}
         .          .    279:
         .          .    280:				provs, err := pm.getProvSet(k)
         .          .    281:				if err != nil {
         .          .    282:					log.Error("error loading known provset: ", err)
         .          .    283:					continue
         .          .    284:				}
      40ms     15.24s    285:				for p, t := range provs.set {
     170ms      3.90s    286:					if now.Sub(t) > ProvideValidity {
         .          .    287:						delete(provs.set, p)
         .          .    288:					}
         .          .    289:				}
         .          .    290:				// have we run out of providers?
         .          .    291:				if len(provs.set) == 0 {
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.NewProviderManager.func1 in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
         0     19.15s (flat, cum) 70.87% of Total
         .          .     68:	}
         .          .     69:	pm.providers = cache
         .          .     70:
         .          .     71:	pm.proc = goprocessctx.WithContext(ctx)
         .          .     72:	pm.cleanupInterval = defaultCleanupInterval
         .     19.15s     73:	pm.proc.Go(func(p goprocess.Process) { pm.run() })
         .          .     74:
         .          .     75:	return pm
         .          .     76:}
         .          .     77:
         .          .     78:const providersKeyPrefix = "/providers/"

You'll notice the 70% CPU time...

Stebalien · 2019-04-10T16:23:10Z

So basically, we need a quick fix, then the right fix. That's why I suggested just dropping provider records as they aren't useful in bulk anyways.

anacrolix · 2019-04-10T22:04:25Z

Just to confirm the issue, I understand from reading the code that the hourly cleanup deserializes every provider record entirely into memory from the provider data store, checks all the timeouts and inserts any changes back into the store?

It does seem strange that timeouts feature so prominently everywhere. Ultimately with enough traffic/churn, something has to give. I think timeouts help remove old/quiet data, but aren't sufficient.

The mapping looks like Cid->{Peer.ID,time.Time}, done manually on a KV store. Even the simplest structured data store solution would do away with all the hassle here, such as sqlite3. Pruning would be as simple as delete from providers where datetime('now')>=expiry. Visibility into providers per Cid, and pruning large Cid provider sets would be trivial. The hacky "trim on the fly" can be hand-waved away with select peer from providers where cid=? and expiry < datetime('now'). Since storage is not the issue, performant queries is the real issue, moving that optimization elsewhere would improve the situation we have now like loading an entire set into a map because writing a proper data KV query iterator is a lot of work. I understand this might not be a viable short-term solution, if not, a seperate issue to create better provider storage is an option.

@Stebalien do you have numbers for this provider set? How many entries for example?

Stebalien · 2019-04-11T01:02:59Z

@Stebalien do you have numbers for this provider set? How many entries for example?

Unfortunately, not. I just know that we're spending a lot of time iterating over that map.

In terms of sqlite, we've considered this in go-ipfs as well because using a kvstore for everything is kind of painful. Unfortunately, sqlite itself requires an external library (can we statically link) and CFFI calls (slow in go). However, we should still look into this and, maybe, alternative databases.

because writing a proper data KV query iterator is a lot of work.

A proper kv iterator would actually be pretty easy. I believe we're doing it this way due to how we're caching provider records in memory.

bpot · 2019-04-12T23:37:26Z

I don't know how big the impact is but it looks like the cleanup code is only deleting expired provider records from the datastore when all providers are removed for a key.

If only some of the providers are removed the in-memory/cached provider set is updated but the expired entries aren't removed from the database.

It seems like this could at least be exacerbating the current issue.

Stebalien · 2019-04-13T00:52:42Z

Ah. So if the cache is full, we'll repeatedly restore this set from disk. That's bad.

part of fixing: libp2p/go-libp2p-kad-dht#316 (comment)

Stebalien · 2019-04-13T01:45:41Z

Hm. So... it looks like our current GC algorithm is n^2. Let's fix that.

whyrusleeping · 2019-04-17T19:34:28Z

complains

(pprof) top10
Showing nodes accounting for 223.71MB, 83.96% of 266.44MB total
Dropped 184 nodes (cum <= 1.33MB)
Showing top 10 nodes out of 112
      flat  flat%   sum%        cum   cum%
  117.02MB 43.92% 43.92%   117.02MB 43.92%  github.com/libp2p/go-libp2p-kad-dht/providers.(*providerSet).setVal
   62.50MB 23.46% 67.38%   180.05MB 67.57%  github.com/libp2p/go-libp2p-kad-dht/providers.loadProvSet
    9.03MB  3.39% 70.76%     9.53MB  3.58%  github.com/whyrusleeping/yamux.newSession
    8.71MB  3.27% 74.03%     8.71MB  3.27%  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
    5.41MB  2.03% 76.06%     7.41MB  2.78%  github.com/hashicorp/golang-lru/simplelru.(*LRU).Add
    4.53MB  1.70% 77.76%     4.53MB  1.70%  github.com/gogo/protobuf/io.(*varintWriter).WriteMsg
    4.50MB  1.69% 79.45%     4.50MB  1.69%  github.com/multiformats/go-multiaddr.NewMultiaddrBytes
    4.02MB  1.51% 80.96%     4.02MB  1.51%  bufio.NewWriterSize
       4MB  1.50% 82.46%        4MB  1.50%  github.com/syndtr/goleveldb/leveldb/memdb.New
       4MB  1.50% 83.96%        9MB  3.38%  github.com/libp2p/go-libp2p-peerstore/pb.(*AddrBookRecord).Unmarshal

Stebalien · 2019-04-17T19:48:52Z

@whyrusleeping which version of what? Assuming go-ipfs 0.4.20, try running go get github.com/libp2p/go-libp2p-kad-dht@master and rebuilding (this fix didn't make it into the release).

Stebalien · 2019-06-18T15:18:26Z

We still need to limit the size of this set, but this issue was really about a bug where we weren't garbage collecting it properly.

Stebalien added kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up labels Apr 10, 2019

Stebalien mentioned this issue Apr 10, 2019

CPU Utilization Issues ipfs/kubo#5613

Closed

raulk mentioned this issue Apr 10, 2019

DHT provider records: adopt sloppiness, rejection and backtracking from Coral libp2p/specs#163

Open

Stebalien added a commit to ipfs/go-datastore that referenced this issue Apr 13, 2019

autobatch: batch deletes

301890e

part of fixing: libp2p/go-libp2p-kad-dht#316 (comment)

Stebalien mentioned this issue Apr 13, 2019

autobatch: batch deletes ipfs/go-datastore#128

Merged

Stebalien mentioned this issue Apr 13, 2019

Fix some provider subsystem performance issues #319

Merged

6 tasks

raulk mentioned this issue May 30, 2019

Critical path towards DHT efficiency and performance #345

Open

7 tasks

Stebalien closed this as completed Jun 18, 2019

libp2p deleted a comment from bitcard Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the size of provider sets #316

Limit the size of provider sets #316

Stebalien commented Apr 10, 2019

Stebalien commented Apr 10, 2019

vyzo commented Apr 10, 2019

Kubuxu commented Apr 10, 2019

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019 •

edited

Loading

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019

vyzo commented Apr 10, 2019 •

edited

Loading

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019 •

edited

Loading

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019

vyzo commented Apr 10, 2019

vyzo commented Apr 10, 2019

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019 •

edited

Loading

raulk commented Apr 10, 2019

Stebalien commented Apr 10, 2019 •

edited

Loading

Stebalien commented Apr 10, 2019

anacrolix commented Apr 10, 2019

Stebalien commented Apr 11, 2019

bpot commented Apr 12, 2019

Stebalien commented Apr 13, 2019

Stebalien commented Apr 13, 2019

whyrusleeping commented Apr 17, 2019

Stebalien commented Apr 17, 2019 •

edited

Loading

Stebalien commented Jun 18, 2019

Limit the size of provider sets #316

Limit the size of provider sets #316

Comments

Stebalien commented Apr 10, 2019

Stebalien commented Apr 10, 2019

vyzo commented Apr 10, 2019

Kubuxu commented Apr 10, 2019

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019 • edited Loading

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019

vyzo commented Apr 10, 2019 • edited Loading

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019 • edited Loading

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019

raulk commented Apr 10, 2019

vyzo commented Apr 10, 2019

vyzo commented Apr 10, 2019

vyzo commented Apr 10, 2019

raulk commented Apr 10, 2019 • edited Loading

raulk commented Apr 10, 2019

Stebalien commented Apr 10, 2019 • edited Loading

Stebalien commented Apr 10, 2019

anacrolix commented Apr 10, 2019

Stebalien commented Apr 11, 2019

bpot commented Apr 12, 2019

Stebalien commented Apr 13, 2019

Stebalien commented Apr 13, 2019

whyrusleeping commented Apr 17, 2019

Stebalien commented Apr 17, 2019 • edited Loading

Stebalien commented Jun 18, 2019

raulk commented Apr 10, 2019 •

edited

Loading

vyzo commented Apr 10, 2019 •

edited

Loading

raulk commented Apr 10, 2019 •

edited

Loading

raulk commented Apr 10, 2019 •

edited

Loading

Stebalien commented Apr 10, 2019 •

edited

Loading

Stebalien commented Apr 17, 2019 •

edited

Loading