Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the size of provider sets #316

Closed
Stebalien opened this issue Apr 10, 2019 · 28 comments
Closed

Limit the size of provider sets #316

Stebalien opened this issue Apr 10, 2019 · 28 comments
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up

Comments

@Stebalien
Copy link
Member

Currently, provider sets can grow quite large. Unfortunately, if a peer ID happens to lie close to some very popular content, that node will end up storing a ton of provider records for that content. This usually isn't much of an issue in terms of storage space but is definitely an issue in terms of performance.

See ipfs/kubo#5613 (comment).

cc @markg85

@Stebalien Stebalien added kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up labels Apr 10, 2019
@Stebalien
Copy link
Member Author

@Kubuxu do you have time to land a fix?

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

If the storage space isn't an issue, perhaps we need some smarter provider set management so that it is not a performance issue any more.

@Kubuxu
Copy link
Member

Kubuxu commented Apr 10, 2019

@Stebalien can you share the performance trace?

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

We probably want to optimize the garbage collector (and possibly the data structure), since that's what seems to be eating the cpu in the linked issue.

@raulk
Copy link
Member

raulk commented Apr 10, 2019

This is a fundamental flaw in our design and we can’t solve it nicely without introducing some form of scoring.

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Involving @jhiesey and @anacrolix

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Alternatively adopting sloppy hashing would alleviate the burden on strictly near peer ID.

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

So one thing we could do to immediately reduce the stress, is to bin the provider record by (next) hour of expiry.
So instead of having to iterate through N provider records, we would iterate through a fixed number of (24) bins.

edit: referring to gc efficiency, which seems to be the limiting factor as it is a linear check of all provider records.

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Nodes should throttle provider records, and providers should expand their radius upon getting throttled. That way, content is sloppily radiated across the network the more popular it gets, and it’s also findable more rapidly as you don’t need a perfect match any longer.

@raulk
Copy link
Member

raulk commented Apr 10, 2019

This approach is similar to how topics will be QoSsed in discv5 in Ethereum. See this proposal: libp2p/specs#145

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

Actually with binning we wouldn't even have to iterate on expiry, we could simply throw away the next bin on every tick (which is hourly).

@raulk
Copy link
Member

raulk commented Apr 10, 2019

@vyzo for GC, have a look at the 2-cycle algorithm we implemented in the datastore-backed peerstore. It has lookahead and prune cycles, and it operates similarly to what you propose.

How do you deal with record renewals with binning, where the record hasn’t yet been pruned yet and still lives in a bin? Looking it up becomes O(n) again?

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Would bins contain references or the physical record? I guess the former, in which case the above could be a problem. Can’t be the latter because access would be O(n).

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

I think it is fine if we have a provider appear in multiple bins, but we'll have to deduplicate on the response.

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

nevermind, I think the deduplication alone would turn the response into O(n) which is unacceptable.

@vyzo
Copy link
Contributor

vyzo commented Apr 10, 2019

So if the provider record size is the issue, I am not sure that dropping new providers is the right solution.
We should probabilistically replace existing providers I think, so that we get to refresh the provider set.

@raulk
Copy link
Member

raulk commented Apr 10, 2019

The provider mechanism in our DHT is inspired by Coral, but it’s a half-baked, naïve implementation. Let’s implement it all the way with sloppy hashing (backwards compatible) and record rejection and backtracking. Although backtracking is not Byzantine resistant IIRC, we can make it so (keep going forward a few steps, disjoint paths, etc.)

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Opened a discussion in libp2p/specs to get this implemented: libp2p/specs#163

@Stebalien
Copy link
Member Author

Stebalien commented Apr 10, 2019

When I say "space isn't an issue" I mean that that's not what's crashing the node. It's still an issue and there's no way we can meaningfully use more than say 100 provider records.

I can't share the dumps but I can share the results.

CPU Profile:

Showing nodes accounting for 23.31s, 86.27% of 27.02s total
Dropped 542 nodes (cum <= 0.14s)
Showing top 20 nodes out of 133
      flat  flat%   sum%        cum   cum%
    13.82s 51.15% 51.15%     14.76s 54.63%  runtime.mapiternext
     2.04s  7.55% 58.70%      2.04s  7.55%  math/big.addMulVVW
     1.09s  4.03% 62.73%      1.09s  4.03%  runtime.duffcopy
     1.01s  3.74% 66.47%      1.04s  3.85%  runtime.findObject
     0.56s  2.07% 68.54%      0.56s  2.07%  runtime._div64by32
     0.53s  1.96% 70.50%      1.73s  6.40%  runtime.scanobject
     0.50s  1.85% 72.35%      1.06s  3.92%  runtime.dodiv
     0.47s  1.74% 74.09%      3.64s 13.47%  time.Time.Sub
     0.42s  1.55% 75.65%      2.57s  9.51%  time.Time.Add
     0.41s  1.52% 77.17%      0.43s  1.59%  syscall.Syscall
     0.34s  1.26% 78.42%      0.87s  3.22%  runtime.int64div
     0.34s  1.26% 79.68%      0.87s  3.22%  runtime.int64mod
     0.31s  1.15% 80.83%      2.28s  8.44%  math/big.nat.montgomery
     0.31s  1.15% 81.98%      0.31s  1.15%  runtime.markBits.isMarked (inline)
     0.22s  0.81% 82.79%      0.22s  0.81%  runtime.add (inline)
     0.21s  0.78% 83.57%     19.15s 70.87%  gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).run
     0.19s   0.7% 84.27%      0.21s  0.78%  crypto/elliptic.p256ReduceDegree
     0.19s   0.7% 84.97%      0.36s  1.33%  runtime.(*bmap).overflow (inline)
     0.18s  0.67% 85.64%      0.24s  0.89%  time.Time.Equal
     0.17s  0.63% 86.27%      0.17s  0.63%  gx/ipfs/QmcTzQXRcU2vf8yX5EEboz1BSvWC7wWmeYAKVQmhp8WZYU/sha256-simd.blockGeneric

CPU Profile Code Listing:

Total: 27.02s
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).getProvKeys.func1 in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
         0       10ms (flat, cum) 0.037% of Total
         .          .    232:			if err != nil {
         .          .    233:				log.Warning("error decoding base32 provider key: %s: %s", parts[2], err)
         .          .    234:				continue
         .          .    235:			}
         .          .    236:
         .       10ms    237:			c, err := cid.Cast(decoded)
         .          .    238:			if err != nil {
         .          .    239:				log.Warning("error casting key to cid from datastore key: %s", err)
         .          .    240:				continue
         .          .    241:			}
         .          .    242:
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.(*ProviderManager).run in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
     210ms     19.15s (flat, cum) 70.87% of Total
         .          .    270:				log.Error("Error loading provider keys: ", err)
         .          .    271:				continue
         .          .    272:			}
         .          .    273:			now := time.Now()
         .          .    274:			for {
         .       10ms    275:				k, ok := keys()
         .          .    276:				if !ok {
         .          .    277:					break
         .          .    278:				}
         .          .    279:
         .          .    280:				provs, err := pm.getProvSet(k)
         .          .    281:				if err != nil {
         .          .    282:					log.Error("error loading known provset: ", err)
         .          .    283:					continue
         .          .    284:				}
      40ms     15.24s    285:				for p, t := range provs.set {
     170ms      3.90s    286:					if now.Sub(t) > ProvideValidity {
         .          .    287:						delete(provs.set, p)
         .          .    288:					}
         .          .    289:				}
         .          .    290:				// have we run out of providers?
         .          .    291:				if len(provs.set) == 0 {
ROUTINE ======================== gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers.NewProviderManager.func1 in /home/steb/projects/go/src/github.com/ipfs/distributions/dists/go-ipfs/gopath/src/gx/ipfs/QmdR6WN3TUEAVQ9KWE2UiFJikWTbUvgBJay6mjB4yUJebq/go-libp2p-kad-dht/providers/providers.go
         0     19.15s (flat, cum) 70.87% of Total
         .          .     68:	}
         .          .     69:	pm.providers = cache
         .          .     70:
         .          .     71:	pm.proc = goprocessctx.WithContext(ctx)
         .          .     72:	pm.cleanupInterval = defaultCleanupInterval
         .     19.15s     73:	pm.proc.Go(func(p goprocess.Process) { pm.run() })
         .          .     74:
         .          .     75:	return pm
         .          .     76:}
         .          .     77:
         .          .     78:const providersKeyPrefix = "/providers/"

You'll notice the 70% CPU time...

@Stebalien
Copy link
Member Author

So basically, we need a quick fix, then the right fix. That's why I suggested just dropping provider records as they aren't useful in bulk anyways.

@anacrolix
Copy link
Contributor

Just to confirm the issue, I understand from reading the code that the hourly cleanup deserializes every provider record entirely into memory from the provider data store, checks all the timeouts and inserts any changes back into the store?

It does seem strange that timeouts feature so prominently everywhere. Ultimately with enough traffic/churn, something has to give. I think timeouts help remove old/quiet data, but aren't sufficient.

The mapping looks like Cid->{Peer.ID,time.Time}, done manually on a KV store. Even the simplest structured data store solution would do away with all the hassle here, such as sqlite3. Pruning would be as simple as delete from providers where datetime('now')>=expiry. Visibility into providers per Cid, and pruning large Cid provider sets would be trivial. The hacky "trim on the fly" can be hand-waved away with select peer from providers where cid=? and expiry < datetime('now'). Since storage is not the issue, performant queries is the real issue, moving that optimization elsewhere would improve the situation we have now like loading an entire set into a map because writing a proper data KV query iterator is a lot of work. I understand this might not be a viable short-term solution, if not, a seperate issue to create better provider storage is an option.

@Stebalien do you have numbers for this provider set? How many entries for example?

@Stebalien
Copy link
Member Author

@Stebalien do you have numbers for this provider set? How many entries for example?

Unfortunately, not. I just know that we're spending a lot of time iterating over that map.


In terms of sqlite, we've considered this in go-ipfs as well because using a kvstore for everything is kind of painful. Unfortunately, sqlite itself requires an external library (can we statically link) and CFFI calls (slow in go). However, we should still look into this and, maybe, alternative databases.

because writing a proper data KV query iterator is a lot of work.

A proper kv iterator would actually be pretty easy. I believe we're doing it this way due to how we're caching provider records in memory.

@bpot
Copy link

bpot commented Apr 12, 2019

I don't know how big the impact is but it looks like the cleanup code is only deleting expired provider records from the datastore when all providers are removed for a key.

If only some of the providers are removed the in-memory/cached provider set is updated but the expired entries aren't removed from the database.

It seems like this could at least be exacerbating the current issue.

@Stebalien
Copy link
Member Author

Ah. So if the cache is full, we'll repeatedly restore this set from disk. That's bad.

Stebalien added a commit to ipfs/go-datastore that referenced this issue Apr 13, 2019
@Stebalien
Copy link
Member Author

Hm. So... it looks like our current GC algorithm is n^2. Let's fix that.

@whyrusleeping
Copy link
Contributor

complains

(pprof) top10
Showing nodes accounting for 223.71MB, 83.96% of 266.44MB total
Dropped 184 nodes (cum <= 1.33MB)
Showing top 10 nodes out of 112
      flat  flat%   sum%        cum   cum%
  117.02MB 43.92% 43.92%   117.02MB 43.92%  github.com/libp2p/go-libp2p-kad-dht/providers.(*providerSet).setVal
   62.50MB 23.46% 67.38%   180.05MB 67.57%  github.com/libp2p/go-libp2p-kad-dht/providers.loadProvSet
    9.03MB  3.39% 70.76%     9.53MB  3.58%  github.com/whyrusleeping/yamux.newSession
    8.71MB  3.27% 74.03%     8.71MB  3.27%  github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Get
    5.41MB  2.03% 76.06%     7.41MB  2.78%  github.com/hashicorp/golang-lru/simplelru.(*LRU).Add
    4.53MB  1.70% 77.76%     4.53MB  1.70%  github.com/gogo/protobuf/io.(*varintWriter).WriteMsg
    4.50MB  1.69% 79.45%     4.50MB  1.69%  github.com/multiformats/go-multiaddr.NewMultiaddrBytes
    4.02MB  1.51% 80.96%     4.02MB  1.51%  bufio.NewWriterSize
       4MB  1.50% 82.46%        4MB  1.50%  github.com/syndtr/goleveldb/leveldb/memdb.New
       4MB  1.50% 83.96%        9MB  3.38%  github.com/libp2p/go-libp2p-peerstore/pb.(*AddrBookRecord).Unmarshal

@Stebalien
Copy link
Member Author

Stebalien commented Apr 17, 2019

@whyrusleeping which version of what? Assuming go-ipfs 0.4.20, try running go get github.com/libp2p/go-libp2p-kad-dht@master and rebuilding (this fix didn't make it into the release).

@Stebalien
Copy link
Member Author

We still need to limit the size of this set, but this issue was really about a bug where we weren't garbage collecting it properly.

@libp2p libp2p deleted a comment from bitcard Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

7 participants