refactor: improve peers performance #1262

hacdias · 2019-10-20T19:48:45Z

This refactors the peers locations fetching/updating new locations every ~~three~~ one seconds so any changes will only be seen after ~~three~~ one seconds. This reduced dramatically the CPU usage on my end and still keeps the functionality.

As it was stated by @olizilla on #1072:

We do a lot of work in peer-locations.js to expose the internal logic of the queuing of geoip look-ups as state in redux. We are not visualising the queue, nor do we plan to, so i suspect the code could be written more simply. We could extend the work in #1071 to encapsulate the geoip look up queue and caching in a class, and simply have it periodically update the redux state with batches of locations as they are found, rather than have all the queuing logic managed in redux.

@lidel @autonome, can you take a look at this one, please?

closes #1273

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias · 2019-10-20T20:05:10Z

~~I still need to fix the tests. However, I don't think that'll be an easy task taking in account I've changed this a lot. Perhaps I'll focus my tests on the PeerLocationResolver class.~~

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

olizilla

Tested it and works well!

Some comments inline to make thing a little more comprehendable for the reader.

As an stretch goal, it'd be lovely to animate the addition of nodes to the map seperately from discovering their location, so that we could show nodes continually popping in and fading out of the map all the time, rather than flashing changes on 3s intervals.

src/bundles/peer-locations.js

olizilla · 2019-10-21T14:28:35Z

src/bundles/peer-locations.js

@@ -300,44 +148,65 @@ class PeerLocationResolver {
      ...opts.cache
    })

+    this.peersToAddrs = {}
+    this.failedAddrs = []


failedAddrs and peersToAddrs are both unbounded caches... if we need them consider using an LRU cache with a max size.

I'm just storing them to avoid calling the same lookup over and over despite of its failure. I've changed the max size to 500, where we delete the first 100 when it gets full. I think it's not worth it to implement a full LRU cache, but it follows the principle.

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias · 2019-10-21T17:20:50Z

@olizilla just made a few updates to fix and improve the things you've mentioned. I'll see what I can do about the animations in a different PR.

autonome · 2019-10-21T19:20:26Z

This is awesome, thanks so much digging further into the root cause of the resource consumption.

Is there a specific reason why three seconds is the default?

hacdias · 2019-10-21T19:22:08Z

uh, I said 3? I set it to 1. Not a specific reason, but it's much smoother now...

autonome · 2019-10-21T19:28:13Z

It was originally three and you changed to one? I was thinking the other way... can we go to 5s or higher ;) If we had a list of specific reasons why people need to know, it would be easier to reason about instead of picking arbitrary numbers.

But as long as not burning laptop battery, 1s is fine for now!

autonome · 2019-10-21T19:31:58Z

src/bundles/peer-locations.js

-        const unique = new Set(allCoord.map(JSON.stringify))
-        return Array.from(unique).map(JSON.parse)
+    staleAfter: ms.seconds(1),
+    retryAfter: ms.seconds(1),


Would be good to have the interval as a const at top of file, and documented

Just added!

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias · 2019-10-22T06:43:39Z

Basically the difference will only make us see the changes later and add more requests to queue later. I don't believe it influences the amount of rendering that much, but not sure. Either way, it's a simple change.

lidel · 2019-10-22T17:09:53Z

IIUC we are fetching info for 10 locations at the same time:

  // Max number of locations to retrieve concurrently
  opts.concurrency = opts.concurrency || 10

@hacdias do you remember if we picked this number for any particular reason?
I'd like to decrease this to 4 to be safely below current XHR/HTTP 1.1 concurrency limits in browsers (iirc its around 6 concurrent requests per the same host in Chromium).

hacdias · 2019-10-22T17:23:04Z

@lidel the default is 10 but we are fetching 20! It was changed on #1071

hacdias · 2019-10-22T17:23:22Z

Green light to merge?

lidel

@lidel the default is 10 but we are fetching 20! It was changed on #1071

Hm... I don't think this is capable of doing 20 🙃

To be specific, queue concurrency can be at 20, but fetch calls to HTTP API are throttled to max 4-6 at a time by the browser itself, and fetch calls over that limit are put in "pending" state by the browser until existing ones are finished.

When you load Peers screen for the first time geoip cache gets populated fast because its fetched from localhost, but you can observe how browser is freezing requests into "pending" state when network speed is artificially slowed down (scroll down):

I propose to change the queue concurrency to 4 as a part of this PR: afaik there is no gain from anything higher than that, and by limiting it to 4 and not 6 we give some breathing room for requests other than get. IT would also help with #882.

+ two asks to simplify code below

package.json

src/bundles/peer-locations.js

hacdias · 2019-10-22T21:43:18Z

@lidel done!

lidel

Thanks!

I noticed potential regression of the problem described in #887.

In short, geolocation of peers should be limited to the Peers screen. Geoip data should be downloaded only if user is on the Peers screen, but right now it gets fetched even when on Status screen:

First load of Status screen in this PR:

First load of Status screen in Web UI v2.5.7:

I did not dig into this further, but we need to address this before merging.

Ps. I am testing the screen with ~4k peers.
This may not be visible with regular setup. FYI my node's settings:

		"ConnMgr": {
			"GracePeriod": "3m",
			"HighWater": 5000,
			"LowWater": 4000,
			"Type": "basic"
		},

src/bundles/peer-locations.js

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias · 2019-10-23T09:28:25Z

Both of those are now fixed @lidel!

lidel · 2019-10-23T15:53:45Z

Hm, for some reason map and location column are not updated now.
@hacdias are you able to reproduce? (I got the same result in Firefox and Brave)

hacdias · 2019-10-23T15:58:36Z

Can't reproduce on Firefox Dev Edition, neither on Chrome.

hacdias · 2019-10-23T20:13:38Z

@lidel any update on this?

lidel

Ok, I was trying to figure out what is going wrong around 4000 peers.
Added small mitigation in ab83def (longer brain dump below, in case its useful)
Need to look into other things, but can circle back to this if needed.
Apart from two inline questions, lgtm.

Notes on performance with >4000 peers

CPU is still high (when run with thousands of peers)

Until geoip cache is populated, the CPU load remains high at all times (only on Peers tab):

Another thing is that it can take long time between executions of findLocations. Sometimes 4 minutes, sometimes 30 seconds. Is this a slowdown due to CPU usage, or effect of some trigger logic? Are we running it only when peer list changed?

`findLocations` pauses around `geoipCache.get`

In my case, at >4000 peers, the initial call to findLocations (empty geoipCache) in Brave took 165442ms to finish (over 2 minutes!). That is why I did not see any dots on the map. Those appear later, after subsequent findLocations calls return values from cache, so it took over 2 minutes for me to see anything during the initial load.

If I replace geoipCache with in-memory Map, the time of processing 4k peers in findLocations cuts down to under 4 seconds. Seems that reading and writing concurrently to the cache backed by idb-keyval causes periodical slowdown around await this.geoipCache.get(ipv4Addr) every 20-30 peers.

Processing single peer should take around 1ms, but every 20-30th peer stops for between 15 to over 20 seconds. At more than a thousand peers, this compounds into significant delay.

On the UX I mitigated the slowness of initial update with thousands of peers in ab83def

src/bundles/peer-locations.js

This change improves performance of initial load of Peers screen when there are thousands of peers. During the first three times PeerLocationResolver.findLocations is called we sort peers by latency and resolve geoip only for the closest subset. This ensures the top of the list in the UI gets updated fast while thousands of more distant peers gets resolved later. closes #1273

Co-Authored-By: Marcin Rataj <lidel@lidel.org>

lidel · 2019-10-24T12:49:08Z

~~Oops, broke tests, will fix shortly~~

Updated tests in 275553b, lint fixed in b08f3b5

Operate on a copy of peers and check if this.pass exists before executing pass-based optimizations

lidel

Ok, I think this is valuable improvement already. LGTM from me.
CPU performance in scenarios with thousands of peers can be tackled in separate PRs.

hacdias added 3 commits October 20, 2019 20:07

feat: improve peers locations performance

2be2038

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

fix lint errors

1f3593c

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

fix tests

ad9d83b

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias requested review from autonome and lidel October 20, 2019 19:48

fix: tests

ddb6cb5

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

hacdias mentioned this pull request Oct 21, 2019

Release 2.6.0 🗺 Know more about your friends 🙎‍♂️🙎‍♀️ #1085

Closed

12 tasks

olizilla reviewed Oct 21, 2019

View reviewed changes

hacdias added 4 commits October 21, 2019 17:31

fix: remove return

df6ece5

cleanup a bit

977d38a

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

peers might move

dc2bb46

remove await

cf6ef75

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

Update peer-locations.js

3dfcd8a

hacdias mentioned this pull request Oct 21, 2019

feat: highlight local network peers #1266

Merged

autonome reviewed Oct 21, 2019

View reviewed changes

add info about time interval

3959642

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

lidel requested changes Oct 22, 2019

View reviewed changes

package.json Outdated Show resolved Hide resolved

src/bundles/peer-locations.js Outdated Show resolved Hide resolved

hacdias added 2 commits October 22, 2019 22:41

use p-queue and hashlru, limit to 4 conc

ef6bdaf

merge

949130a

lidel requested changes Oct 22, 2019

View reviewed changes

src/bundles/peer-locations.js Outdated Show resolved Hide resolved

hacdias added 2 commits October 23, 2019 10:22

true looks better

86058aa

License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>

only fetch on peers page

0e15b96

lidel requested changes Oct 23, 2019

View reviewed changes

src/bundles/peer-locations.js Show resolved Hide resolved

lidel reviewed Oct 24, 2019

View reviewed changes

src/bundles/peer-locations.js Outdated Show resolved Hide resolved

lidel force-pushed the refactor/improv-perf-peers branch from 0e868bd to ab83def Compare October 24, 2019 12:44

Update src/bundles/peer-locations.js

36d9acf

Co-Authored-By: Marcin Rataj <lidel@lidel.org>

lidel added 3 commits October 24, 2019 15:25

refactor: fix peer-locations.js tests

275553b

Operate on a copy of peers and check if this.pass exists before executing pass-based optimizations

refactor: fix lint

b08f3b5

fix: deduplicate ongoing geoip lookups

eeccc8a

lidel approved these changes Oct 24, 2019

View reviewed changes

hacdias merged commit e66ce01 into master Oct 24, 2019

olizilla mentioned this pull request Oct 24, 2019

Issues with the peers page #1072

Closed

8 tasks

hacdias deleted the refactor/improv-perf-peers branch October 24, 2019 13:38

This was referenced Apr 3, 2021

[Snyk] Upgrade react-i18next from 11.7.0 to 11.8.10 47-studio-org/ipfs-webui#3

Open

[Snyk] Upgrade react-i18next from 11.7.0 to 11.8.10 MarcelRaschke/ipfs-webui#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: improve peers performance #1262

refactor: improve peers performance #1262

hacdias commented Oct 20, 2019 •

edited by lidel

Loading

hacdias commented Oct 20, 2019 •

edited

Loading

olizilla left a comment

olizilla Oct 21, 2019

hacdias Oct 21, 2019

hacdias commented Oct 21, 2019

autonome commented Oct 21, 2019

hacdias commented Oct 21, 2019

autonome commented Oct 21, 2019

autonome Oct 21, 2019

hacdias Oct 22, 2019

hacdias commented Oct 22, 2019 •

edited

Loading

lidel commented Oct 22, 2019 •

edited

Loading

hacdias commented Oct 22, 2019

hacdias commented Oct 22, 2019

lidel left a comment •

edited

Loading

hacdias commented Oct 22, 2019

lidel left a comment

hacdias commented Oct 23, 2019

lidel commented Oct 23, 2019

hacdias commented Oct 23, 2019 •

edited

Loading

hacdias commented Oct 23, 2019

lidel left a comment •

edited

Loading

lidel commented Oct 24, 2019 •

edited

Loading

lidel left a comment •

edited

Loading

refactor: improve peers performance #1262

refactor: improve peers performance #1262

Conversation

hacdias commented Oct 20, 2019 • edited by lidel Loading

hacdias commented Oct 20, 2019 • edited Loading

olizilla left a comment

Choose a reason for hiding this comment

olizilla Oct 21, 2019

Choose a reason for hiding this comment

hacdias Oct 21, 2019

Choose a reason for hiding this comment

hacdias commented Oct 21, 2019

autonome commented Oct 21, 2019

hacdias commented Oct 21, 2019

autonome commented Oct 21, 2019

autonome Oct 21, 2019

Choose a reason for hiding this comment

hacdias Oct 22, 2019

Choose a reason for hiding this comment

hacdias commented Oct 22, 2019 • edited Loading

lidel commented Oct 22, 2019 • edited Loading

hacdias commented Oct 22, 2019

hacdias commented Oct 22, 2019

lidel left a comment • edited Loading

Choose a reason for hiding this comment

hacdias commented Oct 22, 2019

lidel left a comment

Choose a reason for hiding this comment

hacdias commented Oct 23, 2019

lidel commented Oct 23, 2019

hacdias commented Oct 23, 2019 • edited Loading

hacdias commented Oct 23, 2019

lidel left a comment • edited Loading

Choose a reason for hiding this comment

Notes on performance with >4000 peers

CPU is still high (when run with thousands of peers)

findLocations pauses around geoipCache.get

lidel commented Oct 24, 2019 • edited Loading

lidel left a comment • edited Loading

Choose a reason for hiding this comment

hacdias commented Oct 20, 2019 •

edited by lidel

Loading

hacdias commented Oct 20, 2019 •

edited

Loading

hacdias commented Oct 22, 2019 •

edited

Loading

lidel commented Oct 22, 2019 •

edited

Loading

lidel left a comment •

edited

Loading

hacdias commented Oct 23, 2019 •

edited

Loading

lidel left a comment •

edited

Loading

`findLocations` pauses around `geoipCache.get`

lidel commented Oct 24, 2019 •

edited

Loading

lidel left a comment •

edited

Loading