-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: improve peers performance #1262
Conversation
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
|
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it and works well!
Some comments inline to make thing a little more comprehendable for the reader.
As an stretch goal, it'd be lovely to animate the addition of nodes to the map seperately from discovering their location, so that we could show nodes continually popping in and fading out of the map all the time, rather than flashing changes on 3s intervals.
src/bundles/peer-locations.js
Outdated
@@ -300,44 +148,65 @@ class PeerLocationResolver { | |||
...opts.cache | |||
}) | |||
|
|||
this.peersToAddrs = {} | |||
this.failedAddrs = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failedAddrs
and peersToAddrs
are both unbounded caches... if we need them consider using an LRU cache with a max size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just storing them to avoid calling the same lookup over and over despite of its failure. I've changed the max size to 500, where we delete the first 100 when it gets full. I think it's not worth it to implement a full LRU cache, but it follows the principle.
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
@olizilla just made a few updates to fix and improve the things you've mentioned. I'll see what I can do about the animations in a different PR. |
This is awesome, thanks so much digging further into the root cause of the resource consumption. Is there a specific reason why three seconds is the default? |
uh, I said 3? I set it to 1. Not a specific reason, but it's much smoother now... |
It was originally three and you changed to one? I was thinking the other way... can we go to 5s or higher ;) If we had a list of specific reasons why people need to know, it would be easier to reason about instead of picking arbitrary numbers. But as long as not burning laptop battery, 1s is fine for now! |
src/bundles/peer-locations.js
Outdated
const unique = new Set(allCoord.map(JSON.stringify)) | ||
return Array.from(unique).map(JSON.parse) | ||
staleAfter: ms.seconds(1), | ||
retryAfter: ms.seconds(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have the interval as a const at top of file, and documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added!
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
Basically the difference will only make us see the changes later and add more requests to queue later. I don't believe it influences the amount of rendering that much, but not sure. Either way, it's a simple change. |
IIUC we are fetching info for 10 locations at the same time: // Max number of locations to retrieve concurrently
opts.concurrency = opts.concurrency || 10 @hacdias do you remember if we picked this number for any particular reason? |
Green light to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel the default is 10 but we are fetching 20! It was changed on #1071
Hm... I don't think this is capable of doing 20 🙃
To be specific, queue concurrency can be at 20, but fetch
calls to HTTP API are throttled to max 4-6 at a time by the browser itself, and fetch
calls over that limit are put in "pending" state by the browser until existing ones are finished.
When you load Peers screen for the first time geoip cache gets populated fast because its fetched from localhost, but you can observe how browser is freezing requests into "pending" state when network speed is artificially slowed down (scroll down):
I propose to change the queue concurrency to 4 as a part of this PR: afaik there is no gain from anything higher than that, and by limiting it to 4 and not 6 we give some breathing room for requests other than get
. IT would also help with #882.
+ two asks to simplify code below
@lidel done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I noticed potential regression of the problem described in #887.
In short, geolocation of peers should be limited to the Peers screen. Geoip data should be downloaded only if user is on the Peers screen, but right now it gets fetched even when on Status screen:
First load of Status screen in this PR:
First load of Status screen in Web UI v2.5.7:
I did not dig into this further, but we need to address this before merging.
Ps. I am testing the screen with ~4k peers.
This may not be visible with regular setup. FYI my node's settings:
"ConnMgr": {
"GracePeriod": "3m",
"HighWater": 5000,
"LowWater": 4000,
"Type": "basic"
},
License: MIT Signed-off-by: Henrique Dias <hacdias@gmail.com>
Both of those are now fixed @lidel! |
Hm, for some reason map and location column are not updated now. |
Can't reproduce on Firefox Dev Edition, neither on Chrome. |
@lidel any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was trying to figure out what is going wrong around 4000 peers.
Added small mitigation in ab83def (longer brain dump below, in case its useful)
Need to look into other things, but can circle back to this if needed.
Apart from two inline questions, lgtm.
Notes on performance with >4000 peers
CPU is still high (when run with thousands of peers)
Until geoip cache is populated, the CPU load remains high at all times (only on Peers tab):
Another thing is that it can take long time between executions of findLocations
. Sometimes 4 minutes, sometimes 30 seconds. Is this a slowdown due to CPU usage, or effect of some trigger logic? Are we running it only when peer list changed?
findLocations
pauses around geoipCache.get
In my case, at >4000 peers, the initial call to findLocations
(empty geoipCache) in Brave took 165442ms to finish (over 2 minutes!). That is why I did not see any dots on the map. Those appear later, after subsequent findLocations
calls return values from cache, so it took over 2 minutes for me to see anything during the initial load.
If I replace geoipCache
with in-memory Map
, the time of processing 4k peers in findLocations
cuts down to under 4 seconds. Seems that reading and writing concurrently to the cache backed by idb-keyval
causes periodical slowdown around await this.geoipCache.get(ipv4Addr)
every 20-30 peers.
Processing single peer should take around 1ms, but every 20-30th peer stops for between 15 to over 20 seconds. At more than a thousand peers, this compounds into significant delay.
On the UX I mitigated the slowness of initial update with thousands of peers in ab83def
This change improves performance of initial load of Peers screen when there are thousands of peers. During the first three times PeerLocationResolver.findLocations is called we sort peers by latency and resolve geoip only for the closest subset. This ensures the top of the list in the UI gets updated fast while thousands of more distant peers gets resolved later. closes #1273
0e868bd
to
ab83def
Compare
Co-Authored-By: Marcin Rataj <lidel@lidel.org>
Operate on a copy of peers and check if this.pass exists before executing pass-based optimizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is valuable improvement already. LGTM from me.
CPU performance in scenarios with thousands of peers can be tackled in separate PRs.
This refactors the peers locations fetching/updating new locations every
threeone secondsso any changes will only be seen afterthreeone seconds. This reduced dramatically the CPU usage on my end and still keeps the functionality.As it was stated by @olizilla on #1072:
@lidel @autonome, can you take a look at this one, please?
closes #1273