-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor performance of MpoolSelect #11233
Comments
I expanded the
The daemon-logs indicate that creating the message chains is what is taking up the most time when this happens:
There has not been any period of time with a large rise in mpool over the weekend, so I have not been able to check if this puts additional strain on MpoolSelect. Some additional extended logging when MpoolSelect was taking a lot of time:
And a second one:
|
Upon further time-logging of
|
|
During yesterdays standup, we discussed that it would be good to run the node with the refinements to the locking mechanism in #10865, togehether with the additional metrics, to see if the PR helped towards the spikes in MpoolSelect.
When running the node with addtional time logging around GetActorAfter (0cf99f0), one can observe that when the MpoolSelect timing is spiking, it is happening because TipSetState in the GetActorAfter function: lotus/chain/messagepool/provider.go Lines 107 to 121 in 37b8afd
Example:
|
Closing this issue, now that the initial monitoring phase has been completed:
Based on these findings, a follow-up issue has been added here: #11251, and work on optimizations has started. |
The
MpoolSelect
method draws messages from the Lotus daemon's message pool, optimizing for maximum gas reward for the miner that might include those messages. Although it's rarely called by users other than the lotus-miner process, it is a very important method for the network as a whole.The performance of this method can be somewhat variable, depending on (among other things):
We've historically been somewhat blind to the performance of this method, since it's pretty much only called by miners. This makes it hard to investigate reports that this method might be slow, or find ways to optimize it.
In order to do so, we should create a simple tool (perhaps in lotus-shed) that calls
MpoolSelect
every 30 seconds, and records how long it takes. It should also log a few other stats, such as:We will then want to profile the method itself (if it does appear to take dangerously long in certain scenarios), and look for potential optimizations.
The text was updated successfully, but these errors were encountered: