-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectX: Review N*SQ 64B transmit performance mellanox (Rev 2) #1007
Comments
I am reading an absolutely fascinating book called Statistical Modeling: A Fresh Approach. It's about how to make models (functions) that succinctly account for the variations in data sets. The notion is that formal models can be used to capture details that would be too complex to present visually. I am using this ConnectX-4 data set as a first modeling exercise. The idea is to make a model that accounts for how performance ( Here is what the raw data looks like. (The points appear in groups because several different queue lengths are tested.) The exercise here is to make a series of models that try to account for the ConstantThe first naive model supposes that > m1 <- lm(Mpps ~ 1, data = d)
> coefficients(m1)
(Intercept)
60.11094 This model predicts that performance will always be 60.1 Mpps. We can visualize the model as a line overlayed on the data: On the one hand this is useful information but on the other hand it does not account for any of the variations in the measured values. LinearThe second model supposes that > m2 <- lm(Mpps ~ SendQueues, data = d)
> coefficients(m2)
(Intercept) SendQueues
49.6810481 0.8343916 This model predicts that baseline performance is 49.681 Mpps and then increases by 0.834 Mpps for each send queue. This may or may not be a step in the right direction but it definitely does not fit the data very closely. The goodness of the fit can be quantified with a statistic called "R squared" that tells us what fraction of the variation in Mpps values is accounted for by the model. The answer here is 25%. > summary(m2)$r.squared
[1] 0.2506815 SegmentedThe last model supposes that > m3 <- segmented(lm(Mpps~SendQueues-1, data=d), seg.Z=~SendQueues) The "fit" tells us that we have two lines with different slopes. Initially each send queue accounts for a 15.8 Mpps increase in performance. Later each send queue accounts for a slight decrease of 0.14 Mpps. The "break point" is after ~4 send queues.
This model looks much more satisfying to me. This is quantified in the R-squared statistic that says we are now accounting for 99.9% of the variation in SummarySo there we have it: our best model of ConnectX-4 performance says that the first four send queues give you 15.8 Mpps each and adding more beyond that has a slightly negative effect. I believe that Mellanox claim the maximum packet rate for this card is around 90 Mpps. This leads me to think that there are some other factors that we could include in our tests - descriptor formats, etc - that would produce values that don't fit this model. Future steps are to identify these factors, measure them, and update the model to account for them. ReflectionsJust now it seems to me like statistical modeling is a promising approach to characterizing system performance. I would be very happy if network equipment like NICs would be supplied with models like this to tell me what performance to expect in different configurations. I would be even happier if the constants in these models were derived directly from reproducible benchmarks on a CI system. Maybe in the future Snabb applications could come with such models that tell you what performance to expect based on factors like software version, configuration, clock speed, number of cores, choice of NIC, etc. |
I just want some clarification. This is on a single core, right? And that core is not saturated? So we can't expect a performance increase simply by using more cores, correct? |
Also very good job. Both on the driver implementation and on the analysis. It's always very interesting to read material from writers enthusiastic about a subject. You clearly are and it shows. A very interesting read. Interesting results and interesting ideas for the future. |
@plajjan Thanks! This benchmark is an attempt to establish the performance characteristics of the NIC itself. The tests are always with a single CPU core driving all of the send queues. I have used an especially efficient special-case transmit routine ("packetblaster") to prevent a CPU bottleneck. I have also sanity-checked this by running the test at both 3.5 GHz and 2.0 GHz and not seeing a significant difference in the results. (It would be interesting to add multiple clock speeds into the experiment and then use the model to quantify any effect this may have.) |
I think that I should repeat this testing and modeling exercise adding a couple more factors into the tests:
More? Informally it seems like clock speed does not have much effect (would confirm that the test is not CPU-bound) and that packet size has a surprisingly large effect (e.g. 200B packet gets only ~37 Mpps). Could be that the "packets/sec vs bytes/packet" curve is actually problematic in the sense of #1013. Have to model that to find out :-). |
This benchmark report for single-core transmit ("packetblaster") over multiple Send Queues superceeds #1006. This is based on a new Snabb version that performs better due to different DMA memory allocation.
These results look much clearer and simpler.
Summary:
The text was updated successfully, but these errors were encountered: