-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid units on network page (/network) #15070
Comments
Yes. We would receive garbage bits for some specific storage metrics.
Very unlikely. The bug was caused by misconfiguration of specific metrics and is very likely not present in the network metrics we use for this plot. Also, the garbage bits would be constant which would result in a single enourmous spike in the plot (since the plot shows the differences between samples, not the samples directly). There is a pronounced constant plateau here, which wouldn't be caused by that bug. |
Is there a specific pattern of traffic that triggers this? Can you make it happen on purpose and then record a video? We have a whole new implementation for the graphs lined up: #14913. Are you able to try out that pull request? Otherwise it would probably make sense to wait a bit for it to be released and reach you (hopefully in two or three weeks or so). |
Ask and you shall receive. My architecture is as such:
Upstream internet comes in via Realtek card and gets split internally via the Qualcomm card. If my network is mostly idling (admittedly some video streams, VPN connections and other stuff) -- I get a nice graph: However, if I spike it with Like I said, the board is far too old (and running over 1Gbe / cat6) to support that throughput :) How can I tell if the Qualcomm card is just giving bogus data? From
and in Any thoughts for how to debug this further? |
(I'm happy to test an RPM build of the PR if someone has one, but I don't really want to figure out how to build cockpit :-) |
How do the numbers from iperf3 and iftop compare to Cockpit's numbers in the "reasonable" scenario? Maybe the whole Cockpit plot is just wrong by some factor. Cockpit gets its numbers from PCP, or from its own internal source if PCP is not available. If they come from PCP, there is a
Note that Cockpit shows the traffic in bits per seconds, while pmval will show it in bytes per second. If there is no PCP, I'll check here as well if all the numbers agree. |
It all checks out here, running iperf3 between a VM and its host. I get a peak rate of about 10 Gbits/s according to iperf3 itself, and all of iftop, pmval, the Cockpit graphs and the numerical display of Cockpit (in the "Interfaces panel") agree. Hmm. |
Here they are: https://copr.fedorainfracloud.org/coprs/mvo/pr-14913/ |
@mvollmer -- I'm guessing it is an issue in the display logic. I get:
(spike at the end is The rest of the traffic lines up well in the chart, and |
I installed the new package. Same bug.
Interestingly, if I reload the page after spiking the traffic, I get the right units again: But the recorded traffic is off, by a factor of 2ish:
Oddly, this persists -- if you look at the screenshot above, both plateaus are using the same I closed the tab and re-opened it after installing the new package, though I honestly don't see any difference in the charts :) Oddly, if I watch the chart as traffic ramps up, I notice that the units were correct (up to around 800Mbps), but as soon as it shows 1200, it switches to Gbps. I'm guessing there's a subtle logic bug somewhere: 1200Mbps is 1.2Gbps, so the units switch over, even though it is wrong. So either the numerical axis labels need to be updated to be in Gbps, or the label still needs to show Mbps until we get into higher values of Gbps. |
Yes, this makes a lot of sense. (And that's why I was asking for a video. If it's not too much trouble, it would be nice to see it in action, but since you have seen it and described the effect, I don't think I could get any more details out of a video, actually...) I'll do some code reading here and mock up some artificial traffic ramps. |
You should be seeing version 235. You need to at least log out and log in, I guess. Executing And thanks a lot for all the effort in your side for figuring this out! I think we are getting closer! |
I pulled the repo as of:
and went on with building & running instructions (thanks @mvollmer!). Some observations: #1: Using inspector, I was able to see that the data sent from the server is correct (i.e., I'm seeing the expected traffic rate in bytes/second). #2: When I logged calls into the interesting plot functions ( Log messages of plot function calls while calculating problemeatic graphThese are for the y-axis values:
These are for the y-axis unit labels:
#3: When I changed the y-axis calculation to use Log messages of plot function calls while calculating problemeatic graphThese are for the y-axis values:
These are for the y-axis unit label:
The patch I made in both cases was fairly simple: export function bits_per_sec_tick_unit(axis) {
// Here, I made the change for datamax over data:
// const max_value = axis.datamax ? axis.datamax : axis.max
const ret = cockpit.format_bits_per_sec(axis.max * 8, 1000, true)[1];
console.log("bits_per_sec_tick_unit", axis, ret);
return ret;
}
export function format_bits_per_sec_tick_no_unit(val, axis) {
const ret = cockpit.format_bits_per_sec(val * 8, bits_per_sec_tick_unit(axis), true)[0];
console.log("format_bits_per_sec_tick_no_unit", val, axis, ret);
return ret;
}
export function format_bits_per_sec_tick(val, axis) {
const ret = cockpit.format_bits_per_sec(val * 8, 1000);
console.log("format_bits_per_sec_tick", val, axis);
return ret;
} To me, this says that there's an issue with the plotting library: the same value needs to be used on both the unit label and the numerical values. Since your branch didn't help (sorry!) I used the current Fedora 32 cockpit version for the video. From the loaded page, I refreshed. Waited for graphs to load. Then kicked off Sending iperf3. Watched graph. Then kicked off receiving iperf3. cockpit-network-issue.mp4 |
Excellent, you nailed it down perfectly, as far as I can tell. I had assumed that yaxis.max would be the same when the labels are formatted and after the plot. But flot seems to change it, probably rounding it to a nice value. Using I don't think the new code has this bug, since it is all done under our control: cockpit/pkg/lib/cockpit-components-plot.jsx Line 130 in 2e1ca3b
It might have other bugs, so I would really appreciate it if you could test it (once it is released). |
#14913 has been merged to master, so it will be in this weeks release, 236. I'll close this issue, thanks a lot for the investigations! |
Cockpit version: cockpit-234-1.fc32.x86_64 / cockpit-networkmanager-234-1.fc32.noarch / cockpit-pcp-234-1.fc32.x86_64
OS: Fedora 32
Page: Network
Units sometimes are incorrect on the network page. See screenshot:
I'm not really sure how to reproduce it, other than sending varying amounts of traffic. :-)
The relevant devices are:
There's no way I peaked anywhere close to 1000Gbps like the chart on the left implies. I'm fairly sure the board and the processor don't have that much total bandwidth. :-)
The text was updated successfully, but these errors were encountered: