Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: changed method for retrieving Windows network statistics in case of awsvpc network mode #3425

Merged
merged 2 commits into from
Oct 18, 2022

Conversation

rawahars
Copy link
Contributor

@rawahars rawahars commented Oct 10, 2022

Summary

Previously, we were running a Powershell method (Get-NetAdapterStatistics) to retrieve the network statistics when the task is using awsvpc network mode. However, this can cause CPU spikes on smaller instance types.

Another method to obtain the network statistics would be access them using Win32 API. This method is significantly more performant than using Powershell cmdlet. Therefore, we are switching the agent implementation to use native Win32 API for retrieving the same.

Implementation

When we initialise the stats module for a given task, we will query and store the interface LUID of the task ENI. During the subsequent calls to retrieve the network statistics, we will query the statistics specific to the interface using it's LUID.

In order to accomplish the same, we are introducing two new APIs in NetworkUtils interface would be the preferred way to access Windows networking APIs in the agent. The two APIs are-

  • ConvertInterfaceAliasToLUID: Converts the interface alias into the corresponding LUID.
    • Internally invokes ConvertInterfaceAliasToLuid Win32 API to convert interface alias into LUID.
  • GetMIBIfEntryFromLUID: Returns the MIB_IF_ROW2 for the interface with the given LUID.
    • Internally invokes GetIfEntry2Ex Win32 API to obtain the specific interface which has given LUID.

Reference:

Note: As part of this PR, refactoring has been done to improve the code quality of existing Windows workflow.

  • Removed Setter method from networkUtils package
  • Removed Setter method from watcher package
  • Introduced mocks for networkUtils package and used the same in watcher tests instead of object injection.

Testing

Tested the new agent using a custom AMI.
Tested it out on smaller instance types for any periodic spikes.

New tests cover the changes:
Yes

Description for the changelog

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@rawahars rawahars requested a review from a team as a code owner October 10, 2022 18:28
@rawahars rawahars requested review from jterry75 and a team October 10, 2022 18:29

// We would be using GetIfTable2Ex and FreeMibTable Win32 APIs from IP Helper.
moduleIPHelper := windows.NewLazySystemDLL("iphlpapi.dll")
procGetIfTable2Ex := moduleIPHelper.NewProc("GetIfTable2Ex")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we retrieving the entire table and not the specific interface row?
https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2ex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah we have been searching for this particular method all along. Unfortunately, we were not aware about the same.
I'll make the changes to use GetIfEntry2Ex API.

cancel()
return nil, errors.Wrapf(err, "failed to initialise stats task container")
}
ifaceLUID[index] = ifEntry.InterfaceLUID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still dont quite understand. Cant we just use ConvertInterfaceIndexToLuid for this and then be done? We dont need to do an entire mib table and filter it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the thing is that we do not store interface index for the task network adapters. There can be two cases here-

  • We query the interface index first using net.Interfaces and then use ConvertInterfaceIndexToLuid in the stats module itself. This leads to two Win32 calls. Now, net.Interfaces would in turn call GetAdaptersAddresses which fetches all the addresses associated with the adapters. I feel that a single call to GetIfTable2Ex might be more performant than these two calls.
  • We store the ENI's interface index along with Link name during the initial task attachment workflow. Now, the issue here is that if the interface index becomes stale for some reason say, instance reboot, we would not have any mechanism to identify or refresh the same.

@@ -87,6 +86,7 @@ func newTestWatcher(ctx context.Context, primaryMAC string, state dockerstate.Ta
notificationChannel := make(chan int)
err := eniMonitor.Start(notificationChannel)
if err != nil {
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but we dont even use ctx until the &ENIWatcher below .If we did not early alloc it we would avoid this case entirely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically move ln 84 to 92

// GetMIBIfEntryFromDeviceName return the MIB_IF_ROW2 object for the queried device.
// This method would internally invoke the GetIfTable2Ex Windows API to get complete interface list
// from which the queried device would be filtered.
func (utils *networkUtils) GetMIBIfEntryFromDeviceName(device string) (*MibIfRow2, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at my other comment using the LUID. I think we can remove this entirely.

}

// getIfEntry2Ex invokes GetIfEntry2Ex Win32 API to retrieve the MIB_IF_ROW2 for the interface with given LUID.
func getIfEntry2Ex(ifRow *MibIfRow2) (*MibIfRow2, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we add this extra function just for unit testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we did :)

}

// TestGetMIBIfEntryFromLUID tests the GetMIBIfEntryFromLUID method.
func TestGetMIBIfEntryFromLUID(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we went to all the trouble here lets add the negative test if we return error from the call too.

@@ -65,6 +65,7 @@ func newWatcher(ctx context.Context,
notificationChannel := make(chan int)
err := eniMonitor.Start(notificationChannel)
if err != nil {
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, move ln 62 to 72 and then we dont have to handle this case.

devices := make([]string, len(taskENIs))
ifaceLUID := make([]uint64, len(taskENIs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have multiple eni's per task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case with trunk ENI I think.

@@ -108,14 +85,16 @@ func TestTaskStatsCollection(t *testing.T) {
TaskMetadata: &TaskMetadata{
TaskArn: taskId,
ContainerPID: containerPID,
DeviceName: []string{"Ethernet 3"},
DeviceName: []string{deviceName},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using DeviceName? Isnt the ACS api based on mac and then we find index? Once we find Index we can get LUID. And from there we are good to go for basically all win32 api's. Why use name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that is the current workflow which is shared with the Linux. We store the stats corresponding to each device name. Reference:

networkStats, err := taskStat.retrieveNetworkStatistics()

Regarding fetching interface index, I have mentioned the same in earlier comment.

Harsh Rawat added 2 commits October 15, 2022 01:32
Presently, we have setter methods present in the networkutils and watcher package instead of having mock methods for the same. This is inconsistent with rest of the agent package and goes against Go guidelines.
Currently, we are running a Powershell method (Get-NetAdapterStatistics) to retrieve the network statistics in case of awsvpc network mode. However, this can cause CPU spikes on smaller instance types. Therefore, we are switching the method to use native Win32 API for retrieving the same.

As part of this change, we have added two new APIs to existing NetworkUtils interface applicable for Windows.
- ConvertInterfaceAliasToLUID: Returns the LUID corresponding to the given interface alias.
- GetMIBIfEntryFromLUID: Returns the MIB_IF_ROW for the specified LUID.

Windows API reference:
https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-convertinterfacealiastoluid
https://learn.microsoft.com/en-us/windows/win32/api/netioapi/nf-netioapi-getifentry2ex
Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@singholt singholt merged commit 8680732 into aws:dev Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants