Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Calico-Felix for Windows #1638

Merged
merged 8 commits into from
Dec 7, 2017
Merged

Conversation

nwoodmsft
Copy link
Contributor

@nwoodmsft nwoodmsft commented Nov 30, 2017

Description

This change enables Calico-Felix to be compiled and run on Windows (policy-only). There are three main work items being covered in this change:

  • Makefile is updated to support cross-compiling a Calico-Felix binary for Windows (bin/calico-felix.exe)
  • To enable compilation to succeed for Windows, there is some minor refactoring of functions/blocks (mainly from felix.go) into seperate source files with build constraints.
  • A Windows dataplane driver has been added. This provides the ability to program policies into the Windows dataplane via the Host Network Service (HNS).

Todos

  • Unit tests (full coverage)
  • Integration tests (delete as appropriate) In plan
  • Documentation (Windows deployment guidance is in-review)
  • Backport
  • Release note

Release Note

Enabled Calico-Felix to be compiled and run on Windows (policy-only)

@CLAassistant
Copy link

CLAassistant commented Nov 30, 2017

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nwoodmsft. The code generally looks in good shape and it's easy to follow. Most of the issues I spotted were to do with changes that have gone into master since you took your fork:

  • we now have a named port IP set type; for this PR, I propose that you add that to your list of match types to log and skip
  • we added health check support to Felix so it'd be good to plumb that through into the windows driver

I think I spotted a bug in the rule conversion when there are multiple IP sets to match.

The rest are suggested cleanups; I think it should be pretty trivial to avoid duplicating code in logutils for example. I'd also like to get to the bottom of the set.go issue but that doesn't need to block the merge if it's unreasonably tricky.

I wonder if we should update the README to mention the Windows support and make sure people understand what is and isn't supported.

case *proto.WorkloadEndpointRemove:
log.WithField("workloadEndpointId", msg.Id).Info("Processing WorkloadEndpointRemove")
m.pendingWlEpUpdates[*msg.Id] = nil
case *proto.IPSetDeltaUpdate:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need to process IPSetUpdate here too. I think it might be possible in some corner cases for the calc graph to send an IPSetUpdate then a second IPSetUpdate rather than sending deltas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I hadn't seen a case (in testing) where updates came through as a IPSetUpdate instead of a IPSetDeltaUpdate, so I thought it was safe to have endpoint manager just react to the Deltas. If there are cases where IPSetUpdate might be generated instead, then we definitely need to handle this here. Added.

// endpoint id for a given endpoint ip address.
addressToEndpointId map[string]string
// read-write lock to protect access to the addressToEndpointId map during cache updates.
endpointLock sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this lock, unless I missed somethingOnUpdate and CompleteDeferredWork are both called from the same goroutine (the dataplane driver's main loop).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will remove. I think there was a case in an earlier iteration of the code where we thought we needed this to be on the safe side, but it doesn't look like it's needed now.

activePolicyNames = append(activePolicyNames, workload.Tiers[0].EgressPolicies...)
} else {
if len(workload.ProfileIds) > 0 {
activePolicyNames = workload.ProfileIds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should namespace policy/profile names; you could get a profile and a policy with the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, i'll take a look.

var rules []*hns.ACLPolicy
if (len(policyNames) > 0) {
rules = m.policysetsDataplane.GetPolicySetRules(policyNames)
for _, rule := range rules {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there could be many rules here, might be worth wrapping this in a conditional: if log.GetLevel() >= log.DebugLevel {...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

switch msg := msg.(type) {
case *proto.ActivePolicyUpdate:
log.WithField("policyID", msg.Id).Info("Processing ActivePolicyUpdate")
m.policysetsDataplane.AddOrReplacePolicySet(msg.Id.Name, msg.Policy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you may need to namespace the profiles/policies by type to avoid clashes.

log.Debug("Reschedule kick received")
d.dataplaneNeedsSync = true
d.reschedC = nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you port across the health reporting function. Think we added that after you took your fork. Should just be a matter of starting an extra ticker and calling HealthAggregator.Report(...). Felix now has a health endpoint for Kubernetes to poll.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1,3 +1,5 @@
// +build !windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you pull out the shared function into a shared file? I think that'd be pretty easy to do for this module.

I think you could:

  • pull out a function configureSyslog that is implemented on Linux, but stubbed on Windows
  • pull out a function openLogFile that is implemented differently on each
  • share everything else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will take a look

@@ -59,3 +59,5 @@ import:
version: 2e7ea655c10e4d4d73365f0f073b81b39cb08ee1
subpackages:
- net
- package: github.com/Microsoft/go-winio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you pin these to a version or a specific revision? We've found that glide can resolve floating dependencies differently depending on the state of its cache configuration so we've taken to pinning all our immediate dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

if len(pRule.NotSrcIpSetIds) > 0 || len(pRule.NotDstIpSetIds) > 0 {
return true
}
if pRule.NotProtocol != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master has some new negated matches to consider for named ports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for now I've added NotSrcNamedPortIpSetIds and NotDstNamedPortIpSetIds to this list.


// msgStringer wraps an API message to customise how we stringify it. For example, it truncates
// the lists of members in the (potentially very large) IPSetsUpdate messages.
type msgStringer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you move this to a utility package (maybe put it in the proto package along with the protobuf definitions?) That'd let us share it between the two implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense to share this code. Done.

@fasaxc
Copy link
Member

fasaxc commented Dec 4, 2017

@nwoodmsft Please can you resolve conflicts? To resolve the glide.lock, it should just be a matter of merging glide.yaml and then re-running make update-vendor.

Resolving the glide conflict should fix the Typha pin issue picked up by Semaphore.

Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fasaxc fasaxc merged commit 661afe2 into projectcalico:master Dec 7, 2017
@fasaxc fasaxc added this to the Calico v3.0.0 milestone Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants