Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPAM controller implementation #5

Merged
merged 2 commits into from
May 4, 2023

Conversation

ykulazhenkov
Copy link
Collaborator

Code is not covered by test and is not tested well, but main flows should work.

cmd/ipam-controller/app/options/options.go Outdated Show resolved Hide resolved
}
nodeRange := allocatedRange{
StartIP: startIP,
EndIP: ip.NextIPWithOffset(startIP, int64(pa.cfg.PerNodeBlockSize)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will NextIPWithOffset prevent EndIP to be broadcast IP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also you can create the allocatedRange object after Line 115

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check to avoid allocation of broadcast ip

}
}
if len(startIP) == 0 {
startIP = ip.NextIP(ip.Network(pa.cfg.Subnet).IP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment that this to skip the network ip address

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code path changed a bit to cover additional use-cases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: firs -> first

return nodeAllocationInfo{}, fmt.Errorf("endIP is incorrect ip")
}

if !subnet.Contains(nodeAllocStart) || !subnet.Contains(nodeAllocEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't validate that the gateway is in the subnet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return nodeAllocationInfo{}, fmt.Errorf("invalid allocation allocators: start or end IP is out of the subnet")
}

if ip.Cmp(nodeAllocEnd, nodeAllocStart) <= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this validation. It seem it will fail with the ip.Distance as negative value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but I think it is better to keep the check and return more accurate message

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep this as is.

if err := r.Client.List(ctx, nodeList); err != nil {
return ctrl.Result{}, err
}
for _, n := range nodeList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will trigger the node controller Reconcile to sync again all nodes right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is designed to be able to handle each node object individually, and it is able to handle multiple node objects concurrently.
The Node controller currently runs with a single worker to avoid possible concurrency-related issues. We can increase the count of workers later when we will have good enough test coverage.

pkg/pool/pool.go Show resolved Hide resolved
@moshe010
Copy link
Contributor

moshe010 commented May 3, 2023

The anonnotaion is not as defined in the doc
yours {"my-pool":{"name":"my-pool","subnet":"192.168.0.0/16","startIP":"192.168.0.26","endIP":"192.168.0.50","gateway":"192.168.0.1"}}
and in the doc it without the name key. Why do we need the name key anyway...

@moshe010
Copy link
Contributor

moshe010 commented May 3, 2023

The offset code is adding addition ip to the block
for example for this config

     "pools": {
         "my-pool": {
             "subnet": "192.168.0.0/29",
             "perNodeBlockSize": 2,
             "gateway": "192.168.0.1"
         }

each node get range of 3 ip
node 1: start="192.168.0.1" end="192.168.0.3"
node 2: start="192.168.0.4" end="192.168.0.6"

I0503 22:16:59.739984 628030 allocator.go:117] "allocator/pool=my-pool: range allocated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node01" namespace="" name="worker-node01" reconcileID=684b8ca2-d904-47fa-905c-8f735c12872a node="worker-node01" start="192.168.0.1" end="192.168.0.3"
I0503 22:16:59.766752 628030 node.go:104] "node object updated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node01" namespace="" name="worker-node01" reconcileID=684b8ca2-d904-47fa-905c-8f735c12872a
I0503 22:16:59.767120 628030 allocator.go:117] "allocator/pool=my-pool: range allocated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node02" namespace="" name="worker-node02" reconcileID=0d051c99-8076-48f4-8cc0-be72ce99af8c node="worker-node02" start="192.168.0.4" end="192.168.0.6"

@moshe010
Copy link
Contributor

moshe010 commented May 3, 2023

when the configmap get deleted I see this message [1] in a loop. restart the controller solves it.

[1] -
E0503 22:21:50.096217 628030 configmap.go:58] "failed to get ConfigMap object from the cache" err="ConfigMap "nvidia-k8s-ipam-config" not found" controller="configmap" controllerGroup="" controllerKind="ConfigMap" ConfigMap="default/nvidia-k8s-ipam-config" namespace="default" name="nvidia-k8s-ipam-config" reconcileID=221b2639-634e-45bc-aa01-474d33e84108

@moshe010
Copy link
Contributor

moshe010 commented May 3, 2023

with this config (blockSize is 4 but I mean 5 see +1 shift issue above)
{
"pools": {
"my-pool": {
"subnet": "192.168.0.0/28",
"perNodeBlockSize": 4,
"gateway": "192.168.0.1"
}
}
}

see below [1] "worker-node02" got 192.168.0.15 which is broadcast address (start="192.168.0.11" end="192.168.0.15")

[1]
571eb2f46d8 node="master-node" start="192.168.0.1" end="192.168.0.5"
I0503 22:27:54.316226 630910 node.go:104] "node object updated" controller="node" controllerGroup="" controllerKind="Node" Node="master-node" namespace="" name="master-node" reconcileID=bf39f805-1ae8-48e4-87ac-7571eb2f46d8
I0503 22:27:54.316576 630910 allocator.go:117] "allocator/pool=my-pool: range allocated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node01" namespace="" name="worker-node01" reconcileID=5ee200ff-adb6-41c5-8d0a-a9ad7c7f2824 node="worker-node01" start="192.168.0.6" end="192.168.0.10"
I0503 22:27:54.347169 630910 node.go:104] "node object updated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node01" namespace="" name="worker-node01" reconcileID=5ee200ff-adb6-41c5-8d0a-a9ad7c7f2824
I0503 22:27:54.347544 630910 allocator.go:117] "allocator/pool=my-pool: range allocated" controller="node" controllerGroup="" controllerKind="Node" Node="worker-node02" namespace="" name="worker-node02" reconcileID=f7c6dd88-31f1-4f54-ab6b-1f7b5a8708b9 node="worker-node02" start="192.168.0.11" end="192.168.0.15"

@moshe010
Copy link
Contributor

moshe010 commented May 4, 2023

when the configmap get deleted I see this message [1] in a loop. restart the controller solves it.

[1] - E0503 22:21:50.096217 628030 configmap.go:58] "failed to get ConfigMap object from the cache" err="ConfigMap "nvidia-k8s-ipam-config" not found" controller="configmap" controllerGroup="" controllerKind="ConfigMap" ConfigMap="default/nvidia-k8s-ipam-config" namespace="default" name="nvidia-k8s-ipam-config" reconcileID=221b2639-634e-45bc-aa01-474d33e84108

So when creating again the configmap the error stoped which is good. We can just open issue to see how to handle configmap delete as this is not critical

@moshe010
Copy link
Contributor

moshe010 commented May 4, 2023

we should clean annotation from nodes when configmap is invalid.
scenario:

  1. create valid config
  2. annotation on all nodes
  3. delete config
  4. create invalid config
  5. annotation should be removed (current code keep valid configmap annotation)

@adrianchiris
Copy link
Collaborator

adrianchiris commented May 4, 2023

we should clean annotation from nodes when configmap is invalid.

CNI relies on valid configuration for all CNI calls CMD Add and CMD Del

we can keep the "last valid" annotation on node
or we need to write a caching mechanism in CNI to save it from the add call.

we may need the latter eventually, but not sure if we need it now.

@ykulazhenkov
Copy link
Collaborator Author

PR re-based, testing is in progress

@ykulazhenkov ykulazhenkov force-pushed the controller branch 2 times, most recently from 5d56133 to 63c9ff4 Compare May 4, 2023 09:05
pkg/pool/pool.go Outdated Show resolved Hide resolved
pkg/pool/pool.go Outdated Show resolved Hide resolved
pkg/pool/pool.go Outdated Show resolved Hide resolved
pkg/pool/pool.go Outdated Show resolved Hide resolved
pkg/ipam-controller/config/config.go Show resolved Hide resolved
return ctrl.Result{}, err
}

if err := r.Client.Update(ctx, node); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use patch here ? to reduce chance of error ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is not critical. In case of error we will restart the loop for specific node only. But in general maybe it is a good idea to use Patch, I will return back to it when work on test coverage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thx, in the past we got folks complaining on errors in controller, if we can avoid them its better.
(will save time in the long term ...)

@ykulazhenkov
Copy link
Collaborator Author

we should clean annotation from nodes when configmap is invalid. scenario:

1. create valid config

2. annotation on all nodes

3. delete config

4. create invalid config

5. annotation should be removed (current code keep valid configmap annotation)

I updated the controller behavior. Now the controller explicitly handles ConfigMap removal - simply waits for ConfigMap recreation.

Current behavior is following: in case if there is no ConfigMap or if it contains invalid configuration then controller will simply keep all existing annotations for nodes and wait for ConfigMap update/creation. I think this is the safest behavior but we can change it later.

StartIP: startIP,
EndIP: ip.NextIPWithOffset(startIP, int64(pa.cfg.PerNodeBlockSize)-1),
}
if !pa.cfg.Subnet.Contains(nodeRange.EndIP) || ip.IsBroadcast(nodeRange.EndIP, pa.cfg.Subnet) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add check that start IP is in subnet ?

far fetched scenario where startIP has wrap around to 0 and EndIP is somehow within the subnet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it looks like the code may fail if the next IP become an invalid address. WIll address this in a separate PR

return err
}
// check if r.StartIP or r.EndIP match start or end IP of the "a"
if r.StartIP.Equal(a.StartIP) || r.StartIP.Equal(a.EndIP) ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: allocatedRange can have a method "Overlaps(other *allocatedRange) bool"

}
// check if r.StartIP or r.EndIP match start or end IP of the "a"
if r.StartIP.Equal(a.StartIP) || r.StartIP.Equal(a.EndIP) ||
r.EndIP.Equal(a.StartIP) || r.EndIP.Equal(a.EndIP) {
Copy link
Collaborator

@adrianchiris adrianchiris May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since block size is the same, then an overlap means the blocks are necessarily equal.
is there a case that this is not correct ?

if so, its enough to just compare StartIP which can technically act as an absolute "block index" in the subnet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I changed this logic. Now we check that StartIP has correct offset and simply validate that there is no duplicates registered.

@ykulazhenkov ykulazhenkov force-pushed the controller branch 2 times, most recently from b236eae to b7c00ff Compare May 4, 2023 12:45
@ykulazhenkov ykulazhenkov changed the title [WIP] IPAM controller implementation IPAM controller implementation May 4, 2023
}
allocations := pa.getAllocationsAsSlice()
for _, a := range allocations {
if allocData.allocatedRange.StartIP.Equal(a.StartIP) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on why this is enough ?

to the naive reader it many not be apparent.

float64(pa.cfg.PerNodeBlockSize)) != 0 {
return fmt.Errorf("invalid start IP offset")
}
if ip.Distance(allocData.StartIP, allocData.EndIP) != int64(pa.cfg.PerNodeBlockSize)-1 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like adding a Wrapper of ip.IPsInRange(start, end) which will just add one to distance result
will save adding -1 in different places. (which later on folks may forget to add )

if !allocData.Gateway.Equal(pa.cfg.Gateway) {
return fmt.Errorf("gateway mismatch")
}
if math.Mod(float64(ip.Distance(ip.Network(pa.cfg.Subnet).IP, allocData.StartIP))-1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 is because we skip the network address ? (e.g for 192.168.0.0/16 , 192.168.0.0 is not allocated)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple example:

network: 0.0.0.0
block size 10

StartIPs wil be: 1, 11, 21, 31, 41 ...
Distance per StartIP from Network IP is same
subsracting 1 will yield 0 in mod 10 (block size)

so it seems to work :).

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of comments.
mainly nits so up to u if you want to address them :)

feel free to self merge.

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
This package is shared by components

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
@ykulazhenkov ykulazhenkov merged commit 4c5bda4 into Mellanox:main May 4, 2023
@ykulazhenkov ykulazhenkov deleted the controller branch May 8, 2023 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants