Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that subnets are dissociated from routers in the ManageIQ inventory when their interfaces are removed on the OSP side #182

Merged

Conversation

mansam
Copy link
Contributor

@mansam mansam commented Dec 18, 2017

Defaults the network_router field on subnets to nil during refresh. If the interface linking subnet to router still exists on the OSP side, the association will be restored while updating ports.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1449260

# Always default this to nil so that if a interface that connects this
# to a router are removed, then the association is properly broken when
# the subnet is refreshed.
subnet.network_router = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quick question @mansam, where is the code which connects subnet to a network router?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this might not be the best. I think that targeted refresh would be then nullifying the relation (unless it always refreshes together with the interface)

Best would be to just define dependent => :nullify on the relation? That way if router is deleted, the relation disappears. But we probably can't do that on the network_port? Hm we could do custom disconnect method that would nullify it when port is being deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we could do dependent => :nullify since the port is the thing that gets deleted, but the relationship that has to change is the subnet -> router relationship. I looked into using a custom destroy hook, but save_inventory_multi uses delete so the destroy hook never gets called.

Copy link
Contributor

@Ladas Ladas Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, save_inventory_multi is for old refresh (that doesn't have targeted refresh)

for having custom delete method you need to specify it like https://github.com/Ladas/manageiq/blob/0815e847bd63282129eee90869bf9bf321c0224d/app/models/manager_refresh/inventory_collection_default.rb#L7

then it expects the instance method on the model, https://github.com/Ladas/manageiq/blob/2a1d3e3a5941a74aafe65663547047b255031ba1/app/models/vm_or_template.rb#L648

it "removes subnet link to router if the interface is deleted" do
setup_mocked_collector

EmsRefresh.refresh(@ems)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will also need targeted refresh spec, checking refresh of the subnet is not nullifying the relation

@mansam mansam force-pushed the disconnect-subnets-from-routers-during-refresh branch from 4f7d63d to 3b2706e Compare January 9, 2018 16:21
def disconnect_port
# Some ports link subnets to routers, so
# sever that association if the port is removed
cloud_subnets.each do |subnet|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might be the wrong subnets? I think we fill these in only for Vm's network ports?

We should look at belongs_to :device, :polymorphic => true , that relations for subnets is being fixed in https://github.com/ManageIQ/manageiq-providers-openstack/pull/188/files (could you verify it actually returns subnet here?)

Also we should check if we can conventionalize this by subnet.network_router = nil, only if the router is the router mentioned in the NetowrkPort (we might not store this). To avoid nullifying when the subnet was just connected somewhere else.

Hm, this approach is getting more complex than it should.

Copy link
Contributor

@Ladas Ladas Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should even remodel the relation, to not use subnet.network_router, but to just use the network_port and sending the query through it (i wound need to check how this affects other providers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The router gets returned out of find_device_object correctly there. Both parser.network_ports and find_device_object use the network_port.fixed_ips to associate subnets so I was thinking those were the correct subnets. The tests pass correctly, at least. (though perhaps this catches unrelated subnets? I don't have a good mental model of all the circumstances in which subnets get attached to ports and routers.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right so there is NetworkPort which is interface of a VM (Vm is the device) and there is a NetworkPort that is interface of router. I think the network_port.cloud_subnets works only for Vm's interfaces.

Copy link
Contributor Author

@mansam mansam Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear to work in my test case for this PR, so that must not be the case that it only works on ports that are for VMs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, the spec looks correct. So it must work :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though a spec case where this might break is that the targeted refresh contains both removing of old interface and adding new one. That might cause CloudSubnet.first.network_router_id to be nil, while it should point to a new router. Could you add this spec scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll write another spec.

Copy link
Contributor Author

@mansam mansam Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a spec for the situation where the port is removed and a new one is created in the same refresh. network_router_id becomes nil as you suspected, but I don't understand why exactly.

I inspected what was happening when the custom delete is called, and it looks like at that time the replacement NetworkPort doesn't exist yet. So if the old NetworkPort is the only one in the database at the time, wouldn't the new NetworkPort have to be created afterwards and set the id again?


it "targeted refresh shouldn't remove subnet link to router unless the interface is deleted" do
setup_mocked_collector
::Settings.ems_refresh.openstack_network.allow_targeted_refresh = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, the ::Settings.ems_refresh.openstack_network.allow_targeted_refresh = false might bite us, since specs run in parallel in CI. So we should use the stub that is isolated to 1 spec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct assignment does work, but you're right that there's risk of cross contamination since the tests are in parallel. I'll adjust to use the settings stub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, probably do another cleanup PR, since it's in all specs

Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Nice specs

@mansam mansam force-pushed the disconnect-subnets-from-routers-during-refresh branch from 2303d7a to 6755c7f Compare January 9, 2018 21:48
@mansam
Copy link
Contributor Author

mansam commented Jan 9, 2018

This comment got buried by github after I touched the code the review was on, so I am repeating it here:

@Ladas I wrote a spec for the situation where the port is removed and a new one is created in the same refresh. network_router_id becomes nil as you suspected, but I don't understand why exactly.

I inspected what was happening when the custom delete is called, and it looks like at that time the replacement NetworkPort doesn't exist yet. So if the old NetworkPort is the only one in the database at the time, wouldn't the new NetworkPort have to be created afterwards and set the id again?

The timing is confusing to me, and if this isn't possible to solve with a custom delete method then I'm pretty stumped on how to fix this bug.

@ManageIQ ManageIQ deleted a comment from miq-bot Jan 9, 2018
@Ladas
Copy link
Contributor

Ladas commented Jan 9, 2018

@mansam right, so this is because of the ordering in the graph. The subnet table is processed first and is storing the new relation to new router. After that, the network_ports are processed, where part of it we delete the network port and touch a subnet that was already saved correctly.

# Some ports link subnets to routers, so
# sever that association if the port is removed
cloud_subnets.each do |subnet|
subnet.network_router = nil
Copy link
Contributor

@Ladas Ladas Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to correctly cover the latest spec. We would need info on network_port, to what router it relates to (ems_ref). Then here, we could check that if it's still the same router, we can nullify it. If it's a different router we should not touch it.

But it's kind of clunky (and it will not work in parallel). I think in the end, we should look into how to change the modeling. So that 1 model doesn't have to touch relations of other models. So e.g. doing the actual relation through NetworkPort and deleting the subnet.network_router foreign key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subnet gets network_router set on it as part of parsing network ports. I take your comment about timing mean that the whole inventory is collected and parsed, and then subnets are saved before network ports, resulting in the property being cleared?

We can't compare whether the router is the same, because the user could delete and recreate the same router relationship. That would shrink the edge case though. I agree that the appropriate long term solution is to fix the DB modeling, but a fix that touches the DB is probably too high impact for this part of the release cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't compare whether the router is the same, because the user could delete and recreate the same router relationship

@mansam @aufi that sounds like very rare, race condition kind of situation that we can fix with the more elegant solution involving DB Schema change. Would the current simpler fix cover 99.99% of use cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just comparing will probably get to 99% cases being correct. And the 1% will be fixed by next full refresh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also just set the test as pending, the current state is like 95% correct, where the 5% will be fixed by full refresh. Setting/changing router interfaces is a very rare operation, comparing to others, so it should not be an issue.

@mansam mansam force-pushed the disconnect-subnets-from-routers-during-refresh branch from 6755c7f to 4d147e9 Compare January 12, 2018 17:14
@ManageIQ ManageIQ deleted a comment from miq-bot Jan 12, 2018
@mansam mansam force-pushed the disconnect-subnets-from-routers-during-refresh branch from 4d147e9 to 81dbf4e Compare January 12, 2018 17:25
@miq-bot
Copy link
Member

miq-bot commented Jan 12, 2018

Checked commits mansam/manageiq-providers-openstack@22febb8~...81dbf4e with ruby 2.3.3, rubocop 0.52.0, haml-lint 0.20.0, and yamllint 1.10.0
3 files checked, 0 offenses detected
Everything looks fine. 👍

@ManageIQ ManageIQ deleted a comment from miq-bot Jan 12, 2018
@Ladas
Copy link
Contributor

Ladas commented Jan 12, 2018

@mansam the CI failure looks unrelated

@aufi
Copy link
Member

aufi commented Jan 16, 2018

The code looks good to me, I'm afraid of failing tests (re-triggered test and still failing).

@Ladas
Copy link
Contributor

Ladas commented Jan 16, 2018

@aufi the failing tests are being fixed in ManageIQ/manageiq#16824, I think you can merge all PRs, that have only these 3 failures

@aufi
Copy link
Member

aufi commented Jan 16, 2018

@Ladas Ok, thanks for info. LGTM, merging!

@aufi aufi merged commit 5acca17 into ManageIQ:master Jan 16, 2018
@aufi aufi added this to the Sprint 78 Ending Jan 29, 2018 milestone Jan 16, 2018
simaishi pushed a commit that referenced this pull request Jan 16, 2018
…uring-refresh

Ensure that subnets are dissociated from routers in the ManageIQ inventory when their interfaces are removed on the OSP side
(cherry picked from commit 5acca17)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1535155
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit fcb6a7bb6297e9893fab361aa7d243577ebdf320
Author: Marek Aufart <aufi.cz@gmail.com>
Date:   Tue Jan 16 17:04:38 2018 +0100

    Merge pull request #182 from mansam/disconnect-subnets-from-routers-during-refresh
    
    Ensure that subnets are dissociated from routers in the ManageIQ inventory when their interfaces are removed on the OSP side
    (cherry picked from commit 5acca17ce02c444fb77812b777ee74c92d7b4de2)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1535155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants