Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redid handling of link down to ensure tear down of old path #583

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Ktmi
Copy link

@Ktmi Ktmi commented Dec 4, 2024

Closes #517, #575

Summary

Replaces the procedure for handling of link down events to ensure that the old_path is torn down before fully committing changes to the database.

Local Tests

This still needs some tweaking, but the general shape of the solution is here. Will have to develop a test harness which can replicate the old issue reliably.

E2E Tests

Here's the latest E2E test report:

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/INFO/g /etc/kytos/logging.ini
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 -m pytest tests/ --reruns 2 -r fEr
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
kytos-1  | rootdir: /tests
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, anyio-4.3.0
kytos-1  | collected 279 items
kytos-1  | 
kytos-1  | tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
kytos-1  | tests/test_e2e_05_topology.py ..................                         [  7%]
kytos-1  | tests/test_e2e_06_topology.py ....                                       [  8%]
kytos-1  | tests/test_e2e_10_mef_eline.py ..........ss.....x.........RRF........... [ 22%]
kytos-1  | .RRFRRF                                                                  [ 23%]
kytos-1  | tests/test_e2e_11_mef_eline.py ......                                    [ 25%]
kytos-1  | tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 28%]
kytos-1  | tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 43%]
kytos-1  | .                                                                        [ 43%]
kytos-1  | tests/test_e2e_14_mef_eline.py xRRFRRFRRF                                [ 45%]
kytos-1  | tests/test_e2e_15_mef_eline.py .....                                     [ 46%]
kytos-1  | tests/test_e2e_16_mef_eline.py ..                                        [ 47%]
kytos-1  | tests/test_e2e_17_mef_eline.py ...                                       [ 48%]
kytos-1  | tests/test_e2e_20_flow_manager.py ..........................             [ 58%]
kytos-1  | tests/test_e2e_21_flow_manager.py ...                                    [ 59%]
kytos-1  | tests/test_e2e_22_flow_manager.py ...............                        [ 64%]
kytos-1  | tests/test_e2e_23_flow_manager.py ..............                         [ 69%]
kytos-1  | tests/test_e2e_30_of_lldp.py .R...                                       [ 70%]
kytos-1  | tests/test_e2e_31_of_lldp.py ...                                         [ 72%]
kytos-1  | tests/test_e2e_32_of_lldp.py ...                                         [ 73%]
kytos-1  | tests/test_e2e_40_sdntrace.py ................                           [ 78%]
kytos-1  | tests/test_e2e_41_kytos_auth.py ........                                 [ 81%]
kytos-1  | tests/test_e2e_42_sdntrace.py ..                                         [ 82%]
kytos-1  | tests/test_e2e_50_maintenance.py ............................            [ 92%]
kytos-1  | tests/test_e2e_60_of_multi_table.py .....                                [ 94%]
kytos-1  | tests/test_e2e_70_kytos_stats.py ........                                [ 97%]
kytos-1  | tests/test_e2e_80_pathfinder.py ss......                                 [100%]
kytos-1  | 
kytos-1  | =================================== FAILURES ===================================
...
kytos-1  | =========================== rerun test summary info ============================
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_150_current_path_value_given_dynamic_backup_path_and_empty_primary_conditions
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_150_current_path_value_given_dynamic_backup_path_and_empty_primary_conditions
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_300_inter_evc_dynamic_uni_status
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_300_inter_evc_dynamic_uni_status
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_301_inter_evc_static_uni_status
kytos-1  | RERUN test_e2e_10_mef_eline.py::TestE2EMefEline::test_301_inter_evc_static_uni_status
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_010_redeploy_avoid_vlan
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_010_redeploy_avoid_vlan
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_015_redeploy_avoid_primary_path
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_015_redeploy_avoid_primary_path
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_020_redeploy_avoid_vlan_static_path
kytos-1  | RERUN test_e2e_14_mef_eline.py::TestE2EMefEline::test_020_redeploy_avoid_vlan_static_path
kytos-1  | RERUN test_e2e_30_of_lldp.py::TestE2EOfLLDP::test_010_disable_of_lldp
kytos-1  | =========================== short test summary info ============================
kytos-1  | FAILED tests/test_e2e_10_mef_eline.py::TestE2EMefEline::test_150_current_path_value_given_dynamic_backup_path_and_empty_primary_conditions
kytos-1  | FAILED tests/test_e2e_10_mef_eline.py::TestE2EMefEline::test_300_inter_evc_dynamic_uni_status
kytos-1  | FAILED tests/test_e2e_10_mef_eline.py::TestE2EMefEline::test_301_inter_evc_static_uni_status
kytos-1  | FAILED tests/test_e2e_14_mef_eline.py::TestE2EMefEline::test_010_redeploy_avoid_vlan
kytos-1  | FAILED tests/test_e2e_14_mef_eline.py::TestE2EMefEline::test_015_redeploy_avoid_primary_path
kytos-1  | FAILED tests/test_e2e_14_mef_eline.py::TestE2EMefEline::test_020_redeploy_avoid_vlan_static_path
kytos-1  | = 6 failed, 250 passed, 8 skipped, 8 xfailed, 7 xpassed, 1369 warnings, 13 rerun in 11466.55s (3:11:06) =

�[Kkytos-1 exited with code 1

@Ktmi Ktmi requested a review from a team as a code owner December 4, 2024 16:48
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi I read in the description your still working on it and with TODOs in the code. Overall, it's headed in the right direction. Yes, first and foremost we definitely need a local way of reproducing this either deterministically or with a high likelihood to assess with concrete data how it's performing in terms of correctness and failover convergence control plane and data plane switchover flows performance, and e2e test maybe we don't need fully exact one but one that simulates a closely double failure, and other cases like failing both current_path and failover_path when they're not fully disjoint.

Other than that, when you need a final review let me know. I did a partial review with some things that were a bit obvious, other parts I won't assume their status or what happened since I don't know if you'll still address based on what we discussed.

main.py Outdated Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
@Ktmi
Copy link
Author

Ktmi commented Dec 5, 2024

Looking deeper into the code here, the old system wasn't ever making the vlans available when removing the old_path.

@Alopalao Alopalao self-requested a review December 5, 2024 21:01
@viniarck
Copy link
Member

viniarck commented Dec 6, 2024

Looking deeper into the code here, the old system wasn't ever making the vlans available when removing the old_path.

Good finding @Ktmi. Looking on git history up until 2023.2.7, it was using evc.remove_path_flows(evc.old_path), which would send flow mods via a request + make the s vlans available. On 2024.1.+ this regressed when refactored to use events for flow deletions, and indeed making the vlans available were left behind. I've also confirmed on master version this newer issue here:

intf_3 below had a failover_path over it, and intf_4 only the current_path. First, I simulated a link down on failover_path, so it got removed but s_vlan 1 wasn't made available. intf_4 did as expected.

kytos $> intf_3.available_tags
Out[5]: {'vlan': [[2, 3798], [3800, 4095]]}

kytos $> intf_4.available_tags
Out[7]: {'vlan': [[1, 3798], [3800, 4095]]}

On 2023.2.X versions though we still had issues with old_path and how it was accessed, so in certain cases it could leak too, but in newer versions it regressed more.

@Ktmi
Copy link
Author

Ktmi commented Dec 9, 2024

For redeploys, its looking like performing the redeploy process in bulk would be too difficult, however, I can get the undeploying of the EVC done, preparing it for a proper redeploy.

@Ktmi
Copy link
Author

Ktmi commented Dec 13, 2024

Just finished writing some e2e tests for the consistency behaviour. You can find them at kytos-ng/kytos-end-to-end-tests#339. The current version of mef_eline fails all three of those tests. This PR only fails the last one, so I'll be hunting down the cause of that bug.

@Ktmi
Copy link
Author

Ktmi commented Dec 13, 2024

Here are the E2E results with the new tests:

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/INFO/g /etc/kytos/logging.ini
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 -m pytest tests/test_e2e_17_mef_eline.py
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
kytos-1  | rootdir: /tests
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, anyio-4.3.0
kytos-1  | collected 3 items
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py ..F                                       [100%]
kytos-1  | 
kytos-1  | =================================== FAILURES ===================================
kytos-1  | ______________________ TestE2EMefEline.test_003_link_down ______________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_17_mef_eline.TestE2EMefEline object at 0x7f70779f8290>
kytos-1  | 
kytos-1  |     def test_003_link_down(self):
kytos-1  |         """Test link down behaviour on failover_path."""
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s3", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s5", "s6", "down")
kytos-1  |     
kytos-1  |         payload = {
kytos-1  |             "name": "Link Down Test",
kytos-1  |             "uni_a": {"interface_id": "00:00:00:00:00:00:00:01:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "uni_z": {"interface_id": "00:00:00:00:00:00:00:05:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "enabled": True,
kytos-1  |             "primary_constraints": {
kytos-1  |                 "mandatory_metrics": {
kytos-1  |                     "not_ownership": ["forbidden_link"],
kytos-1  |                 },
kytos-1  |             },
kytos-1  |             "dynamic_backup_path": True,
kytos-1  |         }
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |     
kytos-1  |         assert response.status_code == 201, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |         evc_id =  data["circuit_id"]
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert data["failover_path"]
kytos-1  |     
kytos-1  |         # Collect service vlans
kytos-1  |     
kytos-1  |         vlan_allocations = defaultdict[str, list[int]](list)
kytos-1  |     
kytos-1  |         for link in data["failover_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |         # Close a link that the failover path depends on
kytos-1  |     
kytos-1  |         link = data["failover_path"][1]
kytos-1  |         if link["id"] == LinkID("00:00:00:00:00:00:00:02:3", "00:00:00:00:00:00:00:03:2"):
kytos-1  |             self.net.net.configLinkStatus("s2", "s3", "down")
kytos-1  |         else:
kytos-1  |             self.net.net.configLinkStatus("s2", "s6", "down")
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         # EVC should be enabled but not active
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["enabled"]
kytos-1  |         assert data["active"]
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert not data["failover_path"]
kytos-1  |     
kytos-1  |         # Check that all the s_vlans have been freed
kytos-1  |     
kytos-1  |         api_url = f"{KYTOS_API}/topology/v3/interfaces/tag_ranges"
kytos-1  |     
kytos-1  |         response = requests.get(api_url)
kytos-1  |     
kytos-1  |         assert response.ok, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         for interface, reserved_tags in vlan_allocations.items():
kytos-1  |             available_tags = data[interface]["available_tags"]
kytos-1  |             for reserved_tag in reserved_tags:
kytos-1  | >               assert any(
kytos-1  |                     reserved_tag["value"] >= range_start and reserved_tag["value"] <= range_end
kytos-1  |                     for (range_start, range_end) in available_tags[reserved_tag["tag_type"]]
kytos-1  |                 ), f"Vlan tag {reserved_tag} on interface {interface}, not released. Available tags: {available_tags}"
kytos-1  | E               AssertionError: Vlan tag {'tag_type': 'vlan', 'value': 2} on interface 00:00:00:00:00:00:00:01:2, not released. Available tags: {'vlan': [[3, 3798], [3800, 4095]]}
kytos-1  | E               assert False
kytos-1  | E                +  where False = any(<generator object TestE2EMefEline.test_003_link_down.<locals>.<genexpr> at 0x7f70779297e0>)
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py:394: AssertionError
kytos-1  | =============================== warnings summary ===============================
kytos-1  | test_e2e_17_mef_eline.py: 37 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1121: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     return ( StrictVersion( cls.OVSVersion ) <
kytos-1  | 
kytos-1  | test_e2e_17_mef_eline.py: 37 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1122: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     StrictVersion( '1.10' ) )
kytos-1  | 
kytos-1  | -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
kytos-1  | ------------------------------- start/stop times -------------------------------
kytos-1  | test_e2e_17_mef_eline.py::TestE2EMefEline::test_003_link_down: 2024-12-13,17:08:45.783771 - 2024-12-13,17:09:05.906983
kytos-1  | =========================== short test summary info ============================
kytos-1  | FAILED tests/test_e2e_17_mef_eline.py::TestE2EMefEline::test_003_link_down - ...
kytos-1  | ============= 1 failed, 2 passed, 74 warnings in 155.08s (0:02:35) =============

@Ktmi Ktmi force-pushed the fix/cleanup_old_path branch 2 times, most recently from 9cdd0d9 to ef0596e Compare December 18, 2024 20:35
@Ktmi
Copy link
Author

Ktmi commented Dec 18, 2024

@viniarck This should be ready for a final review. Two things of note: one of the tests still needs a rewrite, as we added additionally functionality to link down handler that existing tests don't implement mocks for. Additionally, using the http endpoint for sending the flow mods didn't work out. E2E tests started to fail with that, so I reverted that change.

Ktmi added 2 commits December 19, 2024 11:35
 - Ensure that entire operation is atomic, or as close to as reasonably possible
 - Ensure that the state of the EVC is properly progressed in link down scenarios
 - Ensure that vlans are cleared for removed paths.
@Ktmi Ktmi force-pushed the fix/cleanup_old_path branch from ef0596e to 1f46e55 Compare December 19, 2024 16:41
@Ktmi
Copy link
Author

Ktmi commented Dec 19, 2024

Performed a rebase to make it easier to cherry pick commits.

@Alopalao
Copy link

LGTM, I did some testing to see if the EVC locks are hold and seems to be working as intended. Great reworking of old_path management.

@italovalcy
Copy link

Hi David, I tested this PR using some of the scenarios we saw recently in production and all worked as expected. Very nice done! LGTM!

@Ktmi Ktmi mentioned this pull request Jan 13, 2025
@viniarck viniarck self-requested a review January 21, 2025 14:47
Copy link
Member

@viniarck viniarck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix and overall improvement compared to what we used to have. Also, neat use of multi lock with nested ExitStacks.

It's almost there, I'm requesting changes since there's a major KytosEvent breaking change on telemetry_int item 6) below, and the thread safety issue on item 3) is a bit concerning too, although it'd be very unlikely to happen, but it can happen, there's a risk involved that we need to make a decision. Also, unit tests needs to be covered.

@@ -1899,6 +1899,9 @@ def test_handle_link_up(self):
assert evc_mock.handle_link_up.call_count == 2
evc_mock.handle_link_up.assert_called_with("abc")

@pytest.mark.xfail(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Unit tests: many of the new important conditionals haven't been covered yet, and the xfail marked one shouldn't be marked as xfail and needs to be augmented too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests updated.

main.py Show resolved Hide resolved
content={"evcs": evcs_with_failover + check_failover}
)
if clear_old_path_flows:
self.execute_clear_old_path(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. send_flow_mods_event, which uses the topic handler kytos.flow_manager.flows.install|delete on flow_manager doesn't guarantee execution order and might end up delayed, which might cause issues especially if there are deletions and installations sent closely to each other related to the same flow match. Most of the time, it'll work correctly, but in extreme cases, if the runtime gets overloaded as threads gets scheduled and preempted, there's a chance that a late out of order deletion happens which might or might not cause issues (if the new available path flow overlaps with the deleted one and ended up using a recent released svlan with a same flow match):

Here's a simulated example:

Apply this patch on flow_manager, this is to simulate a very late extremely rare delayed deletion execution (not necessarily it needs 15+ seconds, but this is to make it one case more obvious while being easy to simulate and reproduce)

diff --git a/main.py b/main.py
index d4c0d03..f457bb6 100644
--- a/main.py
+++ b/main.py
@@ -656,6 +656,7 @@ class Main(KytosNApp):
         if event.name.endswith("install"):
             command = "add"
         elif event.name.endswith("delete"):
+            time.sleep(15)
             command = "delete"
         else:
  • I) Create a static EVC with just a primary_path, wait for it to be active and fully deployed
  • II) Shut down one of the NNIs of the primary_path, it'll result the undeploy case deleting the flows via events
  • III) After 1 second or less, bring up again the same interface to trigger a new deployment (the link_up delay timer on topology will help a bit if using the same intf since it already slow things down, but you can also imagine another case with another UNI with a fully dynamic EVC where a new intf going up while this II is concurrently happening)

Here are the logs, notice below that the deletion of step II) happens after the redeployment on III):

2025-01-21 08:44:47,971 - INFO [uvicorn.access] (MainThread) 127.0.0.1:57060 - "DELETE /api/kytos/flow_manager/v2/flows HTTP/1.1" 202
2025-01-21 08:44:47,985 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request command: add, force: False, dpids: ['00:00:00:00:00:00:00:02'], flows[0, 
2]: [{'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12306031610466163777, 'actions': [{'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'output', 'port': 3}], 'owner': 'mef_
eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}, {'match': {'in_port': 3, 'dl_vlan': 1}, 'cookie': 12306031610466163777, 'actions': [{'action_type': 'set_vlan', 'vlan_i
d': 1}, {'action_type': 'output', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2025-01-21 08:44:47,986 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request command: add, force: False, dpids: ['00:00:00:00:00:00:00:01'], flows[0, 
2]: [{'match': {'in_port': 1}, 'cookie': 12306031610466163777, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'out
put', 'port': 3}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 3, 'dl_vlan': 1}, 'cookie': 12306031610466163777, 'actions': [{'a
ction_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2025-01-21 08:44:47,986 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Send FlowMod from request command: add, force: False, dpids: ['00:00:00:00:00:00:00:03'], flows[0, 
2]: [{'match': {'in_port': 1}, 'cookie': 12306031610466163777, 'actions': [{'action_type': 'push_vlan', 'tag_type': 's'}, {'action_type': 'set_vlan', 'vlan_id': 1}, {'action_type': 'out
put', 'port': 2}], 'owner': 'mef_eline', 'table_group': 'epl', 'table_id': 0, 'priority': 10000}, {'match': {'in_port': 2, 'dl_vlan': 1}, 'cookie': 12306031610466163777, 'actions': [{'a
ction_type': 'pop_vlan'}, {'action_type': 'output', 'port': 1}], 'owner': 'mef_eline', 'table_group': 'evpl', 'table_id': 0, 'priority': 20000}]
2025-01-21 08:44:47,986 - INFO [kytos.napps.kytos/flow_manager] (AnyIO worker thread) Flows received summary: {switch: 00:00:00:00:00:00:00:02, flows_length: 2}, {switch: 00:00:00:00:00
:00:00:01, flows_length: 2}, {switch: 00:00:00:00:00:00:00:03, flows_length: 2},  total_flows_length: 6
2025-01-21 08:44:48,000 - INFO [uvicorn.access] (MainThread) 127.0.0.1:57068 - "POST /api/kytos/flow_manager/v2/flows_by_switch/?force=False HTTP/1.1" 202
2025-01-21 08:44:48,006 - INFO [kytos.napps.kytos/mef_eline] (thread_pool_app_5) EVC(c7ce8cb0905c41, epl_static) was deployed.

kytos $> 2025-01-21 08:44:53,649 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_4) Send FlowMod from KytosEvent command: delete, force: True, dpids: ['00:00:00:00:00:00:00:01'
], flows[0, 1]: [{'cookie': 12306031610466163777, 'match': {'in_port': 3, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}]
2025-01-21 08:44:53,649 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_4) Flows received summary:  switches:['00:00:00:00:00:00:00:01'], flows_by_switch:1,  total_flows_length
: 1
2025-01-21 08:44:53,652 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_11) Send FlowMod from KytosEvent command: delete, force: True, dpids: ['00:00:00:00:00:00:00:03'], flows
[0, 1]: [{'cookie': 12306031610466163777, 'match': {'in_port': 2, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}]
2025-01-21 08:44:53,652 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_11) Flows received summary:  switches:['00:00:00:00:00:00:00:03'], flows_by_switch:1,  total_flows_lengt
h: 1
2025-01-21 08:44:53,655 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_14) Send FlowMod from KytosEvent command: delete, force: True, dpids: ['00:00:00:00:00:00:00:02'], flows
[0, 2]: [{'cookie': 12306031610466163777, 'match': {'in_port': 2, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}, {'cookie': 12306031610466163777, 'match': {'
in_port': 3, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}]
2025-01-21 08:44:53,655 - INFO [kytos.napps.kytos/flow_manager] (thread_pool_app_14) Flows received summary:  switches:['00:00:00:00:00:00:00:02'], flows_by_switch:2,  total_flows_lengt
h: 2
kytos $> 

Should we make it safer with a blocking request using the bulk flows endpoint that was discussed? Did you guys reassess it and decided not to use it, what's changed in that discussion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying the E2E tests with the bulk flows resulted in several regressions in E2E tests, so I decided to revert that change.

Copy link
Member

@viniarck viniarck Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recall if you had the chance to narrow down what was the underlying cause? If it was a bug on flow_manager that deleted more or less flows than expected? Because that's supposed to be equivalent except in a single request while doing all the ops in the request response cycle. E2e coverage for that endpoint isn't too through though but we have some coverage.

Another pro about it too, is when we allow direct NApp to NApp func calls we can get rid of the request overhead too, assuming no other unexpected bugs that you mentioned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reran the tests. And here is the E2E result log.

kytos-1  | Starting enhanced syslogd: rsyslogd.
kytos-1  | /etc/openvswitch/conf.db does not exist ... (warning).
kytos-1  | Creating empty database /etc/openvswitch/conf.db.
kytos-1  | Starting ovsdb-server.
kytos-1  | rsyslogd: error during config processing: omfile: chown for file '/var/log/syslog' failed: Operation not permitted [v8.2302.0 try https://www.rsyslog.com/e/2207 ]
kytos-1  | Configuring Open vSwitch system IDs.
kytos-1  | Starting ovs-vswitchd.
kytos-1  | Enabling remote OVSDB managers.
kytos-1  | + '[' -z '' ']'
kytos-1  | + '[' -z '' ']'
kytos-1  | + echo 'There is no NAPPS_PATH specified. Default will be used.'
kytos-1  | + NAPPS_PATH=
kytos-1  | + sed -i 's/STATS_INTERVAL = 60/STATS_INTERVAL = 7/g' /var/lib/kytos/napps/kytos/of_core/settings.py
kytos-1  | There is no NAPPS_PATH specified. Default will be used.
kytos-1  | + sed -i 's/CONSISTENCY_MIN_VERDICT_INTERVAL =.*/CONSISTENCY_MIN_VERDICT_INTERVAL = 60/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LINK_UP_TIMER = 10/LINK_UP_TIMER = 1/g' /var/lib/kytos/napps/kytos/topology/settings.py
kytos-1  | + sed -i 's/DEPLOY_EVCS_INTERVAL = 60/DEPLOY_EVCS_INTERVAL = 5/g' /var/lib/kytos/napps/kytos/mef_eline/settings.py
kytos-1  | + sed -i 's/LLDP_LOOP_ACTIONS = \["log"\]/LLDP_LOOP_ACTIONS = \["disable","log"\]/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/LLDP_IGNORED_LOOPS = {}/LLDP_IGNORED_LOOPS = {"00:00:00:00:00:00:00:01": \[\[4, 5\]\]}/' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + sed -i 's/CONSISTENCY_COOKIE_IGNORED_RANGE =.*/CONSISTENCY_COOKIE_IGNORED_RANGE = [(0xdd00000000000000, 0xdd00000000000009)]/g' /var/lib/kytos/napps/kytos/flow_manager/settings.py
kytos-1  | + sed -i 's/LIVENESS_DEAD_MULTIPLIER =.*/LIVENESS_DEAD_MULTIPLIER = 3/g' /var/lib/kytos/napps/kytos/of_lldp/settings.py
kytos-1  | + kytosd --help
kytos-1  | + sed -i s/WARNING/INFO/g /etc/kytos/logging.ini
kytos-1  | + test -z ''
kytos-1  | + TESTS=tests/
kytos-1  | + test -z ''
kytos-1  | + RERUNS=2
kytos-1  | + python3 scripts/wait_for_mongo.py
kytos-1  | Trying to run hello command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Trying to run 'hello' command on MongoDB...
kytos-1  | Ran 'hello' command on MongoDB successfully. It's ready!
kytos-1  | + python3 -m pytest tests/test_e2e_16_mef_eline.py tests/test_e2e_17_mef_eline.py
kytos-1  | ============================= test session starts ==============================
kytos-1  | platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
kytos-1  | rootdir: /tests
kytos-1  | plugins: rerunfailures-13.0, timeout-2.2.0, anyio-4.3.0
kytos-1  | collected 6 items
kytos-1  | 
kytos-1  | tests/test_e2e_16_mef_eline.py .F                                        [ 33%]
kytos-1  | tests/test_e2e_17_mef_eline.py FFFF                                      [100%]
kytos-1  | 
kytos-1  | =================================== FAILURES ===================================
kytos-1  | _________________ TestE2EMefEline.test_002_delete_evc_old_path _________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_16_mef_eline.TestE2EMefEline object at 0x7fc04f200dd0>
kytos-1  | 
kytos-1  |     def test_002_delete_evc_old_path(self):
kytos-1  |         """Test create an EVC then disable one of its failover_path interface"""
kytos-1  |         evc_id = self.create_evc(
kytos-1  |             uni_a="00:00:00:00:00:00:00:01:1",
kytos-1  |             uni_z="00:00:00:00:00:00:00:02:1",
kytos-1  |             vlan_id=100
kytos-1  |         )
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |         assert data["failover_path"]
kytos-1  |         assert (data["failover_path"][0]["endpoint_a"]["id"] ==
kytos-1  |                 "00:00:00:00:00:00:00:01:4")
kytos-1  |         assert (data["failover_path"][0]["endpoint_b"]["id"] ==
kytos-1  |                 "00:00:00:00:00:00:00:03:3")
kytos-1  |         assert (data["failover_path"][1]["endpoint_a"]["id"] ==
kytos-1  |                 "00:00:00:00:00:00:00:02:3")
kytos-1  |         assert (data["failover_path"][1]["endpoint_b"]["id"] ==
kytos-1  |                 "00:00:00:00:00:00:00:03:2")
kytos-1  |     
kytos-1  |     
kytos-1  |         s1, s2, s3 = self.net.net.get('s1', 's2', 's3')
kytos-1  |         flows_s1 = s1.dpctl('dump-flows')
kytos-1  |         flows_s2 = s2.dpctl('dump-flows')
kytos-1  |         flows_s3 = s3.dpctl('dump-flows')
kytos-1  |     
kytos-1  |         assert len(flows_s1.split('\r\n ')) == 6, flows_s1
kytos-1  |         assert len(flows_s2.split('\r\n ')) == 6, flows_s2
kytos-1  |         assert len(flows_s3.split('\r\n ')) == 5, flows_s3
kytos-1  |     
kytos-1  |         url = f"{KYTOS_API}/topology/v3/interfaces/00:00:00:00:00:00:00:03:3/disable"
kytos-1  |         response = requests.post(url, headers={"Content-type": "application/json"})
kytos-1  |         assert response.status_code == 200, response.text
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |         assert not data["failover_path"]
kytos-1  |     
kytos-1  |         flows_s1 = s1.dpctl('dump-flows')
kytos-1  |         flows_s2 = s2.dpctl('dump-flows')
kytos-1  |         flows_s3 = s3.dpctl('dump-flows')
kytos-1  | >       assert len(flows_s1.split('\r\n ')) == 4, flows_s1
kytos-1  | E       AssertionError:  cookie=0xab00000000000001, duration=31.207s, table=0, n_packets=17, n_bytes=714, priority=50000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535
kytos-1  | E          cookie=0xac00000000000001, duration=29.955s, table=0, n_packets=0, n_bytes=0, priority=50000,dl_src=ee:ee:ee:ee:ee:02 actions=CONTROLLER:65535
kytos-1  | E          cookie=0xaac2190c4044e641, duration=20.044s, table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth1",dl_vlan=100 actions=mod_vlan_vid:1,output:"s1-eth3"
kytos-1  | E          cookie=0xaac2190c4044e641, duration=20.043s, table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth3",dl_vlan=1 actions=strip_vlan,output:"s1-eth1"
kytos-1  | E          cookie=0xaac2190c4044e641, duration=19.985s, table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth4",dl_vlan=1 actions=strip_vlan,output:"s1-eth1"
kytos-1  | E         
kytos-1  | E       assert 5 == 4
kytos-1  | E        +  where 5 = len([' cookie=0xab00000000000001, duration=31.207s, table=0, n_packets=17, n_bytes=714, priority=50000,dl_vlan=3799,dl_typ..., table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth4",dl_vlan=1 actions=strip_vlan,output:"s1-eth1"\r\n'])
kytos-1  | E        +    where [' cookie=0xab00000000000001, duration=31.207s, table=0, n_packets=17, n_bytes=714, priority=50000,dl_vlan=3799,dl_typ..., table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth4",dl_vlan=1 actions=strip_vlan,output:"s1-eth1"\r\n'] = <built-in method split of str object at 0x24a8d10>('\r\n ')
kytos-1  | E        +      where <built-in method split of str object at 0x24a8d10> = ' cookie=0xab00000000000001, duration=31.207s, table=0, n_packets=17, n_bytes=714, priority=50000,dl_vlan=3799,dl_type...s, table=0, n_packets=0, n_bytes=0, priority=20000,in_port="s1-eth4",dl_vlan=1 actions=strip_vlan,output:"s1-eth1"\r\n'.split
kytos-1  | 
kytos-1  | tests/test_e2e_16_mef_eline.py:234: AssertionError
kytos-1  | ______________________ TestE2EMefEline.test_001_link_down ______________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_17_mef_eline.TestE2EMefEline object at 0x7fc04f202c90>
kytos-1  | 
kytos-1  |     def test_001_link_down(self):
kytos-1  |         """Test link down behaviour."""
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s5", "s6", "down")
kytos-1  |     
kytos-1  |         payload = {
kytos-1  |             "name": "Link Down Test",
kytos-1  |             "uni_a": {"interface_id": "00:00:00:00:00:00:00:01:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "uni_z": {"interface_id": "00:00:00:00:00:00:00:05:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "enabled": True,
kytos-1  |             "primary_constraints": {
kytos-1  |                 "mandatory_metrics": {
kytos-1  |                     "not_ownership": ["forbidden_link"],
kytos-1  |                 },
kytos-1  |             },
kytos-1  |             "dynamic_backup_path": True,
kytos-1  |         }
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |     
kytos-1  |         assert response.status_code == 201, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |         evc_id =  data["circuit_id"]
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert data["failover_path"]
kytos-1  |     
kytos-1  |         # Collect service vlans
kytos-1  |     
kytos-1  |         vlan_allocations = defaultdict[str, list[int]](list)
kytos-1  |     
kytos-1  |         for link in data["current_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |         for link in data["failover_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |     
kytos-1  |         # Close a link that both the current and failover path depend on
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s2", "down")
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         # EVC should be enabled but not active
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["enabled"]
kytos-1  | >       assert not data["active"]
kytos-1  | E       assert not True
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py:205: AssertionError
kytos-1  | ______________________ TestE2EMefEline.test_002_link_down ______________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_17_mef_eline.TestE2EMefEline object at 0x7fc04f2032d0>
kytos-1  | 
kytos-1  |     def test_002_link_down(self):
kytos-1  |         """Test link down behaviour on current_path."""
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s3", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s5", "s6", "down")
kytos-1  |     
kytos-1  |         payload = {
kytos-1  |             "name": "Link Down Test",
kytos-1  |             "uni_a": {"interface_id": "00:00:00:00:00:00:00:01:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "uni_z": {"interface_id": "00:00:00:00:00:00:00:05:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "enabled": True,
kytos-1  |             "primary_constraints": {
kytos-1  |                 "mandatory_metrics": {
kytos-1  |                     "not_ownership": ["forbidden_link"],
kytos-1  |                 },
kytos-1  |             },
kytos-1  |             "dynamic_backup_path": True,
kytos-1  |         }
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |     
kytos-1  |         assert response.status_code == 201, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |         evc_id =  data["circuit_id"]
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert data["failover_path"]
kytos-1  |     
kytos-1  |         # Collect service vlans
kytos-1  |     
kytos-1  |         vlan_allocations = defaultdict[str, list[int]](list)
kytos-1  |     
kytos-1  |         for link in data["current_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |         # Close a link that the current path depends on
kytos-1  |     
kytos-1  |         link = data["current_path"][1]
kytos-1  |         if link["id"] == LinkID("00:00:00:00:00:00:00:02:3", "00:00:00:00:00:00:00:03:2"):
kytos-1  |             self.net.net.configLinkStatus("s2", "s3", "down")
kytos-1  |         else:
kytos-1  |             self.net.net.configLinkStatus("s2", "s6", "down")
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         # EVC should be enabled but not active
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["enabled"]
kytos-1  |         assert data["active"]
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert not data["failover_path"]
kytos-1  |     
kytos-1  |         # Check that all the s_vlans have been freed
kytos-1  |     
kytos-1  |         api_url = f"{KYTOS_API}/topology/v3/interfaces/tag_ranges"
kytos-1  |     
kytos-1  |         response = requests.get(api_url)
kytos-1  |     
kytos-1  |         assert response.ok, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         for interface, reserved_tags in vlan_allocations.items():
kytos-1  |             available_tags = data[interface]["available_tags"]
kytos-1  |             for reserved_tag in reserved_tags:
kytos-1  | >               assert any(
kytos-1  |                     reserved_tag["value"] >= range_start and reserved_tag["value"] <= range_end
kytos-1  |                     for (range_start, range_end) in available_tags[reserved_tag["tag_type"]]
kytos-1  |                 ), f"Vlan tag {reserved_tag} on interface {interface}, not released. Available tags: {available_tags}"
kytos-1  | E               AssertionError: Vlan tag {'tag_type': 'vlan', 'value': 1} on interface 00:00:00:00:00:00:00:01:2, not released. Available tags: {'vlan': [[3, 3798], [3800, 4095]]}
kytos-1  | E               assert False
kytos-1  | E                +  where False = any(<generator object TestE2EMefEline.test_002_link_down.<locals>.<genexpr> at 0x7fc04e76db60>)
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py:309: AssertionError
kytos-1  | ______________________ TestE2EMefEline.test_003_link_down ______________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_17_mef_eline.TestE2EMefEline object at 0x7fc04f203910>
kytos-1  | 
kytos-1  |     def test_003_link_down(self):
kytos-1  |         """Test link down behaviour on failover_path."""
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s3", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s5", "s6", "down")
kytos-1  |     
kytos-1  |         payload = {
kytos-1  |             "name": "Link Down Test",
kytos-1  |             "uni_a": {"interface_id": "00:00:00:00:00:00:00:01:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "uni_z": {"interface_id": "00:00:00:00:00:00:00:05:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "enabled": True,
kytos-1  |             "primary_constraints": {
kytos-1  |                 "mandatory_metrics": {
kytos-1  |                     "not_ownership": ["forbidden_link"],
kytos-1  |                 },
kytos-1  |             },
kytos-1  |             "dynamic_backup_path": True,
kytos-1  |         }
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |     
kytos-1  |         assert response.status_code == 201, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |         evc_id =  data["circuit_id"]
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert data["failover_path"]
kytos-1  |     
kytos-1  |         # Collect service vlans
kytos-1  |     
kytos-1  |         vlan_allocations = defaultdict[str, list[int]](list)
kytos-1  |     
kytos-1  |         for link in data["failover_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |         # Close a link that the failover path depends on
kytos-1  |     
kytos-1  |         link = data["failover_path"][1]
kytos-1  |         if link["id"] == LinkID("00:00:00:00:00:00:00:02:3", "00:00:00:00:00:00:00:03:2"):
kytos-1  |             self.net.net.configLinkStatus("s2", "s3", "down")
kytos-1  |         else:
kytos-1  |             self.net.net.configLinkStatus("s2", "s6", "down")
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         # EVC should be enabled but not active
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["enabled"]
kytos-1  |         assert data["active"]
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert not data["failover_path"]
kytos-1  |     
kytos-1  |         # Check that all the s_vlans have been freed
kytos-1  |     
kytos-1  |         api_url = f"{KYTOS_API}/topology/v3/interfaces/tag_ranges"
kytos-1  |     
kytos-1  |         response = requests.get(api_url)
kytos-1  |     
kytos-1  |         assert response.ok, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         for interface, reserved_tags in vlan_allocations.items():
kytos-1  |             available_tags = data[interface]["available_tags"]
kytos-1  |             for reserved_tag in reserved_tags:
kytos-1  | >               assert any(
kytos-1  |                     reserved_tag["value"] >= range_start and reserved_tag["value"] <= range_end
kytos-1  |                     for (range_start, range_end) in available_tags[reserved_tag["tag_type"]]
kytos-1  |                 ), f"Vlan tag {reserved_tag} on interface {interface}, not released. Available tags: {available_tags}"
kytos-1  | E               AssertionError: Vlan tag {'tag_type': 'vlan', 'value': 2} on interface 00:00:00:00:00:00:00:01:2, not released. Available tags: {'vlan': [[3, 3798], [3800, 4095]]}
kytos-1  | E               assert False
kytos-1  | E                +  where False = any(<generator object TestE2EMefEline.test_003_link_down.<locals>.<genexpr> at 0x7fc04f1573e0>)
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py:394: AssertionError
kytos-1  | ______________________ TestE2EMefEline.test_004_link_down ______________________
kytos-1  | 
kytos-1  | self = <tests.test_e2e_17_mef_eline.TestE2EMefEline object at 0x7fc04f203f90>
kytos-1  | 
kytos-1  |     def test_004_link_down(self):
kytos-1  |         """Test multiple simultaneous link down behaviour."""
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s1", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s3", "s6", "down")
kytos-1  |         self.net.net.configLinkStatus("s5", "s6", "down")
kytos-1  |     
kytos-1  |         payload = {
kytos-1  |             "name": "Link Down Test",
kytos-1  |             "uni_a": {"interface_id": "00:00:00:00:00:00:00:01:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "uni_z": {"interface_id": "00:00:00:00:00:00:00:05:1", "tag": {"tag_type": "vlan", "value": 100}},
kytos-1  |             "enabled": True,
kytos-1  |             "primary_constraints": {
kytos-1  |                 "mandatory_metrics": {
kytos-1  |                     "not_ownership": ["forbidden_link"],
kytos-1  |                 },
kytos-1  |             },
kytos-1  |             "dynamic_backup_path": True,
kytos-1  |         }
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.post(api_url, json=payload)
kytos-1  |     
kytos-1  |         assert response.status_code == 201, response.text
kytos-1  |     
kytos-1  |         data = response.json()
kytos-1  |         evc_id =  data["circuit_id"]
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["current_path"]
kytos-1  |         assert data["failover_path"]
kytos-1  |     
kytos-1  |         # Collect service vlans
kytos-1  |     
kytos-1  |         vlan_allocations = defaultdict[str, list[int]](list)
kytos-1  |     
kytos-1  |         for link in data["current_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |         for link in data["failover_path"]:
kytos-1  |             s_vlan = link["metadata"]["s_vlan"]
kytos-1  |             for endpoint in (link["endpoint_a"], link["endpoint_b"]):
kytos-1  |                 vlan_allocations[endpoint["id"]].append(s_vlan)
kytos-1  |     
kytos-1  |     
kytos-1  |         # Close a link that both the current and failover path depend on
kytos-1  |     
kytos-1  |         self.net.net.configLinkStatus("s2", "s3", "down")
kytos-1  |         self.net.net.configLinkStatus("s2", "s6", "down")
kytos-1  |     
kytos-1  |         time.sleep(10)
kytos-1  |     
kytos-1  |         # EVC should be enabled but not active
kytos-1  |     
kytos-1  |         api_url = KYTOS_API + "/mef_eline/v2/evc/"
kytos-1  |         response = requests.get(api_url + evc_id)
kytos-1  |         data = response.json()
kytos-1  |     
kytos-1  |         assert data["enabled"]
kytos-1  | >       assert not data["active"]
kytos-1  | E       assert not True
kytos-1  | 
kytos-1  | tests/test_e2e_17_mef_eline.py:464: AssertionError
kytos-1  | =============================== warnings summary ===============================
kytos-1  | test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | test_e2e_17_mef_eline.py: 37 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1121: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     return ( StrictVersion( cls.OVSVersion ) <
kytos-1  | 
kytos-1  | test_e2e_16_mef_eline.py: 17 warnings
kytos-1  | test_e2e_17_mef_eline.py: 37 warnings
kytos-1  |   /usr/lib/python3/dist-packages/mininet/node.py:1122: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
kytos-1  |     StrictVersion( '1.10' ) )
kytos-1  | 
kytos-1  | -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
kytos-1  | ------------------------------- start/stop times -------------------------------
kytos-1  | test_e2e_16_mef_eline.py::TestE2EMefEline::test_002_delete_evc_old_path: 2025-01-24,19:25:29.652753 - 2025-01-24,19:25:49.772797
kytos-1  | test_e2e_17_mef_eline.py::TestE2EMefEline::test_001_link_down: 2025-01-24,19:26:40.119954 - 2025-01-24,19:27:00.223267
kytos-1  | test_e2e_17_mef_eline.py::TestE2EMefEline::test_002_link_down: 2025-01-24,19:27:20.015428 - 2025-01-24,19:27:40.127689
kytos-1  | test_e2e_17_mef_eline.py::TestE2EMefEline::test_003_link_down: 2025-01-24,19:27:59.938697 - 2025-01-24,19:28:20.048221
kytos-1  | test_e2e_17_mef_eline.py::TestE2EMefEline::test_004_link_down: 2025-01-24,19:28:39.752939 - 2025-01-24,19:28:59.873844
kytos-1  | =========================== short test summary info ============================
kytos-1  | FAILED tests/test_e2e_16_mef_eline.py::TestE2EMefEline::test_002_delete_evc_old_path
kytos-1  | FAILED tests/test_e2e_17_mef_eline.py::TestE2EMefEline::test_001_link_down - ...
kytos-1  | FAILED tests/test_e2e_17_mef_eline.py::TestE2EMefEline::test_002_link_down - ...
kytos-1  | FAILED tests/test_e2e_17_mef_eline.py::TestE2EMefEline::test_003_link_down - ...
kytos-1  | FAILED tests/test_e2e_17_mef_eline.py::TestE2EMefEline::test_004_link_down - ...
kytos-1  | ============ 5 failed, 1 passed, 108 warnings in 304.45s (0:05:04) =============

�[Kkytos-1 exited with code 1

However, I don't know how to get the logs from kytos itself while running e2e tests. If you can provide some instructions on how to do so, I could provide additional context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here's the patch file that's being applied to mef_eline.

diff --git a/main.py b/main.py
index 0b7a1a3..cd6edf7 100755
--- a/main.py
+++ b/main.py
@@ -34,7 +34,7 @@ from napps.kytos.mef_eline.utils import (aemit_event, check_disabled_component,
                                          emit_event, get_vlan_tags_and_masks,
                                          map_evc_event_content,
                                          merge_flow_dicts, prepare_delete_flow,
-                                         send_flow_mods_event)
+                                         send_flow_mods_http)
 
 
 # pylint: disable=too-many-public-methods
@@ -848,7 +848,7 @@ class Main(KytosNApp):
             self.controller, "failover_link_down",
             content=deepcopy(event_contents)
         )
-        send_flow_mods_event(
+        send_flow_mods_http(
             self.controller,
             install_flows,
             "install"
@@ -877,7 +877,7 @@ class Main(KytosNApp):
         delete_flows
     ):
         """Process changes needed to commit clearing the old path"""
-        send_flow_mods_event(
+        send_flow_mods_http(
             self.controller,
             delete_flows,
             'delete'
@@ -911,7 +911,7 @@ class Main(KytosNApp):
 
     def execute_undeploy(self, evcs: list[EVC], remove_flows):
         """Process changes needed to commit an undeploy"""
-        send_flow_mods_event(
+        send_flow_mods_http(
             self.controller,
             remove_flows,
             'delete'
diff --git a/utils.py b/utils.py
index 8ae4e97..bfe23ee 100644
--- a/utils.py
+++ b/utils.py
@@ -4,8 +4,10 @@ from typing import Union
 from kytos.core.common import EntityStatus
 from kytos.core.events import KytosEvent
 from kytos.core.interface import UNI, Interface, TAGRange
-from napps.kytos.mef_eline.exceptions import DisabledSwitch
+from napps.kytos.mef_eline import settings
+from napps.kytos.mef_eline.exceptions import DisabledSwitch, FlowModException
 
+import httpx
 
 def map_evc_event_content(evc, **kwargs) -> dict:
     """Returns a set of values from evc to be used for content"""
@@ -178,6 +180,35 @@ def send_flow_mods_event(
             },
         )
 
+def send_flow_mods_http(
+    controller,
+    flow_dict: dict[str, list],
+    action: str, force=True
+):
+    """
+    Send a flow_mod list to a specific switch.
+
+    Args:
+        dpid(str): The target of flows (i.e. Switch.id).
+        flow_mods(dict): Python dictionary with flow_mods.
+        command(str): By default is 'flows'. To remove a flow is 'remove'.
+        force(bool): True to send via consistency check in case of errors.
+        by_switch(bool): True to send to 'flows_by_switch' request instead.
+    """
+    endpoint = f"{settings.MANAGER_URL}/flows_by_switch/?force={force}"
+
+    try:
+        if action == "install":
+            res = httpx.post(endpoint, json=flow_dict, timeout=30)
+        elif action == "delete":
+            res = httpx.request(
+                "DELETE", endpoint, json=flow_dict, timeout=30
+            )
+    except httpx.RequestError as err:
+        raise FlowModException(str(err)) from err
+    if res.is_server_error or res.status_code >= 400:
+        raise FlowModException(res.text)
+
 
 def prepare_delete_flow(evc_flows: dict[str, list[dict]]):
     """Create flow mods suited for flow deletion."""

Copy link
Member

@viniarck viniarck Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ktmi, let's get to the bottom of it. Also, by reading the patch and the temporary refactoring, it looks like it's passing an invalid payload to the bulk endpoint {dpid: [flows]} instead of {dpid: {flows: [flows]}}. Can you double check this?

However, I don't know how to get the logs from kytos itself while running e2e tests. If you can provide some instructions on how to do so, I could provide additional context.

The quickest way is to read from kytos.log, you have to add the file handler on the root logging.ini config. Ultimately, you need to correlate the timestamp and see start and end times, that's how we debug e2e tests on both CI and locally typically.

If you're running kytosd in a Docker container (which seems the case), then the easiest route after making sure the file handler is set, is to tail -f /var/log/kytos.log in another terminal window/panel (or add another command in the shell script to copy the file before the container exists).

If you're running e2e without kytosd in a container it's easier since kytos.log won't be removed automatically, so it's easier to read after when the test fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that fixed it. Thanks.

main.py Outdated Show resolved Hide resolved
main.py Show resolved Hide resolved
main.py Outdated Show resolved Hide resolved
event_contents[evc.id]["clear_old_path"] =\
map_evc_event_content(
evc,
removed_flows=deepcopy(del_flows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There was a major breaking change event content regression on kytos/mef_eline.failover_old_path, telemetry_int expects current_path to be set:
2025-01-21 09:17:15,069 - ERROR [kytos.core.helpers] (MainThread) alisten_to handler: <function Main.on_failover_old_path at 0x7f50505ea660>, args: (<Main(telemetry_int, stopped 1399786
46464192)>, KytosEvent('kytos/mef_eline.failover_old_path', {'18966143a60449': {'removed_flows': {'00:00:00:00:00:00:00:01': [{'cookie': 12256711730379752521, 'match': {'in_port': 2, 'd
l_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}], '00:00:00:00:00:00:00:06': [{'cookie': 12256711730379752521, 'match': {'in_port': 5, 'dl_vlan': 1}, 'owner': 'm
ef_eline', 'cookie_mask': 18446744073709551615}], '00:00:00:00:00:00:00:02': [{'cookie': 12256711730379752521, 'match': {'in_port': 2, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask'
: 18446744073709551615}, {'cookie': 12256711730379752521, 'match': {'in_port': 1, 'dl_vlan': 1}, 'owner': 'mef_eline', 'cookie_mask': 18446744073709551615}]}, 'evc_id': '18966143a60449'
, 'id': '18966143a60449', 'name': 'inter_evpl_2222', 'metadata': {'telemetry_request': {}, 'telemetry': {'enabled': True, 'status': 'UP', 'status_reason': [], 'status_updated_at': '2025
-01-21T14:11:40'}}, 'active': True, 'enabled': True, 'uni_a': {'interface_id': '00:00:00:00:00:00:00:01:15', 'tag': {'tag_type': 'vlan', 'value': 2222}}, 'uni_z': {'interface_id': '00:0
0:00:00:00:00:00:06:22', 'tag': {'tag_type': 'vlan', 'value': 2222}}}}, 0)) traceback: Traceback (most recent call last):,   File "/home/viniarck/repos/kytos/kytos/core/helpers.py", lin
e 231, in handler_context,     result = await handler(*args),              ^^^^^^^^^^^^^^^^^^^^,   File "/home/viniarck/repos/napps/napps/kytos/telemetry_int/main.py", line 544, in on_f
ailover_old_path,     await self.int_manager.handle_failover_flows(,   File "/home/viniarck/repos/napps/napps/kytos/telemetry_int/managers/int.py", line 627, in handle_failover_flows,  
   self.flow_builder.build_failover_old_flows(to_remove, old_flows),   File "/home/viniarck/repos/napps/napps/kytos/telemetry_int/managers/flow_builder.py", line 65, in build_failover_o
ld_flows,     for link in evc["current_path"]:,                 ~~~^^^^^^^^^^^^^^^^, KeyError: 'current_path', 

This patch below fix it, make sure to later try it out on INT lab with telemetry_int the failover related events and also the link_down without failover too, since we don't have e2e tests we need to try it out, if you need help with requests payloads let me know (basically you just have to configure the external loops, and when creating the EVC set "metadata": { "telemetry_request": {}}, there's a check box in the UI that does this too):

diff --git a/main.py b/main.py
index 0fa495c..2623421 100755
--- a/main.py
+++ b/main.py
@@ -1008,6 +1008,7 @@ class Main(KytosNApp):
                     event_contents[evc.id]["clear_old_path"] =\
                         map_evc_event_content(
                             evc,
+                            current_path=evc.current_path.as_dict(),
                             removed_flows=deepcopy(del_flows)
                     )
                 elif evc.id in flow_modifications:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back in, will try to test in the int lab later.

main.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants