Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup persistent firewall rules on service restart #2154

Merged
merged 20 commits into from
Feb 5, 2021

Conversation

larohra
Copy link
Contributor

@larohra larohra commented Feb 1, 2021

Description

This PR ensures that the firewall rules for the agent are persisted over system restarts. This PR does the following -

  • Ensures Firewall rules are persisted over restarts by either using firewalld.service or creating a new unit file for doing the same
  • If firewalld is available on the box, prioritize that because it already has provisions for making the rules persist over reboots.
  • If not, create a new custom agent-network-setup.service which would always setup firewall rules on reboots before network of the VM is even established.

Issue #


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • Except for special cases involving multiple contributors, the PR is started from a fork of the main repository, not a branch.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made and Travis.CI is passing.

Quality of Code and Contribution Guidelines

@codecov
Copy link

codecov bot commented Feb 1, 2021

Codecov Report

Merging #2154 (a06c17e) into develop (ac6f5a9) will increase coverage by 0.33%.
The diff coverage is 77.77%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2154      +/-   ##
===========================================
+ Coverage    70.48%   70.82%   +0.33%     
===========================================
  Files           95       96       +1     
  Lines        13104    13367     +263     
  Branches      1862     1876      +14     
===========================================
+ Hits          9237     9467     +230     
- Misses        3453     3487      +34     
+ Partials       414      413       -1     
Impacted Files Coverage Δ
azurelinuxagent/agent.py 54.14% <47.29%> (+5.19%) ⬆️
azurelinuxagent/common/osutil/arch.py 65.62% <66.66%> (-1.05%) ⬇️
azurelinuxagent/common/osutil/clearlinux.py 58.20% <66.66%> (+0.83%) ⬆️
azurelinuxagent/common/osutil/coreos.py 67.39% <66.66%> (-0.06%) ⬇️
azurelinuxagent/common/osutil/freebsd.py 44.09% <66.66%> (+0.03%) ⬆️
azurelinuxagent/common/osutil/iosxe.py 43.90% <66.66%> (+1.79%) ⬆️
azurelinuxagent/common/osutil/mariner.py 54.28% <66.66%> (+2.56%) ⬆️
azurelinuxagent/common/osutil/openbsd.py 23.93% <66.66%> (+0.55%) ⬆️
azurelinuxagent/common/osutil/redhat.py 57.31% <66.66%> (+0.35%) ⬆️
azurelinuxagent/common/osutil/suse.py 63.79% <66.66%> (+0.15%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac6f5a9...a06c17e. Read the comment docs.

setup.py Show resolved Hide resolved
Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments, but LGTM

@larohra larohra requested a review from narrieta February 2, 2021 18:32
# Check if firewall-rules have already been enabled
# This function would also return False if the dest-ip is changed. So no need to check separately for that
try:
AddFirewallRules.check_firewalld_rule_applied(self._dst_ip, self._uid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this check work? raises both if the rules are not enabled or it if fails for some other reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service is working properly, it would fail if the firewall rules for the specified data (IP, UID) are not applied.

The check to see if firewalld.service is enabled/running on the box is different, its done before this check - _is_firewall_service_running()

If the firewalld.service is not working properly for external reasons, then it would fail too but that would also mostly fail the initial check, which would make us go the other route and setup our custom service.

I hope that answers your question, if not we can sync offline about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are logging as verbose, so we won't see the messages under normal runs. seems that we would like to know about some of those, e.g. "If the service is working properly, it would fail if the firewall rules for the specified data (IP, UID) are not applied."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such a case it would still print a generic message like - "Firewall rules not added yet, adding them now using firewalld.service" or "Service: {0} not enabled. Adding it now".

If you want more data then make it verbose. Initially I added the metadata of these commands as info too (stdout/stderr) but that definitely has a potential of being a bit confusing to the untrained eye so figured I'd make it verbose. Do you have a simpler alternative in mind?

Copy link
Member

@narrieta narrieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, but LGTM

@narrieta
Copy link
Member

narrieta commented Feb 5, 2021

There are a couple of INFO messages in the log that look like errors and may be misleading. Can we rephrase them?

>>>>>>>>>>>> 2021-02-05T15:06:39.031001Z INFO ExtHandler ExtHandler firewall-cmd --state command returned error/not running: [Errno 2] No such file or directory: 'firewall-cmd': 'firewall-cmd'**
2021-02-05T15:06:39.031472Z INFO ExtHandler ExtHandler Firewalld service not running/unavailable, trying to set up walinuxagent-network-setup.service
>>>>>>>>>>>2021-02-05T15:06:39.130154Z INFO ExtHandler ExtHandler walinuxagent-network-setup.service not enabled. Command: systemctl is-enabled walinuxagent-network-setup.service, ExitCode: 1
Stdout:
Stderr: Failed to get unit file state for walinuxagent-network-setup.service: No such file or directory

2021-02-05T15:06:39.371742Z INFO ExtHandler ExtHandler Successfully added and enabled the walinuxagent-network-setup.service
2021-02-05T15:06:39.372280Z INFO ExtHandler ExtHandler Drop-in file /lib/systemd/system/walinuxagent-network-setup.service.d/10-environment.conf successfully updated


[Service]
Type=oneshot
Environment="EGG={egg_path}" "DST_IP={wire_ip}" "UID={user_id}" "WAIT={wait}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably not parameterize the uid

Copy link
Contributor Author

@larohra larohra Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly added the UID to make it easier to add more users later on (if ever needed). Can remove it though not a big deal


add_event(
op=WALAEventOperation.PersistFirewallRules,
is_success=cmd_success,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is the status of the journalctl command; we should get the status of the service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the logs of the service (the same logs that you see when we do systemctl status ). It would get the verbose logs of the service but would probably not get the status of it (active, failed, etc). I was initially considering that but then I figured it would be more helpful to get the complete logs than the status. Do you think status makes more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this is an example of how journalctl logs look like -

2021-02-05T06:57:44.194001Z INFO ExtHandler ExtHandler Logs from the walinuxagent-network-setup.service since system boot: -- 
Logs begin at Fri 2021-02-05 06:57:36 UTC, end at Fri 2021-02-05 06:57:44 UTC. --
Feb 05 06:57:38 edpwz5814j python3[441]: Setting up firewall for the WALinux Agent with args: {'wait': '-w', 'uid': '0', 'dst_ip': '168.63.129.16'}
Feb 05 06:57:38 edpwz5814j systemd[1]: Started Setup network rules for WALinuxAgent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the is_success we send in telemetry. Probably we do not care about failures executing journalctl, but we do care if the firewall rules were not created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good point, let me also query the service to see if it was in a failed state or not and send the is-success based on that

@narrieta narrieta merged commit 8087dbf into Azure:develop Feb 5, 2021
@larohra larohra deleted the network-service-file branch February 5, 2021 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants