Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check agent log for errors; install test libraries #2787

Merged
merged 6 commits into from
Mar 22, 2023

Conversation

narrieta
Copy link
Member

Added a check for errors in the agent's log after all test suites have been executed.

Installed the test agent under ~/bin/azurelinuxagent; now tests can use the agent as a library (e.g. query the goal state, get the distro info, use systemd utilities, etc). Added that location to PYTHONPATH.

self._log.info(
"Test suite parameters: [test_suites: %s] [skip_setup: %s] [collect_logs: %s]",
[t.name for t in self.context.test_suites], self.context.skip_setup, self.context.collect_logs) # pylint: disable=E1133
"Runbook variables:\n%s",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several variables were added and not logged; now i'm logging the entire set

@@ -265,23 +269,27 @@ def _setup_node(self) -> None:
self._log.info("Resource Group: %s", self.context.vm.resource_group)
self._log.info("")

self._install_tools_on_node()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged _install_agent_on_node and _install_tools_on_node into _setup_node and added code to install the test libraries and agent

@@ -72,6 +72,14 @@ else
fi
echo "Service name: $service_name"

#
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a race condition between get-agent-path and the agent service restart a few lines below (get-agent-path uses the pid file of the agent in its logic) so I moved the initialization here, before we stop and restart the service

#
echo "Verifying agent installation..."

PYTHONPATH=$(get-agent-pythonpath)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is done in ~/bin/agent-dev so that all tests can benefit from this.

Also, now we set the path to the test agent instead of the agent installed in the image. The test agent is the current code, while the installed agent varies wildly across all the distros we test, which makes using considerably more difficult.

# 2023-03-15T20:47:56.688981Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup
#
{
'message': r"\[CGW\]\s*(The (CPU|memory) cgroup controller is not mounted)|(The agent's process is not within a (CPU|memory) cgroup)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following up an issue with this rule. I'll post an update to this PR once I fix it

#
ignore_rules = [
#
# NOTE: This list was taken from the older agent tests and needs to be cleaned up. Feel free to un-comment rules as new tests are added.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this

@@ -52,10 +57,19 @@ def generate_ssh_key(private_key_file: Path):
def get_architecture(self):
return self.run_command("uname -m").rstrip()

def copy(self, source: Path, target: Path, remote_source: bool = False, remote_target: bool = False, recursive: bool = False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface was a little hard to use, so I split this method into copy_to_node and copy_from_node

# the test modules we copied to ~/bin.
#
#
echo "Creating ~/bin/agent-env to set PATH and PYTHONPATH"
Copy link
Member Author

@narrieta narrieta Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to its own file (agent-env) instead of adding it directly to .bash_profile. It turns out that in some distros (e.g. Mariner) .bash_profile references undefined variables, and that was adding some really weird error messages to our logs.

I'm still adding agent-env to .bash_profile to help with interactive debugging sessions. If you ssh to a test machine, just keep in mind PATH and PYTHONPATH are setup to the same values as in the test runs (e.g. 'python3' will invoke Pypy, instead of the Python installed on the image)

If you need to use the Python installed in the image, you can find it with the which command:

$ which -a python3
/home/nam/bin/python3
/usr/bin/python3
/bin/python3
$ 

# Copy the test tools
tools_path = self.context.test_source_directory/"orchestrator"/"scripts"
tools_target_path = Path("~/bin")
self._log.info("Copying %s to %s:%s", Path("~/bin/pypy3.7.tar.bz2"), self.context.node.name, tools_target_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we copy tools, can we change respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed

record.text = dictionary["text"]
record.when = dictionary["when"]
record.level = dictionary["level"]
record.level = dictionary["level"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed

maddieford
maddieford previously approved these changes Mar 20, 2023
@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Merging #2787 (5099203) into develop (e9b51d7) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #2787   +/-   ##
========================================
  Coverage    71.99%   71.99%           
========================================
  Files          104      104           
  Lines        15857    15857           
  Branches      2273     2273           
========================================
  Hits         11417    11417           
  Misses        3915     3915           
  Partials       525      525           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@narrieta narrieta merged commit 1640510 into Azure:develop Mar 22, 2023
@narrieta narrieta deleted the agent-log branch March 22, 2023 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants