Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aperf update #65

Merged
merged 10 commits into from
May 15, 2023
Merged

Aperf update #65

merged 10 commits into from
May 15, 2023

Conversation

janaknat
Copy link
Contributor

@janaknat janaknat commented May 5, 2023

Description of changes:

  • Move HTML files from src/bin/html_files/ to src/html_files.
  • Remove the use of HTTP to get the data.
    • We are now using local files which gets produced by aperf.
    • This removes the need to run a local web server.
  • Move from 2 binaries (aperf-collector, aperf-visualizer) to 1 binary (aperf).
    • There are now 2 sub-commands: record, report.
    • aperf record => The same functionality as aperf-collector.
      • The data is collected in the directory as before. By default, a tar.gz of the directory is now produced.
    • aperf report => The same functionality as aperf-visualizer.
      • aperf report can take the directory containing the data or a tar.gz of the directory as produced by aperf record.
      • A report is now generated which contains all the files needed to visualize the data with a web browser.
  • Remove the use of environment variables to control debug information of aperf.
    • Previously, one had to export APERF_LOG_LEVEL=<debug|trace>.
    • Now, use aperf <command> -v for debug information and aperf <command> -vv for trace information.

Attached the output of aperf report.

aperf_report.tar.gz

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This also includes:
* Remove the use of HTTP to get the data.
* The data is now fed from local files which will be generated at run
  time by the report generator.
* Use setTimeout to prevent the main UI thread from locking up when
  rendering the graphs.
All data types are allowed only 2 types of visualization calls. "keys"
and "values".
Previously, aperf had one binary for each part of the process, collector
and visualizer. Move to a single binary 'aperf' with subcommands.

Usage:
* aperf record -r <run-name>
  Records the performance data. Similar to aperf-collector.
* aperf report -r <data location>
  Generates a report based on the data collected. Similar to
  aperf-visualizer.

* aperf record, by default, will collect the data and generate a tar.gz of
  the directory.
* aperf report, by default, will generate static HTML files along with a
  tar.gz of the directory. It will also contain a tar of the raw data.
Copy link
Contributor

@wash-amzn wash-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should not be doing any tarballing or un-tarballing.

README.md Outdated
```
./aperf-visualizer -r <COLLECTOR_DIRECTORY> -p <PORT_NUMBER>
./aperf report -r <COLLECTOR_DIRECTORY> -p <PORT_NUMBER>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port number? ;)

README.md Outdated
@@ -87,20 +82,18 @@ To see a step-by-step example, please see our example [here](./EXAMPLE.md)
`-r, --run-name` run name (name of the run for organization purposes, creates directory of the same name, default of aperf_[timestamp])


`./aperf-visualizer -h`
`./aperf report -h`

**Visualizer Flags:**

`-v, --version` version of APerf visualizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message says -v is verbose, this say it's version info. This also isn't specific to the report subcommand.

Fortunately there's also the -v and -vv non-subcommand-specific options to document in the same way.

src/bin/aperf.rs Outdated
match verbose {
1 => level = LevelFilter::Debug,
2 => level = LevelFilter::Trace,
_ => level = LevelFilter::Info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean -vvv is info? That would be unfortunate.

@@ -180,6 +180,13 @@ macro_rules! processed_data {
)*
}
}
pub fn get_calls(&mut self) -> Result<Vec<String>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to stop thinking of these as calls, since it's not done back to a running HTTP server anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still 'call' the visualizer code.

@@ -33,6 +33,7 @@ impl CollectData for InterruptDataRaw {
self.time = TimeEnum::DateTime(Utc::now());
self.data = String::new();
self.data = std::fs::read_to_string("/proc/interrupts")?;
trace!("{:#?}", self.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know trace is detailed, but this looks excessive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is consistent trace behavior across other data types I'm fine with it, but that doesn't appear to be the case (at least it's not being added to the others as part of this diff).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this is still not appropriate. If I'm wrong, say something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running this branch locally, I see that trace mode is pretty intense, so I guess this fits in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The trace functionality is a last resort. It print EVERYTHING.

src/report.rs Outdated
Comment on lines 33 to 39
pub fn is_dir(dir: String) -> Result<bool> {
let file_type = fs::metadata(dir.clone())?.file_type();
if file_type.is_dir() {
return Ok(true);
}
return Ok(false);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path has a is_dir function for you https://doc.rust-lang.org/std/path/struct.Path.html#method.is_dir

You have to construct a Path anyway so there's no need for this function.

src/report.rs Outdated
Comment on lines 54 to 57
fs::copy(&archive_name, archive_dst)?;

/* Delete temp archive */
fs::remove_file(&archive_name)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just move it instead of copy/delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That's simpler.

src/report.rs Outdated
fs::remove_file(&archive_name)?;
return Ok(());
}
if infer::get_from_path(&dir)?.unwrap().mime_type() == "application/gzip" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable naming leads to confusion. How would a directory (it's location is stored in a variable named dir, afterall) possibly have a mime type of application/gzip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename it to run.

src/report.rs Outdated
form_and_copy_archive(dir.clone())?;
}
/* Generate base HTML, JS files */
let _ico_file = File::create("aperf_report/ico")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is that for? ico? Typo related to favicon.ico?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is for the favicon.ico. Typo. I'll remove the _.

src/report.rs Outdated
}
/* Generate aperf_report.tar.gz */
info!("Generating aperf_report.tar.gz");
let tar_gz = File::create("aperf_report.tar.gz")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way can the output filename always be the same thing. No way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at how perf record always generates perf.data. Something similar. We could append all the runs which were used to generate the report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind copying perf behavior, but you have to do it completely. Both on the collection and on the reporting side, that includes renaming an existing file out of the way (I don't know the full extent of perf's behavior here, I believe it only does a single rename. Note that the renames aren't possible with directories, so that's another complication (having to delete something in the way)).

* Update README to remove references to port number.
* Print more info messages during record.
src/report.rs Outdated
Comment on lines 44 to 57
/* Copy archive to aperf_report */
let archive_dst = format!("aperf_report/data/archive/{}", archive_name);
fs::copy(&archive_name, archive_dst)?;

/* Delete temp archive */
fs::remove_file(&archive_name)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now instead of copy and delete, you're just copying, so leaving the temporary archive around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Say you use aperf to record the performance data. $PWD contains run_data/ and run_data.tar.gz. If you then do aperf report -r run_data/ , then I should not delete run_data.tar.gz that was generated by aperf record.

src/report.rs Outdated
@@ -107,36 +96,50 @@ pub fn report(report: &Report) -> Result<()> {
dir_paths.push(path.to_str().unwrap().to_string());
}

/* Generate report name */
let mut report_name = "aperf_report".to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really should be trying to use Path objects as much as possible and not repeatedly constructing paths as strings

@@ -33,6 +33,7 @@ impl CollectData for InterruptDataRaw {
self.time = TimeEnum::DateTime(Utc::now());
self.data = String::new();
self.data = std::fs::read_to_string("/proc/interrupts")?;
trace!("{:#?}", self.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is consistent trace behavior across other data types I'm fine with it, but that doesn't appear to be the case (at least it's not being added to the others as part of this diff).

info!("To see debug messages export APERF_LOG_LEVEL=[debug|trace]");

let args = Args::parse();
pub fn record(record: &Record) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still very echo-y in here

src/report.rs Outdated
}
/* Generate aperf_report.tar.gz */
info!("Generating aperf_report.tar.gz");
let tar_gz = File::create("aperf_report.tar.gz")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind copying perf behavior, but you have to do it completely. Both on the collection and on the reporting side, that includes renaming an existing file out of the way (I don't know the full extent of perf's behavior here, I believe it only does a single rename. Note that the renames aren't possible with directories, so that's another complication (having to delete something in the way)).

README.md Outdated
`aperf-collector` collects performance data and stores them in a series of files. These files are then viewed using `aperf-visualizer` either on the same machine the performance data was collected on or a remote machine running `aperf-visualizer`.

To visualize the data using `aperf-visualizer` download the directory created by `aperf-collector` and load the data with `aperf-visualizer`.
`aperf record` records performance data and stores them in a series of files. A report is then generated with `aperf report` and viewed in any system with a web browser.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and viewed in any/and can be viewed in any/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah.

Also, move to use PathBuf when forming paths in the report. Updated the
README and EXAMPLE with the new option.

Example:
aperf report -r aarch64_run -r x86_run -n compare_runs

This generates compare_runs/ with the report and a compare_runs.tar.gz.

If a report name is not given, aperf defaults to concatinating the run
names to form a report name.

aperf report -r aarch64_run -r x86_run

This generates aperf_report_aarch64_run_x86_run/ and a
aperf_report_aarch64_run_x86_run.tar.gz
EXAMPLE.md Outdated
```

The APerf recorder also produces a tarball of the run data. To generate a report with the tarball generated by the recorder us the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/us/use/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

EXAMPLE.md Outdated
The APerf recorder also produces a tarball of the run data. To generate a report with the tarball generated by the recorder us the following command:

```
./aperf-v0.1.6-alpha-aarch64/aperf report --run-directory c7g_performance_run_1.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--run- directory c7g_performance_run_1 .tar.gz

:/ I guess the tarball becomes a directory, so it's kind of not wrong.

--run would be more appropriate if multiple forms of input are accepted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We can change the arg name.

EXAMPLE.md Outdated
```

The APerf recorder also produces a tarball of the run data. To generate a report with the tarball generated by the recorder us the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the behavior of the recorder should be getting described in the visualization section. This part should merely say you can supply either a run's directory or tarball

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

src/report.rs Outdated
let directory = get_dir(dir.to_string())?;
let path = Path::new(&directory);
if dir_stems.contains(&path.file_stem().unwrap().to_str().unwrap().to_string()) {
error!("Cannot process two directories with the same name");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily directories

src/report.rs Outdated Show resolved Hide resolved
src/report.rs Outdated

/* Create a temp archive */
let archive_name = format!("{}.tar.gz", &dir_stem);
let tar_gz = fs::File::create(&archive_name)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I supply a run that's like /tmp/foo-bar, isn't this going to create (and then leave behind) foo-bar.tar.gz in PWD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

fs::copy(&archive_name, archive_dst)?;
return Ok(());
}
if infer::get_from_path(&loc)?.unwrap().mime_type() == "application/gzip" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: should be an 'else if'

src/report.rs Outdated
let tar_gz = File::open(&dir)?;
let tar = GzDecoder::new(tar_gz);
let mut archive = tar::Archive::new(tar);
archive.unpack(".")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the tarball contains absolute paths in it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And won't this similarly leave things around, like if I supply /tmp/foo-bar.tar.gz this will leave a foo-bar directory in PWD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard tar command has protections in it to reduce the impact of a malicious tarball (https://unix.stackexchange.com/a/276962/398490), but even if we had that (which I don't know if tar::Archive does), there's still the threat of extracting a bunch of garbage files all over the place and leaving a mess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tarballs format is expected to be the same as what the recorder produces.

The run data can either be fed as a directory or as a tarball generated
by aperf record. Change arg name from --run-directory to --run.
If the directory or archive fed to aperf report does not contain valid
run data, fail gracefully. If the total number of fails is equal to the
total number of visualizers then the run data is invalid. If only some
of the visualizers fail during init, it could be that some data was not
collected. This will help in being backwards compatible.
src/lib.rs Outdated
Comment on lines 77 to 78
#[error("Invalid run name")]
InvalidRecordName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid record name? What's the thinking behind the "record" part here, versus calling it InvalidRunName, which would align with InvalidRunData above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. That is not needed. Holdover from when I was considering limiting what can be a run name. I'll remove.

pub fn form_and_copy_archive(loc: String, report_name: &PathBuf) -> Result<()> {
if Path::new(&loc).is_dir() {
let dir_stem = Path::new(&loc).file_stem().unwrap().to_str().unwrap().to_string();

/* Create a temp archive */
let archive_name = format!("{}.tar.gz", &dir_stem);
let tar_gz = fs::File::create(&archive_name)?;
let archive_path = format!("{}/{}", APERF_TMP, archive_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this temp file not get cleaned up? Why not just write directly to the destination file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Towards the end, the directory gets deleted.

An aperf_tmp is used for all intermediate steps aperf performs when
forming the report.
@janaknat janaknat merged commit 6be61ee into main May 15, 2023
@janaknat janaknat deleted the aperf_update branch July 7, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants