Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for collecting, visualizing per-process data #66

Merged
merged 5 commits into from
Jul 7, 2023
Merged

Conversation

janaknat
Copy link
Contributor

Shows only the top 15 processes recorded during the collection period.

Also, update the way we detect a run not having data of a particular type. Previously, if one run does not contain a data that is recorded, all runs of the visualization type would be empty. With this update, only the run without data does not contain visualization.

Attached a tarball of the aperf report which exercises the changes.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
comp.tar.gz

janaknat added 2 commits May 31, 2023 17:44
With a boolean, if one run doesn't have the data, no data is visualized
for any of the runs.
Copy link
Contributor

@wash-amzn wash-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads can have a different name from their parent process. This happens quite a bit for server applications, so could really be a problem here doing aggregation by name and not by pids and explicit relationship attributes.

};
let name = pstat.comm;
let pid = pstat.pid;
let time_ticks = pstat.utime as i64 + pstat.stime as i64 + pstat.cutime + pstat.cstime +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those actually signed values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some were signed, some were unsigned. Changed all to unsigned. Not sure how you can have negative CPU usage.

Comment on lines +121 to +140
let mut total_time: u64 = 1;
if let TimeEnum::TimeDiff(v) = values.last().unwrap().time - values[0].time {
if v > 0 {
total_time = v;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I collect with a period = 0, there's only one data point and letting total_time be 1 is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed code so that we can't give a time of 0 for the collection interval or collection period.

Comment on lines 170 to 174
for (name, entry) in &process_map {
process_list.insert(name.to_string(), entry.total_cpu_time / total_time);
}
let mut ordered_process_list = Vec::from_iter(process_list);
ordered_process_list.sort_by(|&(_, a), &(_, b)| b.cmp(&a));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put everything in a b-tree if you're just going to then pull everything out and re-sort it by value (discarding the work the b-tree did to sort by key)?

let mut process_entry = ProcessEntry {
name: entry.name.clone(),
prev_sample: sample.clone(),
collection_time: TimeEnum::TimeDiff(total_time),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fixed value being attached to every ProcessEntry. If all of them have the same value, why store it in them at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. The collection_time is present only once now.

Comment on lines 181 to 187
let pe = process_map.get(name);
for sample in &pe.unwrap().samples {
if sample.cpu_time != 0 {
not_zero = true;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just use the total_cpu_time you already have there (instead of looking for at least one non-zero sample)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Used the total_cpu_time.

}
}
if not_zero {
let name_split: Vec<&str> = name.split("-").collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process name can contain a -, so you need to re-jigger this one way or another not to break in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to ';'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process name can likely contain a semicolon as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use serialization like serde, the CPU usage increase by 0.5%. So that is not an option. It is unlikely that process names will have a semicolon. However, I can change it to using "=>" as the separator.

Comment on lines 205 to 209
end_entry.entries.push(pe.unwrap().clone());
end_map.insert(actual_name.to_string(), end_entry);
count = count + 1;
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual logic here for which processes are considered in the top 15 is pretty complicated to explain. I think it's the 15 processes (keyed by their name name, not pid), ordered by each one's top CPU-using thread (not all threads combined, and thread is defined as being in the same process or in a child process with the same name).

I doubt I got it right. The point is that this is really complicated to explain to someone, so it's going to be an ongoing head-scratcher for users and anyone who has to answer users' questions.

let value_zero = values[0].clone();
let time_zero = value_zero.time;
let mut end_map: HashMap<String, EndEntry> = HashMap::new();
let end_values: Vec<EndEntry>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally figured out what "end" really is, but the troubling thing is even now giving it a name is tough. It's not total aggregate per process (since we stop at 16 threads). Regardless, it needs a different name.

Copy link
Contributor

@wash-amzn wash-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processes tab would be a good place to have things like number of processes created, context switches, etc.

let pid = pstat.pid;
let time_ticks = pstat.utime + pstat.stime + pstat.cutime as u64 + pstat.cstime as u64 +
pstat.guest_time.unwrap_or_default() + pstat.cguest_time.unwrap_or_default() as u64;
if time_ticks / procfs::ticks_per_second()? as u64 <= 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to be constantly calling ticks_per_second, AFAICT it winds up making a sysconf system call each time (and then again a couple lines down)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Rust and static global. Eek.

}
/* Order the processes by Total CPU Time per collection time */
let mut ordered_process_list: Vec<ProcessEntry> = process_map.clone().into_values().collect();
ordered_process_list.sort_by(|a, b| (b.total_cpu_time / total_time).cmp(&(a.total_cpu_time / total_time)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dividing both numbers by total_time doesn't change anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

if entry.total_cpu_time != 0 {
match end_map.get_mut(&entry.name) {
Some(e) => {
if e.total_threads > 16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't actually counting threads, it's the number of difference processes with the same name you're aggregating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah. Changing it total_processes.

if count >= 15 {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you aggregate everything else into an "Other" entry, so then at least the sum of everything would match the system's overall cpu usage? It would avoid hiding a bunch of usage when there's a long tail of low-cpu usage processes that together use a significant amount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. That has a high probability of confusing the user though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assure you the current behavior is going to be more confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have the current set of functionality as is. I'd increase the processes count per graph to 32 and see how it fares.

if entry.total_cpu_time != 0 {
match end_map.get_mut(&entry.name) {
Some(e) => {
if e.total_threads > 16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stop at aggregating together 16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to stop somewhere. For a Linux kernel build, there are thousands of sub process. Choices are 16, 32 or 64 (pushing it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you actually saving? You've already iterated through everything once, O(2n) vs O(n)? Not even that, the above is linear to amount of time and number of prcesses, this is just number of processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have 1000's of entries in the graph. Also, the size of the report would become huge if we measure long running, fork bombing processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case isn't about changing the size of any data structure, only aggregating together CPU time (more additions).

Regardless, I'll re-read all of this and see if I'm misunderstanding things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you were talking about on output size. However, there's a better way to do this and get the correct result.

Right now you're aggregating by pid, sorting those aggregates, and then partially (depending on who you ask partially would mean incorrectly) re-aggregating based on process name. Just aggregate on process name in the first place, then the sort will give you the actual top N that you want without a second partial aggregation that will have cases of giving the wrong results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a single aggregation isn't quite as easy as simply dropping the pid part of the key, you'll need to have a second-level map in each entry to keep the last, per-pid, samples, in order to compute diffs and add time to the process-by-name's aggregate

ordered_process_list.sort_by(|a, b| (b.total_cpu_time).cmp(&(a.total_cpu_time)));

/*
* Create 1 entry per process name. Add similarly named processes to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really "similarly named", it's an exact match grouping.

@janaknat
Copy link
Contributor Author

3 Linux kernel builds which fork off a bunch of make sub-processes. Plotly was able to handle it without issues. The run was for 1 hour.
aperf_report_fullbuild4.tar.gz

@janaknat janaknat force-pushed the processes branch 3 times, most recently from b6f296f to 2c00e6b Compare July 5, 2023 16:18
janaknat and others added 3 commits July 5, 2023 19:08
action-rs/toolchain already includes cargo. Don't use another actions-rs for cargo commands.
@janaknat janaknat merged commit ca998e1 into main Jul 7, 2023
@janaknat janaknat deleted the processes branch July 31, 2023 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants