Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute power only in start - stop window #378

Merged
merged 1 commit into from
Sep 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions mlperf_logging/result_summarizer/result_summarizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,28 +516,45 @@ def _compute_total_power(benchmark_folder, result_file, time_to_train, ruleset):

def _compute_power_node(loglines, time_to_train):
prev_timestamp = 0
max_timestamp = 0
last_timestamp = 0
power_start = 0
power_stop = 0
agg_power = 0
conversion_eff = 1.0
power_stop_found = False
power_start_found = False
loglines.sort(key=lambda x: x.timestamp)
for logline in loglines:
if logline.key == "power_measurement_start":
power_start = logline.timestamp
prev_timestamp = logline.timestamp
if logline.key == "power_reading":
agg_power += (logline.value['value'] * (logline.timestamp - prev_timestamp))
power_start_found = True
Copy link
Contributor

@xyhuang xyhuang Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the power-start line is not found in the log, should we raise an error or log a warning? i guess it might cause an inaccurate calculation if we assumed 0 start time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should rase an error. Just updated the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# If power start is not found, raise an error
if not power_start_found:
raise ValueError("Power start timestamp not found")

if (
(logline.key == "power_reading")
and (not power_stop_found)
and (power_start_found)
):
agg_power += logline.value["value"] * (logline.timestamp - prev_timestamp)
prev_timestamp = logline.timestamp
max_timestamp = max(max_timestamp, logline.timestamp)
last_timestamp = max(last_timestamp, logline.timestamp)
if logline.key == "power_measurement_stop":
power_stop = logline.timestamp
break
power_stop_found = True
if logline.key == "conversion_eff":
conversion_eff = logline.value['value']

conversion_eff = logline.value["value"]

# If power start is not found, raise an error
if not power_start_found:
raise ValueError("Power start timestamp not found")
# If power stop is not found, set it to the first power reading
if not power_stop_found:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it expected that submission may not have the power-stop line? if not, maybe we should raise an error or log a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not expected. But it it happens, we can take the last power measurement as the stop. I updated it to log a warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# If power stop is not found, set it to the first power reading
if not power_stop_found:
power_stop = last_timestamp
print("WARNING: Power stop not found, taking last_timestamp as the power measurement stop")

power_stop = last_timestamp
print("WARNING: Power stop not found, taking last_timestamp as the power measurement stop")

# Compute the result, convert ms to s
power_stop = max(power_stop, max_timestamp)
result = conversion_eff * agg_power * time_to_train / (power_stop - power_start) / 1000
result = (
conversion_eff * agg_power * time_to_train / (power_stop - power_start) / 1000
)
return result


Expand Down
Loading