Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors #1913

Merged
merged 1 commit into from
Oct 25, 2018

Conversation

xiaochenshen
Copy link
Contributor

@xiaochenshen xiaochenshen commented Oct 17, 2018

Linux kernel v4.15 introduces better diagnostics for Intel RDT operation
errors. If any error returns when making new directories or writing to
any of the control file in resctrl filesystem, reading file
/sys/fs/resctrl/info/last_cmd_status could provide more information that
can be conveyed in the error returns from file operations.

Some examples:

  echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  mask f3 has non-consecutive 1-bits

  echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  MB value 0 out of range [10,100]

  cd /sys/fs/resctrl
  mkdir 1 2 3 4 5 6 7 8
  mkdir: cannot create directory '8': No space left on device
  cat /sys/fs/resctrl/info/last_cmd_status
  out of CLOSIDs

See 'last_cmd_status' for more details in kernel documentation:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

In runc, we could append the diagnostics information to the error
message of Intel RDT operation errors to provide more user-friendly
information.

Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com

@xiaochenshen xiaochenshen force-pushed the rdt-add-diagnostics branch 4 times, most recently from 339bcf2 to 6c307f8 Compare October 18, 2018 16:02
…peration errors

Linux kernel v4.15 introduces better diagnostics for Intel RDT operation
errors. If any error returns when making new directories or writing to
any of the control file in resctrl filesystem, reading file
/sys/fs/resctrl/info/last_cmd_status could provide more information that
can be conveyed in the error returns from file operations.

Some examples:
  echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  mask f3 has non-consecutive 1-bits

  echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata
  -bash: echo: write error: Invalid argument
  cat /sys/fs/resctrl/info/last_cmd_status
  MB value 0 out of range [10,100]

  cd /sys/fs/resctrl
  mkdir 1 2 3 4 5 6 7 8
  mkdir: cannot create directory '8': No space left on device
  cat /sys/fs/resctrl/info/last_cmd_status
  out of CLOSIDs

See 'last_cmd_status' for more details in kernel documentation:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

In runc, we could append the diagnostics information to the error
message of Intel RDT operation errors to provide more user-friendly
information.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@xiaochenshen
Copy link
Contributor Author

@crosbymichael @hqhq @cyphar @mrunalp @rjnagal @vmarmol @dqminh

This PR is a minor enhancement for Intel RDT operation error handling, it will provide more user-friendly diagnostics information in runc.

Could you help code review and comment at your convenience?
Thank you in advance!

lastCmdStatus, err := getIntelRdtParamString(path, "last_cmd_status")
if err != nil {
return "", err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the system doesn't support "last_cmd_status", is it gonna fall back to use the error we got in getIntelRdtParamString? Doesn't sound quite right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqhq

Thank you for kind code review!

This change could handle the case when system doesn't support "last_cmd_status":

Firstly, getLastCmdStatus() returns the error which we get from getIntelRdtParamString() to the caller NewLastCmdError().

In NewLastCmdError(), parameter 'err' is the original error. 'err1' is the error returns from getLastCmdStatus().

  1. err1 != nil indicates that we don't support "last_cmd_status". We just return original 'err' directly.
  2. Only when err1 == nil, which indicates that we support "last_cmd_status", we will call overloaded function Error() to append extra diagnostics information to original 'err'.

See more details in NewLastCmdError():

func NewLastCmdError(err error) error {
	lastCmdStatus, err1 := getLastCmdStatus()
	if err1 == nil {
		return &LastCmdError{
			LastCmdStatus: lastCmdStatus,
			Err:           err,
		}
	}
	return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I misread that, thanks for explanation.

@caniszczyk
Copy link
Contributor

RFC @opencontainers/runc-maintainers

1 similar comment
@caniszczyk
Copy link
Contributor

RFC @opencontainers/runc-maintainers

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Oct 22, 2018

Sorry for some duplicate comments here. It looks like something wrong with github due to the incident: https://blog.github.com/2018-10-21-october21-incident-report/

@hqhq
Copy link
Contributor

hqhq commented Oct 22, 2018

LGTM

Approved with PullApprove

2 similar comments
@crosbymichael
Copy link
Member

crosbymichael commented Oct 25, 2018

LGTM

Approved with PullApprove

@caniszczyk
Copy link
Contributor

caniszczyk commented Oct 25, 2018

LGTM

Approved with PullApprove

@caniszczyk caniszczyk merged commit f3ce822 into opencontainers:master Oct 25, 2018
@crosbymichael
Copy link
Member

pull approve is a little off recently

@caniszczyk
Copy link
Contributor

caniszczyk commented Oct 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants