Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system power states #24228

Closed
wentongwu opened this issue Apr 9, 2020 · 24 comments
Closed

system power states #24228

wentongwu opened this issue Apr 9, 2020 · 24 comments
Assignees
Labels
area: Power Management RFC Request For Comments: want input from the community

Comments

@wentongwu
Copy link
Contributor

wentongwu commented Apr 9, 2020

Currently Zephyr classifies power states into two categories, sleep state and deep sleep state, based on whether the CPU loses execution context during the power state transition, and every state has more sophisticated sub-states(SLEEP_1, SLEEP_2, SLEEP_3, DEEP_SLEEP_1, DEEP_SLEEP_2, DEEP_SLEEP_3) which are classified only by the residency duration. But it's not enough to define power states based on only whether CPU loses execution context, also there is no technical rules to classify sub-states. It's intended to give more flexible for vendors and users to do power management, but it really makes huge noise for long time because of the uncleared and confused definitions.

ACPI specification defines cleared power states which has already been adopted by other OSes(AFAIK linux and windows), so I suggest Zephyr also do power state definition based on ACPI spec. The sleep states that can be supported by Zephyr are listed below.

Working state Runtime active state
During runtime active state, the system is awake and running. In simple terms, the system is in a full running state.

Runtime idle state
Runtime idle is a system sleep state in which all of the cores enter deepest possible idle state and wait for interrupts, but all the devices are awake and in normal state, no requirements for the devices, leaving them at the states where they are.

Suspend to idle state
The system goes through a normal platform suspend where it puts all of the cores in deepest possible idle state and puts peripherals into low-power states(possibly lower-power than available in the working state runtime active state). No operating state is lost (the cpu core retains power, does not lose execution context), so the system can go back to where it left off easily enough.

Standby state
In addition to putting peripherals into low-power states, which is done for suspend to idle too, all non-boot CPUs are powered off. It should allow more energy to be saved relative to suspend to idle, but the resume latency will generally be greater than for that state. But it should be the same state with suspend to idle state on uniprocesser system.

Suspend to ram state
This state offers significant energy savings by power off as much of the system as possibleas everything in the system is power gated, where memory should be placed into the self-refresh mode to retain its contents. The state of devices and CPUs is saved and held in memory, and it may require some boot-strapping code in ROM to resume the system from it.

Suspend to disk state
It gets the greatest power savings through powering off as much of the system as possible, including the memory. The contents of memory are written to disk/flash, and on resume it's read back into memory with the help of boot-strapping code, restores the system to the same point of execution where it went to suspend to disk.

Soft off state
This state consumes a minimal amount of power and requires a large latency in order to return to runtime active state the Working state. The contents of system(CPU and memory) will not be preserved, so the system will be restarted if woken by any wakeup-source.

And the implementation of this RFC will be the start of PM overhaul.

@wentongwu wentongwu added the RFC Request For Comments: want input from the community label Apr 9, 2020
@wentongwu wentongwu self-assigned this Apr 9, 2020
@wentongwu
Copy link
Contributor Author

wentongwu commented Apr 10, 2020

@pabigot
Copy link
Collaborator

pabigot commented Apr 10, 2020

Where are these ACPI states documented?

Looking at the 6.3eB specification section 2.2 I see G0 is Working, G1 is Sleeping. Figure 16-74 in section 16.1 shows four (S1-S4) substates within G1, though S5 is also mentioned. It seems to break down to:

  • G0 Working
  • G1/S1 low wake-latency sleeping...
  • G1/S2 S1+system memory can be lost
  • G1/S3 S2+more power domains shut down
  • G1/S4 S3+all devices powered off
  • G2/S5 Soft Off

These don't match to what you describe. Which is fine, I just want to see a design that's clearly informed by an existing specification/architecture that we can go to to resolve questions.

@sslupsky
Copy link
Contributor

I've interpreted the sleep states for sam0 as "idle" versus "standby" modes documented in the samd21 datasheet. "Idle" might translate to "Runtime idle" and "standby" might translate to "Suspend to idle"

@wentongwu
Copy link
Contributor Author

Where are these ACPI states documented?

reference ACPI spec you mentioned and also linux doc

@pabigot
Copy link
Collaborator

pabigot commented Apr 13, 2020

reference ACPI spec you mentioned and also linux doc

OK, I see some of it in the Linux documentation. Specifically I see that on ACPI systems these Linux system-wide states might map to ACPI states:

  • Standby => G1/S1
  • Suspend-to-RAM => G1/S3
  • Hibernation (aka Suspend-to-Disk) => G1/S4 or G2/S5

But you're mixing terms from two domains. In Linux Working state is a power management scheme for non-sleep states: the states should be:

  • "Runtime Active" when all system components are active; and
  • "Runtime Idle" when "all [system components] are inactive".

The other power management scheme System-wide provides the suspend-to-idle, standby, suspend-to-RAM, and hibernation states.

So the Linux architecture splits the working and sleep states into two different schemes. This seems like a good idea; but how does it map to Zephyr?

I don't see the terms you provided used in the ACPI reference, and they are not one-to-one with Linux terms. We can say that the terms in this RFC are related to terms used by Linux, but not ACPI. And because they're only related and from different Linux management schemes they are not part of a clearly defined existing power management architecture.

Which (again) doesn't mean we can't use them, but it does mean this is nowhere near enough description of a power management architecture to be used as a basis for implementation.

Why aren't we starting with the existing architecture, terminology, and PI from Linux, and describing what Zephyr will do in terms of how it's different?

@wentongwu
Copy link
Contributor Author

wentongwu commented Apr 15, 2020

  1. For the device dependency, per Refining Zephyr's Device Driver Model - Take II #22941, @tbursztyka has already started to work on that, but no clear APIs exposed to other sub-systems so far, assume the dependency info will also be used to device initialization process, during which device_pm_add API of pm side will be called to construct the device list used by PM sub-system, the sequence of device suspend and resume will be based on that list instead of complicated data structure(tree or graph) to save sizes and reduce runtime latency; but if DTS code for device dependency can give such list somehow, pm side wouldn't maintain it again.

  2. And for device runtime pm, the children info also be recorded in struct device based on the API of device dependency, parent can only be suspend if all of it's children suspended. This part will consider more, some problems haven't details yet, for example implementation will be based on current device runtime infrastructure or onoff service which I'm reviewing, and in which context parent suspend will be invoked if its last child suspended, etc.

  3. Statement for runtime idle state has been updated, it has no requirements for the devices and just leaving them at where they are because device runtime pm may already put them off.

  4. When system wide power management works with device runtime PM at the same time, the sync between them should be considered carefully, for example when system wide PM is functional, device PM should be paused and follow system wide PM logic, and as Resuming from suspend should check device usage count in device idle PM #22391(suggested by @vanti) indicated, the resume should consider the usage count of device, etc.

  5. The PM sub-system will consist of pm policy, pm core, pm platform, device pm and device runtime pm. The pm policy is to make the decision about the next system state based on the constraints and system idle time, etc. Pm platform layer will expose the supported power states and do the actual power things for every platform, e.g. wfi/wfe, save cpu states and power gate cpu core, sync between cache and ram, ram retention/shutdown, etc. Device pm layer will do devices' clock/power gate. Pm core will expose APIs to pm manger where pm logic will run and do the switch and lock, etc. The infrastructure of pm policy, pm core, pm platform and device pm is very straight forward, we should think about each layer's structures and APIs carefully and I'm still considering them. As mentioned above, The infrastructure of device runtime PM is still under considering.

@pabigot
Copy link
Collaborator

pabigot commented Apr 15, 2020

Please change "Working state" in the list of states to "Runtime active state" to be consistent with "Runtime idle state" and the way the terms are used in Linux.

This also helps makes clear that there are two general states: runtime, and sleep. The remaining five states are sleep states.

It would be nice if all the power-related state names matched their Linux inspiration.

@wentongwu
Copy link
Contributor Author

wentongwu commented Apr 23, 2020

PM state

typedef enum pm_state {
	PM_STATE_RUNTIME_ACTIVE = 0,
	PM_STATE_RUNTIME_IDLE,
	PM_STATE_SUSPEND_TO_IDLE,
	PM_STATE_STANDBY,
	PM_STATE_SUSPEND_TO_RAM,
	PM_STATE_SUSPEND_TO_DISK,
	PM_STATE_SOFT_OFF,
	PM_STATE_MAX
} pm_state_t;

PM policy structure and API

struct pm_policy_api {
	void (*init)(void);
	pm_state_t (*next_state)(struct pm_policy *policy);
	int (*set_constraint)(struct pm_policy *policy, pm_state_t state);
	int (*release_constraint)(struct pm_policy *policy, pm_state_t state);
};

struct pm_policy {
	bool supported_states[PM_STATE_MAX];
	struct pm_policy_api *policy_api;
};

#define PM_POLICY_DEFINE(_name, _rt_active, _rt_idle, _suspend_to_idle, _standby, _suspend_to_ram, _suspend_to_disk, _soft_off, _api) \
	static const Z_STRUCT_SECTION_ITERABLE(pm_policy, _name) = \
	{ \
		.name = STRINGIFY(_name), \
		.supported_states = {_rt_active, _rt_idle, _suspend_to_idle, _standby, _suspend_to_ram, _suspend_to_disk, _soft_off}, \
		.policy_api = &_api \
	}

static inline void pm_policy_init(struct pm_policy *policy)
{
	if (policy && policy->api) {
		policy->api->init();
	}
}

static inline pm_state_t pm_policy_next_state(struct pm_policy *policy)
{
	if (policy && policy->api) {
		policy->api->next_state(policy);
	}
}

static inline int pm_policy_set_constraint(struct pm_policy *policy, pm_state_t state)
{
	if (policy && policy->api) {
		policy->api->set_constraint(policy, state);
	}
}

static inline int pm_policy_release_constraint(struct pm_policy *policy, pm_state_t state)
{
	if (policy && policy->api) {
		policy->api->release_constraint(policy, state);
	}
}

static inline struct pm_policy *pm_policy_get(char *name)
{
	Z_STRUCT_SECTION_FOREACH(pm_policy, policy) {
		if (strcmp(policy->name, name) == 0) {
			return policy;
		}
	}
	return NULL;
}

@pabigot
Copy link
Collaborator

pabigot commented Apr 23, 2020

@wentongwu Could you put that into a draft PR that has the API in a header (with documentation) so we can review it? I have comments, but can't provide them here.

@wentongwu
Copy link
Contributor Author

@wentongwu Could you put that into a draft PR that has the API in a header (with documentation) so we can review it? I have comments, but can't provide them here.

sure, will do.

@wentongwu
Copy link
Contributor Author

wentongwu commented May 11, 2020

PM core
The PM core will get the pm state where system is going to with pm policy APIs, and it should be called with interrupt locked. And according to the decided next system pm state, it will suspend devices(if needed) and platform in sequence with the defined interfaces.

For the device suspend in some power states, it will take two steps. First is the device prepare stage, it will be executed with scheduler locked(k_sched_lock) to allow do some sync if needed with the connected slave/master, the well prepared devices will be linked into dev_prepared_list which will be the foundation of next step, however device can't reject the power state switching in this state because pm policy layer already provides constraint API. However during this, it's possible that there are wake-up interrupts, in that situation a global wake-up count will be defined to record wake-up happened or not, and it will be checked at the beginning of next step(if wake-up happened, the ongoing suspend will be stopped). And for the second step, it will clock/power gate the devices based on dev_prepared_list with irq locked.

After that platform suspend will happen with the defined APIs struct platform_suspend_ops.

Hold on, during typing this comment, I have another idea, run-time device pm will be always on(maybe no Kconfig option) to take care devices' pm, and it will provide API to indicate device states that can be used to pm policy layer to help decide system next state, because as above state definitions indicate, some of them need devices suspend. And system will do the necessary operations following definitions above and platform implementation with the decided pm state. And that will give more flexible to control device's pm state by device self, but may save less power compared with above method. @pabigot what's your thoughts about this?

@pabigot
Copy link
Collaborator

pabigot commented May 11, 2020

We cannot assume that devices can transition power levels synchronously from within an interrupt. Rather, the allowed system power state must be affected by the states the devices are in.

This does suggest that device power management must always be on and devices should automatically transition to the lowest power state consistent with application needs.

I think the idea that system power management should control device power management is workable only with a well-defined model of application needs that can constrain system power management. It is not in general acceptable for the system to say "I'm going to sleep, everybody shut down" if the application is waiting for a response from an external device that will be lost if the system sleeps. We don't have such a model.

So I still feel we're going too deep without agreeing on and documenting the general design principles and goals. That includes an architectural vision, core concepts, and (abstract) data structures including dependencies and constraints: what they are, how they're represented, and how they affect transitions between system power states. So far the only thing described in any detail is the static power states.

For example the concept of an interrupt occurring during a power level transition and so blocking completion of that transition must have been addressed before. How is it handled in the TI and other power management architectures?

@wentongwu
Copy link
Contributor Author

wentongwu commented May 11, 2020

I think the idea that system power management should control device power management is workable only with a well-defined model of application needs that can constrain system power management. It is not in general acceptable for the system to say "I'm going to sleep, everybody shut down" if the application is waiting for a response from an external device that will be lost if the system sleeps. We don't have such a model.

see the pm policy API, it defined constraint API to constrain system power management, the state pm core will follow come from pm policy layer. Ok, I will give more documentation about that.

struct pm_policy_api {
	void (*init)(void);
	pm_state_t (*next_state)(struct pm_policy *policy);
	int (*set_constraint)(struct pm_policy *policy, pm_state_t state);
	int (*release_constraint)(struct pm_policy *policy, pm_state_t state);
};

We cannot assume that devices can transition power levels synchronously from within an interrupt.

sure, power transition will not happen in interrupt, I mean the sync between device driver and device firmware will be the first step, device driver will start the sync message which runs in idle thread context, and the response(ack) will be the interrupt self and if receive the interrupt the device will be put to the dev_prepared_list. The second step will do the actual power transition in idle thread context based on the dev_prepared_list. There may be limitations for the sync if the response need a read, so suggest another idle as above.

Rather, the allowed system power state must be affected by the states the devices are in.

But we should well consider that device rejects the suggested power state, if happened, we only will go into run-time idle. It has the same effect if devices states can affect the ongoing system pm state.

So I still feel we're going too deep without agreeing on and documenting the general design principles and goals.

yes, so we have to discuss, and I will document more.

@pabigot
Copy link
Collaborator

pabigot commented May 11, 2020

We need the API in a (draft?) PR as requested so we can see the whole thing. Please document its behavior as part of the initial PR. If that PR exists please link to it here (and link back here from it).

wentongwu added a commit to wentongwu/zephyr that referenced this issue Jul 20, 2020
In order to reduce the overall system power consumption, we should
suspend the devices which are idle or not being used while system is
active or running. Currently there is device idle power management
framework which intends to do that. But the implementation seems can
only do get/put one after one and can't handle the concurrency, for
example if multiple threads request for DEVICE_PM_ACTIVE_STATE
concurrently, there is possibility polling the signal without reset
and signal contention among multiple threads. And the disable function
doesn't consider the ongoing transition. Further it doesn't consider
the device dependency. So decide rework the implementation and rename
it device runtime power management following the definition in zephyrproject-rtos#24228.

The API rt_dpm_claim is trying to resume the device and protect any
hardware transfer after this call by increasing the usage count. When
there is concurrency with another claim or release, this API will pend
the current thread to the wait_q until previous transition finished.

The API rt_dpm_release is to release previous claim, forbid unexpected
release. And no hardware operation depends on release, so release has
asynchronous version. After the release, the parents of that device
will also be considered automatically.

And it can be decided by individual device to support device runtime
power management or not by the API rt_dpm_enable/rt_dpm_disable. And
also it's the device driver instead of this framework that decide how
to define device not in use, it means device driver decide where to
put rt_dpm_claim/rt_dpm_release, for example we can put them around
transfer function for i2c, but for net device, maybe we can only put
them around open/close or other similar place.

Signed-off-by: Wentong Wu <wentong.wu@intel.com>
@wentongwu
Copy link
Contributor Author

@pabigot @nashif @vanti I attach the API here wentongwu@598eab9 we have so far, please share some comments to avoid going differently with anyone's idea in head. platform pm API and device pm API is still in progress, after that will settle down the implementation of the pm core(or pm manager).

@vanti
Copy link
Collaborator

vanti commented Jul 23, 2020

@wentongwu Can we relax the definition of "Suspend to ram state" state a bit to include states such as the standby mode on TI CC1352, where almost everything is power-gated, with the exception for some minimal CPU logic that is required for waking up from the same point without restarting? I think the word 'everything' is a bit strong here.

@pabigot
Copy link
Collaborator

pabigot commented Jul 24, 2020

Just capturing some information about existing states. The ones at the top are Zephyr; I think the names are good, and the other references can show what we mean by those states.

The draft API pointed to above should incorporate documentation that explain more clearly what's meant by those states (the Linux System State link has the most detail).

System power management states:

Zephyr ACPI Linux
Runtime Active S0 Runtime active
Runtime Idle S0ix Runtime idle
Suspend to idle S1 Suspend-to-idle (S2, S2Idle)
Standby S2? Standby
Suspend to RAM S2 or S4 Suspend-to-RAM (STR, S2RAM)
Suspend to Disk S4 Hibernation
Soft Off S5

Device States

Zephyr ACPI Linux
TBD D0 (Fully On) ??
TBD D1, D2 ??
TBD D3 (Hot, Cold) ??

@wentongwu
Copy link
Contributor Author

@wentongwu Can we relax the definition of "Suspend to ram state" state a bit to include states such as the standby mode on TI CC1352, where almost everything is power-gated, with the exception for some minimal CPU logic that is required for waking up from the same point without restarting? I think the word 'everything' is a bit strong here.

@vanti updated. Thanks

@wentongwu
Copy link
Contributor Author

Zephyr ACPI Linux
normal(PM_ACTIVE_STATE) D0 (Fully On) ??
clock gate(PM_SUSPEND_STATE) D1, D2 ??
power gate(PM_OFF_STATE) D3 (Hot, Cold) ??

@pabigot @vanti how about we use the current definition for device pm state?

@wentongwu
Copy link
Contributor Author

another problem, if all of the code get ready, which one is the best platform we do the test?

@wentongwu
Copy link
Contributor Author

wentongwu commented Jul 29, 2020

Zephyr ACPI Linux
normal(PM_ACTIVE_STATE) D0 (Fully On) ??
clock gate(PM_SUSPEND_STATE) D1, D2 ??
power gate(PM_OFF_STATE) D3 (Hot, Cold) ??
@pabigot @vanti how about we use the current definition for device pm state?

maybe not clock gate, some devices can work on different clocks, maybe the defined API for device pm should pass down them as parameter.

@vanti
Copy link
Collaborator

vanti commented Jul 29, 2020

Zephyr ACPI Linux
normal(PM_ACTIVE_STATE) D0 (Fully On) ??
clock gate(PM_SUSPEND_STATE) D1, D2 ??
power gate(PM_OFF_STATE) D3 (Hot, Cold) ??
@pabigot @vanti how about we use the current definition for device pm state?

I think active/suspend/off sound fine to me. On devices that only have active/off state, would suspend map to off?

another problem, if all of the code get ready, which one is the best platform we do the test?

The infrastructure should be tested on multiple platforms in my opinion, to make sure it is flexible enough.

@pabigot
Copy link
Collaborator

pabigot commented Jul 29, 2020

I don't like "power gate" and "clock gate" since those are (AIUI) technologies that produce a savings in power, not low-power states. I've also not seen any non-MCU/CPU devices (e.g. I2C or SPI ICs) that document their low power modes in those terms.

Remember there are four states to capture. Would it be active/suspend1/suspend2/off? That's getting vague.

So I'm leaning towards D0, D1, D2, D3. Then, for consistency, I've gotta change my position on system and go for S0, S0ix, S1, S2, S3, S4, S5. My motivation for the existing Linux-based names was it's more clear what those states mean, but that could be addressed by clear documentation (might even be better, as we can go into detail without getting wrapped up in what's implied by "suspend to RAM". (Though S0ix might become S0i) to indicate "idling in S0" rather than something to do with Intel-specific stuff).

Given where we are, I believe a one-to-one correspondence to an existing architecture like ACPI has the best chance for meeting cross-platform needs. I would prefer a different functional-based architecture for the device power states but I don't think that's going anywhere.

@nashif
Copy link
Member

nashif commented Feb 22, 2021

this is now done.

@nashif nashif closed this as completed Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Power Management RFC Request For Comments: want input from the community
Projects
None yet
Development

No branches or pull requests

6 participants