system power states #24228

wentongwu · 2020-04-09T11:11:58Z

Currently Zephyr classifies power states into two categories, sleep state and deep sleep state, based on whether the CPU loses execution context during the power state transition, and every state has more sophisticated sub-states(SLEEP_1, SLEEP_2, SLEEP_3, DEEP_SLEEP_1, DEEP_SLEEP_2, DEEP_SLEEP_3) which are classified only by the residency duration. But it's not enough to define power states based on only whether CPU loses execution context, also there is no technical rules to classify sub-states. It's intended to give more flexible for vendors and users to do power management, but it really makes huge noise for long time because of the uncleared and confused definitions.

ACPI specification defines cleared power states which has already been adopted by other OSes(AFAIK linux and windows), so I suggest Zephyr also do power state definition based on ACPI spec. The sleep states that can be supported by Zephyr are listed below.

~~Working state~~ Runtime active state
During runtime active state, the system is awake and running. In simple terms, the system is in a full running state.

Runtime idle state
Runtime idle is a system sleep state in which all of the cores enter deepest possible idle state and wait for interrupts, ~~but all the devices are awake and in normal state~~, no requirements for the devices, leaving them at the states where they are.

Suspend to idle state
The system goes through a normal platform suspend where it puts all of the cores in deepest possible idle state and puts peripherals into low-power states(possibly lower-power than available in ~~the working state~~ runtime active state). No operating state is lost (the cpu core retains power, does not lose execution context), so the system can go back to where it left off easily enough.

Standby state
In addition to putting peripherals into low-power states, which is done for suspend to idle too, all non-boot CPUs are powered off. It should allow more energy to be saved relative to suspend to idle, but the resume latency will generally be greater than for that state. But it should be the same state with suspend to idle state on uniprocesser system.

Suspend to ram state
This state offers significant energy savings by power off as much of the system as possible~~as everything in the system is power gated~~, where memory should be placed into the self-refresh mode to retain its contents. The state of devices and CPUs is saved and held in memory, and it may require some boot-strapping code in ROM to resume the system from it.

Suspend to disk state
It gets the greatest power savings through powering off as much of the system as possible, including the memory. The contents of memory are written to disk/flash, and on resume it's read back into memory with the help of boot-strapping code, restores the system to the same point of execution where it went to suspend to disk.

Soft off state
This state consumes a minimal amount of power and requires a large latency in order to return to runtime active state ~~the Working state~~. The contents of system(CPU and memory) will not be preserved, so the system will be restarted if woken by any wakeup-source.

And the implementation of this RFC will be the start of PM overhaul.

wentongwu · 2020-04-10T00:50:03Z

@pabigot @vanti @erwango @mnkp @fulong82

pabigot · 2020-04-10T11:58:13Z

Where are these ACPI states documented?

Looking at the 6.3eB specification section 2.2 I see G0 is Working, G1 is Sleeping. Figure 16-74 in section 16.1 shows four (S1-S4) substates within G1, though S5 is also mentioned. It seems to break down to:

G0 Working
G1/S1 low wake-latency sleeping...
G1/S2 S1+system memory can be lost
G1/S3 S2+more power domains shut down
G1/S4 S3+all devices powered off
G2/S5 Soft Off

These don't match to what you describe. Which is fine, I just want to see a design that's clearly informed by an existing specification/architecture that we can go to to resolve questions.

sslupsky · 2020-04-11T22:18:10Z

I've interpreted the sleep states for sam0 as "idle" versus "standby" modes documented in the samd21 datasheet. "Idle" might translate to "Runtime idle" and "standby" might translate to "Suspend to idle"

wentongwu · 2020-04-13T01:26:39Z

Where are these ACPI states documented?

reference ACPI spec you mentioned and also linux doc

pabigot · 2020-04-13T11:00:47Z

reference ACPI spec you mentioned and also linux doc

OK, I see some of it in the Linux documentation. Specifically I see that on ACPI systems these Linux system-wide states might map to ACPI states:

Standby => G1/S1
Suspend-to-RAM => G1/S3
Hibernation (aka Suspend-to-Disk) => G1/S4 or G2/S5

But you're mixing terms from two domains. In Linux Working state is a power management scheme for non-sleep states: the states should be:

"Runtime Active" when all system components are active; and
"Runtime Idle" when "all [system components] are inactive".

The other power management scheme System-wide provides the suspend-to-idle, standby, suspend-to-RAM, and hibernation states.

So the Linux architecture splits the working and sleep states into two different schemes. This seems like a good idea; but how does it map to Zephyr?

I don't see the terms you provided used in the ACPI reference, and they are not one-to-one with Linux terms. We can say that the terms in this RFC are related to terms used by Linux, but not ACPI. And because they're only related and from different Linux management schemes they are not part of a clearly defined existing power management architecture.

Which (again) doesn't mean we can't use them, but it does mean this is nowhere near enough description of a power management architecture to be used as a basis for implementation.

Why aren't we starting with the existing architecture, terminology, and PI from Linux, and describing what Zephyr will do in terms of how it's different?

wentongwu · 2020-04-15T08:58:28Z

For the device dependency, per Refining Zephyr's Device Driver Model - Take II #22941, @tbursztyka has already started to work on that, but no clear APIs exposed to other sub-systems so far, assume the dependency info will also be used to device initialization process, during which device_pm_add API of pm side will be called to construct the device list used by PM sub-system, the sequence of device suspend and resume will be based on that list instead of complicated data structure(tree or graph) to save sizes and reduce runtime latency; but if DTS code for device dependency can give such list somehow, pm side wouldn't maintain it again.
And for device runtime pm, the children info also be recorded in struct device based on the API of device dependency, parent can only be suspend if all of it's children suspended. This part will consider more, some problems haven't details yet, for example implementation will be based on current device runtime infrastructure or onoff service which I'm reviewing, and in which context parent suspend will be invoked if its last child suspended, etc.
Statement for runtime idle state has been updated, it has no requirements for the devices and just leaving them at where they are because device runtime pm may already put them off.
When system wide power management works with device runtime PM at the same time, the sync between them should be considered carefully, for example when system wide PM is functional, device PM should be paused and follow system wide PM logic, and as Resuming from suspend should check device usage count in device idle PM #22391(suggested by @vanti) indicated, the resume should consider the usage count of device, etc.
The PM sub-system will consist of pm policy, pm core, pm platform, device pm and device runtime pm. The pm policy is to make the decision about the next system state based on the constraints and system idle time, etc. Pm platform layer will ~~expose the supported power states and~~ do the actual power things for every platform, e.g. wfi/wfe, save cpu states and power gate cpu core, sync between cache and ram, ram retention/shutdown, etc. Device pm layer will do devices' clock/power gate. Pm core will expose APIs to pm manger where pm logic will run and do the switch and lock, etc. The infrastructure of pm policy, pm core, pm platform and device pm is very straight forward, we should think about each layer's structures and APIs carefully and I'm still considering them. As mentioned above, The infrastructure of device runtime PM is still under considering.

pabigot · 2020-04-15T11:09:23Z

Please change "Working state" in the list of states to "Runtime active state" to be consistent with "Runtime idle state" and the way the terms are used in Linux.

This also helps makes clear that there are two general states: runtime, and sleep. The remaining five states are sleep states.

It would be nice if all the power-related state names matched their Linux inspiration.

wentongwu · 2020-04-23T02:10:40Z

PM state

typedef enum pm_state {
	PM_STATE_RUNTIME_ACTIVE = 0,
	PM_STATE_RUNTIME_IDLE,
	PM_STATE_SUSPEND_TO_IDLE,
	PM_STATE_STANDBY,
	PM_STATE_SUSPEND_TO_RAM,
	PM_STATE_SUSPEND_TO_DISK,
	PM_STATE_SOFT_OFF,
	PM_STATE_MAX
} pm_state_t;

PM policy structure and API

struct pm_policy_api {
	void (*init)(void);
	pm_state_t (*next_state)(struct pm_policy *policy);
	int (*set_constraint)(struct pm_policy *policy, pm_state_t state);
	int (*release_constraint)(struct pm_policy *policy, pm_state_t state);
};

struct pm_policy {
	bool supported_states[PM_STATE_MAX];
	struct pm_policy_api *policy_api;
};

#define PM_POLICY_DEFINE(_name, _rt_active, _rt_idle, _suspend_to_idle, _standby, _suspend_to_ram, _suspend_to_disk, _soft_off, _api) \
	static const Z_STRUCT_SECTION_ITERABLE(pm_policy, _name) = \
	{ \
		.name = STRINGIFY(_name), \
		.supported_states = {_rt_active, _rt_idle, _suspend_to_idle, _standby, _suspend_to_ram, _suspend_to_disk, _soft_off}, \
		.policy_api = &_api \
	}

static inline void pm_policy_init(struct pm_policy *policy)
{
	if (policy && policy->api) {
		policy->api->init();
	}
}

static inline pm_state_t pm_policy_next_state(struct pm_policy *policy)
{
	if (policy && policy->api) {
		policy->api->next_state(policy);
	}
}

static inline int pm_policy_set_constraint(struct pm_policy *policy, pm_state_t state)
{
	if (policy && policy->api) {
		policy->api->set_constraint(policy, state);
	}
}

static inline int pm_policy_release_constraint(struct pm_policy *policy, pm_state_t state)
{
	if (policy && policy->api) {
		policy->api->release_constraint(policy, state);
	}
}

static inline struct pm_policy *pm_policy_get(char *name)
{
	Z_STRUCT_SECTION_FOREACH(pm_policy, policy) {
		if (strcmp(policy->name, name) == 0) {
			return policy;
		}
	}
	return NULL;
}

pabigot · 2020-04-23T10:39:05Z

@wentongwu Could you put that into a draft PR that has the API in a header (with documentation) so we can review it? I have comments, but can't provide them here.

wentongwu · 2020-05-08T01:46:35Z

@wentongwu Could you put that into a draft PR that has the API in a header (with documentation) so we can review it? I have comments, but can't provide them here.

sure, will do.

wentongwu · 2020-05-11T04:12:08Z

PM core
The PM core will get the pm state where system is going to with pm policy APIs, and it should be called with interrupt locked. And according to the decided next system pm state, it will suspend devices(if needed) and platform in sequence with the defined interfaces.

For the device suspend in some power states, it will take two steps. First is the device prepare stage, it will be executed with scheduler locked(k_sched_lock) to allow do some sync if needed with the connected slave/master, the well prepared devices will be linked into dev_prepared_list which will be the foundation of next step, however device can't reject the power state switching in this state because pm policy layer already provides constraint API. However during this, it's possible that there are wake-up interrupts, in that situation a global wake-up count will be defined to record wake-up happened or not, and it will be checked at the beginning of next step(if wake-up happened, the ongoing suspend will be stopped). And for the second step, it will clock/power gate the devices based on dev_prepared_list with irq locked.

After that platform suspend will happen with the defined APIs struct platform_suspend_ops.

Hold on, during typing this comment, I have another idea, run-time device pm will be always on(maybe no Kconfig option) to take care devices' pm, and it will provide API to indicate device states that can be used to pm policy layer to help decide system next state, because as above state definitions indicate, some of them need devices suspend. And system will do the necessary operations following definitions above and platform implementation with the decided pm state. And that will give more flexible to control device's pm state by device self, but may save less power compared with above method. @pabigot what's your thoughts about this?

pabigot · 2020-05-11T12:50:20Z

We cannot assume that devices can transition power levels synchronously from within an interrupt. Rather, the allowed system power state must be affected by the states the devices are in.

This does suggest that device power management must always be on and devices should automatically transition to the lowest power state consistent with application needs.

I think the idea that system power management should control device power management is workable only with a well-defined model of application needs that can constrain system power management. It is not in general acceptable for the system to say "I'm going to sleep, everybody shut down" if the application is waiting for a response from an external device that will be lost if the system sleeps. We don't have such a model.

So I still feel we're going too deep without agreeing on and documenting the general design principles and goals. That includes an architectural vision, core concepts, and (abstract) data structures including dependencies and constraints: what they are, how they're represented, and how they affect transitions between system power states. So far the only thing described in any detail is the static power states.

For example the concept of an interrupt occurring during a power level transition and so blocking completion of that transition must have been addressed before. How is it handled in the TI and other power management architectures?

wentongwu · 2020-05-11T14:16:51Z

I think the idea that system power management should control device power management is workable only with a well-defined model of application needs that can constrain system power management. It is not in general acceptable for the system to say "I'm going to sleep, everybody shut down" if the application is waiting for a response from an external device that will be lost if the system sleeps. We don't have such a model.

see the pm policy API, it defined constraint API to constrain system power management, the state pm core will follow come from pm policy layer. Ok, I will give more documentation about that.

struct pm_policy_api {
	void (*init)(void);
	pm_state_t (*next_state)(struct pm_policy *policy);
	int (*set_constraint)(struct pm_policy *policy, pm_state_t state);
	int (*release_constraint)(struct pm_policy *policy, pm_state_t state);
};

We cannot assume that devices can transition power levels synchronously from within an interrupt.

sure, power transition will not happen in interrupt, I mean the sync between device driver and device firmware will be the first step, device driver will start the sync message which runs in idle thread context, and the response(ack) will be the interrupt self and if receive the interrupt the device will be put to the dev_prepared_list. The second step will do the actual power transition in idle thread context based on the dev_prepared_list. There may be limitations for the sync if the response need a read, so suggest another idle as above.

Rather, the allowed system power state must be affected by the states the devices are in.

But we should well consider that device rejects the suggested power state, if happened, we only will go into run-time idle. It has the same effect if devices states can affect the ongoing system pm state.

So I still feel we're going too deep without agreeing on and documenting the general design principles and goals.

yes, so we have to discuss, and I will document more.

pabigot · 2020-05-11T16:01:29Z

We need the API in a (draft?) PR as requested so we can see the whole thing. Please document its behavior as part of the initial PR. If that PR exists please link to it here (and link back here from it).

In order to reduce the overall system power consumption, we should suspend the devices which are idle or not being used while system is active or running. Currently there is device idle power management framework which intends to do that. But the implementation seems can only do get/put one after one and can't handle the concurrency, for example if multiple threads request for DEVICE_PM_ACTIVE_STATE concurrently, there is possibility polling the signal without reset and signal contention among multiple threads. And the disable function doesn't consider the ongoing transition. Further it doesn't consider the device dependency. So decide rework the implementation and rename it device runtime power management following the definition in zephyrproject-rtos#24228. The API rt_dpm_claim is trying to resume the device and protect any hardware transfer after this call by increasing the usage count. When there is concurrency with another claim or release, this API will pend the current thread to the wait_q until previous transition finished. The API rt_dpm_release is to release previous claim, forbid unexpected release. And no hardware operation depends on release, so release has asynchronous version. After the release, the parents of that device will also be considered automatically. And it can be decided by individual device to support device runtime power management or not by the API rt_dpm_enable/rt_dpm_disable. And also it's the device driver instead of this framework that decide how to define device not in use, it means device driver decide where to put rt_dpm_claim/rt_dpm_release, for example we can put them around transfer function for i2c, but for net device, maybe we can only put them around open/close or other similar place. Signed-off-by: Wentong Wu <wentong.wu@intel.com>

wentongwu · 2020-07-23T03:08:09Z

@pabigot @nashif @vanti I attach the API here wentongwu@598eab9 we have so far, please share some comments to avoid going differently with anyone's idea in head. platform pm API and device pm API is still in progress, after that will settle down the implementation of the pm core(or pm manager).

vanti · 2020-07-23T19:00:20Z

@wentongwu Can we relax the definition of "Suspend to ram state" state a bit to include states such as the standby mode on TI CC1352, where almost everything is power-gated, with the exception for some minimal CPU logic that is required for waking up from the same point without restarting? I think the word 'everything' is a bit strong here.

pabigot · 2020-07-24T15:05:45Z

Just capturing some information about existing states. The ones at the top are Zephyr; I think the names are good, and the other references can show what we mean by those states.

The draft API pointed to above should incorporate documentation that explain more clearly what's meant by those states (the Linux System State link has the most detail).

System power management states:

Zephyr	ACPI	Linux
Runtime Active	S0	Runtime active
Runtime Idle	S0ix	Runtime idle
Suspend to idle	S1	Suspend-to-idle (S2, S2Idle)
Standby	S2?	Standby
Suspend to RAM	S2 or S4	Suspend-to-RAM (STR, S2RAM)
Suspend to Disk	S4	Hibernation
Soft Off	S5

Device States

Zephyr	ACPI	Linux
TBD	D0 (Fully On)	??
TBD	D1, D2	??
TBD	D3 (Hot, Cold)	??

wentongwu · 2020-07-28T07:20:04Z

@wentongwu Can we relax the definition of "Suspend to ram state" state a bit to include states such as the standby mode on TI CC1352, where almost everything is power-gated, with the exception for some minimal CPU logic that is required for waking up from the same point without restarting? I think the word 'everything' is a bit strong here.

@vanti updated. Thanks

wentongwu · 2020-07-28T12:41:51Z

Zephyr	ACPI	Linux
normal(PM_ACTIVE_STATE)	D0 (Fully On)	??
clock gate(PM_SUSPEND_STATE)	D1, D2	??
power gate(PM_OFF_STATE)	D3 (Hot, Cold)	??

@pabigot @vanti how about we use the current definition for device pm state?

wentongwu · 2020-07-29T01:12:25Z

another problem, if all of the code get ready, which one is the best platform we do the test?

wentongwu · 2020-07-29T01:23:42Z

Zephyr ACPI Linux
normal(PM_ACTIVE_STATE) D0 (Fully On) ??
clock gate(PM_SUSPEND_STATE) D1, D2 ??
power gate(PM_OFF_STATE) D3 (Hot, Cold) ??
@pabigot @vanti how about we use the current definition for device pm state?

maybe not clock gate, some devices can work on different clocks, maybe the defined API for device pm should pass down them as parameter.

vanti · 2020-07-29T03:19:34Z

Zephyr ACPI Linux
normal(PM_ACTIVE_STATE) D0 (Fully On) ??
clock gate(PM_SUSPEND_STATE) D1, D2 ??
power gate(PM_OFF_STATE) D3 (Hot, Cold) ??
@pabigot @vanti how about we use the current definition for device pm state?

I think active/suspend/off sound fine to me. On devices that only have active/off state, would suspend map to off?

another problem, if all of the code get ready, which one is the best platform we do the test?

The infrastructure should be tested on multiple platforms in my opinion, to make sure it is flexible enough.

pabigot · 2020-07-29T09:57:05Z

I don't like "power gate" and "clock gate" since those are (AIUI) technologies that produce a savings in power, not low-power states. I've also not seen any non-MCU/CPU devices (e.g. I2C or SPI ICs) that document their low power modes in those terms.

Remember there are four states to capture. Would it be active/suspend1/suspend2/off? That's getting vague.

So I'm leaning towards D0, D1, D2, D3. Then, for consistency, I've gotta change my position on system and go for S0, S0ix, S1, S2, S3, S4, S5. My motivation for the existing Linux-based names was it's more clear what those states mean, but that could be addressed by clear documentation (might even be better, as we can go into detail without getting wrapped up in what's implied by "suspend to RAM". (Though S0ix might become S0i) to indicate "idling in S0" rather than something to do with Intel-specific stuff).

Given where we are, I believe a one-to-one correspondence to an existing architecture like ACPI has the best chance for meeting cross-platform needs. I would prefer a different functional-based architecture for the device power states but I don't think that's going anywhere.

nashif · 2021-02-22T15:07:48Z

this is now done.

wentongwu added the RFC Request For Comments: want input from the community label Apr 9, 2020

wentongwu self-assigned this Apr 9, 2020

wentongwu added the area: Power Management label Apr 9, 2020

pabigot mentioned this issue Apr 10, 2020

device power management irq lock #24230

Open

pabigot mentioned this issue May 27, 2020

device_pm: clarify and document usage #24653

Closed

wentongwu mentioned this issue Jul 2, 2020

subsys: power: refactor device runtime power management #26366

Closed

pabigot mentioned this issue Jul 9, 2020

Power Management Infrastructure #14307

Closed

12 tasks

pabigot mentioned this issue Aug 24, 2020

OS Pwr Manager doesn't put nrf52 into LPS_1 #12025

Closed

ceolin mentioned this issue Oct 26, 2020

pm: Add power management states definition #29505

Merged

pabigot mentioned this issue Jan 7, 2021

Mapping between existing and new system power management states #31162

Closed

5 tasks

nashif assigned ceolin and unassigned wentongwu Feb 22, 2021

nashif closed this as completed Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system power states #24228

system power states #24228

wentongwu commented Apr 9, 2020 •

edited

Loading

wentongwu commented Apr 10, 2020 •

edited

Loading

pabigot commented Apr 10, 2020

sslupsky commented Apr 11, 2020

wentongwu commented Apr 13, 2020

pabigot commented Apr 13, 2020 •

edited

Loading

wentongwu commented Apr 15, 2020 •

edited

Loading

pabigot commented Apr 15, 2020

wentongwu commented Apr 23, 2020 •

edited

Loading

pabigot commented Apr 23, 2020

wentongwu commented May 8, 2020

wentongwu commented May 11, 2020 •

edited

Loading

pabigot commented May 11, 2020

wentongwu commented May 11, 2020 •

edited

Loading

pabigot commented May 11, 2020

wentongwu commented Jul 23, 2020

vanti commented Jul 23, 2020

pabigot commented Jul 24, 2020 •

edited

Loading

wentongwu commented Jul 28, 2020

wentongwu commented Jul 28, 2020

wentongwu commented Jul 29, 2020

wentongwu commented Jul 29, 2020 •

edited

Loading

vanti commented Jul 29, 2020 •

edited

Loading

pabigot commented Jul 29, 2020

nashif commented Feb 22, 2021

system power states #24228

system power states #24228

Comments

wentongwu commented Apr 9, 2020 • edited Loading

wentongwu commented Apr 10, 2020 • edited Loading

pabigot commented Apr 10, 2020

sslupsky commented Apr 11, 2020

wentongwu commented Apr 13, 2020

pabigot commented Apr 13, 2020 • edited Loading

wentongwu commented Apr 15, 2020 • edited Loading

pabigot commented Apr 15, 2020

wentongwu commented Apr 23, 2020 • edited Loading

pabigot commented Apr 23, 2020

wentongwu commented May 8, 2020

wentongwu commented May 11, 2020 • edited Loading

pabigot commented May 11, 2020

wentongwu commented May 11, 2020 • edited Loading

pabigot commented May 11, 2020

wentongwu commented Jul 23, 2020

vanti commented Jul 23, 2020

pabigot commented Jul 24, 2020 • edited Loading

wentongwu commented Jul 28, 2020

wentongwu commented Jul 28, 2020

wentongwu commented Jul 29, 2020

wentongwu commented Jul 29, 2020 • edited Loading

vanti commented Jul 29, 2020 • edited Loading

pabigot commented Jul 29, 2020

nashif commented Feb 22, 2021

wentongwu commented Apr 9, 2020 •

edited

Loading

wentongwu commented Apr 10, 2020 •

edited

Loading

pabigot commented Apr 13, 2020 •

edited

Loading

wentongwu commented Apr 15, 2020 •

edited

Loading

wentongwu commented Apr 23, 2020 •

edited

Loading

wentongwu commented May 11, 2020 •

edited

Loading

wentongwu commented May 11, 2020 •

edited

Loading

pabigot commented Jul 24, 2020 •

edited

Loading

wentongwu commented Jul 29, 2020 •

edited

Loading

vanti commented Jul 29, 2020 •

edited

Loading