periph_timer: systematic proportional error in timer_set #10545

jia200x · 2018-12-04T17:15:57Z

Description

While debugging LoRaWAN I discovered there is a systematic error while using timer_set. When setting a timer to T units of time, it actually gets triggered in T + X/100*T (with X a board specific drift).
This issue is at least present on samr21-xpro (0.48% drift) and b-l072z-lrwan1 (1% drift).

This is causing some problems in LoRaWAN. For the Join Request procedure, the first reception window should be opened 5 seconds right after there's a TX done event. In b-l072z-lrwan1 the RX window is opened 47 ms later than expected. This explains why this compensation works ok on this board but produces problems in others (@danpetry I know you were having issues with this).

Steps to reproduce the issue

I ran the following test (samr21-xpro, 32bit timer) which produces timer cycles (with timeout increasing linearly, 200ms step) and measured the LED pin with a logic analyzer:

#include <stdio.h>
#include "xtimer.h"
#include "timex.h"
#include "periph/timer.h"
#include "random.h"

#include "msg.h"
#include "thread.h"

#define T 200000
#define TIMER TIMER_DEV(1)
#define TIMER_MAX 0xffffffff
#define TIMER_OFFSET 0xffff

#define LED_OFF LED0_OFF
#define LED_TOGGLE LED0_TOGGLE

unsigned int i=T;

static void cb(void *arg, int chan)
{
    (void) arg;
    (void) chan;
    LED_TOGGLE;
    if (i+T < TIMER_MAX)
        timer_set(TIMER, 0, i);
    i += T;
}

int main(void)
{
    // Init timer with 1 MHz
    timer_init(TIMER, 1000000ul, cb, NULL);
    LED_OFF;
    timer_set(TIMER, 0, TIMER_OFFSET);

    return 0;
}

Expected results

The actual timeout timer for the n cycle (e.g how long does the nth cycle take) in milliseconds would be:

t'[n] = TIMER_OVERHEAD + n*200

Here TIMER_OVERHEAD is the intrinsic delay of timer_set (ISR + callback served). Since TIMER_OVERHEAD should be constant:

t''[n] = diff(t'[n]) = 200

So for a real measurement, I would expect diff(t[n]) to be centered in 200 and have some values around.

Actual results

I got the result of the logic analyzer here. It contains the timestamp when there was a flank. Then, I ran the following using python's pandas to extract the min, avg and max value of the sample's diff(t[n]):

import pandas as pd

data = pd.read_csv("results/untitled.csv")
data = data["Time[s]"][3:]
p_data = data.diff().diff()*1000 #Get derivative of cycles timeout timer in ms
p_data.min(), p_data.mean(), p_data.max()

Got (200.57575000000583, 200.9604469696972, 201.59066666666803)
So, diff(t[n]) is slightly to the right. There's a deviation of ((200.9604469696972 - 200)/200 * 100) = 0.48%, as mentioned before.

TL;DR

This code toggles the LED every 10 seconds:

#include <stdio.h>
#include "timex.h"
#include "periph/timer.h"

#define T 10000000
#define TIMER TIMER_DEV(1)
#define TIMER_MAX 0xffffffff
#define TIMER_OFFSET 0xffff

#define LED_OFF LED0_OFF
#define LED_TOGGLE LED0_TOGGLE

unsigned int i=T;

static void cb(void *arg, int chan)
{
    (void) arg;
    (void) chan;
    LED_TOGGLE;
    if (i+T < TIMER_MAX)
        timer_set(TIMER, 0, T);
    //i += T;
}

int main(void)
{
    //int interval = 0;
    timer_init(TIMER, 1000000ul, cb, NULL);
    LED_OFF;
    timer_set(TIMER, 0, T);

    return 0;
}

Here is the output of the logic analyzer when I ran this snippet on samr21-xpro. Then, I calculated the length between LED toggles:

     Time[s]   Channel 0
2  10.051050         1.0
3  10.050970        -1.0
4  10.050887         1.0
5  10.050525        -1.0
6  10.050817         1.0

Greater than 10 (avg: 10.05085). Drift here is:

((10.05085-10)/10) * 100= 0.51 %

Sort of consistent with the value reported before for samr21-xpro

The text was updated successfully, but these errors were encountered:

jia200x · 2018-12-04T17:16:47Z

Can someone confirm if this issue is present? @kaspar030 @MichelRottleuthner

jia200x · 2018-12-04T17:17:37Z

(didn't put it in the description, but obviously this issue also affects xtimer)

jia200x · 2018-12-04T18:30:07Z

this might be also related to #10352

jnohlgard · 2018-12-04T19:40:16Z

Did you try the application tests/bench_timer?
It performs the same kind of test but using another hardware timer on the same CPU as the reference. It produces some statistics about the error distribution as well.

jia200x · 2018-12-04T19:56:04Z

Did you try the application tests/bench_timer?

IIRC I got similar results with both TIMER_DEV(0) and TIMER_DEV(1). So, the issue might be invisible to that test :/

I will confirm this tomorrow (and of course, run tests/bench_timer)

kaspar030 · 2018-12-04T22:28:13Z

Got (200.57575000000583, 200.9604469696972, 201.59066666666803)
So, diff(t[n]) is slightly to the right. There's a deviation of ((200.9604469696972 - 200)/200 * 100) = 0.48%, as mentioned before.

But that deviation is (more or less) constant, right? When printing the differences I get these:

 [200.9545     200.97325    200.95366667 200.853      201.03866667
 200.89083333 201.15025    200.77225    201.13441667 200.84225
 201.08283333 200.77816667 200.9925     200.96558333 201.056
 200.95925    200.84258333 200.8585     201.59066667 201.1005
 200.76441667 200.57575   ]

Note that the maximum is at pos n-3, the minimum is the last value.
So the actual measured elapsed time differences are off from min=0.575 to max=1.59, but it is not increasing.

When setting a timer to T units of time, it actually gets triggered in T + X/100*T

According to your data, it is more like T + 0.048*200ms.

Anyways, a milisecond spread when using periph timer is not good at all, and ~10k usec TIMER_OVERHEAD (0.048 * 200ms) is very weird.

Could you, with the oscilioscope, measure the time it takes for just plainly setting the gpios?

LED0_TOGGLE;
LED0_TOGGLE;
LED0_TOGGLE;

Maybe the GPIO is not as fast as it should be.

kaspar030 · 2018-12-04T22:29:33Z

Could you, with the oscilioscope, measure the time it takes for just plainly setting the gpios?

Maybe a couple thousand times in a loop, so we can get a baseline on what accuracy the measurement has?

kaspar030 · 2018-12-04T22:35:14Z

t[n] = TIMER_OVERHEAD + n*200

BTW, your code actually sets relative times (timer_set(i)), thus the expected time for each step is

t[n] = t[n-1] + TIMER_OVERHEAD + n*200.

jia200x · 2018-12-05T09:33:07Z

But that deviation is (more or less) constant, right? When printing the differences I get these:

Yes, the deviation is more or less constant. For samr21-xpro is 0.48%. For b-l072z-lrwan1 around 1%.

Could you, with the oscilioscope, measure the time it takes for just plainly setting the gpios?

I already did that and IIRC it was for sure less than 10 us. I will measure again.

Anyways, a milisecond spread when using periph timer is not good at all, and ~10k usec TIMER_OVERHEAD (0.048 * 200ms) is very weird.

With TIMER_OVERHEAD I meant the ISR + callback overhead. That should be always constant.

BTW, your code actually sets relative times (timer_set(i)), thus the expected time for each step is

The description and the variable of the formula is misleading. I will fix it.
Basically this is how long does the nth cycle take (first cycle 200ms, then 400ms, 600ms, 800ms, ...)
In reality each cycle should be slightly longer due to the TIMER_OVERHEAD (as described above), but this one should be constant (the time to receive and serve the interrupt shouldn't be dependent on the timeout value). Then, t[n+1] - t[n] should be constant to 200ms. A slight drift of 0.48% in this value will produce a drift of 0.48% on every call to timer_set :(

MichelRottleuthner · 2018-12-05T09:33:23Z

The clock configuration of this board looks suspicious regarding this issue.

From boards/samr21-xpro/include/periph_conf.h it looks like there are three possible settings for the clock config:

CLOCK_USE_PLL = 1
CLOCK_USE_XOSC32_DFLL = 1
CLOCK_USE_XOSC32_DFLL = CLOCK_USE_PLL = 0

Then looking at cpu/samd21/cpu.c and cpu/samd21/periph/timer.c gives the following clock sources for the timers:

OSC8M->[GCLK1]->timer{0,1}
OSC8M->[GCLK1]->timer{0,1}
OSC8M->[GCLK0]->timer{0,1}

So to me this looks like independent of the actually available clocks the timer always uses OSC8M.
According to datasheet page 976 the accuracy of this oscillator is bad even when using flash calibration values.
(0.75% to 2% depending on condtions + temperature and supply effects).

jia200x · 2018-12-05T09:36:10Z

(fixed formula to make it clearer)

kaspar030 · 2018-12-05T10:26:56Z

In reality each cycle should be slightly longer due to the TIMER_OVERHEAD (as described above), but this one should be constant (the time to receive and serve the interrupt shouldn't be dependent on the timeout value). Then, t[n+1] - t[n] should be constant to 200ms. A slight drift of 0.48% in this value will produce a drift of 0.48% on every call to timer_set :(

To what do you think the error is proportional? 0.48% of what? If the length of each cycle increases, but the error is pretty constant at 0.48% of 200ms, I'd rather assume that the error is constant. It is certainly not in proportion to the cycle length, otherwise the error would increase.

What happens if you use 400ms, 800ms difference between cycles? If the error is proportional to the difference, it would stay close to 0.48% of 400ms, or 0.48% of 800ms.

So to me this looks like independent of the actually available clocks the timer always uses OSC8M.
According to datasheet page 976 the accuracy of this oscillator is bad even when using flash calibration values.
(0.75% to 2% depending on condtions + temperature and supply effects).

That seems like a very probable explanation.

jia200x · 2018-12-05T10:43:15Z

To what do you think the error is proportional? 0.48% of what? If the length of each cycle increases, but the error is pretty constant at 0.48% of 200ms, I'd rather assume that the error is constant. It is certainly not in proportion to the cycle length, otherwise the error would increase.

The error is proportional to the length of the timeout window (a.k.a the difference between the callback and the moment were timer_set is called).
As shown in the TL;DR section, for 10 seconds it gives an error of 0.5%. Thus, is not constant.

In fact:
diff(t[n]) = t[n+1] - t[n] = 200 + 0.0048*200 => t[n] = 200*n + 0.0048*200*n = 200*n*(1+0.0048)
If the expected time is e[n] = 200*n, then the drift with the actual value is:
t[n] / e[n] = 0.48%

So it's really important diff(t[n]) is centered in 200 ms (in practice there are always drifts because of clock jitter, but they somehow cancel if it's centered)

kaspar030 · 2018-12-05T11:54:00Z

The error is proportional to the length of the timeout window (a.k.a the difference between the callback and the moment were timer_set is called).

Look again at the data:

[200.9545     200.97325    200.95366667 200.853      201.03866667
 200.89083333 201.15025    200.77225    201.13441667 200.84225
 201.08283333 200.77816667 200.9925     200.96558333 201.056
 200.95925    200.84258333 200.8585     201.59066667 201.1005
 200.76441667 200.57575   ]

There's 0.48% of 200 ms average deviation, with a standard deviation of 0.19. But it is not increasing with increasing interval length. It is not 0.48% of 200*n. The data says, for intervals 200*(1......n), the differences between two n, which should be 200ms, are too high by 1 ms, on average. So actually, the differences are (more or less) centered at 200ms. Or maybe I don't get the data.

As shown in the TL;DR section, for 10 seconds it gives an error of 0.5%. Thus, is not constant.

True that. Too bad you didn't run the "increasing 200ms" capture so it arrives at 10s.

Could you try reading the timer in a loop, toggling the led at specific intervals?

e.g.,

uint32_t n = 0
while(1) {
 n++;
if ((n & 0xffff) == 0) {
LED0_TOGGLE;
}

so we get some data on how accurate the timer actually is?

jia200x · 2018-12-05T14:25:51Z

Look again at the data:

Note this is the data of the second discrete derivative of the timestamps.
With the latest description this means:

t[n] = Time where there was a flank
t'[n] = diff(t[n]) = Duration of nth cycle (expected to be  [200, 400, 600, 800...] )
t''[n] = diff(t'[n]) = Time difference between cycle n+1 and n (expected to be [200, 200, 200, 200, ...])

The cycle time t'[n] is equal to TIMER_OVERHEAD (probably in the order of microseconds) plus 200*n.

With this data:

[200.9545     200.97325    200.95366667 200.853      201.03866667
 200.89083333 201.15025    200.77225    201.13441667 200.84225
 201.08283333 200.77816667 200.9925     200.96558333 201.056
 200.95925    200.84258333 200.8585     201.59066667 201.1005
 200.76441667 200.57575   ]

The mean of the differences between cycles (t''[n]) is 200.96. It means the cycle time t'[n] = TIMER_OVERHEAD + 200.96*n.

If TIMER_OVERHEAD is not 0 (which was discarded in #10545#issuecomment) the deviation against the expected time is:

t'[n] / e'[n] = (TIMER_OVERHEAD + 200.96*n)/ (TIMER_OVERHEAD + 200*n) = 1 + 0.96*n/(TIMER_OVERHEAD+200*n)

For low values of TIMER_OVERHEAD or high values of n this tends to 1+0.96/200 = 1.0048 drift (0.48%)

Or maybe I don't get the data.

Sorry, I wasn't clear with the presentation of the data :(

Anyway, I will do more measurements and report the results back

jia200x · 2018-12-05T14:33:45Z

For 10 seconds (n=50000) the expected time with this 0.48% drift is approx:

200.96*50000 = 10.048 seconds (pretty close to what I measured before with the 10 seconds test)

kfessel · 2018-12-07T12:09:33Z

Please tell me that I am wrong but:
It seems to me that the b-l072z-lrwan1 board is configured to use the stm internal HSI (internal RC oscillator) which is callibrated in factory to 1% @ 25°C -> this is not good enaugh for the LoRa join Procedure.
Please tell me that I am wrong.

jia200x · 2018-12-07T12:33:04Z

It seems to me that the b-l072z-lrwan1 board is configured to use the stm internal HSI (internal RC oscillator) which is callibrated in factory to 1% @ 25°C -> this is not good enaugh for the LoRa join Procedure.
Please tell me that I am wrong.

Well, this could explain why we get 1% of drift in the b-l072z-lrwan1 board

kfessel · 2018-12-07T12:36:29Z

And also why the thing that should not have worked (50 ms offset) worked for you and not for me

kfessel · 2018-12-07T12:40:47Z

but when we fix it we also have to take a look on power management for the board since the external oci is powerd by an configurable jumper which may be powerd by a pin (PA12)

kaspar030 · 2018-12-07T13:33:55Z

The mean of the differences between cycles (t''[n]) is 200.96. It means the cycle time t'[n] = TIMER_OVERHEAD + 200.96*n.

Ok, now I'm with you. I wasn't getting that the 200.96 add up. Thanks for clarifying!

Why didn't you just state that the timer runs 0.48% too fast?

jia200x · 2018-12-07T13:57:04Z

When setting a timer to T units of time, it actually gets triggered in T + X/100*T (with X a board specific drift).
This issue is at least present on samr21-xpro (0.48% drift) and b-l072z-lrwan1 (1% drift).

You mean 0.48% slower? (Expected 5s => measured 5.047)

That's what I tried to say with:

While debugging LoRaWAN I discovered there is a systematic error while using timer_set. When setting a timer to T units of time, it actually gets triggered in T + X/100*T (with X a board specific drift).
This issue is at least present on samr21-xpro (0.48% drift) and b-l072z-lrwan1 (1% drift).

jia200x · 2018-12-07T14:03:24Z

Anyway, both boards have inaccurate clock sources for the timers. That indeed explains these drifts...

stale · 2019-08-10T02:09:16Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

smlng · 2019-09-10T06:37:27Z

I think that relates to the discussion we had during the summit regarding the fact that timer_set should disable IRQs while doing timer_read and timer_set_absolute. Hence, reopen

MichelRottleuthner · 2019-09-10T07:20:27Z

Actually I think it is only because of using an inaccurate clock source as I described in one of the comments above. But maybe we should open a separate issue for the periph timer thing.

jia200x assigned smlng Dec 4, 2018

jia200x added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems labels Dec 4, 2018

jia200x mentioned this issue Dec 7, 2018

semtech-loramac: fix wrong behavior with RX windows #9618

Closed

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019

stale bot closed this as completed Sep 10, 2019

smlng reopened this Sep 10, 2019

stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Sep 10, 2019

miri64 added this to the Release 2020.07 milestone Jul 3, 2020

miri64 modified the milestones: Release 2020.07, Release 2020.10 Jul 24, 2020

MrKevinWeiss modified the milestones: Release 2020.10, Release 2021.07 Jun 22, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

periph_timer: systematic proportional error in timer_set #10545

periph_timer: systematic proportional error in timer_set #10545

jia200x commented Dec 4, 2018 •

edited

Loading

jia200x commented Dec 4, 2018

jia200x commented Dec 4, 2018

jia200x commented Dec 4, 2018

jnohlgard commented Dec 4, 2018

jia200x commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

jia200x commented Dec 5, 2018

MichelRottleuthner commented Dec 5, 2018

jia200x commented Dec 5, 2018

kaspar030 commented Dec 5, 2018

jia200x commented Dec 5, 2018 •

edited

Loading

kaspar030 commented Dec 5, 2018

jia200x commented Dec 5, 2018

jia200x commented Dec 5, 2018

kfessel commented Dec 7, 2018 •

edited

Loading

jia200x commented Dec 7, 2018

kfessel commented Dec 7, 2018

kfessel commented Dec 7, 2018

kaspar030 commented Dec 7, 2018

jia200x commented Dec 7, 2018

jia200x commented Dec 7, 2018

stale bot commented Aug 10, 2019

smlng commented Sep 10, 2019

MichelRottleuthner commented Sep 10, 2019

periph_timer: systematic proportional error in timer_set #10545

periph_timer: systematic proportional error in timer_set #10545

Comments

jia200x commented Dec 4, 2018 • edited Loading

Description

Steps to reproduce the issue

Expected results

Actual results

TL;DR

jia200x commented Dec 4, 2018

jia200x commented Dec 4, 2018

jia200x commented Dec 4, 2018

jnohlgard commented Dec 4, 2018

jia200x commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

kaspar030 commented Dec 4, 2018

jia200x commented Dec 5, 2018

MichelRottleuthner commented Dec 5, 2018

jia200x commented Dec 5, 2018

kaspar030 commented Dec 5, 2018

jia200x commented Dec 5, 2018 • edited Loading

kaspar030 commented Dec 5, 2018

jia200x commented Dec 5, 2018

jia200x commented Dec 5, 2018

kfessel commented Dec 7, 2018 • edited Loading

jia200x commented Dec 7, 2018

kfessel commented Dec 7, 2018

kfessel commented Dec 7, 2018

kaspar030 commented Dec 7, 2018

jia200x commented Dec 7, 2018

jia200x commented Dec 7, 2018

stale bot commented Aug 10, 2019

smlng commented Sep 10, 2019

MichelRottleuthner commented Sep 10, 2019

jia200x commented Dec 4, 2018 •

edited

Loading

jia200x commented Dec 5, 2018 •

edited

Loading

kfessel commented Dec 7, 2018 •

edited

Loading