-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating kurtosis from histogram #10
Comments
Thanks for your question, as I am in the middle of a project it might take a few days to come back to it. |
1024 bins means you need 4K RAM to hold all bins if you use the Histogram class. Some googling showed there is quite some math involved.
(from the last link, not Arduino code) #ifndef RUNNINGSTATS_H
#define RUNNINGSTATS_H
class RunningStats
{
public:
RunningStats();
void Clear();
void Push(double x);
long long NumDataValues() const;
double Mean() const;
double Variance() const;
double StandardDeviation() const;
double Skewness() const;
double Kurtosis() const;
friend RunningStats operator+(const RunningStats a, const RunningStats b);
RunningStats& operator+=(const RunningStats &rhs);
private:
long long n;
double M1, M2, M3, M4;
};
#endif And here is the implementation file RunningStats.cpp. #include "RunningStats.h"
#include <cmath>
#include <vector>
RunningStats::RunningStats()
{
Clear();
}
void RunningStats::Clear()
{
n = 0;
M1 = M2 = M3 = M4 = 0.0;
}
void RunningStats::Push(double x)
{
double delta, delta_n, delta_n2, term1;
long long n1 = n;
n++;
delta = x - M1;
delta_n = delta / n;
delta_n2 = delta_n * delta_n;
term1 = delta * delta_n * n1;
M1 += delta_n;
M4 += term1 * delta_n2 * (n*n - 3*n + 3) + 6 * delta_n2 * M2 - 4 * delta_n * M3;
M3 += term1 * delta_n * (n - 2) - 3 * delta_n * M2;
M2 += term1;
}
long long RunningStats::NumDataValues() const
{
return n;
}
double RunningStats::Mean() const
{
return M1;
}
double RunningStats::Variance() const
{
return M2/(n-1.0);
}
double RunningStats::StandardDeviation() const
{
return sqrt( Variance() );
}
double RunningStats::Skewness() const
{
return sqrt(double(n)) * M3/ pow(M2, 1.5);
}
double RunningStats::Kurtosis() const
{
return double(n)*M4 / (M2*M2) - 3.0;
}
RunningStats operator+(const RunningStats a, const RunningStats b)
{
RunningStats combined;
combined.n = a.n + b.n;
double delta = b.M1 - a.M1;
double delta2 = delta*delta;
double delta3 = delta*delta2;
double delta4 = delta2*delta2;
combined.M1 = (a.n*a.M1 + b.n*b.M1) / combined.n;
combined.M2 = a.M2 + b.M2 +
delta2 * a.n * b.n / combined.n;
combined.M3 = a.M3 + b.M3 +
delta3 * a.n * b.n * (a.n - b.n)/(combined.n*combined.n);
combined.M3 += 3.0*delta * (a.n*b.M2 - b.n*a.M2) / combined.n;
combined.M4 = a.M4 + b.M4 + delta4*a.n*b.n * (a.n*a.n - a.n*b.n + b.n*b.n) /
(combined.n*combined.n*combined.n);
combined.M4 += 6.0*delta2 * (a.n*a.n*b.M2 + b.n*b.n*a.M2)/(combined.n*combined.n) +
4.0*delta*(a.n*b.M3 - b.n*a.M3) / combined.n;
return combined;
}
RunningStats& RunningStats::operator+=(const RunningStats& rhs)
{
RunningStats combined = *this + rhs;
*this = combined;
return *this;
} |
Found some time, Made an experimental sketch + library based upon the above code. |
Slightly faster skew() as a division is replaced by multiplication (to be verified). double skew()
{
return sqrt(double(_count)) * M3 * pow(M2, -1.5);
}; |
@RobTillaart that is very kind of you to jump into this and put together that library! Today my my code does something of this kind: RAM consumption is not of an issue as I am using a Teensy 4.1 with 1MB of internal RAM and 16MB of PSRAM, I assume that in order to setup the same kind of architecture, I would need two histogram classes? |
(Not a statistician, never worked with the kurtosis/skewness until today, so all disclaimers apply) Thinking out loud, Using two buffers sounds logical however swapping the buffers means (1) either you clear and start all over again or (2) you continue with the contents of that histogram.
What you should do is to use three histogram objects(x,Y,Z), in pseudo code task_filler()
{
forever()
{
value = adc();
if (flag == x) X[adc]++;
else Y[adc]++;
}
}
task_analyzer()
{
Kurtosis K;
forever()
{
K.reset();
// add the Y histo to the Z histo and add the new Z value to K.
for (idx = 0; idx < max; idx++)
{
if (flag == x) Z[idx] += Y[idx]; // while x is being filled we analyze Y
else Z[idx] += X[idx]; // and vice versa
K.add(Z[idx]);
}
print(K.count + "\t");
print(K.kurtosis());
// clear process historgram, "swap" X and Y, and start again.
if (flag == x)
{
Y.clear();
flag = y;
}
else
{
X.clear();
flag = x;
}
} 250 KHz samples every 100 ms means that you could have 25000 samples in one bin. |
Created a new library - https://github.com/RobTillaart/Kurtosis (under development) The calculation of the kurtosis of a histogram is more work as you need to add every entry, e.g. if bin 13 has a count of 123, you have to add 123 times the value bin 13 represents. I can implement an add(value, times) but the only gain is that you have no function call overhead. Might still be worth it So it will easier to have one histogram and add all measurements to the Kurtosis object directly after making the measurement. |
Released the version 0.1.0 of the kurtosis library today. Furthermore in the new 0.1.1 develop branch I have added add(double value, uint8_t times). This add(double value, uint8_t times).allows to add a single value multiple times in one call. (need to find time) |
@RobTillaart that's amazing, thanks!! Could you help me put together a simple example of a 1024 bin histogram (I saw you need to declare the bin list in the class, and it can't be set after - so how would I do over 1k bins without looping over b[] to assign a value to each bin?) Once I have something basic in front of me, I can scale it up for the double or triple buffers etc |
@david-res This morning I did a second test - code below so you can confirm - in which I filled an array with random values. Then I filled a 2nd kurtosis object with add(value, times) from the histogram. The metrics failed to be calculated. Reason is that when you start with e.g. 10x value 4 the average is 4 but the variance etc is all zero. Apparently the math is such that when adding the variances it stays zero. This seems to be also affecting the other metrics. TODO: check if adding order affects metrics. There might be a (time consuming) trick to handle this, I will dive into it right away //
// FILE: kurtosis_histogram.ino
// AUTHOR: Rob Tillaart
// DATE: 2024-05-21
// PURPOSE: demo determination of skewness and kurtosis of histogram
// URL: https://github.com/RobTillaart/Kurtosis
#include "Kurtosis.h"
Kurtosis K1;
Kurtosis K2;
// simplified histogram
uint16_t hist[100];
void setup()
{
Serial.begin(115200);
Serial.println(__FILE__);
Serial.print("KURTOSIS_LIB_VERSION: ");
Serial.println(KURTOSIS_LIB_VERSION);
Serial.println();
K1.reset();
// fill the histogram with 10000 measurements
for (int i = 0; i < 10000; i++)
{
int x = random(100); // simulate measurement
hist[x]++;
K1.add(x);
}
// print statistics
Serial.println("K1");
Serial.print("COUNT:\t");
Serial.println(K1.count());
Serial.print("MEAN:\t");
Serial.println(K1.mean());
Serial.print("VAR:\t");
Serial.println(K1.variance());
Serial.print("STDDEV:\t");
Serial.println(K1.stddev());
Serial.print("SKEW:\t");
Serial.println(K1.skewness());
Serial.print("KURT:\t");
Serial.println(K1.kurtosis());
Serial.println();
K2.reset();
for (int i = 0; i < 100; i++)
{
K2.add(i, hist[i]);
}
// print statistics
Serial.println("K2");
Serial.print("COUNT:\t");
Serial.println(K2.count());
Serial.print("MEAN:\t");
Serial.println(K2.mean());
Serial.print("VAR:\t");
Serial.println(K2.variance());
Serial.print("STDDEV:\t");
Serial.println(K2.stddev());
Serial.print("SKEW:\t");
Serial.println(K2.skewness());
Serial.print("KURT:\t");
Serial.println(K2.kurtosis());
Serial.println();
}
void loop()
{
}
// -- END OF FILE -- output
only skewness seems to be correct |
Trick failed, 😒, learned (again) that adding distributions is different from adding values. This means that the add(value, times) will be removed from the 0.1.1 API. Sorry I could not solve your question. For me the investigations will stop for now as I am out of ideas and time. |
@RobTillaart so today what I do is I fill a buffer(histogram) with a count increment based on the ADC value as each bin represents an ADC value, something like this: uint32_t histogramBuffer[1024] = {0};
void adc0_isr(void) {
uint16_t measurement = adc->adc0->analogReadContinuous();
// Increment the corresponding histogram bin
histogramBuffer[measurement] += 1;
}
//called after switching histogram buffers
void processData(){
for(int i=0;i>1024;i++){ //Loop over each bin
uint32_t count = histogramBuffer[i]; //get bin count value
for(int j=0;j<count;j++){
K1.add(i);
}
}
Serial.print("KURT:\t");
Serial.println(K1.kurtosis());
} My only concern is the Kurtosis is not aware of bins that have a value of 0 - would that not put off the calculation? |
Use an if statement to filter them If ("bin not empty") k.add(bin); |
The for loop is doing that already. If count eq 0 then the loop won't run. What do you think? |
I see now zero bins are handled correctly. |
Good catch, I missed that for int j. But what about my comment regarding the mean calculation? I think I could increment a variable each time the bin value eq 0. Then call K1.add(0) x variable count. |
Simple example Bins with value 0 should therefor not be added as zero. |
K1.add(0) |
Right right, that makes total sense now. |
@RobTillaart I got the Kurtosis class working, thanks! I have another question - how can I create a pointer array to two Histogram objects? This is so that I can swap between them every 100ms in my existing intervalTimer interrupt. |
Tomorrow I might have some tome |
I ended up creating two histograms object: H1 and H2 I them created a class pointer array: it compiles, but I have not run it yet |
Looks like nice pointers which will work. It is easier to have an array of histograms and adjust the (byte size) index one may write to.. this index is either 0 or 1 and can only have one value, so atomic by design (note it should be byte sized). |
Depending on board swapping two pointers can be made atomic enough. You have one process that writes to a H1 and a process that reads H2. When the reading process is done it clears H2 and changes the pointer if the write process to pint to H2. (Both pointers point to H2 now however the read process is dine). So it is possible if the reader is in control of the swapping. |
//
// FILE: hist_pointer.ino
// AUTHOR: Rob Tillaart
// PURPOSE: demo histogram pointer
// URL: https://github.com/RobTillaart/Histogram
#include "histogram.h"
// boundaries does not need to be equally distributed.
float a[] = {
0, 50, 125, 200, 250, 350, 500
};
float b[] = {
0, 100, 200, 300, 325, 350, 375, 400
};
Histogram histA(7, a);
Histogram histB(8, b);
Histogram *phist;
void setup()
{
Serial.begin(115200);
Serial.println(__FILE__);
Serial.print("HISTOGRAM_LIB_VERSION: ");
Serial.println(HISTOGRAM_LIB_VERSION);
Serial.println();
phist = &histA;
}
void loop()
{
phist->clear();
for (int i = 0; i < 10000; i++)
{
int x = random(600) - 50;
phist->add(x);
}
Serial.print(phist->count());
for (uint16_t i = 0; i < phist->size(); i++)
{
Serial.print("\t");
Serial.print(phist->frequency(i), 2);
}
Serial.println();
// swap
if (phist == &histA) phist = &histB;
else phist = &histA;
}
// -- END OF FILE -- compiles and runs. Sample output shows the different length of the 2 histograms.
will update the library with the example soon. |
Created develop branch with pointer and array example + reference to new kurtosis library. Think the original issue of kurtosis math is solved now. |
- add examples **hist_pointer.ino** and **hist_array.ino** - issue #10 spin off library https://github.com/RobTillaart/Kurtosis - minor edits
@david-res |
Note to self: (thinking out loud modus) So we do not add C times V but V +-delta, which averages to V but could represent the original values better. As said before, no statistician, however the idea would recreate the possible spread of the original values. |
Cheers @RobTillaart! |
Please open a new issue if it is a histogram issue. FYI added a standalone kurtosis and skewness function to https://github.com/RobTillaart/kurtosis develop branch. |
How would you suggest that I calculate the Kurtosis of the histogram?
Ideally I would have at least 1024 bin (10 bit adc resolution)
Would you be able to provide such an example?
The text was updated successfully, but these errors were encountered: