Complex penalties #131

multimeric · 2021-03-04T15:15:09Z

multimeric
Mar 4, 2021

Moved from an issue

I note in the paper you talk about different penalties in section 6.1. However from looking through the library, it seems that ruptures only supports a fixed linear penalty (ie Beta). Am I right to assume that it doesn't work with more complex penalties linear such as AIC?

Further, if I wanted to implement a method were you normally would calculate a p-value of splitting (ie a likelihood ratio test following Chi squared), is the idea that we just have the error() method return the test statistic without testing for significance (ie the raw likelihood ratio), and the penalty constant implies a p-value? I suppose this makes segmentation fast and flexible, but highly dependent on the choice of penalty?

Answered by deepcharles

Mar 8, 2021

Hello Michael,

AIC is also a linear penalty (in the context of change point detection) so you could also use it (you only need to set the beta accordingly). However you are right to assume that ruptures can only deal with linear penalties. In the case of general penalty formulas, there is no efficient way to find the best segmentation. If you have a specific case in mind that you think might be worth integrating to ruptures, I would be glad to hear it.

As for your second question, the .error(start, end) method only returns the cost on a given sub-signal signal[start:end]. It is not always equal to the likelihood because constant terms were sometimes discarded (because they did not change …

View full answer

deepcharles · 2021-03-08T08:19:39Z

deepcharles
Mar 8, 2021
Maintainer

Hello Michael,

AIC is also a linear penalty (in the context of change point detection) so you could also use it (you only need to set the beta accordingly). However you are right to assume that ruptures can only deal with linear penalties. In the case of general penalty formulas, there is no efficient way to find the best segmentation. If you have a specific case in mind that you think might be worth integrating to ruptures, I would be glad to hear it.

As for your second question, the .error(start, end) method only returns the cost on a given sub-signal signal[start:end]. It is not always equal to the likelihood because constant terms were sometimes discarded (because they did not change the segmentation). Nevertheless, the procedure you propose could be implemented in ruptures. Something like:

bkps = algo.predict(n_bkps=2)
# compute p-value for bkps
...
bkps = algo.predict(n_bkps=3)
# compute p-value for bkps
...

If you have a reference that describe such procedure, I could help more.

On a side note, we are currently adding examples (which are basically notebooks) to the documentation. If you have an interesting procedure, we would be glad to add it. Just let us know.

Cheers

6 replies

multimeric Mar 24, 2021
Author

Can you elaborate on how the AIC would be implemented? I can see how the breakpoint penalty would factor in, but if we can't properly calculate the likelihood then how is it AIC?

For my second question, are you proposing that we first calculate the breakpoints, and then test their significance? I think this would make sense.

deepcharles Mar 24, 2021
Maintainer

For my second question, are you proposing that we first calculate the breakpoints, and then test their significance? I think this would make sense.

Yes. See for instance this paper:

Hyun, S., Lin, K. Z., G'Sell, M., & Tibshirani, R. J. (2018). Post‐selection inference for changepoint detection algorithms with application to copy number variation data. Biometrics. (http://www.stat.cmu.edu/~ryantibs/papers/binseginf.pdf)

Can you elaborate on how the AIC would be implemented?

AIC is defined as follows (lower is better):

$$ AIC = 2k-2\ln \hat{L} $$

where $\hat{L}$ is the maximum likelihood and $k$ is the number of parameters.

You need to first define a log-likelihood, for instance Gaussian, plug the maximum likelihood estimator for mean and/or variance:

$$ \ln \hat{L} = \sum_{t=1}^T \ln f(y_t | \hat{mu}, \hat{\sigma}) $$

where $f(\cdot | mu, \sigma)$ is the probability density of $\mathcal{N}(\mu, \sigma)$, and $hat{mu}$ and $\hat{\sigma} are the empirical mean and standard deviation.
The number $k$ of parameters is equal $K + 1 + 1$ (the means on each segment + the variance) where $K$ is the number of change points.

Note that since we only want the argmin of AIC, additive constants that do not depend on the segmentation can be removed. Finally the AIC amounts to PELT with a specific penalty value, like the BIC criterion (see this issue)

multimeric Mar 24, 2021
Author

Yes, I understand the definition of AIC and how it uses the maximum likelihood. What I'm asking is how this can be implemented in ruptures, because the only way to provide a penalty is seemingly as a scalar.

deepcharles Mar 24, 2021
Maintainer

In the univariate Gaussian case, AIC (after simplifications) amounts to minimizing $\sum_k c(y_{t_k..t_{k+1}}) + \beta K$ where $K$ is the number of changes, $\beta = 2\sigma^2$ and $c(y_{a..b})$ is the L2 cost on the sub-signal between indexes $a$ and $b$.

So you could use PELT as follows:

import ruptures as rpt

# assume you have a univariate signal in the variable `signal`, with estimated noise variance `sigma`

aic_pen = 2*sigma*sigma  # AIC

algo = rpt.Pelt(model="l2", min_size=2, jump=1).fit(signal)
my_bkps = algo.predict(pen=aic_pen)

print(my_bkps)

Hope this helps

multimeric Mar 24, 2021
Author

Thank you, that answers my question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex penalties #131

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Complex penalties #131

multimeric Mar 4, 2021

Replies: 1 comment · 6 replies

deepcharles Mar 8, 2021 Maintainer

multimeric Mar 24, 2021 Author

deepcharles Mar 24, 2021 Maintainer

multimeric Mar 24, 2021 Author

deepcharles Mar 24, 2021 Maintainer

multimeric Mar 24, 2021 Author

multimeric
Mar 4, 2021

Replies: 1 comment 6 replies

deepcharles
Mar 8, 2021
Maintainer

multimeric Mar 24, 2021
Author

deepcharles Mar 24, 2021
Maintainer

multimeric Mar 24, 2021
Author

deepcharles Mar 24, 2021
Maintainer

multimeric Mar 24, 2021
Author