Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect periodogram with multiple terms #28

Open
mwcraig opened this issue Apr 4, 2016 · 5 comments
Open

Incorrect periodogram with multiple terms #28

mwcraig opened this issue Apr 4, 2016 · 5 comments

Comments

@mwcraig
Copy link

mwcraig commented Apr 4, 2016

The code snippet below leads to a periodogram with incorrrect values (less than zero, greater than one). An image showing the periodogram is a little lower down and the data used in the example is here.

from __future__ import print_function, division

from astropy.table import Table

from gatspy.periodic import LombScargleFast, LombScargle

import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

# use seaborn for plot styles
import seaborn; seaborn.set()

s = Table.read('tres_10c.csv')
t = s['BJD_TDB']
mag = s['Source-Sky_C10']
dmag = s['mag_error']

model = LombScargle(Nterms=5).fit(t, mag, dmag)
periods, power = model.periodogram_auto(nyquist_factor=100, oversampling=5)

gatspy-copy2

@jakevdp
Copy link
Member

jakevdp commented Apr 4, 2016

Thanks - there have been other issues reporting similar problems. I suspect it's due to some sort of floating-point roundoff issue that pops up at the particular frequency. A 5-term periodogram involves the simultaneous solution of 11 linear equations, so it's not out-of-the-question that such an issue would arise.

@jakevdp
Copy link
Member

jakevdp commented Apr 4, 2016

Looking at the data, I suspect it's because you have so many tightly-clustered observations separated by a relatively long time, but I'm not sure what you might do to fix this.

@mwcraig
Copy link
Author

mwcraig commented Apr 7, 2016

As it happens it actually picks out the correct period as one of those found using find_best_periods if it is restricted to right range. Given the sampling, that is remarkable! The real fix is more nights of data (being reduced).

@jakevdp
Copy link
Member

jakevdp commented Apr 7, 2016

Some extra intuition for why this is happening: the periodogram is P = 1 - χ² / χ²ref where χ² is the minimum chi-square about the best multiterm periodic model, and χ²ref is the minimum chi-square around the best constant model (i.e. the weighted mean). Theoretically, you should always have 0 ≤ χ² ≤ χ²ref, because the periodic model has more model complexity and thus should never do worse than a flat line. If numerical instabilities lead to a suboptimal fit, however, you may converge to a model that is worse than the weighted mean, which leads to the situation you're seeing.

@mwcraig
Copy link
Author

mwcraig commented Apr 7, 2016

Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants