Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in PyCBC reproducible noise: Discontinuities present over frame boundaries #4772

Open
spxiwh opened this issue Jun 5, 2024 · 9 comments

Comments

@spxiwh
Copy link
Contributor

spxiwh commented Jun 5, 2024

Details from @ArthurTolley:

We are trying to use pycbc_condition_strain to make a long stretch of simulated data with short frame files (to test PyCBC Live like operation). This saw a lot of glitches, Arthur produced the following minimal example:

fake_strain=aLIGODesignSensitivityT1800044
ifo=H1
duration=32
out_path="./${ifo}-SIMULATED_STRAIN-{start}-{duration}.gwf"

for chunk in {1..2}; do
    gps_start_time=$((1200000000+(${chunk}-1)*64))
    gps_end_time=$((1200000000+${chunk}*64))

    pycbc_condition_strain \
        --fake-strain ${fake_strain} \
        --fake-strain-seed 1234 \
        --output-strain-file ${out_path} \
        --gps-start-time ${gps_start_time} \
        --gps-end-time ${gps_end_time} \
        --sample-rate 2048 \
        --frame-duration ${duration} \
        --channel-name $ifo:SIMULATED_STRAIN \

done

which produces two data chunks with a discontinuous boundary (Q-scan over boundary):

image

I can't see anything wrong here. Arthur says if the PSD is changed to aLIGOMidLowSensitivityP1200087 the data is good, but we shouldn't have "bad" and "good" PSDs in this context.

@spxiwh
Copy link
Contributor Author

spxiwh commented Jun 5, 2024

Plotting code to make omega scan:

import matplotlib.pyplot as plt
import h5py
import numpy
from pycbc.frame import read_frame
import glob
from pycbc.filter import resample_to_delta_t

# Change data dir, change figure save location

# Find the data of the time you want to plot
H1_data_dir = './'

time = int(1200000064)
previous_frame = time - 32
next_frame = time + 32

# Read in the frames at the boundary
h1_frames = [H1_data_dir + f'H1-SIMULATED_STRAIN-{previous_frame}-32.gwf',
             H1_data_dir + f'H1-SIMULATED_STRAIN-{time}-32.gwf',
             H1_data_dir + f'H1-SIMULATED_STRAIN-{next_frame}-32.gwf']

print(f" Reading in frames")
h1_data = read_frame(h1_frames, 'H1:SIMULATED_STRAIN', start_time=time-16, end_time = time+16)
# h1_data = resample_to_delta_t(h1_data, 1/512)

fig, ax = plt.subplots(figsize=(12,10))
ax.set_rasterized(True)

# Plot the data
print(" Whitening and qtransforming data")
t, f, p = h1_data.whiten(4, 4).qtransform(.001,
                                        logfsteps=100,
                                        qrange = [40.0, 45.0],
                                        frange = [20, 100],)
print(" Plotting qscan underneath")
ax.pcolormesh(t, f, p**0.5, vmin=1.0, vmax=5.0, shading = 'auto')
ax.set_yscale('log')

ymin, ymax = 20, 100
xmin, xmax = ax.get_xlim()
ax.set_ylim(ymin, ymax)
ax.set_xlim(xmin, xmax)

ax.set_ylabel('Frequency [Hz]')
ax.set_xlabel('Time [s]')
plt.show()
plt.savefig(f'{time}.png')
plt.close()
print(" Done")

@ahnitz
Copy link
Member

ahnitz commented Jun 5, 2024

@spxiwh The hint from @ArthurTolley I think explains the issue potentially. Try the following and I think it will look "ok".

fake_strain=aLIGODesignSensitivityT1800044
ifo=H1
duration=32
out_path="./${ifo}-SIMULATED_STRAIN-{start}-{duration}.gwf"

for chunk in {1..2}; do
    gps_start_time=$((1200000000+(${chunk}-1)*64))
    gps_end_time=$((1200000000+${chunk}*64))

    pycbc_condition_strain \
        --fake-strain ${fake_strain} \
        --fake-strain-seed 1234 \
        --fake-strain-flow  10 \
        --output-strain-file ${out_path} \
        --gps-start-time ${gps_start_time} \
        --gps-end-time ${gps_end_time} \
        --sample-rate 2048 \
        --frame-duration ${duration} \
        --channel-name $ifo:SIMULATED_STRAIN

done

I suspect that the PSD you are using goes to somewhat interesting behavior (e.g. which result in numerical instability near DC) if you allow it to go to zero frequency, so you need the option to set the flow for the noise generation.

@ahnitz
Copy link
Member

ahnitz commented Jun 5, 2024

Figure_1

@ArthurTolley
Copy link
Contributor

Thanks @ahnitz, annoyingly I used --low-frequency-cutoff 10 which we initially thought was the cause of the problem but this argument seems to do nothing for our use case of pycbc_condition_strain. If I used the correct argument to begin with I wouldn't have seen this issue at all!

@titodalcanton
Copy link
Contributor

Phew, the documentation page for condition_strain does mention --fake-strain-flow 10 (though it does not explain why it is important, nor what is really happening behind the scenes).

@ArthurTolley
Copy link
Contributor

Serves me right for using an old example (from pycbc/examples/live/run.sh) instead of going directly to the documentation 😅

@GarethCabournDavies
Copy link
Contributor

Unless there are objections, I will re-name the issue to be that the low-frequency-cutoff options in pycbc_condition_strain need explaining better in the documentation

@titodalcanton
Copy link
Contributor

We should probably add --fake-strain-flow to the PyCBC Live example as well.

@spxiwh
Copy link
Contributor Author

spxiwh commented Jun 11, 2024

I still believe this is a bug, and not a documentation problem. It may be an unavoidable and understandable bug, but it is dangerous, and we probably need a sanity check in the reproducible noise module to check for things like the PSD covering too large a dynamic range and/or certain values going out of a valid range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants