-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a check to verify autos are real-only #1110
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1110 +/- ##
=======================================
Coverage 99.83% 99.83%
=======================================
Files 31 31
Lines 15339 15370 +31
=======================================
+ Hits 15313 15344 +31
Misses 26 26
Continue to review full report at Codecov.
|
cef00f6
to
65bfc19
Compare
I like the approach here and it's very timely! The biggest question I have is do we want to default to making the autos be real? I worry that non-real autos is a big enough problem that we should maybe default to erroring rather than warning. |
So don't have particularly strong feelings about warn vs error, especially given that the remedy (calling However, given that the old |
@jsdillon This PR is adding a check for non-real autos, which is an issue that comes up from time to time and generally means there's a bug in upstream code (sometimes including the correlator). This new check seems to be breaking some tests on hera_cal, presumably because you have some old data files with these issues. It is possible those problems originated with pyuvdata, we recently realized that we sometimes had some numerical precision issues in the old phasing code that caused the autos to have very small (< one part in 10^10) imaginary parts. If that's the source I apologize! In this new checking code, there is an option to just set the imaginary parts of the autos to zero, which you might want to use, depending on the source of the problems in your files. |
a0d87cf
to
403956f
Compare
This is going to be tricky to fix in In the meantime, this PR needs a pyuvdata/pyuvdata/uvdata/uvh5.py Line 1471 in 403956f
|
Ok, I've done some digging and opened a draft PR which fixes 9 of the 12 test failures when running under this branch: HERA-Team/hera_cal#749 The rest are in |
Thanks @jsdillon! Sorry to make work for you. |
403956f
to
0369ee1
Compare
9b588bd
to
444f52c
Compare
Apologies for the delay, but I think we have a PR that fixes the hera_cal issue. Does anyone here want to review it? |
@jsdillon -- FWIW, I reran the above hera_cal checks post-merge of the aforementioned PR, and it seems like there are three tests that are still failing. On first glance though, it seems like the failing tests themselves are actually plugging random complex data into |
Some thoughts on this issue: While valid auto-correlations are always real, they are represented on the computer with complex numbers. Radio astronomers routinely perform completely valid complex arithmetic on auto-correlations which, in every case, introduce small non-real components from numerical precision errors. Examples include applying calibration solutions, filling in flagged channels, converting from Jy to K Sr etc... By requiring that auto-correlations be exactly zero, pyuvdata requires that any arithmetic operation that is performed on visibilities in some external pipeline must always manually set the imaginary part auto-correlation amplitudes to zero after that operation, even when this is not actually necessary. In order to avoid lots of downstream manual setting of autocorrelations to zero, would it make sense to define the real autos requirement to be that the imag part of the autos is below some fraction of the real part rather then the imag part being exactly equal to zero? |
I think that's probably a reasonable approach. I do think we need a check on this as this has been used to identify bugs in both correlators and downstream processing on multiple occasions now. The question then becomes what should the tolerance be? |
Maybe we can figure out what that should be through monte carlo simulations of some arithmatic operations? |
If you want to do that it'd be interesting input. I think it should also be somewhat physically motivated in terms of how much phase angle it is. In some other parts of pyuvdata we use 1 milliarcsecond (~4.85e-09 radian), but those are usually for physical angles not phase... |
The other option is to just pick a value that we know will never be scientifically relevant. |
I guess the problem with the phase argument is that it is related to the absolute value of the visibility, which we don't want here. A value that won't be scientifically relevant was where I was leading with the "physically motivated" comment before. The problem is that pyuvdata is used over a wide range of instruments. Errors that are not scientifically relevant for HERA often are for the SMA. If you want to do a little monte carlo for the kinds of operations you're worried about I think that would be helpful input. |
Hi @aewallwi -- the choice of checking only that the imaginary component was zero was based on three observations/concerns:
The only dog I have in this fight is the first item above - I'd just like it to be fast. But I'm happy to make the code a bit more tolerant, if that's what folks desire. Though FWIW, I don't think the tolerance issue is what's causing the current fails on the hera_cal pipeline. |
Just another example of non-zero imaginary components of auto-correlations... xGPU computes(-ed?) auto correlations the same as cross correlations because it is (was?) faster to do them the same than to have a separate real-only computation. When configured to use floating point math, the GPUs used in the early days of xGPU had a fused-multiply-add (FMA) instruction but not a corresponding fused-multiply-subtract operation, so some terms of the imaginary component of auto correlations that cancel out in theory did not cancel out completely in practice due to the difference in precision of the fused-multiply-add vs the non-fused-multiply-subtract. Not sure if that's still the case with today's GPUs (and most folks use xGPU's integer math now which is not afflicted this way), but thought I'd add it here for historical context/posterity. |
444f52c
to
3b77261
Compare
I agree that we should prioritize speed. I think the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and I think the discussion has converged.
A PR a day keeps the doctor away...
Description
UVData
called_fix_autos
, which will force the auto-correlation data indata_array
to be real-only.UVData.check
, which will see whether or not auto-correlation data have non-zero imaginary components in them. If they detected, depending on arguments supplied,check
can either raise an error or attempt to fix the auto-correlations (via a call to_fix_autos
).Motivation and Context
As originally motivated in #1024, and demonstrated in #1102, checking that the autos are real-only is a good way to help verify the health of any given
UVData
object, to make sure that something weird hasn't happened in the handling of the data set. However, since this check is potentially inspecting lots of values indata_array
, it can be somewhat computationally expensive to run, which seems somewhat antithetical to howUVData.check
should work -- quoting @bhazelton here from a comment in #1024:To that end, I've set up this new functionality with the following defaults:
check
will raise a warning, and will subsequently fix the issue by making those values real-only (usingnp.abs
on the autocorrelation data).check
will raise an error.I think the above is a reasonable balance between ease-of-use, speed, and ensuring the integrity of
UVData
objects. FWIW, The addition of_fix_autos
was motivated in part by the fact that some of the test files so have non-zero imaginary components in their auto-correlation data -- something that I believe is a hangover from the oldphase
method (something I noticed while working on #979).Closes #1024
Types of changes
Checklist:
New feature checklist: