-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non cubic #289
Non cubic #289
Conversation
…or non-cubic boxes
@BradGreig this is great. I've looked through the code and it all seems to check out. I think it's a very nice way of handling the feature. I would like to write a test for it... not sure exactly the best test to write. I mean, we can run a box and another with In terms of whether this is useful, I actually think it might be for one-point statistics. @dprelogo and I and a student have recently realized that a database of lightcones has biased global signals most likely due to periodicity in the lightcones due to coeval box size. Now, of course, we could just make the whole box bigger, but I think elongating along the LOS should help significantly without increasing the cost by N^3. Of course the power spectrum still only makes sense on scales smaller than the smallest dimension. But I think other statistics should be reasonably ok. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #289 +/- ##
==========================================
+ Coverage 86.49% 86.60% +0.10%
==========================================
Files 12 12
Lines 2807 2814 +7
==========================================
+ Hits 2428 2437 +9
+ Misses 379 377 -2
☔ View full report in Codecov by Sentry. |
Hey @steven-murray, with respect to the statistics, I'm not convinced the one-point statistics would be correct. Well, they should be mostly correct, but potentially it may not get the outliers/tails of the distribution correct. Primarily because the large-scale modes will be absent. Probably should be tested though. As for the tests, I suspect its format should be the same as most tests for astrophysical features. That is, set default features and check that the power spectrum is correct. I do not think we should be checking against results from a cubic box. I think we should do more rigorous checking that it is sensible to make such an approximation. Once we deem it is sufficient, then it is a feature that is always allowed, thus a standard test of a specific feature. This will also require double checking for the alternative scenario (i.e. larger transverse dimensions than line-of-sight) which should also be allowed. Obviously for the light-cone this will be bad, but could be helpful for co-eval box studies. |
Hey @steven-murray , @BradGreig My name is haina, and I tested this code under the guidance of steven. We used non-cubic-factor coefficients to produce more continuous light cone images, but we wanted to know whether the average power spectrum and correlation coefficients would have large variations and errors for the same number of cells and resolution in a given dimension statistically, so we made 50 samples in a limited amount of time for statistical purposes. I don't know much about the background so I'm paraphrasing steven's answer "We wanted to check whether, for power spectra obtained in different 8MHz chunks of the lightcone, firstly whether using the non-cubic factor leads to biases in the estimated power, and second whether it helps reduce the artificial correlation between the power spectra taken in different chunks. The latter is a good reason to consider using these non-cubic factors, because for example in HERA now we are doing 21cmMC inference with data at z=8 and z=10, and there is the possibility that the simulations have an artificial correlation between the power spectra in these two redshift chunks due to the repeating boxes. The key point from our tests, discussed below, is that using the non-cubic factor (greater than unity) does not bias the estimated (mean) power spectrum in any given chunk, but it does bring the correlations between chunks closer to the 'truth' (defined as that obtained with the largest cubic box)."(steven) In our tests, we made 50 live cases each for six ( Because Since 100, 200, and 400 were selected as HIIDIM, we compared the mean power spectrum and correlation coefficients of the first four chunks. The following code is the code we use to generate the boxes from fileinput import filename
import py21cmfast as p21c
import os
import time
import argparse
init_time=time.time()
print(f"21cmFAST version is {p21c.__version__}")
from pathlib import Path
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('random_seed', type=int)
parser.add_argument('HII_DIM', type=int)
parser.add_argument('NON_CUBIC_FACTOR', type=float)
args = parser.parse_args()
here=os.path.dirname(os.path.abspath(__file__))
seed=args.random_seed
HII_DIM=args.HII_DIM
NON_CUBIC_FACTOR=args.NON_CUBIC_FACTOR
folder="SEED_%s_DIM_%s_NCF_%s" %(seed,HII_DIM,NON_CUBIC_FACTOR)
Path("/home/hhuan155/no_cubic/no_cubic_cache/%s/%s" %(seed,folder)).mkdir(parents=True, exist_ok=True)
lightcone = p21c.run_lightcone(
redshift = 6.0,
max_redshift = 20.0,
user_params = {"HII_DIM":HII_DIM, "BOX_LEN":HII_DIM*1.5,"DIM":2*HII_DIM,"NON_CUBIC_FACTOR":NON_CUBIC_FACTOR,"N_THREADS":4,"FAST_FCOLL_TABLE":True},
flag_options= {"USE_TS_FLUCT":True},
lightcone_quantities=("brightness_temp", 'density','xH_box'),
global_quantities=("brightness_temp", 'density', 'xH_box'),
direc=here+'/no_cubic_cache/%s/%s' %(seed,folder),
random_seed=seed,
write=False
)
filename="lightcone_%s.h5" %folder
lightcone.save(here+"/no_cubic_cache/%s/%s/%s" %(seed,folder,filename))
final_time=time.time()
with open(here+"/no_cubic_cache/%s/%s/" %(seed,folder)+filename.replace(".h5",".txt"),'w') as f:
f.write("time taken: %s" %(final_time-init_time) ) |
Hi @HainaHuang729 and @steven-murray, can I please ask for more clarification on what is being tested for in the above case? The reason I ask for clarification is that in the above test, it appears the 21-cm PS is not being computed in equivalently sized volumes. That is, I think to truly understand any correlations you will need to have the same resolution to ensure no other sources of variation are present. Perhaps a 400_1, 400_2 and 400_4 (or maybe it'll need to be lowered to a base of 200 for computational reasons). Then, when comparing different chunks the differences will be purely from different volume sizes. |
Hi @BradGreig. @HainaHuang729 is in the process of updating the above comment to provide some more context and explanation. I can say that in all cases the resolution is the same, but the box size varies (either as a full cube or just along the lightcone dimension). |
@BradGreig so @HainaHuang729 has shown that using |
Hey @steven-murray, when you get to looking at this again (given you have the setup on hand) it will probably be worth pulling in the master branch to utilise the updated density field. It should be a relatively straight-forward merge. If not, I can look at doing it for you. |
Thanks @BradGreig, yup I think I can handle that. |
Hey @steven-murray, where does this PR sit in your current priority list? Honestly I can't recall any of the details for knowing the state of this and what the next steps are... |
Hey @BradGreig, I'll try to get to it today. It would be good to get a few of these old PRs in, to have a clean slate as it were. |
Hi @BradGreig, this is now passing all tests, and I think is ready to be merged. Let me try to summarize what we found:
With all this, I'm happy to say I think this "works". |
Hi @steven-murray thanks for the provided updates and checks (and also @HainaHuang729 for the work). Before signing off on this I want to take another detailed look at the code to see if I have missed anything. For example, I do note there are a couple of TODO's marked for myself (although on a quick glance they seem straightforward to implement). Going over the updates and the previous checks, one thing that remains unclear to me is what is considered the "truth" for the comparisons of different choices of Does that make sense? Hopefully I can get to this in the next day or so. |
Thanks @BradGreig -- yes I noticed those TODO's as well. Probably best to fix them up before merging, I guess. In terms of comparison, I'm fairly sure we used 100^3 subsets of the 400^3 boxes to get the statistics, but I will try to check. I found @HainaHuang729's code and simulation data, so I can have a better look. |
…to functions (though not sure why they needed updating...)
Hey @steven-murray, I've gone and updated the code to remove those remaining TODO's. I also fixed up a couple of other minor things that I noticed when double checking a few other things. Something that I've only just realised that is a bit weird was that On Monday I might need to revert to an earlier commit to try and figure out what changed to cause the difference. |
@BradGreig that is strange indeed -- probably something we should check out. I am re-running @HainaHuang729's analysis currently, so will report with more on that soonish. |
I've re-run a large part of the analysis, and confirm now that I always use chunks of When I take the mean of the power spectra, I do it over realizations AND perpendicular chunks (if they exist). For example, for the lightcone made from When getting the covariance, I take it over ONLY the realizations, then I average the covariance over any perpendicular chunks. So, here's the mean plot: Basically, everything is pretty much within one standard deviation of the biggest sim, except for the smallest sims at the highest redshifts. Here are the correlation maps: |
Hey @steven-murray, I located the issue and fixed it. I've updated the test data (effectively reverting back to the old data) and all tests now appear to be passing. I guess one thing to note about this PR is that it is API breaking. So I'm not sure how you want to proceed with it's merge. I'll finally implement #221 once a decision has been made on this though. As for the figures, can I just clarify one thing (because at first I was thinking there was an issue, but I think I figured out what I should be looking at). For example, on the bottom row we have DIM, NCF = (100,4), (200,2), (400,1). Therefore, (400,1) is effectively the "truth" and you want the (100,4) and (200,2) to be identical to the (400,1) case. Is that correct? And presumably the (100,4) case appears to have more correlations as it has lower number statistics (as the 200 and 400 are carved up and averaged over). Then in the row above, you effectively reduce NCF (at least left and middle) which increases the frequency of repeating structure. Thus demonstrating further how it is behaving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someone else should review this given I was responsible for all the changes? I approved in the case we don't care!
Hi @BradGreig, thanks for making the fixes. Yes, your interpretation is of the plots is correct... I think it's quite a nice way to illustrate the behavior. We can work on it a bit more for a potential paper as well. I've reviewed the code changes you made in the last four commits, and they all seem reasonable, so I'm going to go ahead and merge. |
Actually, before I finally merge... what is the breaking change you're thinking of? To me, we just added an additional option with a default, so all old code should still work? |
Hey @steven-murray, yeah, for some reason I was thinking it would break because it wouldn't be able to locate the new variable. But of course, given it just defaults to 1, it shouldn't actually cause any issues |
This is @BradGreig's branch. Making it a PR because I think it's awesome, and we can discuss details here.
Fixes #275