Non cubic #289

steven-murray · 2022-06-02T22:10:03Z

This is @BradGreig's branch. Making it a PR because I think it's awesome, and we can discuss details here.

Fixes #275

…boxes

…or non-cubic boxes

steven-murray · 2022-06-02T22:17:52Z

@BradGreig this is great. I've looked through the code and it all seems to check out. I think it's a very nice way of handling the feature.

I would like to write a test for it... not sure exactly the best test to write. I mean, we can run a box and another with NON_CUBIC_FACTOR=2, then compare power spectra, or mean properties. But we shouldn't expect the exact same thing, so statistically testing it might be difficult. I presume using the same seed won't help. We could also test the power spectrum of one half of the elongated box versus the other half (and the other cubic box). That might be the only reasonable way to do it.

In terms of whether this is useful, I actually think it might be for one-point statistics. @dprelogo and I and a student have recently realized that a database of lightcones has biased global signals most likely due to periodicity in the lightcones due to coeval box size. Now, of course, we could just make the whole box bigger, but I think elongating along the LOS should help significantly without increasing the cost by N^3. Of course the power spectrum still only makes sense on scales smaller than the smallest dimension. But I think other statistics should be reasonably ok.

codecov · 2022-06-02T22:56:11Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.10 🎉

Comparison is base (0ce7c92) 86.49% compared to head (8f054ac) 86.60%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #289      +/-   ##
==========================================
+ Coverage   86.49%   86.60%   +0.10%     
==========================================
  Files          12       12              
  Lines        2807     2814       +7     
==========================================
+ Hits         2428     2437       +9     
+ Misses        379      377       -2

Impacted Files	Coverage Δ
src/py21cmfast/inputs.py	`91.62% <100.00%> (+1.31%)`	⬆️
src/py21cmfast/outputs.py	`92.12% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

BradGreig · 2022-06-06T02:39:49Z

Hey @steven-murray, with respect to the statistics, I'm not convinced the one-point statistics would be correct. Well, they should be mostly correct, but potentially it may not get the outliers/tails of the distribution correct. Primarily because the large-scale modes will be absent. Probably should be tested though.

As for the tests, I suspect its format should be the same as most tests for astrophysical features. That is, set default features and check that the power spectrum is correct. I do not think we should be checking against results from a cubic box. I think we should do more rigorous checking that it is sensible to make such an approximation. Once we deem it is sufficient, then it is a feature that is always allowed, thus a standard test of a specific feature.

This will also require double checking for the alternative scenario (i.e. larger transverse dimensions than line-of-sight) which should also be allowed. Obviously for the light-cone this will be bad, but could be helpful for co-eval box studies.

HainaHuang729 · 2022-07-31T22:20:58Z

Hey @steven-murray , @BradGreig My name is haina, and I tested this code under the guidance of steven. We used non-cubic-factor coefficients to produce more continuous light cone images, but we wanted to know whether the average power spectrum and correlation coefficients would have large variations and errors for the same number of cells and resolution in a given dimension statistically, so we made 50 samples in a limited amount of time for statistical purposes.

I don't know much about the background so I'm paraphrasing steven's answer "We wanted to check whether, for power spectra obtained in different 8MHz chunks of the lightcone, firstly whether using the non-cubic factor leads to biases in the estimated power, and second whether it helps reduce the artificial correlation between the power spectra taken in different chunks. The latter is a good reason to consider using these non-cubic factors, because for example in HERA now we are doing 21cmMC inference with data at z=8 and z=10, and there is the possibility that the simulations have an artificial correlation between the power spectra in these two redshift chunks due to the repeating boxes. The key point from our tests, discussed below, is that using the non-cubic factor (greater than unity) does not bias the estimated (mean) power spectrum in any given chunk, but it does bring the correlations between chunks closer to the 'truth' (defined as that obtained with the largest cubic box)."(steven)

In our tests, we made 50 live cases each for six (HIIDIM_NCF: 100_1, 100_2, 100_4, 200_1, 200_2 and 400_0.5) different simulations configurations: for each simulation, we used a unit size of 1.5 Mpc, but we used a different overall box size and shape. Our "real" box was a cube box with HII_DIM=400^3. We are able to compare 100_4, and 200_2 with the ''truth'' (400_1) because the former has the same dimensions along the line of sight (but not in the horizontal direction). Similarly, we can compare 100_2, 200_1, and 400_0.5. Also, we can compare the differences between boxes of the same cube at different sizes like 100_1, 200_1 and 400_1

Because BOX_LEN is changed together with HII_DIM, the resolution of the box does not change. The only thing that changes is the size of the box itself. The number of chunks generated by cubic of different lengths is different, the number of chunks of 100_1 is used as the sampling number for all cases.

Since 100, 200, and 400 were selected as HIIDIM, we compared the mean power spectrum and correlation coefficients of the first four chunks.eight_mhz = 150.0 ncells_in_eight_mhz = int(eight_mhz/ (lc.user_params.BOX_LEN / lc.user_params.HII_DIM)) From the following code, we can see that BOX_LEN / HII_DIM = 1.5. So the number of cells per chunk is 100, so for HIIDIM equal to 400，4 chunks in the same cubic and for 100 is its same cubic iterated four times, so we compare the mean power spectrum between the first chunk and the next three consecutive chunks and the correlation coefficient to the first chunk. The first two sets of data show a clear three-cluster classification. The correlation coefficients can be divided into three groups, 100_1, 100_2,200_1,400_0.5 and 100_4,200_2,400_1

The following code is the code we use to generate the boxes

from fileinput import filename
import py21cmfast as p21c   
import os
import time
import argparse
init_time=time.time()
print(f"21cmFAST version is {p21c.__version__}")
from pathlib import Path
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('random_seed', type=int)
parser.add_argument('HII_DIM', type=int)
parser.add_argument('NON_CUBIC_FACTOR', type=float)

args = parser.parse_args()
here=os.path.dirname(os.path.abspath(__file__))
seed=args.random_seed
HII_DIM=args.HII_DIM
NON_CUBIC_FACTOR=args.NON_CUBIC_FACTOR
folder="SEED_%s_DIM_%s_NCF_%s" %(seed,HII_DIM,NON_CUBIC_FACTOR)
Path("/home/hhuan155/no_cubic/no_cubic_cache/%s/%s" %(seed,folder)).mkdir(parents=True, exist_ok=True)
lightcone = p21c.run_lightcone(
    redshift = 6.0,
    max_redshift = 20.0,
    user_params = {"HII_DIM":HII_DIM, "BOX_LEN":HII_DIM*1.5,"DIM":2*HII_DIM,"NON_CUBIC_FACTOR":NON_CUBIC_FACTOR,"N_THREADS":4,"FAST_FCOLL_TABLE":True},
    flag_options= {"USE_TS_FLUCT":True},
    lightcone_quantities=("brightness_temp", 'density','xH_box'),
    global_quantities=("brightness_temp", 'density', 'xH_box'),
    direc=here+'/no_cubic_cache/%s/%s' %(seed,folder),
    random_seed=seed,
    write=False
)
filename="lightcone_%s.h5" %folder
lightcone.save(here+"/no_cubic_cache/%s/%s/%s" %(seed,folder,filename))
final_time=time.time()
with open(here+"/no_cubic_cache/%s/%s/" %(seed,folder)+filename.replace(".h5",".txt"),'w') as f:
    f.write("time taken: %s" %(final_time-init_time) )

BradGreig · 2022-08-02T01:00:39Z

Hi @HainaHuang729 and @steven-murray, can I please ask for more clarification on what is being tested for in the above case?

The reason I ask for clarification is that in the above test, it appears the 21-cm PS is not being computed in equivalently sized volumes. That is, HII_DIM is being varied, but is BOX_LEN being kept the same? If so, then the resolution is being changed between the different cases which will alter the fractional variations between different 21-cm PS (your Figures 1 and 2). However, there might be a reason for this that is not obvious to me, hence asking for clarification.

I think to truly understand any correlations you will need to have the same resolution to ensure no other sources of variation are present. Perhaps a 400_1, 400_2 and 400_4 (or maybe it'll need to be lowered to a base of 200 for computational reasons). Then, when comparing different chunks the differences will be purely from different volume sizes.

steven-murray · 2022-08-02T14:32:11Z

Hi @BradGreig. @HainaHuang729 is in the process of updating the above comment to provide some more context and explanation. I can say that in all cases the resolution is the same, but the box size varies (either as a full cube or just along the lightcone dimension).

steven-murray · 2022-09-20T21:34:19Z

@BradGreig so @HainaHuang729 has shown that using NON_CUBIC_FACTOR>1 does not bias the expected power spectrum, but it does lead to better covariance properties for different lightcone chunks within the same lightcone. Does this satisfy you in terms of being ready to merge? Are there other tests we should be doing?

BradGreig · 2022-10-28T00:18:43Z

Hey @steven-murray, when you get to looking at this again (given you have the setup on hand) it will probably be worth pulling in the master branch to utilise the updated density field. It should be a relatively straight-forward merge. If not, I can look at doing it for you.

steven-murray · 2022-10-31T15:40:50Z

Thanks @BradGreig, yup I think I can handle that.

BradGreig · 2023-05-04T02:46:54Z

Hey @steven-murray, where does this PR sit in your current priority list? Honestly I can't recall any of the details for knowing the state of this and what the next steps are...

steven-murray · 2023-05-04T15:06:20Z

Hey @BradGreig, I'll try to get to it today. It would be good to get a few of these old PRs in, to have a clean slate as it were.

steven-murray · 2023-05-09T19:41:40Z

Hi @BradGreig, this is now passing all tests, and I think is ready to be merged. Let me try to summarize what we found:

We simulated 100 realizations each of lightcones of various sizes. Each "size" is labelled by HIIDIM_NCF, i.e. 100_1.0 has HII_DIM=100 and NON_CUBIC_FACTOR=1.0. Each of them was done as a lightcone (i.e. repeating over the line-of-sight direction as usual). We used HII_DIM={100, 200, 400} and NCF={1.0, 2.0, 4.0} to build a set of simulations that fit into each other. Note that the BOX_LEN in each case was set so that the cell size remained constant (i.e. HII_DIM=400 is really a bigger box).
Here's a pictorial representation of the above description:
The first aim was to ensure that the power spectrum in a particular chunk at a particular redshift of all the lightcones remains essentially the same (i.e. we're not getting biased results for the PS by changing NCF). This is shown roughly by the left- and middle-column plots that @HainaHuang729 posted above (I think he made some better plots than this eventually, but I don't know where they are). To ~within sample variance, the power spectrum remains the same across different box sizes and non-cubic factors.
The secondary aim was to show that this might actually be really useful when making lightcones, because it would reduce artificial long-range correlations over the line-of-sight due to the repeating structures. This is shown quite well by this plot:
. Here, each pixel in the heatmap is a "chunk" of the lightcone (100^3 pixels ~= 8 MHz width) at increasing redshift (left to right and top to bottom). The color is the correlation coefficient between the PS at k=0.1 (I think, I can't remember which k it's at and don't have access to the original notebook that @HainaHuang729 made) in different chunks. So, for example in the right-most plot, you see the results for the 400_1.0 simulations, where you expect repetition after 4 chunks, and that's what you see. The left-most shows the repetition for 100_1.0, while the middle shows what happens when you extend that to 100_4.0 -- it looks a LOT closer to the nominally "correct" plot in the right panel.

With all this, I'm happy to say I think this "works".

BradGreig · 2023-05-10T00:08:13Z

Hi @steven-murray thanks for the provided updates and checks (and also @HainaHuang729 for the work). Before signing off on this I want to take another detailed look at the code to see if I have missed anything. For example, I do note there are a couple of TODO's marked for myself (although on a quick glance they seem straightforward to implement).

Going over the updates and the previous checks, one thing that remains unclear to me is what is considered the "truth" for the comparisons of different choices of NON_CUBIC_FACTOR. The largest BOX_LEN is considered the truth, but do you compare the 21-cm PS from the full box, or for smaller pieces of the largest box? For example, if the truth is a 400 Mpc box and you compare against a 200 Mpc box for example (with NON_CUBIC_FACTOR=2, so LOS distance is 400 Mpc) what are you actually comparing. Do you take 1/4 of the 400 Mpc box (i.e. 200 x 200 x 400 Mpc) to compare to the 200 x 200 x 400 Mpc chunk from the NON_CUBIC_FACTOR=2? Or are you comparing something else? It's a subtle point, but I think you want to ensure the statistics (of k-modes per bin) remain the same to some degree.

Does that make sense?

Hopefully I can get to this in the next day or so.

steven-murray · 2023-05-10T16:46:55Z

Thanks @BradGreig -- yes I noticed those TODO's as well. Probably best to fix them up before merging, I guess.

In terms of comparison, I'm fairly sure we used 100^3 subsets of the 400^3 boxes to get the statistics, but I will try to check. I found @HainaHuang729's code and simulation data, so I can have a better look.

…to functions (though not sure why they needed updating...)

BradGreig · 2023-05-12T08:25:46Z

Hey @steven-murray, I've gone and updated the code to remove those remaining TODO's. I also fixed up a couple of other minor things that I noticed when double checking a few other things.

Something that I've only just realised that is a bit weird was that Halo_Field tests were failing. So I just updated the test data for those and pushed assuming it had to do with changes I made. However, thinking about this further, I don't know why the Halo_Field tests started failing (given they worked prior to my changes). The changes I made should not have affected anything since NON_CUBIC_FACTOR=1.0 in all of these tests.

On Monday I might need to revert to an earlier commit to try and figure out what changed to cause the difference.

steven-murray · 2023-05-12T16:12:29Z

@BradGreig that is strange indeed -- probably something we should check out.

I am re-running @HainaHuang729's analysis currently, so will report with more on that soonish.

steven-murray · 2023-05-12T23:36:10Z

I've re-run a large part of the analysis, and confirm now that I always use chunks of 100^3 cells to compute the power spectra (on the LoS axis this corresponds to about 8 MHz bandwidth, which I think is a useful size, and also it corresponds to the smallest basic full simulation size).

When I take the mean of the power spectra, I do it over realizations AND perpendicular chunks (if they exist). For example, for the lightcone made from HII_DIM=400 coevals, there are 16 perpendicular cubes in any line-of-sight chunk, so I average over those.

When getting the covariance, I take it over ONLY the realizations, then I average the covariance over any perpendicular chunks.

So, here's the mean plot:

Basically, everything is pretty much within one standard deviation of the biggest sim, except for the smallest sims at the highest redshifts.

Here are the correlation maps:

steven-murray · 2023-05-13T00:20:57Z

Here's a schematic of how the lightcone chunks are split up:

BradGreig · 2023-05-16T05:43:58Z

Hey @steven-murray, I located the issue and fixed it. I've updated the test data (effectively reverting back to the old data) and all tests now appear to be passing.

I guess one thing to note about this PR is that it is API breaking. So I'm not sure how you want to proceed with it's merge. I'll finally implement #221 once a decision has been made on this though.

As for the figures, can I just clarify one thing (because at first I was thinking there was an issue, but I think I figured out what I should be looking at). For example, on the bottom row we have DIM, NCF = (100,4), (200,2), (400,1). Therefore, (400,1) is effectively the "truth" and you want the (100,4) and (200,2) to be identical to the (400,1) case. Is that correct? And presumably the (100,4) case appears to have more correlations as it has lower number statistics (as the 200 and 400 are carved up and averaged over).

Then in the row above, you effectively reduce NCF (at least left and middle) which increases the frequency of repeating structure. Thus demonstrating further how it is behaving.

BradGreig

Maybe someone else should review this given I was responsible for all the changes? I approved in the case we don't care!

steven-murray · 2023-05-16T16:45:14Z

Hi @BradGreig, thanks for making the fixes.

Yes, your interpretation is of the plots is correct... I think it's quite a nice way to illustrate the behavior. We can work on it a bit more for a potential paper as well.

I've reviewed the code changes you made in the last four commits, and they all seem reasonable, so I'm going to go ahead and merge.

steven-murray · 2023-05-16T16:46:08Z

Actually, before I finally merge... what is the breaking change you're thinking of? To me, we just added an additional option with a default, so all old code should still work?

BradGreig · 2023-05-16T22:50:57Z

Hey @steven-murray, yeah, for some reason I was thinking it would break because it wouldn't be able to locate the new variable. But of course, given it just defaults to 1, it shouldn't actually cause any issues

BradGreig and others added 10 commits March 30, 2022 11:48

Add in variable NON_CUBIC_FACTOR plus some checks to allow non-cubic …

ccb330d

…boxes

Add in new variables and update macros

8b68d07

Update initial conditions, plus some relevant functions and outputs f…

f14ca75

…or non-cubic boxes

Update ionisation box to have non-cubic boxes

9946de9

Update the perturbed field (and haloes) quantities for non-cubic boxes

64838ef

Partial update to find haloes. Needs more work for identifying haloes

1386e97

Add non-cubic to spin temperature

5f1d733

Add non-cubic functionality to brightness temperature box

dc23520

Update header to add new variable from UserParams

1cdb935

fix: merge master

9f1e29c

steven-murray requested a review from BradGreig June 2, 2022 22:10

steven-murray added 6 commits May 5, 2023 10:26

merge master

e8ffd76

fix: stray merge error

1b630e9

test: coverage for non_cubic_factor

64bcedd

test: fix test invocation

b505186

test: allow higher abs tolerance in coeval_vs_low_level

14da932

test: reduce required precision on equivalence between coeval/low-level

40fa3f3

BradGreig added 7 commits May 12, 2023 14:10

Remove first TODO with respect to identifying haloes

82dd26c

Remove TODO with painting ionisation map with sphere method

f1702a3

Missing factors probably from merge

69f4acc

Update summary functions for non-cubic boxes

d3202c5

NON_CUBIC factor is a float not an int

caca97b

Missing line-of-sight factor of HII_D_PARA

3bfb4ba

Halo field tests were failing, so updated the data following changes …

386605a

…to functions (though not sure why they needed updating...)

BradGreig added 3 commits May 16, 2023 11:36

Mathematical operations may require these quantities to be signed

62309ad

Changing the data back following fix in the previous commit

14a419a

Weirdly the test data did not get correctly updated. Now is

9611d41

The variable ct should be private

8f054ac

BradGreig approved these changes May 16, 2023

View reviewed changes

steven-murray merged commit f188b47 into master May 16, 2023

steven-murray deleted the non-cubic branch May 16, 2023 23:52

steven-murray mentioned this pull request May 17, 2023

Release v3.3.0 #327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non cubic #289

Non cubic #289

steven-murray commented Jun 2, 2022

steven-murray commented Jun 2, 2022

codecov bot commented Jun 2, 2022 •

edited

Loading

BradGreig commented Jun 6, 2022

HainaHuang729 commented Jul 31, 2022 •

edited

Loading

BradGreig commented Aug 2, 2022

steven-murray commented Aug 2, 2022

steven-murray commented Sep 20, 2022

BradGreig commented Oct 28, 2022

steven-murray commented Oct 31, 2022

BradGreig commented May 4, 2023

steven-murray commented May 4, 2023

steven-murray commented May 9, 2023

BradGreig commented May 10, 2023 •

edited

Loading

steven-murray commented May 10, 2023

BradGreig commented May 12, 2023

steven-murray commented May 12, 2023

steven-murray commented May 12, 2023

steven-murray commented May 13, 2023

BradGreig commented May 16, 2023

BradGreig left a comment •

edited

Loading

steven-murray commented May 16, 2023

steven-murray commented May 16, 2023

BradGreig commented May 16, 2023

Non cubic #289

Non cubic #289

Conversation

steven-murray commented Jun 2, 2022

steven-murray commented Jun 2, 2022

codecov bot commented Jun 2, 2022 • edited Loading

Codecov Report

BradGreig commented Jun 6, 2022

HainaHuang729 commented Jul 31, 2022 • edited Loading

BradGreig commented Aug 2, 2022

steven-murray commented Aug 2, 2022

steven-murray commented Sep 20, 2022

BradGreig commented Oct 28, 2022

steven-murray commented Oct 31, 2022

BradGreig commented May 4, 2023

steven-murray commented May 4, 2023

steven-murray commented May 9, 2023

BradGreig commented May 10, 2023 • edited Loading

steven-murray commented May 10, 2023

BradGreig commented May 12, 2023

steven-murray commented May 12, 2023

steven-murray commented May 12, 2023

steven-murray commented May 13, 2023

BradGreig commented May 16, 2023

BradGreig left a comment • edited Loading

Choose a reason for hiding this comment

steven-murray commented May 16, 2023

steven-murray commented May 16, 2023

BradGreig commented May 16, 2023

codecov bot commented Jun 2, 2022 •

edited

Loading

HainaHuang729 commented Jul 31, 2022 •

edited

Loading

BradGreig commented May 10, 2023 •

edited

Loading

BradGreig left a comment •

edited

Loading