Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression in CPU LB thermalization. #3847

Merged
merged 8 commits into from
Sep 30, 2020

Conversation

pkreissl
Copy link
Contributor

@pkreissl pkreissl commented Aug 3, 2020

Fixes #3804 , fixes #3772

The issue was actually a regression that happened with the switch to philox in commit
f3cc4ba. Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1].
With this fix the lb_pressure_tensor_acf.py runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR #3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits tol_node and tol_global. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices?

Further, this PR corrects the comparison of the off-diagonal elements avg_ij vs. avg_ji in the test.

@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from 9d45249 to 07f633e Compare August 3, 2020 15:00
@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from 07f633e to ad9a70f Compare August 3, 2020 15:07
@jngrad
Copy link
Member

jngrad commented Aug 3, 2020

This looks like a candidate for a 4.1.4 release...

@mkuron
Copy link
Member

mkuron commented Aug 3, 2020

What are the physical consequences of this error? Are all our thermalized fluid simulations of the past 1.5 years wrong?

@KaiSzuttor
Copy link
Member

the CPU version has probably not been used by many

@KaiSzuttor
Copy link
Member

looking at this function the issue is the non trivial RNG code. There is a level of abstraction missing.

@RudolfWeeber
Copy link
Contributor

@pkreissl, can you please check whether this also solves #3772.
I picked the tolerance limits empirically. Please keep the limits for the gpu tests and run the cpu test shorter with wider margins.
The test may be faster for the same accuracy for a bigger fluid at less steps (due to ghost comm).

This definitely warrants a 4.1.4 release.

@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from f722423 to 11e2862 Compare August 3, 2020 17:59
@KaiSzuttor
Copy link
Member

@pkreissl, can you please check whether this also solves #3772.
I picked the tolerance limits empirically. Please keep the limits for the gpu tests and run the cpu test shorter with wider margins.
The test may be faster for the same accuracy for a bigger fluid at less steps (due to ghost comm).

This definitely warrants a 4.1.4 release.

while I see that it might be tempting to think that a test with empirical or arbitrary tolerance is better than no test, we should invest some time in investigating if the tolerance limit can be deduced from some known facts abou the algorithm. A "false green" can harm when it comes to test driven development. Also with the upcoming change to Walberla we should be confident about the tolerance in our testsuite.

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Aug 3, 2020 via email

@KaiSzuttor
Copy link
Member

Since <sigma_ij>^2 can probably be found in Ulf’s thesis

I thought @pkreissl already checked and was not sure if he found it or not

@pkreissl
Copy link
Contributor Author

pkreissl commented Aug 4, 2020

Since <sigma_ij>^2 can probably be found in Ulf’s thesis

I thought @pkreissl already checked and was not sure if he found it or not

The respective correlations for D3Q19 are given in eq. (4.51) of Ulfs dissertation -- however, they depend on the eigenvalues. I guess, those can be obtained via eqs. (4.29)/(4.30). But as I already mentioned, I have not yet had in-depth contact with the document (and its notation), so I am not yet overly confident, here. Comments from people with higher LB expertise are welcome (@mkuron , @KaiSzuttor ?).

@KaiSzuttor
Copy link
Member

if I'd know what we are looking for I'd let you know 😃

@pkreissl
Copy link
Contributor Author

pkreissl commented Aug 4, 2020

if I'd know what we are looking for I'd let you know smiley

The expected variance of the stress tensor element combinations in terms of LB input parameters... :)

@mkuron
Copy link
Member

mkuron commented Aug 4, 2020

My knowledge about thermalization is contained in https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py, which was directly translated form Ulf's thesis.

@KaiSzuttor
Copy link
Member

My knowledge about thermalization is contained in https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py, which was directly translated form Ulf's thesis.

where are the unit tests. That's the interesting part...

@KaiSzuttor
Copy link
Member

as far as I can see the unit tests in walberla also have "empirical" accuracy limits. So this does not help. @mkuron could you please have a look at the highly cited thesis and check if we can extract values for the accuracy limits?

@KaiSzuttor
Copy link
Member

KaiSzuttor commented Aug 11, 2020

Correct me if I'm wrong:
according to Dünweg et al. Eq. 62 the local stress should fluctuate with the following covariance matrix:

  • for diagonal elements:
    image

  • for off-diagonal elements:
    image

@jngrad
Copy link
Member

jngrad commented Aug 18, 2020

@pkreissl if there is really no way to write a faster/non-statistical test, 722bc52 shows how to disable it from coverage

@RudolfWeeber
Copy link
Contributor

In LbmPy, stresss tensor fluctuagions can be matched against the expressions
immediately above the cited one:


    res_pressure = dh.gather_array("pressure").reshape((-1,3,3))
    
    c_s = np.sqrt(1/3) # speed of sound
    
    # average of pressure tensor
    p_av_expected =np.diag([rho_0*c_s**2]*3) 
    np.testing.assert_allclose(np.mean(res_pressure,axis=0), p_av_expected, atol=c_s**2/500)
    
    # Pressure tensor fluctuations
    agrid = 1
    dt = 1
    dim = 3

    mu = temperature *dt**2 /(c_s**2 * agrid**(dim+2))

    # relaxation rates are dfined differently in Walberla
    gamma_shear = 1- omega_shear
    gamma_bulk = 1 - omega_bulk
    
    p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2)
    p_var_off_diag_expected = (1-gamma_shear**2) * mu * rho_0 * c_s**4

    p_var_expected = np.zeros((3,3))
    for k in range(3): 
        for l in range(3):
            if k==l: p_var_expected[k,l] = p_var_diag_expected
            else: p_var_expected[k,l] = p_var_off_diag_expected
    np.testing.assert_allclose(np.var(res_pressure,axis=0),p_var_expected,rtol=0.03)

The same should be possible in Espresso. There, the relaxation rates match
the definition in the paper, afaik.

@KaiSzuttor
Copy link
Member

p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2) this differs from the variance I read and cited above. Can you please check?

@pkreissl
Copy link
Contributor Author

p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2) this differs from the variance I read and cited above. Can you please check?

with p_var_diag_expected == <R_{alpha,alpha},R_{beta,beta}>, fluctuations in the trace, this is actually exactly eq. (61) in the Phys. Rev. E you cited...

@KaiSzuttor
Copy link
Member

ah I was not able to parse the tensor of rank 4 in my head and missed Eq. 61. You are probably right.

@KaiSzuttor
Copy link
Member

so we should at least add this test to the walberla PR

@pkreissl
Copy link
Contributor Author

Looking into that right now...

@RudolfWeeber
Copy link
Contributor

Turns out that the test in lbmpy only works for the relaxation rates I chose randomly. Lottery win...

In fact, the stress tensor fluctuations <sigma_{ij}^2> do not depend on the relaxation rates at all.
Re-reading in the paper, I get the impression that the equations refer to the fluctuations ADDED to the stress modes in the thermalization step, not to the RESULTING stress from combining thermalization and relaxation.

In fact, the expression in the paper is used in LbmPy when adding the fluctuations in fluctuation_amplitude_from_temperature()
https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py

@mkuron
Copy link
Member

mkuron commented Aug 25, 2020

the equations refer to the fluctuations ADDED to the stress modes in the thermalization step

Correct, these are the equations for $\sigma^r$. He also gives some equations without superscript (3.13, 4.48, 4.51) that appear to relate to the superscripted ones via a relaxation rate-dependent factor.

@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from 6627b24 to 0726a56 Compare August 26, 2020 14:40
@RudolfWeeber
Copy link
Contributor

  • Please keep the average stress test for the CPU. That can run with much fewer steps.
    You'll probably have to move the sampling to a separate function which gets called upon test setup. Then you can split the average test and the acf test into separate funcitons. The acf test then goes into the GPU class.

In MD units, c_s**2 * factor is not a suitable tolerance for the off-diagonal elements of the pressure tensor, since c_s**2 ~=20k in MD units.

@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from 21b65f2 to e42940b Compare September 15, 2020 14:24
@pkreissl pkreissl force-pushed the cpu_lb_thermalization branch from e42940b to 868818a Compare September 15, 2020 14:26
@pkreissl
Copy link
Contributor Author

With re-enabled acf test for GPU the test takes ~27 s on the local machine. I excluded the test from the builds with coverage, still clang-sanitizer CI-job times out after 900 s. osx also fails, but with the domain_decomposition test which should not be influenced by this PR.

@jngrad
Copy link
Member

jngrad commented Sep 15, 2020

The osx job will disappear once kodiak merges this PR into the python branch. It doesn't seem to have failed in the latest commit. Regarding the sanitizer job, you could selectively disable the test with jngrad/espresso@1c16301ab508c2, but this is not a clean solution. The proposal for a long-term solution for these statistical tests (#3883) hasn't received any feedback from the core team yet.

@pkreissl
Copy link
Contributor Author

So, I've added references to the relevant equations in scientific publications.
For the equilibrium stess, we have eq. (19) in Ladd et. al, 2001:

\mathbf{\Pi}_\mathrm{eq} = \rho c_\mathrm{s}^2\mathbf{1} + \rho \mathbf{u}\mathbf{u}

For me, that would mean:

\mathbf{\Pi}_\mathrm{eq} = \rho c_\mathrm{s}^2\mathbf{1} + (2)/(V) (1)/(2) m u^2 = \rho c_\mathrm{s}^2\mathbf{1} + 2 k_\mathrm{B}T / V

In the test, however, the stress matches the above expression without the factor 2 in the \mathbf{u}\mathbf{u}-term (with factor it would be off)!
Am I missing something, here, or do we have another bug in the core?

@pkreissl
Copy link
Contributor Author

Ok from an ideal gas, we know that on average the kinetic energy is

(1)/(2) m u^2 = (3)/(2) k_\mathrm{B} T,

and by equipartition that every dof has (1)/(2) k_\mathrm{B} T, which resolves the "issue" above.

@jngrad
Copy link
Member

jngrad commented Sep 22, 2020

@pkreissl should it be documented in the LB code? Doxygen supports LaTeX, see e.g. bonded_interactions.dox for the syntax.

@pkreissl
Copy link
Contributor Author

@pkreissl should it be documented in the LB code? Doxygen supports LaTeX, see e.g. bonded_interactions.dox for the syntax.

You mean the expected stress tensor value? Can do that, but I don't think that this directly occurs in the core. There just the thermalization fluctuations are added to the modes and the tensor evaluated. If you still find that useful, sure, I can add it as a comment to lb_calc_pressure_tensor().

@jngrad
Copy link
Member

jngrad commented Sep 22, 2020

You mean the expected stress tensor value? Can do that, but I don't think that this directly occurs in the core.

nevermind, I mixed up the python test we looked into this morning and the source code :) maybe it would be helpful to write a short comment about the kinetic energy formula and equipartition, so that future developers know why there is no factor 2 in there

@jngrad jngrad added the automerge Merge with kodiak label Sep 30, 2020
@kodiakhq kodiakhq bot merged commit 50fad4c into espressomd:python Sep 30, 2020
jngrad pushed a commit to jngrad/espresso that referenced this pull request Oct 13, 2020
Fixes espressomd#3804 , fixes espressomd#3772

The issue was actually a regression that happened with the switch to philox in commit
[f3cc4ba](espressomd@f3cc4ba). Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1].
With this fix the `lb_pressure_tensor_acf.py` runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR espressomd#3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits `tol_node` and `tol_global`. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices?

Further, this PR corrects the comparison of the off-diagonal elements `avg_ij` vs. `avg_ji` in the test.
@pkreissl pkreissl deleted the cpu_lb_thermalization branch December 15, 2020 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thermalization of CPU LB broken Thermalized CPU LB produces unphysical currents
5 participants