Fix regression in CPU LB thermalization. #3847

pkreissl · 2020-08-03T14:57:22Z

Fixes #3804 , fixes #3772

The issue was actually a regression that happened with the switch to philox in commit
f3cc4ba. Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1].
With this fix the lb_pressure_tensor_acf.py runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR #3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits tol_node and tol_global. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices?

Further, this PR corrects the comparison of the off-diagonal elements avg_ij vs. avg_ji in the test.

jngrad · 2020-08-03T15:13:52Z

This looks like a candidate for a 4.1.4 release...

mkuron · 2020-08-03T15:28:14Z

What are the physical consequences of this error? Are all our thermalized fluid simulations of the past 1.5 years wrong?

KaiSzuttor · 2020-08-03T15:35:48Z

the CPU version has probably not been used by many

KaiSzuttor · 2020-08-03T16:21:24Z

looking at this function the issue is the non trivial RNG code. There is a level of abstraction missing.

RudolfWeeber · 2020-08-03T16:22:12Z

@pkreissl, can you please check whether this also solves #3772.
I picked the tolerance limits empirically. Please keep the limits for the gpu tests and run the cpu test shorter with wider margins.
The test may be faster for the same accuracy for a bigger fluid at less steps (due to ghost comm).

This definitely warrants a 4.1.4 release.

KaiSzuttor · 2020-08-03T18:35:57Z

@pkreissl, can you please check whether this also solves #3772.
I picked the tolerance limits empirically. Please keep the limits for the gpu tests and run the cpu test shorter with wider margins.
The test may be faster for the same accuracy for a bigger fluid at less steps (due to ghost comm).

This definitely warrants a 4.1.4 release.

while I see that it might be tempting to think that a test with empirical or arbitrary tolerance is better than no test, we should invest some time in investigating if the tolerance limit can be deduced from some known facts abou the algorithm. A "false green" can harm when it comes to test driven development. Also with the upcoming change to Walberla we should be confident about the tolerance in our testsuite.

RudolfWeeber · 2020-08-03T20:21:56Z

For <sigma_ij> =0, that should be possible. Since <sigma_ij>^2 can probably be found in Ulf’s thesis and one can assume that a Gaussian distribution is present. For the ACF, the situation is too complicated, as a numerical integration and a least square fit is involved. The semi-empirical tolerance was 10% deviation from prescribed viscosity. Based on the work that Patrick and I did on evaluating viscosity from stress acf, that’s probably as good as it gets. More generally, we might have to introduce some long-running tests that run in a separate CI task. Running some of them 10x as long as we do now is the only way to be absolutely sure that they converge to the correct value. The need for that became apparent with the Walberla branch. E.g., slight errors in ghost layer handling caused momentum changes that only became apparent in long simulations. The error was of a nature I hadn’t considered when writing unit tests.

KaiSzuttor · 2020-08-04T08:54:34Z

Since <sigma_ij>^2 can probably be found in Ulf’s thesis

I thought @pkreissl already checked and was not sure if he found it or not

pkreissl · 2020-08-04T09:59:23Z

Since <sigma_ij>^2 can probably be found in Ulf’s thesis

I thought @pkreissl already checked and was not sure if he found it or not

The respective correlations for D3Q19 are given in eq. (4.51) of Ulfs dissertation -- however, they depend on the eigenvalues. I guess, those can be obtained via eqs. (4.29)/(4.30). But as I already mentioned, I have not yet had in-depth contact with the document (and its notation), so I am not yet overly confident, here. Comments from people with higher LB expertise are welcome (@mkuron , @KaiSzuttor ?).

KaiSzuttor · 2020-08-04T10:07:43Z

if I'd know what we are looking for I'd let you know 😃

pkreissl · 2020-08-04T10:52:19Z

if I'd know what we are looking for I'd let you know smiley

The expected variance of the stress tensor element combinations in terms of LB input parameters... :)

mkuron · 2020-08-04T11:20:30Z

My knowledge about thermalization is contained in https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py, which was directly translated form Ulf's thesis.

KaiSzuttor · 2020-08-04T12:41:44Z

My knowledge about thermalization is contained in https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py, which was directly translated form Ulf's thesis.

where are the unit tests. That's the interesting part...

KaiSzuttor · 2020-08-05T09:24:37Z

as far as I can see the unit tests in walberla also have "empirical" accuracy limits. So this does not help. @mkuron could you please have a look at the highly cited thesis and check if we can extract values for the accuracy limits?

KaiSzuttor · 2020-08-11T13:38:35Z

Correct me if I'm wrong:
according to Dünweg et al. Eq. 62 the local stress should fluctuate with the following covariance matrix:

for diagonal elements:
for off-diagonal elements:

jngrad · 2020-08-18T08:52:15Z

@pkreissl if there is really no way to write a faster/non-statistical test, 722bc52 shows how to disable it from coverage

RudolfWeeber · 2020-08-21T10:46:26Z

In LbmPy, stresss tensor fluctuagions can be matched against the expressions
immediately above the cited one:


    res_pressure = dh.gather_array("pressure").reshape((-1,3,3))
    
    c_s = np.sqrt(1/3) # speed of sound
    
    # average of pressure tensor
    p_av_expected =np.diag([rho_0*c_s**2]*3) 
    np.testing.assert_allclose(np.mean(res_pressure,axis=0), p_av_expected, atol=c_s**2/500)
    
    # Pressure tensor fluctuations
    agrid = 1
    dt = 1
    dim = 3

    mu = temperature *dt**2 /(c_s**2 * agrid**(dim+2))

    # relaxation rates are dfined differently in Walberla
    gamma_shear = 1- omega_shear
    gamma_bulk = 1 - omega_bulk
    
    p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2)
    p_var_off_diag_expected = (1-gamma_shear**2) * mu * rho_0 * c_s**4

    p_var_expected = np.zeros((3,3))
    for k in range(3): 
        for l in range(3):
            if k==l: p_var_expected[k,l] = p_var_diag_expected
            else: p_var_expected[k,l] = p_var_off_diag_expected
    np.testing.assert_allclose(np.var(res_pressure,axis=0),p_var_expected,rtol=0.03)

The same should be possible in Espresso. There, the relaxation rates match
the definition in the paper, afaik.

KaiSzuttor · 2020-08-21T10:52:13Z

p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2) this differs from the variance I read and cited above. Can you please check?

pkreissl · 2020-08-21T11:01:48Z

p_var_diag_expected = 6 *mu *rho_0 *c_s**4 * (1-gamma_bulk**2) this differs from the variance I read and cited above. Can you please check?

with p_var_diag_expected == <R_{alpha,alpha},R_{beta,beta}>, fluctuations in the trace, this is actually exactly eq. (61) in the Phys. Rev. E you cited...

KaiSzuttor · 2020-08-21T11:10:25Z

ah I was not able to parse the tensor of rank 4 in my head and missed Eq. 61. You are probably right.

KaiSzuttor · 2020-08-21T11:10:56Z

so we should at least add this test to the walberla PR

pkreissl · 2020-08-21T11:11:51Z

Looking into that right now...

RudolfWeeber · 2020-08-24T20:33:38Z

Turns out that the test in lbmpy only works for the relaxation rates I chose randomly. Lottery win...

In fact, the stress tensor fluctuations <sigma_{ij}^2> do not depend on the relaxation rates at all.
Re-reading in the paper, I get the impression that the equations refer to the fluctuations ADDED to the stress modes in the thermalization step, not to the RESULTING stress from combining thermalization and relaxation.

In fact, the expression in the paper is used in LbmPy when adding the fluctuations in fluctuation_amplitude_from_temperature()
https://i10git.cs.fau.de/pycodegen/lbmpy/-/blob/master/lbmpy/fluctuatinglb.py

mkuron · 2020-08-25T07:06:20Z

the equations refer to the fluctuations ADDED to the stress modes in the thermalization step

Correct, these are the equations for $\sigma^r$. He also gives some equations without superscript (3.13, 4.48, 4.51) that appear to relate to the superscripted ones via a relaxation rate-dependent factor.

RudolfWeeber · 2020-08-26T16:16:37Z

Please keep the average stress test for the CPU. That can run with much fewer steps.
You'll probably have to move the sampling to a separate function which gets called upon test setup. Then you can split the average test and the acf test into separate funcitons. The acf test then goes into the GPU class.

In MD units, c_s**2 * factor is not a suitable tolerance for the off-diagonal elements of the pressure tensor, since c_s**2 ~=20k in MD units.

pkreissl · 2020-09-15T17:12:40Z

With re-enabled acf test for GPU the test takes ~27 s on the local machine. I excluded the test from the builds with coverage, still clang-sanitizer CI-job times out after 900 s. osx also fails, but with the domain_decomposition test which should not be influenced by this PR.

jngrad · 2020-09-15T18:02:39Z

The osx job will disappear once kodiak merges this PR into the python branch. It doesn't seem to have failed in the latest commit. Regarding the sanitizer job, you could selectively disable the test with jngrad/espresso@1c16301ab508c2, but this is not a clean solution. The proposal for a long-term solution for these statistical tests (#3883) hasn't received any feedback from the core team yet.

testsuite/python/lb_pressure_tensor.py

pkreissl · 2020-09-22T10:26:18Z

So, I've added references to the relevant equations in scientific publications.
For the equilibrium stess, we have eq. (19) in Ladd et. al, 2001:

$\mathbf{\Pi}_\mathrm{eq} = \rho c_\mathrm{s}^2\mathbf{1} + \rho \mathbf{u}\mathbf{u}$

For me, that would mean:

$\mathbf{\Pi}_\mathrm{eq} = \rho c_\mathrm{s}^2\mathbf{1} + (2)/(V) (1)/(2) m u^2 = \rho c_\mathrm{s}^2\mathbf{1} + 2 k_\mathrm{B}T / V$

In the test, however, the stress matches the above expression without the factor 2 in the $\mathbf{u}\mathbf{u}$ -term (with factor it would be off)!
Am I missing something, here, or do we have another bug in the core?

pkreissl · 2020-09-22T11:17:46Z

Ok from an ideal gas, we know that on average the kinetic energy is

$(1)/(2) m u^2 = (3)/(2) k_\mathrm{B} T$ ,

and by equipartition that every dof has $(1)/(2) k_\mathrm{B} T$ , which resolves the "issue" above.

jngrad · 2020-09-22T11:57:34Z

@pkreissl should it be documented in the LB code? Doxygen supports LaTeX, see e.g. bonded_interactions.dox for the syntax.

pkreissl · 2020-09-22T13:43:08Z

@pkreissl should it be documented in the LB code? Doxygen supports LaTeX, see e.g. bonded_interactions.dox for the syntax.

You mean the expected stress tensor value? Can do that, but I don't think that this directly occurs in the core. There just the thermalization fluctuations are added to the modes and the tensor evaluated. If you still find that useful, sure, I can add it as a comment to lb_calc_pressure_tensor().

jngrad · 2020-09-22T14:23:36Z

You mean the expected stress tensor value? Can do that, but I don't think that this directly occurs in the core.

nevermind, I mixed up the python test we looked into this morning and the source code :) maybe it would be helpful to write a short comment about the kinetic energy formula and equipartition, so that future developers know why there is no factor 2 in there

@KaiSzuttor

Fixes espressomd#3804 , fixes espressomd#3772 The issue was actually a regression that happened with the switch to philox in commit [f3cc4ba](espressomd@f3cc4ba). Random numbers formerly part of the interval (-0.5,0.5] were replaced by random numers in (0,1]. With this fix the `lb_pressure_tensor_acf.py` runs successfully for both CPU and GPU. However, as @KaiSzuttor correctly mentioned in PR espressomd#3831 the CPU part of the test takes a while to execute (on my machine, single core the whole test takes 136 s). I could try to make that faster which, however, would require tweaking the tolerance limits `tol_node` and `tol_global`. @RudolfWeeber , what was your reasoning behind the chosen limits? Or are they semi-arbitrary choices? Further, this PR corrects the comparison of the off-diagonal elements `avg_ij` vs. `avg_ji` in the test.

pkreissl force-pushed the cpu_lb_thermalization branch from 9d45249 to 07f633e Compare August 3, 2020 15:00

Fix regression in CPU LB thermalization.

ad9a70f

pkreissl force-pushed the cpu_lb_thermalization branch from 07f633e to ad9a70f Compare August 3, 2020 15:07

Make lb_pressure_tensor_acf test faster.

11e2862

pkreissl force-pushed the cpu_lb_thermalization branch from f722423 to 11e2862 Compare August 3, 2020 17:59

jngrad added this to the Espresso 4.1.4 milestone Aug 18, 2020

jngrad added BugFix Testcase labels Aug 18, 2020

pkreissl force-pushed the cpu_lb_thermalization branch from 6627b24 to 0726a56 Compare August 26, 2020 14:40

Make test buxfix-ready.

bd25df9

pkreissl force-pushed the cpu_lb_thermalization branch from 0726a56 to bd25df9 Compare August 26, 2020 14:43

pkreissl mentioned this pull request Aug 26, 2020

Replace pressure_tensor_acf test with test of variances #3876

Closed

pkreissl force-pushed the cpu_lb_thermalization branch from 21b65f2 to e42940b Compare September 15, 2020 14:24

Disable test with coverage; check GK viscosity for GPU LB.

868818a

pkreissl force-pushed the cpu_lb_thermalization branch from e42940b to 868818a Compare September 15, 2020 14:26

Disable test with sanitizers

1c16301

KaiSzuttor reviewed Sep 16, 2020

View reviewed changes

testsuite/python/lb_pressure_tensor.py Show resolved Hide resolved

Add references to relevant papers.

dd97af4

More detailed comment on expected stress tensor.

87d267c

RudolfWeeber approved these changes Sep 30, 2020

View reviewed changes

jngrad added the automerge Merge with kodiak label Sep 30, 2020

Merge branch 'python' into cpu_lb_thermalization

bb36259

kodiakhq bot merged commit 50fad4c into espressomd:python Sep 30, 2020

pkreissl deleted the cpu_lb_thermalization branch December 15, 2020 14:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression in CPU LB thermalization. #3847

Fix regression in CPU LB thermalization. #3847

pkreissl commented Aug 3, 2020 •

edited

Loading

jngrad commented Aug 3, 2020

mkuron commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

RudolfWeeber commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

RudolfWeeber commented Aug 3, 2020 via email

KaiSzuttor commented Aug 4, 2020

pkreissl commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

pkreissl commented Aug 4, 2020

mkuron commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

KaiSzuttor commented Aug 5, 2020

KaiSzuttor commented Aug 11, 2020 •

edited

Loading

jngrad commented Aug 18, 2020

RudolfWeeber commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

pkreissl commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

pkreissl commented Aug 21, 2020

RudolfWeeber commented Aug 24, 2020

mkuron commented Aug 25, 2020

RudolfWeeber commented Aug 26, 2020

pkreissl commented Sep 15, 2020

jngrad commented Sep 15, 2020

pkreissl commented Sep 22, 2020

pkreissl commented Sep 22, 2020

jngrad commented Sep 22, 2020

pkreissl commented Sep 22, 2020

jngrad commented Sep 22, 2020

Fix regression in CPU LB thermalization. #3847

Fix regression in CPU LB thermalization. #3847

Conversation

pkreissl commented Aug 3, 2020 • edited Loading

jngrad commented Aug 3, 2020

mkuron commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

RudolfWeeber commented Aug 3, 2020

KaiSzuttor commented Aug 3, 2020

RudolfWeeber commented Aug 3, 2020 via email

KaiSzuttor commented Aug 4, 2020

pkreissl commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

pkreissl commented Aug 4, 2020

mkuron commented Aug 4, 2020

KaiSzuttor commented Aug 4, 2020

KaiSzuttor commented Aug 5, 2020

KaiSzuttor commented Aug 11, 2020 • edited Loading

jngrad commented Aug 18, 2020

RudolfWeeber commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

pkreissl commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

KaiSzuttor commented Aug 21, 2020

pkreissl commented Aug 21, 2020

RudolfWeeber commented Aug 24, 2020

mkuron commented Aug 25, 2020

RudolfWeeber commented Aug 26, 2020

pkreissl commented Sep 15, 2020

jngrad commented Sep 15, 2020

pkreissl commented Sep 22, 2020

pkreissl commented Sep 22, 2020

jngrad commented Sep 22, 2020

pkreissl commented Sep 22, 2020

jngrad commented Sep 22, 2020

pkreissl commented Aug 3, 2020 •

edited

Loading

KaiSzuttor commented Aug 11, 2020 •

edited

Loading