Refine concurrency logic #5063

adamfarley · 2024-02-14T12:35:21Z

This change ensures that we identify the correct memory size when running tests in containers, by setting CGMEM to be the number of memory bytes in all circumstances.

We could remove the "-lt" logic altogether, but we shouldn't because sometimes the cgroup limits are set far beyond the machine's maximum memory, for reasons we do not yet understand.

We need to make sure we're not using so many cores that the tests are starved of memory. Signed-off-by: Adam Farley <adfarley@redhat.com>

Instead of what it does right now, which is to compare bytes with kilobytes, and incorrectly assume that the kilobytes number is always smaller. Signed-off-by: Adam Farley <adfarley@redhat.com>

adamfarley · 2024-02-14T12:39:07Z

Linked to #5012

May not close it. Further review required. Should improve matters, though.

adamfarley · 2024-02-14T12:41:17Z

Testing here: https://ci.adoptium.net/job/Grinder/8831/console

adamfarley · 2024-02-14T12:48:34Z

Yep, that worked. Now sets concurrency to a more sensible level. Ready to merge.

jerboaa

/sys/fs/cgroup/memory/memory.limit_in_bytes is inherently cgv1 (legacy) specific. This won't work on cgroup v2 systems. FYI.

sxa · 2024-02-14T13:10:57Z

Ref Severin's comment above, /sys/fs/cgroup/memory.max appears to be the right thing for V2 so we ought to change that (either in this PR or a separate one)
Noting that this is a follow on to other fixes in this area under #4933 and #5035

andrew-m-leonard

lgtm

sxa

@adamfarley Because I'm paranoid about these things can you fire this off on a subset of diverse machines similar to the table at the top of #4933 to check that this won't regress anything please?

Suggest one of the PLCT RISC-V machines (4 core, 16Gb), test-sxa-armv7l-ubuntu2004-odroid-2 (8 core, 2Gb), something running on an Ubuntu 2004 host and something on Ubuntu 22.04 as a minimum.

adamfarley · 2024-02-14T13:28:23Z

Ref Severin's comment above, /sys/fs/cgroup/memory.max appears to be the right thing for V2 so we ought to change that (either in this PR or a separate one) Noting that this is a follow on to other fixes in this area under #4933 and #5035

Okie dokie. Will fix.

adamfarley · 2024-02-14T13:28:46Z

@adamfarley Because I'm paranoid about these things can you fire this off on a subset of diverse machines similar to the table at the top of #4933 to check that this won't regress anything please?

Suggest one of the PLCT RISC-V machines (4 core, 16Gb), test-sxa-armv7l-ubuntu2004-odroid-2 (8 core, 2Gb), something running on an Ubuntu 2004 host and something on Ubuntu 22.04 as a minimum.

Gotcha. Will run with the updated fix.

adamfarley · 2024-02-14T14:40:55Z

Ok, given that we (a) want to tolerate both memory.max AND memory.etcetcbytes, and (b) that memory.max can be both present, readable and empty, here's a proposal for the new script:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`; 
elif [[ -r /sys/fs/cgroup/memory/memory.limit_in_bytes && $(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`; 
else 
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

I added formatting for an easier review.

What do you think?

adamfarley · 2024-02-14T14:45:27Z

P.S. If I ever complain I have too much spare time, remind me of this makefile. Turning it into a bash script would remove the need to put all of this script into a single line.

In cgroup v1 we used memory.limit_in_bytes to store the maximum memory allocated to the container. In v2, we use memory.max. This change allows us to check for both, includes a meminfo check for non-containers, and adds a few debug comments so we can be sure this new code is working. My plan is to run tests on a diverse set of machines, and to remove the debug statements prior to merging. Signed-off-by: Adam Farley <adfarley@redhat.com>

adamfarley · 2024-02-14T14:58:39Z

Test run: https://ci.adoptium.net/job/Grinder/8832/

Will check across other machines once I know the new code works.

jerboaa · 2024-02-14T15:19:23Z

Ok, given that we (a) want to tolerate both memory.max AND memory.etcetcbytes, and (b) that memory.max can be both present, readable and empty, here's a proposal for the new script:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`; 
elif [[ -r /sys/fs/cgroup/memory/memory.limit_in_bytes && $(cat /sys/fs/cgroup/memory/memory.limit_in_bytes) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes`; 
else 
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

I added formatting for an easier review.

What do you think?

You should test on the FS type on the cgroup root. In pseudo code:

if [ "$(stat -f --format '%T' /sys/fs/cgroup)_" == "cgroup2fs_" ]; then
   #cgroup v2 branch
else
   #cgroup v1 branch
fi

adamfarley · 2024-02-14T16:21:11Z

Ok, had a ponder and came up with this:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -f "/.dockerenv" ]]; then
    CGRPV=`stat -f --format '%T' /sys/fs/cgroup`;
    if [[ "_$$CGRPV" == "_cgroup2fs" ]]; then
        CGMEM=`cat /sys/fs/cgroup/memory.max 2>1`; 
    elif [[ "_$$CGRPV" == "_tmpfs" ]]; then 
        CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>1`; 
    fi; 
fi;

if [[ ! $$(CGMEM) =~ ^[0-9]+$$ ]]; then
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

This way we:

Check if we're in a container before running the stat command, which not all OS' have.
Can handle things if the file is missing, unreadable, or empty.
Don't use cgroups contents if we're not in a container (some OS' have that folder even outside of any containers).

Added some debug code (so we know we're following the right code path) and ran to test it here: https://ci.adoptium.net/job/Grinder/8836/

To ensure we're looking in the correct file for the maximum memory size. Also to handle permissions issues, empty files, etc. Signed-off-by: Adam Farley <adfarley@redhat.com>

Signed-off-by: Adam Farley <adfarley@redhat.com>

jerboaa · 2024-02-14T16:57:53Z

if [[ -f "/.dockerenv" ]]; then

Means you assume docker as the runtime. Fails once moved to podman or something else, FWIW. Not sure if that's an issue worth addressing at this point.

sxa · 2024-02-14T18:17:57Z

You should test on the FS type on the cgroup root. In pseudo code:

Do you think that's going to trap additional situations compared to just checking for the presence of memory.max? Given where this code is (on one line for now) my preference would probably be to just check one or the other given that this is already quite complex for one line unless we need both checks.

if [[ -r /sys/fs/cgroup/memory.max && $(cat /sys/fs/cgroup/memory.max) =~ ^[0-9]+$ ]]; 
    then CGMEM=`cat /sys/fs/cgroup/memory.max`;

We should avoid mixing $(...) and the backquote syntax in this. I also don't personally like using bash regex is bash but I guess that's just personal preference and that ship has sailed elsewhere in our code.

if [[ -f "/.dockerenv" ]]; then

Means you assume docker as the runtime. Fails once moved to podman or something else

Is that check even necessary?

adamfarley · 2024-02-15T15:19:30Z

Taking into consideration Stewart and Severin's points, I'm testing this:

KMEMMB=`awk '/^MemTotal:/{print int($$2/1024)}' /proc/meminfo`; 

if [[ -r /sys/fs/cgroup/memory.max ]]; then
    CGMEM=`cat /sys/fs/cgroup/memory.max 2>1`; 
else
    CGMEM=`cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>1`; 
fi; 

if [[ ! $$(CGMEM) =~ ^[0-9]+$$ ]]; then
    CGMEM=`expr $${KMEMMB} \* 1024 \* 1024`; 
fi; 

CGMEMMB=`expr $${CGMEM} / 1024 / 1024`; 

if [ "$${KMEMMB}" -lt "$${CGMEMMB}" ]; then 
    echo "$${KMEMMB}"; else echo "$${CGMEMMB}"; 
fi

This removes the cgroup version check, but also removes the need for it (as we now have error-handling).

One fewer man-page check (for stat) and one fewer "if" layer, to make debugging easier.

Test run: https://ci.adoptium.net/job/Grinder/8847/

Makes it easier to maintain, and more resilient to failure. I've also added a commented, formatted copy of the script to help people read it in the future. Signed-off-by: Adam Farley <adfarley@redhat.com>

Signed-off-by: Adam Farley <adfarley@redhat.com>

adamfarley · 2024-02-15T15:56:43Z

Also, I added the formatted copy of the script into a big comment, to make maintenance easier in the future.

Signed-off-by: Adam Farley <adfarley@redhat.com>

adamfarley · 2024-02-16T16:20:01Z

Ok, typo fixed and code works. All debug comments removed, and tested here:

https://ci.adoptium.net/job/Grinder/8850/console

Requesting merge now.

adamfarley added 2 commits February 5, 2024 13:03

Fix for unchecked concurrency

cda0149

We need to make sure we're not using so many cores that the tests are starved of memory. Signed-off-by: Adam Farley <adfarley@redhat.com>

Adjusting memory size math to compare bytes with bytes

61a56c4

Instead of what it does right now, which is to compare bytes with kilobytes, and incorrectly assume that the kilobytes number is always smaller. Signed-off-by: Adam Farley <adfarley@redhat.com>

This was referenced Feb 14, 2024

JDK21 sanity.openjdk failures on x86-64_linux on some machines #5012

Closed

Static Docker containers need to be restarted with --cpuset-cpus="0-3" adoptium/infrastructure#3375

Closed

sxa mentioned this pull request Feb 14, 2024

Fix concurrency on non-cgroups linux #4792

Closed

jerboaa reviewed Feb 14, 2024

View reviewed changes

andrew-m-leonard approved these changes Feb 14, 2024

View reviewed changes

sxa reviewed Feb 14, 2024

View reviewed changes

adamfarley added 2 commits February 14, 2024 16:50

Include docker check and cgroup version check

7b6dc17

To ensure we're looking in the correct file for the maximum memory size. Also to handle permissions issues, empty files, etc. Signed-off-by: Adam Farley <adfarley@redhat.com>

Exporting debug message

65ec46d

Signed-off-by: Adam Farley <adfarley@redhat.com>

karianna approved these changes Feb 14, 2024

View reviewed changes

adamfarley added 2 commits February 15, 2024 15:48

Switching back to file existance checks for simplicity

ef689c4

Makes it easier to maintain, and more resilient to failure. I've also added a commented, formatted copy of the script to help people read it in the future. Signed-off-by: Adam Farley <adfarley@redhat.com>

Correcting comment symbol. Typo.

2ec2b02

Signed-off-by: Adam Farley <adfarley@redhat.com>

jerboaa approved these changes Feb 15, 2024

View reviewed changes

sxa approved these changes Feb 15, 2024

View reviewed changes

Switching to curley braces to access variable value

923e67e

Signed-off-by: Adam Farley <adfarley@redhat.com>

andrew-m-leonard merged commit 787c3a5 into adoptium:master Feb 16, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine concurrency logic #5063

Refine concurrency logic #5063

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

jerboaa left a comment

sxa commented Feb 14, 2024

andrew-m-leonard left a comment

sxa left a comment

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

jerboaa commented Feb 14, 2024

adamfarley commented Feb 14, 2024 •

edited

Loading

jerboaa commented Feb 14, 2024

sxa commented Feb 14, 2024 •

edited

Loading

adamfarley commented Feb 15, 2024 •

edited

Loading

adamfarley commented Feb 15, 2024

adamfarley commented Feb 16, 2024 •

edited

Loading

Refine concurrency logic #5063

Refine concurrency logic #5063

Conversation

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

jerboaa left a comment

Choose a reason for hiding this comment

sxa commented Feb 14, 2024

andrew-m-leonard left a comment

Choose a reason for hiding this comment

sxa left a comment

Choose a reason for hiding this comment

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

adamfarley commented Feb 14, 2024

jerboaa commented Feb 14, 2024

adamfarley commented Feb 14, 2024 • edited Loading

jerboaa commented Feb 14, 2024

sxa commented Feb 14, 2024 • edited Loading

adamfarley commented Feb 15, 2024 • edited Loading

adamfarley commented Feb 15, 2024

adamfarley commented Feb 16, 2024 • edited Loading

adamfarley commented Feb 14, 2024 •

edited

Loading

sxa commented Feb 14, 2024 •

edited

Loading

adamfarley commented Feb 15, 2024 •

edited

Loading

adamfarley commented Feb 16, 2024 •

edited

Loading