Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WITH DOCKER is not executed in parallel in latest v0.6.28 (but it was in version v0.6.25) #2377

Open
jrodrigv opened this issue Nov 8, 2022 · 14 comments · Fixed by #3406
Open
Assignees
Labels
type:bug Something isn't working

Comments

@jrodrigv
Copy link

jrodrigv commented Nov 8, 2022

Hi,

I have some targets meant for running tests projects in parallel that are using WITH DOCKER to run a Rabbimq service before running the tests. Using v0.6.25 I can see how the test-executor instances are running fully in parallel loading the rabbitmq tar files in parallel etc. However when using v0.6.28 this is not the case anymore and the test-executor instances are running sequentially instead. Is there a way to prevent this from happening?

parallel-testing:
    FROM +build
    WAIT
        FOR dir IN $(ls Tests/*Tests/*.csproj)
            COPY (+test-executor/TestResults --PROJECT=./$dir) ./TestResults
        END
        SAVE ARTIFACT ./TestResults testresults AS LOCAL earthly-artifacts/testresults
    END
   test-executor:
    FROM +build
    ARG PROJECT
    ARG ID
    COPY docker-compose-unit-tests-rabbitmq.yml .
    COPY coverlet.runsettings .

      WITH DOCKER \
     --compose docker-compose-unit-tests-rabbitmq.yml \
     --service rabbitmq
          RUN dotnet test ${PROJECT} --blame-hang-timeout 5m --no-build --collect:"XPlat Code Coverage" -p:CollectCoverage=true --settings coverlet.runsettings --logger trx --configuration $buildConfiguration --results-directory ./TestResults
      END
  
    SAVE ARTIFACT ./TestResults TestResults
@alexcb alexcb added the type:bug Something isn't working label Nov 8, 2022
@vladaionescu
Copy link
Member

Hi @jrodrigv - what's your VERSION declaration at the top of the Eartfhile?

@jrodrigv
Copy link
Author

jrodrigv commented Nov 8, 2022

Hi @vladaionescu - In both cases I was using:

VERSION --wait-block 0.6

@vladaionescu
Copy link
Member

I suspect it has something to do with either WAIT or this new piece of functionality that we released in v0.6.26:

Loading Docker images as part of WITH DOCKER is now faster through the use of an embedded registry in Buildkit. This functionality was previously hidden (VERSION --use-registry-for-with-docker) and was only auto-enabled for Earthly Satellite users. It is now enabled by default for all builds

Can you try the following:

  • On earthly v0.6.29 enable VERSION --wait-block --no-use-registry-for-with-docker 0.6 and see if the issue goes away.
  • On earthly v0.6.29, remove --wait-block (just use VERSION 0.6 with no other flags) and see if that makes the issue go away (you might have to change your build a little bit to remove WAIT for this test).

It's a way to narrow down to what the cause of this might be.

@jrodrigv
Copy link
Author

jrodrigv commented Nov 9, 2022

Hi @vladaionescu .

I have tested the VERSION --wait-block --no-use-registry-for-with-docker 0.6 and it is working fine as before in v0.6.25.

Thanks

@vladaionescu
Copy link
Member

I was hoping you would be able to try the second suggestion too and see if that influences things too. The --no-use-registry-for-with-docker flag is not something we plan to support long-term - it might go away at some point without warning.

@jrodrigv
Copy link
Author

jrodrigv commented Nov 10, 2022

Hi @vladaionescu ,

I have tried the 2nd suggestion as well using just the VERSION 0.6 without any other flags and removing the WAIT

In this scenario, all the +test-executor targets are executed sequentially following the same pattern (screenshot attached)
image

Not sure which could be the root cause but looking at the logs it looks like each test-executor is doing a final image transfer of the docker compose service.
output | [ ] 0% transferring registry.code.com/connectivity/common-library/rabbitmq:management

And then the next test-executor is pulling the previous transferred image from the embedded registry? Creating a kind of artificial dependency between the test-executor instances.

+test-executor | Loading images from BuildKit via embedded registry...
+test-executor | Pulling 172.30.0.1:8371/sess-ofbf05t00k2o4z7js3zfss7q4:img-0 and retagging as registry.code.com/connectivity/common-library/rabbitmq:management
+test-executor | img-0: Pulling from sess-ofbf05t00k2o4z7js3zfss7q4

However, going back to VERSION --wait-block --no-use-registry-for-with-docker 0.6 we can see how the test-executor targets are executed in parallel:

image

@jrodrigv
Copy link
Author

Hi @vladaionescu

I have tested this issue on 0.7 I can reproduce it. I hace created a Earthfile here that can be used to test it.

https://github.com/jrodrigv/EarthlySamples/blob/test/Earthfile

@sesgoe
Copy link

sesgoe commented Mar 1, 2023

Seeing the same issue here! Version 0.6.30.

Also re-parallelizes for me if I use VERSION --no-use-registry-for-with-docker 0.6 in Earthfile.

@idelvall idelvall self-assigned this Oct 18, 2023
vladaionescu added a commit that referenced this issue Oct 23, 2023
…rent args (#3406)

Fixes #2377

The previous `VisitedCollection` implementation attempts to wait for
ongoing targets with the same name to complete before comparing the
inputs of the target. This has the advantage of being more precise with
which ARGs actually influence the outcome of the targets, ignoring
overriding ARGs that end up being unused within those targets. But it
also has the disadvantage that all targets with the same name (but
different overriding args) execute sequentially.

This new implementation simplifies this greatly, by computing the target
input hash upfront based on all overriding args, without knowing if
there are any args that will end up being unused. The result is that we
are able to run all targets with the same name but different args in
parallel. But it also has the disadvantage that in some cases we would
create duplicated LLBs for some targets when certain overriding args are
different but they are unused. Buildkit will generally de-duplicate the
LLB in these cases, although it's possible that there might be edge
cases if the LLB construction is not consistent.

---------

Co-authored-by: nacho <idelvall@brutusin.org>
Co-authored-by: Vlad A. Ionescu <vladaionescu@users.noreply.github.com>
@jrodrigv
Copy link
Author

jrodrigv commented Oct 24, 2023 via email

@vladaionescu
Copy link
Member

vladaionescu commented Oct 24, 2023

Great - please note that this is behind a feature flag. To enable it, you need to use VERSION --use-visited-upfront-hash-collection 0.7. This just went out in Earthly v0.7.21.

@jrodrigv
Copy link
Author

Hi @vladaionescu , I have been doing some testing with the flag using v0.7.21 using the example I provided and It does not seem to work in parallel but still sequentially. I have recorded a video as evidence.

earthly-parallel-withdocker.mp4

@vladaionescu
Copy link
Member

Hi @jrodrigv - Many thanks for the video - it showcases the issue clearly. It seems that I fixed a slightly different situation than your setup. They're closely related though.

It's possible that we either don't yet support the FOR loop, or the series of COPY commands as a way to parallelize. I'll take a look into this. The old --no-use-registry-for-with-docker should still work as a workaround in the meantime.

@jrodrigv
Copy link
Author

Hi @vladaionescu today I was doing some testing with v0.8.11 and I found something interesting that I'd like to share with you.

It seems the problem is caused when using FOR + WITH DOCKER when the targets are being executed sequentially rather that in parallel, however N explicit BUILD commands are executed in parallel successfully.

VERSION  0.8

FROM earthly/dind:alpine

# this executes in parallel
test-parallel:
    BUILD +test-executor  --PROJECT=1
    BUILD +test-executor  --PROJECT=2
    BUILD +test-executor  --PROJECT=3
    BUILD +test-executor  --PROJECT=4
    BUILD +test-executor  --PROJECT=5

# this sequentially
test-parallel-for:
     BUILD +parallel

parallel:
    FOR num IN $(seq 5)
        BUILD +test-executor --PROJECT=$num
    END

test-executor:
    ARG PROJECT

    WITH DOCKER \
        --pull hello-world
        RUN for i in {1..5}; do sleep 1; echo "hello $PROJECT"; done
    END

@alexcb
Copy link
Contributor

alexcb commented May 21, 2024

Great find @jrodrigv; I have a potential implementation to support this under #4138; however there's some duplicate output that's causing some issues that needs to be addressed before it's ready to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants