Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elm compilation is incredibly slow on CI platforms #1473

Closed
obmarg opened this issue Sep 1, 2016 · 48 comments
Closed

Elm compilation is incredibly slow on CI platforms #1473

obmarg opened this issue Sep 1, 2016 · 48 comments

Comments

@obmarg
Copy link

obmarg commented Sep 1, 2016

Projects that compile in seconds on my local machine take an unreasonably long time when run on CI.

This issue produced a stopgap fix here #1473 (comment). Use that for now.

Additional Details

I've put together an example project using a sample from the elm-guide.

On my local machine this takes 2.6 seconds to build. The Travis CI build here takes 234 seconds to do the same build. My dev machine may be slightly better than the CI machines in question, but certainly not better enough for this difference in build times.

I've seen this behaviour on both travis CI & circle CI, and it only seems to get worse with larger projects. Another project of mine (a few hundred lines of elm, nothing major) struggles to build within 10 minutes.

I see there's a workaround for this here: https://8thlight.com/blog/rob-looby/2016/04/07/caching-elm-builds-on-travis-ci.html

@process-bot
Copy link

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

@evancz
Copy link
Member

evancz commented Sep 2, 2016

@obmarg, can you figure out if it's slow to download packages or to actually build things. I'd expect it to be the former, and it'd be great to know for sure.

I think @OvermindDL1 is talking about something else, so I'm getting rid of those comments. If you have an SSCCE of your thing, and it's not fixed by elm-lang/elm-make@46ec85c then open a separate issue on an appropriate repo.

@obmarg
Copy link
Author

obmarg commented Sep 2, 2016

@evancz elm-package install takes 0.5 seconds, elm-make takes 234 seconds. I assume elm-make doesn't do any downloading if elm-package install has already been run?

@evancz
Copy link
Member

evancz commented Sep 2, 2016

That sounds right, but if you are doing any weird caching of elm-stuff/ I'm not 100% certain. Like if you cache exact-dependencies.json but nothing else or something. I'm not sure.

Basically, if you want the compiler to go faster under odd conditions, I need as much detailed information about what's going wrong as possible. Maybe they are throttling processes? Maybe they report they have multiple cores, but it's actually one? I have no idea without you telling me.

@evancz
Copy link
Member

evancz commented Sep 2, 2016

Another question to ask, are you running elm-package install --yes or without --yes? I think it's important to get a more precise diagnosis before "something" can be fixed.

@obmarg
Copy link
Author

obmarg commented Sep 2, 2016

I agree, a precise diagnosis would be a great idea. The commands that are being run are:

$ elm-package install -y
Packages configured successfully!
$ elm make Main.elm

There's nothing odd being done in between these commands as far as I'm aware.

I would like to explain what the odd conditions causing this are, but I'm not too sure myself. I've been using these CI services for years, and this is the first time I've ran into a serious performance issue like this.

Is there any way to enable more logging in the elm compiler, or anything else that would help diagnose?

@obmarg
Copy link
Author

obmarg commented Sep 2, 2016

It could be the case that travis & circle are reporting way more cores than are actually usable. I just checked /proc/cpuinfo in both environments, and they list 32 cores. The travis documentation specifically says you'll have 2 cores for your builds. I can't find any documentation for circle, but I'm pretty sure I don't have exclusive access to all 32 of those cores.

@evancz
Copy link
Member

evancz commented Sep 2, 2016

This line tells elm-make to look up how many cores there are, so we can use them all. Then this file will just spawn a bunch of light-weight threads and trust Haskell to schedule them nicely. I can imagine if Haskell is being told it has 32 cores, but it only has 2, that things could be getting goofy.

Is there some way to try to make sure that that line is reporting two? Or trick it into reporting two and seeing if that resolves things?

I have some logging stuff for myself, but not a public flag yet. So you could get this information if you build from source. It breaks down how much time is spent in different parts of elm-make. It may just show it's all in the compiler though, so I'd do this kind of thing as a backup because building from source can take a while and be tricky.

@obmarg
Copy link
Author

obmarg commented Sep 3, 2016

I've been doing a bit of work to try and confirm that this false number of CPUs is actually causing this problem. I had a look into how the getNumProcessors works, and discovered libsysconfcpus, which lets you override the number of CPUs reported by sysconf (which getNumProcessors uses under the hood).

I then built & ran that on my CI environment:

$ rm -R elm-stuff/build-artifacts/*
$ time sysconfcpus -n 1  elm-make
Success! Compiled 47 modules.

real    0m2.215s
user    0m2.195s
sys     0m0.024s
$ rm -R elm-stuff/build-artifacts/*
$ time elm-make
Success! Compiled 47 modules.

real    9m21.660s
user    15m38.880s
sys     2m47.578s

So it does look like the CPU count detection is the problem. Seems like a command line option (or similar) might be a reasonable idea?

For anyone trying to use libsysconfcpus themselves, I ran into a couple of compiler issues. My fixed version is here.

@evancz
Copy link
Member

evancz commented Sep 3, 2016

Awesome @obmarg, shared that trick with NoRedInk, I think it'll help them too!

Folks raised the idea of having a --jobs flag on elm-make, but it has problems. There are two ways to restrict the number of "jobs". One is to override this line which is a bad idea for every single user unless you run into this exact CI problem. Another is to manage a thread-pool in this file which I think is pretty pointless if Haskell thinks there are 32 cores and has its own stuff to manage. The root problem could be a bad interaction between Haskell's GC and our threads, so this route may not actually solve the problem.

I recommend folks having this problem use @obmarg's trick for now. I'd like to talk to more people who are seeing this problem in practice to figure out a solution that does not allow bad outcomes in any cases, so no need for PRs at the moment. Code is always the easy part.

@evancz
Copy link
Member

evancz commented Sep 3, 2016

I think a flag like --max-cores that conditionally overrides this line may have the right naming to make sure it is used appropriately.

So you can say elm-make --max-cores=2 anytime you want, but it is pretty clear that this is not something you want under normal circumstances. It also means you may set --max-cores=4 on a machine that actually only has two and two will win.

@obmarg, do you like that approach? Can you think of ways to make sure anyone using CI knows to use that? Maybe we should just have official CI recipes for testing?

I will talk to NRI people about this next week and get their feedback as well.

@obmarg
Copy link
Author

obmarg commented Sep 5, 2016

--max-cores seems like a reasonable name to me. I can think of a couple of situations where you might want to limit the number of cores you're running on, but none where you'd want to explicitly increase it.

Official CI recipes could also be useful, though there's a bunch of different ways to integrate elm into your build system. A recipe for running elm-make on CI might not help someone who uses brunch or webpack to run elm-make, for example. Though at least it could be a place to explain the issue, that people could refer to.

Don't know if this is something that you'd be interested in adding a warning to the compiler for? Though it's probably quite hard to get right...

@jvoigtlaender
Copy link
Contributor

jvoigtlaender commented Sep 8, 2016

FWIW, here is a concrete Travis recipe I arrived at that does work around this issue:

cache:
  directories:
    - sysconfcpus
install:
  - |
    if [ ! -d sysconfcpus/bin ];
    then
      git clone https://github.com/obmarg/libsysconfcpus.git; 
      cd libsysconfcpus;
      ./configure --prefix=$TRAVIS_BUILD_DIR/sysconfcpus;
      make && make install;
      cd ..;
    fi

and then wherever there is a call to elm-make or elm-test, prefix that by $TRAVIS_BUILD_DIR/sysconfcpus/bin/sysconfcpus -n 2.

As a thing to note about the --max-cores option, I think it will not suffice to just add support for this in elm-make for immediate benefit to many. People use calls to elm-test in their CI scripts and the elm-test executable then calls out to elm-make. So there would probably have to be coordination with https://github.com/rtfeldman/node-test-runner so that people get to pass an option like --max-cores to elm-test, which will then know to pass it on to elm-make.

@fredcy
Copy link

fredcy commented Sep 9, 2016

This workaround cuts my elm-package + elm-make time in Travis CI from almost 10 minutes down to 5 seconds. Nice work. Thank you.

@BrianHicks
Copy link
Contributor

Would an environment variable make sense for this? ELM_MAKE_MAX_CORES=2 (modulo bikeshedding the name) would be available to the compiler regardless of wrapper scripts or tools, and every CI provider has first-class support for setting those vars.

@dynajoe
Copy link

dynajoe commented Sep 15, 2016

This looks like a promising work-around. My team is presently building our elm modules in a Docker container. I will try this out and report back. If anyone has already done this (with Docker) please respond with your results and possibly save us some time. 😄

@agrafix
Copy link

agrafix commented Sep 15, 2016

Btw: sysconfcpus -n 1 also worked very well to speed my builds on CircleCI from 24 minutes to just 9 seconds!

@rtfeldman
Copy link
Member

For those using npm install -g elm to obtain elm-make, I expanded on @jvoigtlaender's amazing workaround to replace elm-make with a script that prepends $TRAVIS_BUILD_DIR/sysconfcpus/bin/sysconfcpus -n 2 https://github.com/rtfeldman/node-elm-compiler/blob/master/.travis.yml#L37-L39

Basically this is a drop-in replacement that makes elm-make "just work" for the tests themselves. 😸

@mgold
Copy link
Contributor

mgold commented Sep 30, 2016

@rtfeldman Is this something that can help elm-test CI builds as well? Can you write a ready-to-use .travel.yml for that?

@rtfeldman
Copy link
Member

@mgold PR: elm-community/elm-test#70

@francesco-bracchi
Copy link

@evancz AFAIK, the line from the comment #1473 (comment) could be removed, since the default value of GHC.Conc.numCapabilities should be the number of processors, or can be controlled via the runtime options +RTS -N[x] -RTS

@jvoigtlaender
Copy link
Contributor

@francesco-bracchi, I think you are wrong. Simply leaving that line out will change how the compiler behaves. Namely, it will not use concurrency anymore then. See https://downloads.haskell.org/~ghc/master/users-guide/using-concurrent.html.

@danielcompton
Copy link

danielcompton commented Aug 6, 2018

I've run into this same class of problems when writing Clojure + Java 8, running on CircleCI. The machine had oodles of RAM, and my JVM thought it could take more of it than it was allowed to use. Manually setting memory limits fixed the issue. The root cause from my perspective is that the system (JVM/elm-make) is not correctly interpreting the hints that the environment is giving it about what resources are available to it.

Java 9 and 10 have improvements to running under Docker containers. In Java 10, the container can look at its runtime to see what constraints it is running under.

In theory, it seems like it would be possible for the Elm compiler + associated machinery to take a similar approach. Automatically detecting the number of CPU cores available would fix this without requiring any configuration from users. Something like nproc looks like one approach you could take for detecting the number of allowed CPUs to use.

Apologies if I've misunderstood the issue, I didn't really see anyone directly suggesting that elm should detect how many CPU cores it is actually allowed to use.

@rtfeldman
Copy link
Member

We've looked into that approach. As it turns out, Haskell's concurrency library only knows how to detect "number of physical cores," not "number of available cores." Node.js is the same way.

Rust's num_cpus crate knows how to detect both. It's possible we could introduce some Rust FFI to Elm's compiler (which is written in Haskell) just to accurately get that one number, but it's not clear that's the best path. 😄

davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
davcamer added a commit to davcamer/elm-protobuf that referenced this issue Aug 12, 2018
As described in this elm compiler issue:
elm/compiler#1473
@harrysarson
Copy link

May I ask if this issue is in any way impacted by Elm 0.19?

@turboMaCk
Copy link

I'm not sure. Anyway, in my opinion, people still blame the wrong thing. Problem is not necessary detection of CPU cores. The issue is that no matter what, compilation gets slower with increasing number of threads. The environment in which more threads are used only makes symptoms more noticeable but CPU detection isn't an issue by itself. Maybe it's a secondary issue which makes sense to fix once primary issue - the fact that compiler gets slower with increasing number of threads even though HW resources are available.

@rtfeldman
Copy link
Member

Yeah - they are separate issues; fixing one but not the other would not solve the problem completely.

@turboMaCk
Copy link

My personal opinion is that:

  • we have a workaround that is already commonly known... it's not nice, but it's not that bad (stressing the word known)
  • this is not the simple problem to solve (or both aren't simple) we need to be patient
  • fixing it partially might be worse than ignoring that. It would cause different odd behavior which unlike this wouldn't be known to folks and would make more confusion, more threads....
  • even if this still exists in the same form in 0.19 (which I don't know yet but other might do) single thread is fast enough. So it's "just inconviniece" of setting this up in CI or on a desktop if you have 8+ cores machine.

As a affected user I'm happy I don't have to find a new workaround every month after some bad patch for this is released.

ericbaranowski pushed a commit to kulado/wealthmind that referenced this issue Sep 14, 2018
elm/compiler#1473

Signed-off-by: Elliot Murphy <elliot@elliotmurphy.com>
@davcamer
Copy link

davcamer commented Oct 5, 2018

On elm 0.19 a test suite of 5 tests that runs in less than 2 seconds with sysconfcpus -n 1 on a Concourse set up, has been running for 3 hours 55 minutes on the same Concourse set up without.

If more specific information would somehow help address this issue, I would be happy to provide it.

@rtfeldman
Copy link
Member

@davcamer This i san elm-test on Linux issue, unrelated to the compiler!

See rtfeldman/node-test-runner#295

fidel added a commit to RailsEventStore/rails_event_store that referenced this issue Jan 3, 2019
fidel added a commit to RailsEventStore/rails_event_store that referenced this issue Jan 3, 2019
@evancz
Copy link
Member

evancz commented Feb 15, 2019

With 0.19 folks are able to say things like:

elm make src/Main.elm --optimize +RTS -N4

The things after +RTS are flags to the Haskell runtime, so you can tell it how many cores to use, tweak GC options, etc. There is also a script for TravisCI these days that should account for the root issue.

@rtfeldman also documented the root problem in GHC that led to this here.

Given that there are workarounds in Elm, and the root issue is in GHC, I think it makes sense to close this issue. If folks are still having problems, please open a new issue explaining your particular scenario, with an SSCCE if possible!

@mloughran
Copy link

Noting that elm-test-rs solved this issue (elm-test slowness on CircleCI) for me, without reverting to sysconfcpus.

@turboMaCk
Copy link

I believe this was fixed in Elm 0.19.1.

@harrysarson
Copy link

harrysarson commented Mar 12, 2021

It has been fixed in elm 0.19.1!

(As cool a project as elm-test-rs is it doesn't do anything special with respect to invoking the elm compiler: if you don't need sysconfcpus for elm-test-rs, you don't need it for elm-test either).

@mloughran
Copy link

I do apologise, and thank you for correcting me @turboMaCk and @harrysarson. I jumped to the wrong conclusion, and can confirm that elm compilation is not the issue, and neither does sysconfcpus make a speck of difference.

[I don't know if it's to be expected that elm-test is ~20x slower than elm-test-rs on CircleCi, but this is not the place for that question to be addressed!]

@harrysarson
Copy link

[I don't know if it's to be expected that elm-test is ~20x slower than elm-test-rs on CircleCi, but this is not the place for that question to be addressed!]

I would be interested to here more about this! If you have the time, ping my @harrysarson on the elm-tesr slack or open an issue at https://github.com/rtfeldman/node-test-runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests