Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing AppVeyor with another CI for windows #568

Closed
Neiko2002 opened this issue May 30, 2018 · 32 comments
Closed

Replacing AppVeyor with another CI for windows #568

Neiko2002 opened this issue May 30, 2018 · 32 comments

Comments

@Neiko2002
Copy link
Member

I was wondering if it makes sense to swap AppVeyor with another CI host for windows builds. Any suitable candidate should be tested with one of the biggest javacpp-presets (e.g. mxnet or tensorflow). We can also stay with AppVeyor and just build tensorflow with another CI.

Here is a small list of CI service providers and some information about the:
https://github.com/bytedeco/javacpp-presets/wiki/Continuous-Integration-(CI)

@saudet
Copy link
Member

saudet commented May 30, 2018

Sure, but build times for TensorFlow are longer than 2 hours. I don't know of any service that supports that, do you?

@Neiko2002
Copy link
Member Author

Not yet, but I might try shippable to build the tensorflow libs using the cppbuild.sh, in order to see if its useful for us. This ticket is just for exchanging experiences of various CI service providers.

@saudet
Copy link
Member

saudet commented May 31, 2018

It looks like Shippable also has a limit of 2 hours, just like AppVeyor:
http://docs.shippable.com/ci/custom-timeouts/

@vb216 Would you have any suggestions?

@vb216
Copy link
Member

vb216 commented May 31, 2018 via email

@saudet
Copy link
Member

saudet commented May 31, 2018

We might be able to get 4 hours if we start paying for it? Maybe, we'd have to ask...

@vb216
Copy link
Member

vb216 commented May 31, 2018 via email

@saudet
Copy link
Member

saudet commented May 31, 2018

Sure, that'd be great! Thanks

@Neiko2002
Copy link
Member Author

Neiko2002 commented May 31, 2018

Compiling the cuda kernels takes a long time on windows. Just the tf_core_gpu_kernels.vcxproj project of tensorflow takes over 2h on my machine. Creating the static libs (tensorflow_static.vcxproj) afterwards needs another hour. There are of course many dependency projects which are included in the before mentioned projects. We could run those separately and store their results beforehand, to reduce the compilation time of the bigger projects.

@vb216 If you task shippable you might also want to ask Appveyor.

@vb216
Copy link
Member

vb216 commented May 31, 2018

Ah I meant asking Appveyor - they've been pretty kind in the past increasing from their default 60 min build to about 110 mins nowadays. They replied back already too, suggesting this https://www.appveyor.com/docs/build-environment/#private-build-cloud as an option - would need to be on their premium service but they discount at 50% for open source projects. Seems pretty reasonable. I guess maybe there's the cost of the cloud instance to add in on top of that. Looks like two build engines so overall time would be quicker as well.

Only other suggestion I had was going back in the Jenkins sort of direction, as seems like most cloud build providers will have some sort of time limit. But, that seems a step backwards from where it's at right now.

@saudet
Copy link
Member

saudet commented May 31, 2018

Yeah, it would be great if we could continue using cloud services like that.

I have no problems with paying a small fee, but any ideas why those places can't provide builds longer than 2 hours even to paying customers? I feel that managing a private build cloud wouldn't give us much over Jenkins...

@vb216
Copy link
Member

vb216 commented Jun 1, 2018 via email

@saudet
Copy link
Member

saudet commented Jun 11, 2018

Status update: It sounds like with a Premium plan AppVeyor should be able to provide us with long enough build times.

@Neiko2002 Have you been able to find anything else?

@Neiko2002
Copy link
Member Author

@saudet The list is quite complete I think, but only for free services. Back than I didn't know you are also considering paid ones. There are much more of them.

@saudet
Copy link
Member

saudet commented Jun 18, 2018

@Neiko2002 In your opinion, what would be the best ones?

@saudet
Copy link
Member

saudet commented Aug 8, 2018

@Neiko2002 @vb216 With Python support (pull #596) and CUDA it would take about 7 hours to build on AppVeyor. Also, I still haven't been able to figure out why it insists on reporting an exit code of 259, so partial builds don't appear practical either. In any case, AppVeyor wasn't designed for long builds like that, so is our only option here to do it manually with Jenkins or something?

@wumo If you have any ideas as well, please let us know!

@saudet
Copy link
Member

saudet commented Aug 8, 2018

Microsoft-hosted build agents look like a potential solution:

  • Can run jobs for up to 6 hours (30 minutes on the free tier).
  • Currently utilizing Microsoft Azure general purpose virtual machine sizes (Standard_DS2_v2 and Standard_DS3_v2)

https://docs.microsoft.com/en-us/vsts/pipelines/agents/hosted?view=vsts

Anyone interested in giving this a try? I'll be testing the build at least on a Standard_DS3_v2 to make sure it finishes in less than 6 hours at least.

(They even have support for Linux and Mac! But Travis CI works well for those platforms, so I'm not thinking about changing anything there...)

@Neiko2002
Copy link
Member Author

Sounds interesting. They provide multiple cpu cores + SSD based temp storage. I've just started again building tensorflow to check #596 and will be trying out small incremental builds to see how long the different parts needs to compile.

@saudet
Copy link
Member

saudet commented Aug 9, 2018

@Neiko2002 Thanks! Oops, I think I've already done the work for more incremental builds, see the update.

@saudet
Copy link
Member

saudet commented Aug 9, 2018

I've tested the build on a Standard DS3 v2 (4 vcpus, 14 GB memory) instance on Azure with Windows 2012 R2, Visual Studio 2015, and CUDA 9.2. It took about 3 hours for the core (including Python) and additionally just over 1 hour for CUDA, so it's looking very good. We'll need to figure out how to set up VSTS and integrate it with GitHub, but this guide doesn't seem too terse:
https://docs.microsoft.com/en-us/vsts/pipelines/build/ci-build-github
It looks like it costs $40 per month:
https://visualstudio.microsoft.com/team-services/pricing/
But once we figure out how to set all that up, Skymind will be paying for it, so no worries.

@saudet
Copy link
Member

saudet commented Aug 9, 2018

It looks like we're more likely to get a Standard DS2 v2 (2 vcpus, 7 GB memory) though:
https://stackoverflow.com/questions/51725187/vsts-microsoft-hosted-agent-virtual-machine-size
Let's see how that fares...

@saudet
Copy link
Member

saudet commented Aug 10, 2018

Hum, it takes a bit more than 6.5 hours to build with CUDA (or about 5 hours without) on a Standard DS2 v2 (2 vcpus, 7 GB memory), not cool...

@saudet
Copy link
Member

saudet commented Aug 11, 2018

@Neiko2002 Could you try and see if there wouldn't be a way to split the build in even more increments? Ideally splitting the non-CUDA core build into more or less 3 parts of less than 2 hours each on 2 Xeon cores.

@Neiko2002
Copy link
Member Author

Neiko2002 commented Aug 14, 2018

@saudet I just tried different modules which are needed to build the tensorflow_static lib. Every projected is compiled one other the other. Starting from the bottom. tf_c needs 27 minutes, but most of it comes from the tf_core_lib which already needs 25 minutes. The only strange behavior is tf_tools_transform_graph_lib, it only contains a few files but recompiles the tf_core_kernels project, that's why it takes 1h 32min to build (1h 25min of it are from the tf_core_kernels).

Build time of different Tensorflow modules

I was setting the maxcpucount in the cppbuild.sh to 2. But this option does not work as one would expect:
https://msdn.microsoft.com/en-us/library/bb385193.aspx?f=255&MSPPError=-2147217396>

/maxcpucount: the MSBuild.exe tool can build multiple projects at the same time
/MP: compiler (cl.exe) option can build multiple compilation units at the same time

Cmake activated MultiProcessorCompilation (MP) inside of the *.vcxproj files. By default it uses all available CPU cores. It might be possible to set the corresponding ProcessorNumber via the CL_MPCount parameter:
https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows

I will check and create a PR if this works. But until than all the timings in my figure above are computed on a multi core processor.

@Neiko2002
Copy link
Member Author

Neiko2002 commented Aug 15, 2018

Compiling tf_core_kernels with just two cores using the method described in #599 resulted in 3h 37mins. Reading the build log it seems the compiler does some unnecessary steps:

Source compilation required: input E:\G\JP\TENSORFLOW\CPPBUILD\WINDOWS-X86_64-GPU\BUILD\TF_CORE_LIB.DIR\RELEASE\MUTEX.OBJ is newer than output E:\G\JP\TENSORFLOW\CPPBUILD\WINDOWS-X86_64-GPU\BUILD\TF_CORE_LIB.DIR\RELEASE\TF_CORE_LIB.LIB.

After the message above the tf_core_lib.lib gets re-created. Even though no compilation occurs, there is no reason to build an already existing lib. Checking the modification date afterwards shows identical timestamps:
"mutex.obj" 11:27:08
"tf_core_lib.lib" 11:27:08

Looking even closer with wmic we can see a difference, but does msbuild use such fine grained timestamp? There are difference timestamp resolutions for every file system (100ns for NTFS).
"mutex.obj" 20180815112708.291985+120
"tf_core_lib.lib" 20180815112708.759880+120

Following the mutex.obj we find it was created a few lines before and does force more libraries to be recreated afterwards.

E:\G\jp\tensorflow\cppbuild\windows-x86_64-gpu\tensorflow-1.10.0-rc1\tensorflow\core\platform\default\mutex.cc will be compiled as E:\G\JP\TENSORFLOW\CPPBUILD\WINDOWS-X86_64-GPU\BUILD\EXTERNAL\NSYNC\PUBLIC\NSYNC_CV.H was modified at 15/08/2018 11:27:04.
Outputs for E:\G\JP\TENSORFLOW\CPPBUILD\WINDOWS-X86_64-GPU\TENSORFLOW-1.10.0-RC1\TENSORFLOW\CORE\PLATFORM\DEFAULT\MUTEX.CC:
E:\G\JP\TENSORFLOW\CPPBUILD\WINDOWS-X86_64-GPU\BUILD\TF_CORE_LIB.DIR\RELEASE\MUTEX.OBJ

The file NSYNC_CV.H was copied over into the BUILD\EXTERNAL\NSYNC\PUBLIC\ directory by the nsync_copy_headers_to_destination.vcxproj and its PreBuildEvent:

C:\msys64\mingw64\bin\cmake.exe -E copy_directory E:/G/jp/tensorflow/cppbuild/windows-x86_64-gpu/build/nsync/install/include/ E:/G/jp/tensorflow/cppbuild/windows-x86_64-gpu/build/external/nsync/public/

The problem is copy_directory always overwrites the content of the target directory and therefore changes the last modified timestamps:
https://bravenewmethod.com/2017/06/18/update_directory-command-for-cmake/

The nsync_copy_headers_to_destination project is a dependency of many projects and gets executed multiple times in my build process. One of the dependent project is tf_core_lib itself. It is therefore guaranteed the tf_core_lib.lib will be recreated at every run.

And this is just one example why the construction takes so long.

PS: We could use /p:BuildProjectReferences=false if we can assure all reference projects exists:
https://msdn.microsoft.com/en-us/library/bb629394.aspx?f=255&MSPPError=-2147217396

@Neiko2002
Copy link
Member Author

Neiko2002 commented Aug 15, 2018

Compiling tf_core_kernels twice in a row and disabling /p:BuildProjectReferences=false for the second run, reduces the build time of 2 CPU cores from 3h 37min to 3h 17min. To lower it further we would need to divide the project (tf_core_kernels.vcxproj). In the first half just some files will be compiled, the remaining ones + linking all files is done in the second half.

@saudet
Copy link
Member

saudet commented Aug 16, 2018

Sounds good! Does this mean that we can build tensorflow_static in less than 4 hours? If so, that might be enough.

@saudet
Copy link
Member

saudet commented Aug 16, 2018

It still doesn't seem to build under 4 hours here on AppVeyor with 2 cores though:
https://ci.appveyor.com/project/Bytedeco/javacpp-presets/build/612

@Neiko2002
Copy link
Member Author

No just changing the CL_MPCount flag does not help much. But I will prepare a PR with a partial tensorflow build. The first part will build most of the vcxproj projects (incl. the python api and gpu kernels) in around 2h. The second part creates the missing tf_core_kernels and tensorflow_static in 3h 20min in my tests (using only two cores with CL_MPCount).

@saudet
Copy link
Member

saudet commented Aug 21, 2018

Some more uncoolness, it looks like we can't get administrator rights on Microsoft-hosted agents for VSTS: https://github.com/IvanBoyko/vsts-install-MSI/issues/3#issuecomment-342798108
https://mohitgoyal.co/2017/08/18/install-powershell-modules-on-hosted-agent-in-vsts-visual-studio-team-services/
Makes it very hard to get anything done...

@saudet
Copy link
Member

saudet commented Oct 4, 2018

@vb216 found an interesting thread on TensorFlow's repo: tensorflow/tensorflow#10521

It looks like building without __force_inline for Eigen in conv_ops works around the slow build issue. I'll be testing that, and since we're building with MKL-DNN, we might not even incur any performance hit.

@saudet
Copy link
Member

saudet commented Oct 16, 2018

In the end, it looks like the best option available out there is still AppVeyor. With the workaround above, we're able to build on 4-core VMs in about 3:30 hours for everything, and about 1:45 hours without CUDA. They also added support for Linux recently and plan to support Mac as well, so this is looking promising. Still, thanks for your time on this @Neiko2002! Much appreciated :)

@saudet saudet closed this as completed Oct 16, 2018
@Neiko2002
Copy link
Member Author

With Linux and Mac they will be the first supporting all major OS. This could reduce future workload for the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants