-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing AppVeyor with another CI for windows #568
Comments
Sure, but build times for TensorFlow are longer than 2 hours. I don't know of any service that supports that, do you? |
Not yet, but I might try shippable to build the tensorflow libs using the cppbuild.sh, in order to see if its useful for us. This ticket is just for exchanging experiences of various CI service providers. |
It looks like Shippable also has a limit of 2 hours, just like AppVeyor: @vb216 Would you have any suggestions? |
Do you know how long it would need to complete? They might be able to
increase it a bit more, either free or commercially?
Alternatively, can it be broken down into two smaller tasks? Appveyor were
working on the ability to share artifacts between builds, so if you could
compile half under the time limit, share to the next build task and resume
the build there, might work? I think appveyor had fixed IP addresses on
their build nodes too, so potentially you could whitelist that IP on a AWS
S3 store and push prebuilt work there too (if cache sharing locally still
doesn't work)
…On Thu, 31 May 2018, 04:18 Samuel Audet, ***@***.***> wrote:
It looks like Shippable also has a limit of 2 hours, just like AppVeyor:
http://docs.shippable.com/ci/custom-timeouts/
@vb216 <https://github.com/vb216> Would you have any suggestions?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMYRJ_KHjAZ-8HqZSfENLA5N9SImioFBks5t32EBgaJpZM4UTNEW>
.
|
We might be able to get 4 hours if we start paying for it? Maybe, we'd have to ask... |
Want me to drop them a line and cc you?
…On Thu, 31 May 2018, 08:54 Samuel Audet, ***@***.***> wrote:
We might be able to get 4 hours if we start paying for it? Maybe, we'd
have to ask...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMYRJwPqJwrl5lXWmd7SF6X0e_2FfCfFks5t36G3gaJpZM4UTNEW>
.
|
Sure, that'd be great! Thanks |
Compiling the cuda kernels takes a long time on windows. Just the tf_core_gpu_kernels.vcxproj project of tensorflow takes over 2h on my machine. Creating the static libs (tensorflow_static.vcxproj) afterwards needs another hour. There are of course many dependency projects which are included in the before mentioned projects. We could run those separately and store their results beforehand, to reduce the compilation time of the bigger projects. @vb216 If you task shippable you might also want to ask Appveyor. |
Ah I meant asking Appveyor - they've been pretty kind in the past increasing from their default 60 min build to about 110 mins nowadays. They replied back already too, suggesting this https://www.appveyor.com/docs/build-environment/#private-build-cloud as an option - would need to be on their premium service but they discount at 50% for open source projects. Seems pretty reasonable. I guess maybe there's the cost of the cloud instance to add in on top of that. Looks like two build engines so overall time would be quicker as well. Only other suggestion I had was going back in the Jenkins sort of direction, as seems like most cloud build providers will have some sort of time limit. But, that seems a step backwards from where it's at right now. |
Yeah, it would be great if we could continue using cloud services like that. I have no problems with paying a small fee, but any ideas why those places can't provide builds longer than 2 hours even to paying customers? I feel that managing a private build cloud wouldn't give us much over Jenkins... |
Not sure why but it does seem a common time limit, I guess they're getting
billed for the time they spin up VM instances, maybe they take a view of
worst case incurred cost on users hitting that upper time limit? Plus
prevents never ending jobs costing them alot just because they've hung.
And you're right, it does drive up the management effort when none of us
has a ton of spare time to work on this. Slight benefit over Jenkins is it
sounds like they provide some software drop to go on your cloud at least.
…On Thu, 31 May 2018, 23:22 Samuel Audet, ***@***.***> wrote:
Yeah, it would be great if we could continue using cloud services like
that.
I have no problems with paying a small fee, but any ideas why those places
can't provide builds longer than 2 hours even to paying customers? I feel
that managing a private build cloud wouldn't give us much over Jenkins...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#568 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMYRJ_1p_q9LGZl_HoKXUt1tSMXCtKRbks5t4G0ZgaJpZM4UTNEW>
.
|
Status update: It sounds like with a Premium plan AppVeyor should be able to provide us with long enough build times. @Neiko2002 Have you been able to find anything else? |
@saudet The list is quite complete I think, but only for free services. Back than I didn't know you are also considering paid ones. There are much more of them. |
@Neiko2002 In your opinion, what would be the best ones? |
@Neiko2002 @vb216 With Python support (pull #596) and CUDA it would take about 7 hours to build on AppVeyor. Also, I still haven't been able to figure out why it insists on reporting an exit code of 259, so partial builds don't appear practical either. In any case, AppVeyor wasn't designed for long builds like that, so is our only option here to do it manually with Jenkins or something? @wumo If you have any ideas as well, please let us know! |
Microsoft-hosted build agents look like a potential solution:
https://docs.microsoft.com/en-us/vsts/pipelines/agents/hosted?view=vsts Anyone interested in giving this a try? I'll be testing the build at least on a Standard_DS3_v2 to make sure it finishes in less than 6 hours at least. (They even have support for Linux and Mac! But Travis CI works well for those platforms, so I'm not thinking about changing anything there...) |
Sounds interesting. They provide multiple cpu cores + SSD based temp storage. I've just started again building tensorflow to check #596 and will be trying out small incremental builds to see how long the different parts needs to compile. |
@Neiko2002 Thanks! Oops, I think I've already done the work for more incremental builds, see the update. |
I've tested the build on a Standard DS3 v2 (4 vcpus, 14 GB memory) instance on Azure with Windows 2012 R2, Visual Studio 2015, and CUDA 9.2. It took about 3 hours for the core (including Python) and additionally just over 1 hour for CUDA, so it's looking very good. We'll need to figure out how to set up VSTS and integrate it with GitHub, but this guide doesn't seem too terse: |
It looks like we're more likely to get a Standard DS2 v2 (2 vcpus, 7 GB memory) though: |
Hum, it takes a bit more than 6.5 hours to build with CUDA (or about 5 hours without) on a Standard DS2 v2 (2 vcpus, 7 GB memory), not cool... |
@Neiko2002 Could you try and see if there wouldn't be a way to split the build in even more increments? Ideally splitting the non-CUDA core build into more or less 3 parts of less than 2 hours each on 2 Xeon cores. |
@saudet I just tried different modules which are needed to build the I was setting the maxcpucount in the cppbuild.sh to 2. But this option does not work as one would expect:
Cmake activated I will check and create a PR if this works. But until than all the timings in my figure above are computed on a multi core processor. |
Compiling
After the message above the Looking even closer with wmic we can see a difference, but does msbuild use such fine grained timestamp? There are difference timestamp resolutions for every file system (100ns for NTFS). Following the
The file
The problem is The And this is just one example why the construction takes so long. PS: We could use |
Compiling |
Sounds good! Does this mean that we can build |
It still doesn't seem to build under 4 hours here on AppVeyor with 2 cores though: |
No just changing the |
Some more uncoolness, it looks like we can't get administrator rights on Microsoft-hosted agents for VSTS: https://github.com/IvanBoyko/vsts-install-MSI/issues/3#issuecomment-342798108 |
@vb216 found an interesting thread on TensorFlow's repo: tensorflow/tensorflow#10521 It looks like building without |
In the end, it looks like the best option available out there is still AppVeyor. With the workaround above, we're able to build on 4-core VMs in about 3:30 hours for everything, and about 1:45 hours without CUDA. They also added support for Linux recently and plan to support Mac as well, so this is looking promising. Still, thanks for your time on this @Neiko2002! Much appreciated :) |
With Linux and Mac they will be the first supporting all major OS. This could reduce future workload for the project. |
I was wondering if it makes sense to swap AppVeyor with another CI host for windows builds. Any suitable candidate should be tested with one of the biggest javacpp-presets (e.g. mxnet or tensorflow). We can also stay with AppVeyor and just build tensorflow with another CI.
Here is a small list of CI service providers and some information about the:
https://github.com/bytedeco/javacpp-presets/wiki/Continuous-Integration-(CI)
The text was updated successfully, but these errors were encountered: