Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HUDL - SPIKE - Adds output caching to cake runtime #1

Closed
wants to merge 6 commits into from

Conversation

codeimpossible
Copy link

@codeimpossible codeimpossible commented Jul 19, 2019

This updates the Roslyn script generator to emit the generated code to a DLL (taking some of the code from cake-build#2099 as inspiration). The DLL is stored in .cache/ underneath the directory the Cake.dll resides in.

If the cached assembly exists it will be loaded and executed, unless the --recompile flag is sent as an arg into cake.

Results

I ran the new cake version against hudl-microservicetemplate and recorded the execution times of a fresh run and a cached run. FYI the paths still show cake 0.33.0 but that is just because I replaced the cake files there with the built ones from this branch (see "How do I test this?" below).

Fresh run

$> dotnet ./.marvel/dev/tasks/tools/cake0.33.0/cake.coreclr/0.33.0/Cake.dll build.cake --target=help --verbosity=Verbose

Task                           Duration
---------------------------------------------------
Hudl Command Line Tools - Help 00:00:00.0358184
Teardown                       00:00:00.6338775
---------------------------------------------------
Total:                         00:00:00.6696959


$> took
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 8
Milliseconds      : 716
Ticks             : 87164761
TotalDays         : 0.000100885140046296
TotalHours        : 0.00242124336111111
TotalMinutes      : 0.145274601666667
TotalSeconds      : 8.7164761
TotalMilliseconds : 8716.4761

Cached run

$> dotnet ./.marvel/dev/tasks/tools/cake0.33.0/cake.coreclr/0.33.0/Cake.dll build.cake --target=help --verbosity=Verbose

Task                           Duration
---------------------------------------------------
Hudl Command Line Tools - Help 00:00:00.0690991
Teardown                       00:00:00.6805018
---------------------------------------------------
Total:                         00:00:00.7496009

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 548
Ticks             : 15487039
TotalDays         : 1.79248136574074E-05
TotalHours        : 0.000430195527777778
TotalMinutes      : 0.0258117316666667
TotalSeconds      : 1.5487039
TotalMilliseconds : 1548.7039

How do I test this?

  • Clone the repo locally, then run ./build.ps1 from the root
  • Take the contents of the ./artifacts/v0.35.0-HUDL-SpikeAddAss0001/Cake-bin-coreclr-v0.35.0-HUDL-SpikeAddAss0001.zip file and replace the contents of cake.coreclr in your marvel projects tools directory. Then you can run dotnet <path/to/cake.dll> <path/to/build.cake> --target=<TARGET> --verbosity=Diagnostic --cache_enabled=true to run the new cake version with the verbose logs turned on for more context.

Updates to cache invalidation

The cache is now invalidated by cake script changes. This adds ~2s to the execution time in my testing (because the analyzer has to run and hash all the script files). Still, reducing pre-script "build time" from 10s to 3s is still pretty impressive.

I updated the code so that the cache can be enabled via the built-in cake configuration, either with CAKE_CACHE_ENABLED=true and CAKE_CACHE_FORCE_RECOMPILE=true environment variables, --cache_enabled=true --cache_force_recompile=true args, or by adding the following to the cake.config:

[Cache]
Enabled = true
Force_Recompile = true

Default values for both of these are false.

@codeimpossible codeimpossible added the enhancement New feature or request label Jul 19, 2019
@jamiegs
Copy link

jamiegs commented Jul 19, 2019

It seems like with this if you make code changes to the cake files, you would need to run it --recompile is their any quick ways we could determine if the code has changed and recompile it on next run without --recompile?

@codeimpossible
Copy link
Author

yeah i thought about that and i figured that for us at least that would only impact folks developing on the scripts, end users wouldn't really care or would rarely need to recompile IMO.

We could compute a hash of the input file contents and use that as part of the path to the compiled assembly. That might add some complexity to the code since we'd have to compute the hash in one section and pass it down into the roslyn generator.

I'm not sure we'd need to do this for Hudls use, but it's something that seems necessary before we submit back to cake proper.

@jamiegs
Copy link

jamiegs commented Jul 19, 2019

I found this RFC posted in their repo about assembly caching. You might want to check it to see if there's anything that would need to be different to match it. I'm guessing they wouldn't accept the change if it doesn't follow the RFC.

@codeimpossible
Copy link
Author

codeimpossible commented Jul 19, 2019

Yeah i did see the RFC -> https://github.com/cake-build/rfcs/pull/2/files#diff-9b01d3179a039b60e4fcdb5e273d48c2 and read through it, I think we're following it pretty closely already, except for the recompile issue you mentioned and the fact that they would want compilation to be opt-in rather than default like it is in our case here. Those two changes could be made pretty easily,

  • add a --allow-cache option to enable the saving of the cached dll.
  • use a hash of the input file contents as part of the directory path to the cached dll.
  • determine if we can load reference dlls in the same order they are referenced.

@jamiegs
Copy link

jamiegs commented Jul 19, 2019

I'm not sure we'd need to do this for Hudls use, but it's something that seems necessary before we submit back to cake proper.

Well, one thing I could see happening is:

  • Dev runs run-backend, cake assemblies get cached.
  • Marmalade release new version of run-backend to support a breaking change with locations of files.
  • Sometime later Dev gets latest master
  • Dev runs run-backend, it still uses cached version, which is broken because it doesn't have change.

@codeimpossible
Copy link
Author

Oh, yeah thats a good case. One other thing is this comment on the RFC:

https://github.com/cake-build/rfcs/pull/2/files#r229862533

Caching compilation is not alone enough. We must also cache registered tools resolved via #tool directive. Addins (and their dependencies) must also be loaded into app domain / load context in same order as the first time when resolved via #addin directive.

We're doing some of this now, all the DLLs that are references in the compilation are stored with the cached dll, but we're not loading them in a particular order, that hasn't seemed to matter in our usage, but if it did i could see that being complicated, and probably requiring us to rename the files or find some other way to enforce their ordering.

@@ -30,6 +30,10 @@ internal sealed class RoslynScriptSession : IScriptSession

public HashSet<string> Namespaces { get; }

public bool SupportsCachedExecution => true;

public bool IsCacheValid => System.IO.File.Exists(GetCachedAssemblyPath()) && !_options.ForceRecompile;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like very aggressive caching. Unless I'm missing something, if a newer version of Cake isn't installed, updates to the dev scripts will get ignored unless the explicit force recompile option is passed. How do we ensure updates take effect on a dev's machine?

Could we time how long the analysis step takes? It feels like a hash of the result from this could be a more robust stage to cache at.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah @jamiegs was just mentioning this. I think we can compute a hash of the files much like we do in CICD for output caching. I think the analyzer does take some time to run, getting a benchmark of it would be a good next step.

cache is invalidated by file content changes
@codeimpossible
Copy link
Author

The cache is now invalidated by cake script changes. This adds ~2s to the execution time in my testing (because the analyzer has to run and hash all the script files). Still, reducing pre-script "build time" from 10s to 3s is still pretty impressive.

I updated the code so that the cache can be enabled via the built-in cake configuration, either with CAKE_CACHE_ENABLED=true and CAKE_CACHE_FORCE_RECOMPILE=true environment variables, --cache_enabled=true --cache_force_recompile=true args, or by adding the following to the cake.config:

[Cache]
Enabled = true
Force_Recompile = true

Default values for both of these are false.

@chrisgilbert
Copy link

This is great work Jared! I'm going to give it a try today. I'm curious if just checking file modification times against the time the assembly was built would be good enough to invalidate the cache. That should be less expensive than hashing.

@damtur
Copy link

damtur commented Jul 22, 2019

I've run into 3 problems

  1. global.json contains specific SDK version (2.1.701). Which makes it hard to work with. Maybe consider removing it? For gitBash it was just failing.

  2. After I've removed version from global.json trying this in GitBash for windows:

damian.turczynski@DTURCZYNSKI17L /c/hudl/git/cake develop
$ ./build.sh
Installing .NET CLI...
dotnet_install: Error: OS name could not be detected: UName = MINGW64_NT-10.0
Installing Cake 0.34.1...
You can invoke the tool using the following command: dotnet-cake
Tool 'cake.tool' (version '0.34.1') was successfully installed.
A fatal error occurred, the required library hostfxr.dll could not be found.
If this is a self-contained application, that library should exist in [C:\hudl\git\cake\tools\.store\cake.tool\0.34.1\cake.tool\0.34.1\tools\netcoreapp2.1\any\].
If this is a framework-dependent application, install the runtime in the default location [C:\Program Files\dotnet] or use the DOTNET_ROOT environment variable to specify the runtime location.
  1. Trying from PowerShell
  • First it installed dotnet 2.1.701
  • Then:
The tool package could not be restored.
Tool 'gitversion.tool' failed to install. This failure may have been caused by:

* You are attempting to install a preview release and did not use the --version option to specify the version.
* A package by this name was found, but it was not a .NET Core tool.
* The required NuGet feed cannot be accessed, perhaps because of an Internet connection problem.
* You mistyped the name of the tool.
dotnet exited with 1
Could not find any relevant files for tool 'GitVersion.Tool'. Perhaps you need an include parameter?
Error: Failed to install tool 'GitVersion.Tool'.

And all of that was on develop... :/

EDIT:
After installing GitVersion.Tool manually: (dotnet tool install --global GitVersion.Tool --version 4.0.1-beta1-58)

I'm getting error
This repository indicated that all its packages are repository signed; however, this package is unsigned

EDIT 2:
After removing line
Sources = new [] { "https://api.nuget.org/v3/index.json" }, from build.cake it started to work. I think my Hudl Nuget settings interfered with this. I'm now compiling the branched version.

@damtur
Copy link

damtur commented Jul 22, 2019

After a successful build of Cake-bin-coreclr-v0.35.0-HUDL-SpikeAddAss0001 and putting it into hudl-microservicetemplate and unzipping it to the old 0.33.0 folder I don't see any speed improvements.

As per the discussion above, I think checking the file modification date should be enough.

@codeimpossible
Copy link
Author

The hashing portion of the code doesnt add much to the execution time since cake already compiles a list containing the lines for each script file. I can benchmark checking the most recent modification date against the dll date

Copy link

@lewislabs lewislabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this and I do have an issue where AWSDK.Core failes to load, when using the cached script.dll. However this caching approach is definitely worth pursuing further. It feels much nicer to get quick feedback when the tool is running.

@codeimpossible
Copy link
Author

codeimpossible commented Jul 22, 2019

From chatting with the team this morning it seems there might be a few more kinks to work out in the PR before we finalize:

@codeimpossible
Copy link
Author

This is looking like it's in a good place. I'm going to close this and open a new PR against the parent project, i can't see another way to do this within the GitHub ui.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants