Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoization function #1489

Merged
1 commit merged into from
Dec 12, 2018
Merged

Memoization function #1489

1 commit merged into from
Dec 12, 2018

Conversation

rudihorn
Copy link

@rudihorn rudihorn commented Oct 23, 2018

This is referencing the computation model specified in #1280. The goal has been to implement a memoization function which takes a computation of the form 'a -> 'b Fiber.t and returns a computation 'a -> 'b Fiber.t which additionally performs:

  • Dependency tracking
  • Output caching with early cut-off
  • Cycle detection

Currently the memoization function has been implemented and is able to perform these three functions. An example of how to use the memoization functions is shown in test/unit-tests/memoize.mlt. Below is an example of how one could use the system to implement a Fibonacci sequence which is automatically cached between computations.

let mfib = 
  let mfib = CRef.deferred () in
  let compfib x =
    let mfib = CRef.get mfib in
    counter := !counter + 1;
    if x <= 1 then
      Fiber.return x
    else
      mfib (x - 1)
      >>= (fun r1 ->
        mfib (x - 2)
        >>| fun r2 -> r1 + r2) in
  memoization iimemcached "fib" int_input_spec int_output_spec compfib |> CRef.set mfib;
  CRef.get mfib;;

There has been a little bit of progress to change build_system.ml so that the actual build process is a little more abstracted to make it easier to drop in the memoization function and remove the current caching etc. This bit is still a little bit of a mess however, and in essence it just breaks existing caching (for now).

Feedback welcome.

@rudihorn rudihorn requested a review from a user October 23, 2018 09:24
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved

type 'a input_spec = {
serialize : 'a -> ser_input;
print : 'a -> string;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print : 'a -> Sexp.t seems better here, so that we leave the formatting to the caller

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently I'm only using serialize, but this should probably be changed to also include the "human readable" form. I'm wondering if it is better to store this as a lazy function (which would prevent the input from being garbage collected) or as the printed input (which would require computation of print for every function).

src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
@rudihorn rudihorn force-pushed the memoization branch 2 times, most recently from 9fd28cf to 6683fcb Compare November 1, 2018 15:41
runtestmem.sh Outdated Show resolved Hide resolved
src/build_system.ml Outdated Show resolved Hide resolved
src/build_system.ml Outdated Show resolved Hide resolved
| Some file -> (* build *) build_file_spec file >>| ignore)
) |> Fiber.with_error_handler ~on_error

let _prepare_file t path =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead code, let's delete it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code may be nice to register as computation at a later stage, even if it isn't used currently in the code. It essentially builds a files dependencies without building the file itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Could you add a comment? Otherwise, future readers will wonder why it's here.

@@ -448,6 +385,12 @@ type t =
; hook : hook -> unit
; (* Package files are part of *)
packages : Package.Name.t Path.Table.t
; load_dir_cache : Path.Set.t Memoize.t
(* memoized functions *)
; prepare_rule_def : (Internal_rule.t * Static_deps.t * (unit, Action.t) Build.t, Action.t * Deps.t) CRef.comp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting Static_deps.t inside Internal_rule.t?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now fixed this, but it does mean that rule.static_deps (which is defined as lazy) is evaluated twice, once in prepare_rule and once in build_rule. Otherwise one would need to memoize this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, evaluating it twice doesn't seem good. We should keep it separate then

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to introduce a Once module, which similarly to lazy only evaluates something when Once.get is called, but that also caches the result so Once.get only causes the expression to be evaluated a single time.

src/memoization.mli Outdated Show resolved Hide resolved
src/memoization.mli Outdated Show resolved Hide resolved
src/memoization.mli Outdated Show resolved Hide resolved
src/memoization.mli Outdated Show resolved Hide resolved
src/memoization.mli Outdated Show resolved Hide resolved
src/stdune/id.mli Outdated Show resolved Hide resolved
src/stdune/id.mli Outdated Show resolved Hide resolved
src/stdune/id.mli Outdated Show resolved Hide resolved
src/stdune/id.ml Outdated Show resolved Hide resolved
src/stdune/id.ml Outdated Show resolved Hide resolved
src/stdune/id.mli Outdated Show resolved Hide resolved
src/stdune/id.mli Show resolved Hide resolved
src/stdune/once.mli Outdated Show resolved Hide resolved
@rudihorn rudihorn force-pushed the memoization branch 2 times, most recently from 8526b2e to 734fcdf Compare November 8, 2018 18:19
@rudihorn rudihorn mentioned this pull request Nov 9, 2018
@rgrinberg
Copy link
Member

@rudihorn Once this is ready to be merged, could you remove the WIP tag? I'd like to give this a read when it's ready. To make sure I understand the basics.

@rudihorn rudihorn changed the title [WIP] Memoization function Memoization function Nov 12, 2018
src/memoization.ml Outdated Show resolved Hide resolved
src/stdune/list.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization.ml Outdated Show resolved Hide resolved
src/memoization_cached.ml Outdated Show resolved Hide resolved
src/build_system.ml Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Nov 26, 2018

I did a bit more work on this PR and I now get very similar timings between the base and tip of this PR. For null builds of the platform, there is now only a 0.1s slowdown, which is likely due to the new cycle detection algorithm. This is fine given that the one in master is broken. We can now rebase and merge this PR.

@ghost ghost mentioned this pull request Nov 27, 2018
@ghost ghost force-pushed the memoization branch 2 times, most recently from 6058c84 to 9251fb7 Compare November 27, 2018 12:05
@ghost
Copy link

ghost commented Nov 27, 2018

@rudihorn there is an issue with the signature, could you update the email in author field of the commit and force-push?

@rgrinberg @emillon could you have another look as I changed a couple of things (and @rudihorn as well if you want).

Copy link
Collaborator

@emillon emillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a couple minor comments but I look forward to having this merged 👍

src/dag/dag.ml Outdated Show resolved Hide resolved
src/dag/dag.ml Outdated Show resolved Hide resolved
src/dag/dag.ml Outdated Show resolved Hide resolved
src/dag/dag.ml Outdated Show resolved Hide resolved
src/dag/dag_intf.ml Outdated Show resolved Hide resolved
src/memo/memo.mli Outdated Show resolved Hide resolved
src/memo/memo.mli Outdated Show resolved Hide resolved
src/memo/memo_intf.ml Outdated Show resolved Hide resolved
src/static_deps.mli Show resolved Hide resolved
src/stdune/fdecl.ml Show resolved Hide resolved
src/top_closure.ml Outdated Show resolved Hide resolved
src/build_system.ml Outdated Show resolved Hide resolved
src/build_system.ml Outdated Show resolved Hide resolved
@rgrinberg
Copy link
Member

Bootstrap seems to be broken now.

@rgrinberg
Copy link
Member

@diml feel free to merge after fixing bootstrap. I'd still like to study some parts of this PR, but I don't think we should delay merging. Rebasing these pr's over and over is a waste of time.

@ghost
Copy link

ghost commented Nov 28, 2018

Ok, following the slack conversation, I'll wait for 1.6 and then merge.

@ghost ghost force-pushed the memoization branch 4 times, most recently from 5bc4698 to 3c675b8 Compare December 12, 2018 09:45
Introduce a general system for memoizing functions. The core of Dune
is refactored on top of this system. The new system will allow to
share not only the result of external commands between builds but also
the result of internal computations. The system automatically keeps
tracks of the effects performed by computations in order to detect
when it is safe to reuse previous results.

The intents of this change is to make incremental builds faster,
especially in polling mode. The new system is more powerful as well
and will allow us to use the result of commands for computing rules.

This patch also re-implement cycle detection as we realized that the
previous algorithm was broken and the problem showed up more often in
the new system. The new algorithm was taken from the following paper:

  Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert
  E. Tarjan. 2015. A New Approach to Incremental Cycle Detection and
  Related Problems. ACM Trans. Algorithms 12, 2, Article 14 (December
  2015), 22 pages. DOI: https://doi.org/10.1145/2756553 *)

Signed-off-by: Rudi Horn <dyn-git@rudi-horn.de>
Signed-off-by: Jeremie Dimino <jeremie@dimino.org>
@ghost ghost merged commit a687489 into ocaml:master Dec 12, 2018
rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Feb 12, 2019
CHANGES:

- Second step of the deprecation of jbuilder: the `jbuilder` binary
  now emits a warning on every startup and both `jbuilder` and `dune`
  emit warnings when encountering `jbuild` files (ocaml/dune#1752, @diml)

- Change the layout of build artifacts inside _build. The new layout enables
  optimizations that depend on the presence of `.cmx` files of private modules
  (ocaml/dune#1676, @bobot)

- Fix merlin handling of private module visibility (ocaml/dune#1653 @bobot)

- unstable-fmt: use boxes to wrap some lists (ocaml/dune#1608, fix ocaml/dune#1153, @emillon,
  thanks to @rgrinberg)

- skip directories when looking up programs in the PATH (ocaml/dune#1628, fixes
  ocaml/dune#1616, @diml)

- Use `lsof` on macOS to implement `--stats` (ocaml/dune#1636, fixes ocaml/dune#1634, @xclerc)

- Generate `dune-package` files for every package. These files are installed and
  read instead of `META` files whenever they are available (ocaml/dune#1329, @rgrinberg)

- Fix preprocessing for libraries with `(include_subdirs ..)` (ocaml/dune#1624, fix ocaml/dune#1626,
  @nojb, @rgrinberg)

- Do not generate targets for archive that don't match the `modes` field.
  (ocaml/dune#1632, fix ocaml/dune#1617, @rgrinberg)

- When executing actions, open files lazily and close them as soon as
  possible in order to reduce the maximum number of file descriptors
  opened by Dune (ocaml/dune#1635, ocaml/dune#1643, fixes ocaml/dune#1633, @jonludlam, @rgrinberg,
  @diml)

- Reimplement the core of Dune using a new generic memoization system
  (ocaml/dune#1489, @rudihorn, @diml)

- Replace the broken cycle detection algorithm by a state of the art
  one from [this paper](https://doi.org/10.1145/2756553) (ocaml/dune#1489,
  @rudihorn)

- Get the correct environment node for multi project workspaces (ocaml/dune#1648,
  @rgrinberg)

- Add `dune compute` to call internal memoized functions (ocaml/dune#1528,
  @rudihorn, @diml)

- Add `--trace-file` option to trace dune internals (ocaml/dune#1639, fix ocaml/dune#1180, @emillon)

- Add `--no-print-directory` (borrowed from GNU make) to suppress
  `Entering directory` messages. (ocaml/dune#1668, @dra27)

- Remove `--stats` and track fd usage in `--trace-file` (ocaml/dune#1667, @emillon)

- Add virtual libraries feature and enable it by default (ocaml/dune#1430 fixes ocaml/dune#921,
  @rgrinberg)

- Fix handling of Control+C in watch mode (ocaml/dune#1678, fixes ocaml/dune#1671, @diml)

- Look for jsoo runtime in the same dir as the `js_of_ocaml` binary
  when the ocamlfind package is not available (ocaml/dune#1467, @nojb)

- Make the `seq` package available for OCaml >= 4.07 (ocaml/dune#1714, @rgrinberg)

- Add locations to error messages where a rule fails to generate targets and
  rules that require files outside the build/source directory. (ocaml/dune#1708, fixes
  ocaml/dune#848, @rgrinberg)

- Let `Configurator` handle `sizeof` (in addition to negative numbers).
  (ocaml/dune#1726, fixes ocaml/dune#1723, @Chris00)

- Fix an issue causing menhir generated parsers to fail to build in
  some cases. The fix is to systematically use `-short-paths` when
  calling `ocamlc -i` (ocaml/dune#1743, fix ocaml/dune#1504, @diml)

- Never raise when printing located errors. The code that would print the
  location excerpts was prone to raising. (ocaml/dune#1744, fix ocaml/dune#1736, @rgrinberg)

- Add a `dune upgrade` command for upgrading jbuilder projects to Dune
  (ocaml/dune#1749, @diml)

- When automatically creating a `dune-project` file, insert the
  detected name in it (ocaml/dune#1749, @diml)

- Add `(implicit_transitive_deps <bool>)` mode to dune projects. When this mode
  is turned off, transitive dependencies are not accessible. Only listed
  dependencies are directly accessible. (ocaml/dune#1734, ocaml/dune#430, @rgrinberg, @hnrgrgr)

- Add `toplevel` stanza. This stanza is used to define toplevels with libraries
  already preloaded. (ocaml/dune#1713, @rgrinberg)

- Generate `.merlin` files that account for normal preprocessors defined using a
  subset of the `action` language. (ocaml/dune#1768, @rgrinberg)

- Emit `(orig_src_dir <path>)` metadata in `dune-package` for dune packages
  built with `--store-orig-source-dir` command line flag (also controlled by
  `DUNE_STORE_ORIG_SOURCE_DIR` env variable). This is later used to generate
  `.merlin` with `S`-directives pointed to original source locations and thus
  allowing merlin to see those. (ocaml/dune#1750, @andreypopp)

- Improve the behavior of `dune promote` when the files to be promoted have been
  deleted. (ocaml/dune#1775, fixes ocaml/dune#1772, @diml)

- unstable-fmt: preserve comments (ocaml/dune#1766, @emillon)

- Pass flags correctly when using `staged_pps` (ocaml/dune#1779, fixes ocaml/dune#1774, @diml)

- Fix an issue with the use of `(mode promote)` in the menhir
  stanza. It was previously causing intermediate *mock* files to be
  promoted (ocaml/dune#1783, fixes ocaml/dune#1781, @diml)

- unstable-fmt: ignore files using OCaml syntax (ocaml/dune#1784, @emillon)

- Configurator: Add `which` function to replace the `which` command line utility
  in a cross platform way. (ocaml/dune#1773, fixes ocaml/dune#1705, @Chris00)

- Make configurator append paths to `$PKG_CONFIG_PATH` on macOS. Previously it
  was prepending paths and thus `$PKG_CONFIG_PATH` set by users could have been
  overridden by homebrew installed libraries (ocaml/dune#1785, @andreypopp)

- Disallow c/cxx sources that share an object file in the same stubs archive.
  This means that `foo.c` and `foo.cpp` can no longer exist in the same library.
  (ocaml/dune#1788, @rgrinberg)

- Forbid use of `%{targets}` (or `${@}` in jbuild files) inside
  preprocessing actions
  (ocaml/dune#1812, fixes ocaml/dune#1811, @diml)

- Add `DUNE_PROFILE` environment variable to easily set the profile. (ocaml/dune#1806,
  @rgrinberg)

- Deprecate the undocumented `(no_keep_locs)` field. It was only
  necessary until virtual libraries were supported (ocaml/dune#1822, fix ocaml/dune#1816,
  @diml)

- Rename `unstable-fmt` to `format-dune-file` and remove its `--inplace` option.
  (ocaml/dune#1821, @emillon).

- Autoformatting: `(using fmt 1.1)` will also format dune files (ocaml/dune#1821, @emillon).

- Autoformatting: record dependencies on `.ocamlformat-ignore` files (ocaml/dune#1824,
  fixes ocaml/dune#1793, @emillon)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants