Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual libraries #921

Closed
ghost opened this issue Jun 28, 2018 · 33 comments · Fixed by ocaml/opam-repository#13444
Closed

Virtual libraries #921

ghost opened this issue Jun 28, 2018 · 33 comments · Fixed by ocaml/opam-repository#13444
Assignees
Labels
accepted accepted proposals proposal RFC's that are awaiting discussion to be accepted or rejected

Comments

@ghost
Copy link

ghost commented Jun 28, 2018

Virtual libraries represent part of library variant proposal. They are much simpler to implement than the full proposal, but are still compatible with the full proposal and represent the most useful part of it, so we should start with them.

User facing features

The user may write (virtual_modules a b c) in a library stanza for a library named foo. This means that modules a, b and c must have a .mli but no .ml and will be implemented by another library. If a library has at least one virtual module we call it a virtual library.

Another library may have (implement foo) in its stanza. This library must provide .ml files for a, b and c but no .mli files. Additionally it must not have .ml or .mli files for any non-virtual modules of foo. Such a library is called an implementation of foo. An implementation can have a public_name but must not have a name field, since it must be the same as the one of the library it implements.

For every virtual library an executable depends on, it must depend on exactly one implementation.

One open question is how to refer to implementations when they don't have a public_name. One possibility is to allow an implmenentation_name field in implementations or a local_names field in all libraries. The latter seems better.

Implementation

virtual_modules can be implemented the same way as modules_without_implementations. Additionally for a virtual library we must not produce any archive file (.cma, .a, ...). The list of virtual modules must be kept around in Lib.t and in the generated META file.

When compiling an implementation, we should copy all the artifacts of the virtual library over and do the rest as usual.

When computing the transitive closure for an executable, we must proceed as follow:

  1. compute the transitive closure of libraries (as before)
  2. check that for every virtual libraries in the list we have exactly one corresponding implementation
  3. recompute the transitive closure, but this time implicitely replace dependency on a virtual library by a dependency on the corresponding implementation
@rgrinberg
Copy link
Member

When compiling an implementation, we should copy all the artifacts of the virtual library over and do the rest as usual.

Why is this copying step necessary? We can't just use the include paths of the virtual libraries?

One possibility is to allow an implmenentation_name field in implementations or a local_names field in all libraries. The latter seems better.

What about private_name? Seems more symmetric and possibly useful for executables as well.

@ghost
Copy link
Author

ghost commented Jul 18, 2018

Why is this copying step necessary? We can't just use the include paths of the virtual libraries?

That should be ok, yes

What about private_name? Seems more symmetric and possibly useful for executables as well.

Well name is already private, so the distinction between the two might not be clear.

@rgrinberg
Copy link
Member

Then reusing name shouldn't be so bad. I realize that this may be confusing if the implementation is (wrapped true), but that's simply b/c we don't allow inconsistency between private names and the alias module.

@ghost
Copy link
Author

ghost commented Jul 18, 2018

actually, the only thing we need to make sure is that the prefix we use for object files is the name of the virtual library, but the implementation could have it's own toplevel module. So reusing name seems fine yes

@samoht
Copy link
Member

samoht commented Jul 18, 2018

Looking forward to use this one :-)

The current workaround is causing a bit of confusion already (see here) so it would be nice to have a proper solution which scales well.

@rgrinberg rgrinberg self-assigned this Jul 21, 2018
@rgrinberg
Copy link
Member

rgrinberg commented Jul 21, 2018

As a simplification for v1, let's tentatively disallow implements libraries to be virtual themselves. There is probably no conceptual issues with creating partial implementation, so we'll eventually lift this restriction. But for now this keeps things simple.

@ghost
Copy link
Author

ghost commented Jul 30, 2018

That seems fine

@rgrinberg
Copy link
Member

rgrinberg commented Aug 5, 2018

There's a bit of a technical problem with saving module information in Lib.t. As a reminder, this is necessary to make sure that implementing libraries are verified that they actually posses implementations for all virtual modules.

The issue is as follows, we'd basically like to instantiate a Lib.Info.t from Jbuild.Library.t * Dir_contents.Library_modules.t (Library_module.t would be augmented with virtual modules), but obtaining Library_modules.t requires us to have a Super_context.t. This makes sense because to get the evaluated modules for a library we must evaluate some rules.

I see two potential work arounds for this:

  • The simplest one is to pass the unevaluated list of virtual modules. It's really only enough to have the static of list virtual modules here, and it doesn't matter that we didn't check these virtual modules don't overlap with intf only modules for example (at least at this stage).

  • Refactor Dir_contents to be independent of Super_context. What we'd essentially need is to have some sort of a new "Great_context" type that lies in between Context.t and Super_context.t. This Great_context would be enough to setup rules (such as those in Simple_rules), but it would be oblivious to libraries. Unfortunately, this would require a massive amount of effort, so I'd like to hear some opinions about this issue before tackling it.

@ghost
Copy link
Author

ghost commented Aug 6, 2018

The first method seems fine.

BTW, we are starting to have a lot of contexts and it's hard to invent new names. The part of Super_context.t that would need to be split out are the library and scope databases. Instead of computing them eagerly in Super_context.create, Super_context.t could have a univ_map and these would be computed in scope.ml. We would just have to invert the dependency between super_context and scope.

@rgrinberg
Copy link
Member

The first method seems fine.

Actually, it has a bit of an issue that i over looked as the list of virtual_modules could contain modules that don't yet exists. The check for such fake modules is done when the OSL.t is evaluated, so there seems to be no way to avoid it. What we can do instead is to store this for every Lib.t

type virtual_modules =
  | Unevaluated (* internal libs *)
  | Evaluated of Module.Name.t list (* external libs *)

And whenever we encounter (implements internal_lib) in gen_rules, we simply use Dir_contents.get sctx ~dir |> Dir_contents.modules_of_library ~name:"internal_lib" to get the virtual modules out. Not elegant, but serviceable.

BTW, we are starting to have a lot of contexts and it's hard to invent new names. The part of Super_context.t that would need to be split out are the library and scope databases. Instead of computing them eagerly in Super_context.create, Super_context.t could have a univ_map and these would be computed in scope.ml. We would just have to invert the dependency between super_context and scope.

I agree that the name I've proposed is not really serious, but I do think that there might be value in having an abstraction for creating/inspecting rules that is independent of the concept of libraries (as the existence of things like simple_rules seems to suggest).

Do you see other use cases for this univ map in the future? perhaps plugins? It seems like using a univ map just to break a dependency cycle is overkill and makes the code more "weakly typed". If anything, I'd prefer to make fields type parameters just to break dependency cycles.

@ghost
Copy link
Author

ghost commented Aug 6, 2018

The type virtual_modules seems fine.

Do you see other use cases for this univ map in the future? perhaps plugins? It seems like using a univ map just to break a dependency cycle is overkill and makes the code more "weakly typed". If anything, I'd prefer to make fields type parameters just to break dependency cycles.

It would allow thinning the super_context.t type and then we could add more shared computation easily. It wouldn't cause a runtime error, the databases would be created the first time one tries to access them. In this particular case, we wouldn't win much by using phantom types.

@rgrinberg
Copy link
Member

Btw, I think it is also necessary for implementors of virtual libraries to include the name of the library they implement in findlib/our own format.

@rgrinberg
Copy link
Member

Additionally for a virtual library we must not produce any archive file (.cma, .a, ...)

Curious as to why? Is this just not possible?

When computing the transitive closure for an executable, we must proceed as follow:

So I take it that we don't allow direct dependence on implementations.

By the way, the algorithm you've listed is done in 2 passes. Is this just to keep things simple? I don't see any technical reason why we can't do this in 1 pass. We just accumulate a map of virtual libs => implementations and make sure we don't introduce 2 implementations for the same library as we calculate the closure.

In any case, I'll try implementing the simple algorithm that you've listed first.

@ghost
Copy link
Author

ghost commented Aug 9, 2018

Btw, I think it is also necessary for implementors of virtual libraries to include the name of the library they implement in findlib/our own format.

Agreed

Additionally for a virtual library we must not produce any archive file (.cma, .a, ...)

Curious as to why? Is this just not possible?

Yes, I believe so. It's because when you do the final link, all the compilation units needs to be sorted in dependency order. If we create cma with holes, we can't do that.

So I take it that we don't allow direct dependence on implementations.

I wrote that a while ago, but I initially thought that we could. In fact, until we have variants, I don't see how we can use this feature without specifying explicitly the implementations we are using.

By the way, the algorithm you've listed is done in 2 passes. Is this just to keep things simple? I don't see any technical reason why we can't do this in 1 pass. We just accumulate a map of virtual libs => implementations and make sure we don't introduce 2 implementations for the same library as we calculate the closure.

That's not enough. When a library X depends on a virtual library V, then the implementation I of V must come before X in the final list. However, I might not be in the list of dependencies written by the user, we might only discover it while computing the transitive closure. While doing this, when we reach X we might not know yet who is implementing V. There might be a way to make it work with a single pass, but using 2 passes seems a lot simpler.

@rgrinberg
Copy link
Member

I wrote that a while ago, but I initially thought that we could. In fact, until we have variants, I don't see how we can use this feature without specifying explicitly the implementations we are using.

Oops, I meant that we don't allow libraries to depend on particular implementations. Only executables are allowed to depend on implementations. Thinking about this again, I'm not really sure why we need this restriction however.

That's not enough. When a library X depends on a virtual library V, then the implementation I ofV must come before X in the final list. However, I might not be in the list of dependencies written by the user, we might only discover it while computing the transitive closure. While doing this, when we reach X we might not know yet who is implementing V. There might be a way to make it work with a single pass, but using 2 passes seems a lot simpler.

I see. Doing it in 1 pass might require us to sort the list of dependencies in the closure anyway. That's not going to be in 1 pass after all.

@rgrinberg
Copy link
Member

When compiling an implementation, we should copy all the artifacts of the virtual library over and do the rest as usual.

I've been trying to implement this bit and I actually think that the copying would simplify a lot of things. Our code isn't really meant to handle situations when modules are spread over more than object directory, and I'm leaning against delaying this feature until this problem is addressed. Part of the reason why I'd like to delay such a refactoring is that I'd like to take into account private modules considerations. So I think we should indeed keep things simple and just copy things.

@ghost
Copy link
Author

ghost commented Aug 13, 2018

Agreed. We can also use symlinks to avoid the copy.

@rgrinberg
Copy link
Member

In our original scheme, we didn't really consider implementing external virtual libraries. But I think it shouldn't be too hard to support this as well. All that we'd really need is to save the dep graph of the virtual libs so that we have the linking order when we make the cmxa in the implementation.

@diml will that work?

@ghost
Copy link
Author

ghost commented Aug 16, 2018

Yep, that should be enough. We should indeed allow it so that someone can provide an implementation outside of the package itself.

@rgrinberg
Copy link
Member

Actually, this might not be enough. We're currently not installing .o files and I think those might be necessary to create the .cmxa/.a files when implementing external libraries. I suppose we can just install the .o files for virtual libraries. Perhaps we can also consider installing it for all libraries to keep things consistent.

@rgrinberg
Copy link
Member

rgrinberg commented Aug 16, 2018

Here's another issue I'm encountering. This time it's for implementing virtual libraries that are internal. The issue is that we need to access the virtual libraries dependency graph, but there's no easy way to do so. There are 2 options:

  • One is to save the dep graph (actually, its arrow) in some table keyed by the virtual libraries. But this will require being careful about loading the rules for virtual libraries before the implementations. This doesn't seem very easy.

  • The second approach is to express the dependency on the dep graph of the virtual library using a normal file dependency. We'd have to serialize the dep graph into a file and then have the implementations load the dep graph from it.

I also considered just simply recalculating the graph from scratch in the implementations. Since this is possible for internal libraries. But this is also requires a hefty amount of boilerplate, and will also make implementing external/internal libraries too different for my tastes.

@rgrinberg
Copy link
Member

The case where we serialize/deserialize the dep graph for external virtual libraries needs to decide on the format to use. I think marshalling for installed artifacts is a bad idea, so we should just include it in the installed dune files as sexp.

@rgrinberg
Copy link
Member

Here's another point to consider: virtual libraries without any virtual modules. What's the point? Well we can customize link flags for libraries in the implementation.

Btw, I've discussed with @avsm the possibility of customizing implementations of virtual libraries just by changing compilation flags. I don't really think that this is possible as we can't recompile external virtual libraries. Implementations allow only link time customizations.

@rgrinberg
Copy link
Member

Here's another issue I'm encountering. This time it's for implementing virtual libraries that are internal. The issue is that we need to access the virtual libraries dependency graph, but there's no easy way to do so. There are 2 options:

The same problem arises when trying to serialize the dependency graph into the installed dune file. I think that 2nd approach that I've listed above is the easiest.

@ghost
Copy link
Author

ghost commented Aug 20, 2018

Serializing the dep graph seems best to me as well.

@rgrinberg
Copy link
Member

Okay, will try to make that work. And what about installing .o files? I'm leaning towards just installing it for virtual libraries.

@ghost
Copy link
Author

ghost commented Aug 20, 2018

That seems right

@rgrinberg
Copy link
Member

When compiling implementation modules, we need a way to avoid generating .cmi files since those will be copied from the virtual library. I see no easier way of accomplishing this than adding a field to Module.t. When this field would be set, we'd turn on the force_read_cmi behavior. The other alternative is to have a flag for Module_compilation itself to force_read_cmi or all modules. I prefer the latter approach because storing such options at the module level makes more sense with other features such as private modules. Since in the future, private modules in implementations would of course have their .mli files.

@rgrinberg
Copy link
Member

here's another bit that needs to be saved for external libraries: the module alias name (if it exists). Implementations must share the same module alias names as the virtual libraries that they implement. This information isn't available for external libraries. Well, in theory we could probably piece this together by looking at archive names, but this seems pretty fragile. So I suggest that we include the lib.name for virtual libraries in the dune file.

Btw, we should disallow the wrapped field for implementations. The value of this field should always be the same as the virtual library's.

@ghost
Copy link
Author

ghost commented Aug 27, 2018

Since in the future, private modules in implementations would of course have their .mli files.

Why is that?

So I suggest that we include the lib.name for virtual libraries in the dune file.

Seems good

Btw, we should disallow the wrapped field for implementations. The value of this field should always be the same as the virtual library's.

Agreed

@rgrinberg
Copy link
Member

Why is that?

I just assume that it will be useful for complex implementations to have their own private modules to organize things internally. Such private modules aren't part of the interface of the virtual library, so they could have their own mli files.

Longer term, I'd even expect that we'd guarantee that no wto private modules belonging to different libraries/virtual libraries/implementations could have colliding names.

By the way, this also raises the question of whether an implementation would be able to access the virtual library's private modules. I think it's simpler to disallow this for now.

@ghost
Copy link
Author

ghost commented Aug 28, 2018

Yh, I agree that we should allow implementations to have private modules, however I thought you suggested that such modules should be required to have a .mli file.

By the way, this also raises the question of whether an implementation would be able to access the virtual library's private modules. I think it's simpler to disallow this for now.

That seems right, especially for the case where the implementation is in another repository.

@rgrinberg
Copy link
Member

Now present in master.

rgrinberg added a commit to rgrinberg/opam-repository that referenced this issue Feb 12, 2019
CHANGES:

- Second step of the deprecation of jbuilder: the `jbuilder` binary
  now emits a warning on every startup and both `jbuilder` and `dune`
  emit warnings when encountering `jbuild` files (ocaml/dune#1752, @diml)

- Change the layout of build artifacts inside _build. The new layout enables
  optimizations that depend on the presence of `.cmx` files of private modules
  (ocaml/dune#1676, @bobot)

- Fix merlin handling of private module visibility (ocaml/dune#1653 @bobot)

- unstable-fmt: use boxes to wrap some lists (ocaml/dune#1608, fix ocaml/dune#1153, @emillon,
  thanks to @rgrinberg)

- skip directories when looking up programs in the PATH (ocaml/dune#1628, fixes
  ocaml/dune#1616, @diml)

- Use `lsof` on macOS to implement `--stats` (ocaml/dune#1636, fixes ocaml/dune#1634, @xclerc)

- Generate `dune-package` files for every package. These files are installed and
  read instead of `META` files whenever they are available (ocaml/dune#1329, @rgrinberg)

- Fix preprocessing for libraries with `(include_subdirs ..)` (ocaml/dune#1624, fix ocaml/dune#1626,
  @nojb, @rgrinberg)

- Do not generate targets for archive that don't match the `modes` field.
  (ocaml/dune#1632, fix ocaml/dune#1617, @rgrinberg)

- When executing actions, open files lazily and close them as soon as
  possible in order to reduce the maximum number of file descriptors
  opened by Dune (ocaml/dune#1635, ocaml/dune#1643, fixes ocaml/dune#1633, @jonludlam, @rgrinberg,
  @diml)

- Reimplement the core of Dune using a new generic memoization system
  (ocaml/dune#1489, @rudihorn, @diml)

- Replace the broken cycle detection algorithm by a state of the art
  one from [this paper](https://doi.org/10.1145/2756553) (ocaml/dune#1489,
  @rudihorn)

- Get the correct environment node for multi project workspaces (ocaml/dune#1648,
  @rgrinberg)

- Add `dune compute` to call internal memoized functions (ocaml/dune#1528,
  @rudihorn, @diml)

- Add `--trace-file` option to trace dune internals (ocaml/dune#1639, fix ocaml/dune#1180, @emillon)

- Add `--no-print-directory` (borrowed from GNU make) to suppress
  `Entering directory` messages. (ocaml/dune#1668, @dra27)

- Remove `--stats` and track fd usage in `--trace-file` (ocaml/dune#1667, @emillon)

- Add virtual libraries feature and enable it by default (ocaml/dune#1430 fixes ocaml/dune#921,
  @rgrinberg)

- Fix handling of Control+C in watch mode (ocaml/dune#1678, fixes ocaml/dune#1671, @diml)

- Look for jsoo runtime in the same dir as the `js_of_ocaml` binary
  when the ocamlfind package is not available (ocaml/dune#1467, @nojb)

- Make the `seq` package available for OCaml >= 4.07 (ocaml/dune#1714, @rgrinberg)

- Add locations to error messages where a rule fails to generate targets and
  rules that require files outside the build/source directory. (ocaml/dune#1708, fixes
  ocaml/dune#848, @rgrinberg)

- Let `Configurator` handle `sizeof` (in addition to negative numbers).
  (ocaml/dune#1726, fixes ocaml/dune#1723, @Chris00)

- Fix an issue causing menhir generated parsers to fail to build in
  some cases. The fix is to systematically use `-short-paths` when
  calling `ocamlc -i` (ocaml/dune#1743, fix ocaml/dune#1504, @diml)

- Never raise when printing located errors. The code that would print the
  location excerpts was prone to raising. (ocaml/dune#1744, fix ocaml/dune#1736, @rgrinberg)

- Add a `dune upgrade` command for upgrading jbuilder projects to Dune
  (ocaml/dune#1749, @diml)

- When automatically creating a `dune-project` file, insert the
  detected name in it (ocaml/dune#1749, @diml)

- Add `(implicit_transitive_deps <bool>)` mode to dune projects. When this mode
  is turned off, transitive dependencies are not accessible. Only listed
  dependencies are directly accessible. (ocaml/dune#1734, ocaml/dune#430, @rgrinberg, @hnrgrgr)

- Add `toplevel` stanza. This stanza is used to define toplevels with libraries
  already preloaded. (ocaml/dune#1713, @rgrinberg)

- Generate `.merlin` files that account for normal preprocessors defined using a
  subset of the `action` language. (ocaml/dune#1768, @rgrinberg)

- Emit `(orig_src_dir <path>)` metadata in `dune-package` for dune packages
  built with `--store-orig-source-dir` command line flag (also controlled by
  `DUNE_STORE_ORIG_SOURCE_DIR` env variable). This is later used to generate
  `.merlin` with `S`-directives pointed to original source locations and thus
  allowing merlin to see those. (ocaml/dune#1750, @andreypopp)

- Improve the behavior of `dune promote` when the files to be promoted have been
  deleted. (ocaml/dune#1775, fixes ocaml/dune#1772, @diml)

- unstable-fmt: preserve comments (ocaml/dune#1766, @emillon)

- Pass flags correctly when using `staged_pps` (ocaml/dune#1779, fixes ocaml/dune#1774, @diml)

- Fix an issue with the use of `(mode promote)` in the menhir
  stanza. It was previously causing intermediate *mock* files to be
  promoted (ocaml/dune#1783, fixes ocaml/dune#1781, @diml)

- unstable-fmt: ignore files using OCaml syntax (ocaml/dune#1784, @emillon)

- Configurator: Add `which` function to replace the `which` command line utility
  in a cross platform way. (ocaml/dune#1773, fixes ocaml/dune#1705, @Chris00)

- Make configurator append paths to `$PKG_CONFIG_PATH` on macOS. Previously it
  was prepending paths and thus `$PKG_CONFIG_PATH` set by users could have been
  overridden by homebrew installed libraries (ocaml/dune#1785, @andreypopp)

- Disallow c/cxx sources that share an object file in the same stubs archive.
  This means that `foo.c` and `foo.cpp` can no longer exist in the same library.
  (ocaml/dune#1788, @rgrinberg)

- Forbid use of `%{targets}` (or `${@}` in jbuild files) inside
  preprocessing actions
  (ocaml/dune#1812, fixes ocaml/dune#1811, @diml)

- Add `DUNE_PROFILE` environment variable to easily set the profile. (ocaml/dune#1806,
  @rgrinberg)

- Deprecate the undocumented `(no_keep_locs)` field. It was only
  necessary until virtual libraries were supported (ocaml/dune#1822, fix ocaml/dune#1816,
  @diml)

- Rename `unstable-fmt` to `format-dune-file` and remove its `--inplace` option.
  (ocaml/dune#1821, @emillon).

- Autoformatting: `(using fmt 1.1)` will also format dune files (ocaml/dune#1821, @emillon).

- Autoformatting: record dependencies on `.ocamlformat-ignore` files (ocaml/dune#1824,
  fixes ocaml/dune#1793, @emillon)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted accepted proposals proposal RFC's that are awaiting discussion to be accepted or rejected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants