Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocatable ocamlfind #72

Merged

Conversation

Leonidas-from-XIV
Copy link
Contributor

This PR mirrors the work on the compiler on making it relocatable. The way it works it adds a new option for configure that can be used to instead of hardcoding the absolute path, it will resolve $PREFIX to the installation location using various means (readlink, reading environment variables etc).

@gridbugs
Copy link

gridbugs commented Mar 5, 2024

I did some playing around with this branch while investigating ocaml/dune#8931. I'm trying to build https://github.com/ocaml/Zarith directly (without dune) after installing ocamlfind from this branch without passing the new flag to ./configure and I get a suspicious error running ocamlfind install (from zarith's make install):

ocamlfind: Config file not found - neither @CONFIGFILE nor the directory @CONFIGFILE.d

That's suspicious to me because I thought that @CONFIGFILE was a placeholder for a value chosen at ocamlfind's compile time so it shouldn't make it into the binary.

My experimental setup is to clone this branch and run:

./configure \
    -bindir  /tmp/ocamlfind/bin \
    -mandir  /tmp/ocamlfind/man \
    -mandir  /tmp/ocamlfind/man \
    -sitelib /tmp/ocamlfind/lib \
    -config /tmp/ocamlfind/lib/findlib.conf \
    ;

make all
make install

export PATH=/tmp/ocamlfind/bin:PATH

...and then to run ./configure && make && make install from a checkout of zarith.

I haven't tried out the new configure option yet but just wanted to raise this as it looks like a regression. The above experiment works for me when I repeat it with the tip of the master branch of ocamlfind.

@Leonidas-from-XIV
Copy link
Contributor Author

@gridbugs That's a good point, I missed the final @ of @CONFIGFILE@ that indeed breaks the code. I've pushed a fix.

@gridbugs
Copy link

gridbugs commented Mar 7, 2024

Trying this out on macos and it looks like it's trying to read the (non-existent) /proc/self/exe:

$ dune build
...
ocamlc -c oneshot_webserver.mli
ocamlfind ocamlopt -package unix -c oneshot_webserver.ml
Uncaught exception: Unix.Unix_error(Unix.ENOENT, "readlink", "/proc/self/exe")
make: *** [oneshot_webserver.cmx] Error 3
-> required by _build/_private/default/.pkg/oneshot-webserver/target

(this was while trying to build https://github.com/gridbugs/oneshot-webserver-app/)

@Leonidas-from-XIV
Copy link
Contributor Author

Indeed, I probably need to copy more of how the relocatable compiler deals with determining the executable name. Looks like on NextStep I can use _NSGetExecutablePath to determine this, however in the context of ocamlfind it needs I need to add an extra stub for it.

Maybe that's actually better, then I can replace Unix.readlink with a C stub, so I can probably avoid the dependency on Unix.

@gridbugs
Copy link

Since your latest changes I can now use this on macos and linux to build packages with dune's package management features. One issue I run into now is that using the ./configure arguments from ocamlfind's opam file now gives this error when linking ocamlfind:

ocamlc -I +compiler-libs  -o ocamlfind -g findlib.cma unix.cma \
           -I +unix -I +dynlink ocaml_args.cmo frontend.cmo
make[1]: Leaving directory '/home/s/src/oneshot-webserver-app/_build/.sandbox/ba495fce988e4e0fef6e323f48198258/_private/default/.pkg/ocamlfind/source/src/findlib'
File "_none_", line 1:
Error: Error while linking findlib.cma(Fl_bindings):
       The external function `fl_executable_path' is not available
make[1]: *** [Makefile:55: ocamlfind] Error 2
make: *** [Makefile:14: all] Error 2
-> required by _build/_private/default/.pkg/ocamlfind/target/cookie

The problem goes away if you pass -custom to ocamlc which can be achieved by not passing -no-custom to ocamlfind's configure script.

@gridbugs
Copy link

I tried using this to build the topkg package and ran into some issues which I describe here: ocaml/dune#10271.

In summary:

  • the generated "topfind" file currently contains paths inside the temporary build sandbox which has been deleted by the time that topkg's build script attempts to open "topfind" with the #use directive
  • the bytecode versions of the findlib libraries can't be loaded into the bytecode interpreter due to the inclusion of c stubs. The "topfind" file will attempt to load "findlib.cma" into the bytecode interpreter while building topkg which no longer works (I confirmed that removing the stubs fixes this).

More details in the linked issue.

@Leonidas-from-XIV
Copy link
Contributor Author

I've removed the C parts, since I don't think there's a way to determine the location from OCaml on platforms other than Linux without at least some C code (e.g. both macOS and Windows).

Instead I made it look up OPAM_SWITCH_PREFIX first, which is set by OPAM and could be set by Dune when it builds non-Dune packages.

The topfind problem is indeed an issue, as I am not sure #use takes a variable and without that we can't compute the right path. Maybe this could be worked around by replacing topfind with a wrapper that generates a topfind in a temporary location and then loads that instead.

@@ -179,6 +180,9 @@ depend: *.ml *.mli fl_meta.ml fl_metascanner.ml findlib_config.ml topfind.ml
.mli.cmi:
$(OCAMLC) $(OPAQUE) $(OCAMLC_FLAGS) $(OCAMLFIND_OCAMLFLAGS) -c $<

.c.o:
$(OCAMLC) $(OPAQUE) $(OCAMLC_FLAGS) $(OCAMLFIND_OCAMLFLAGS) -c $<

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can go away after rmoving the C files again?

let len = String.length prefix in
match String.starts_with ~prefix path with
| false -> None
| true -> Some (String.sub path len (String.length path - len))
Copy link
Contributor

@gerdstolpmann gerdstolpmann Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any special handling of / needed? E.g.

  • prefix="$PREFIX" and path="$PREFIXgoeson": returns goeson
  • prefix="$PREFIX" and path="$PREFIX/d3/foo": returns /d3/foo (absolute!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this function because it is should indeed return that absolute path (I've renamed it since), so returning the absolute part should not be a problem, given it is later only used to Filename.concat with the computed prefix.

However with that concern in mind, if the path was $PREFIX/d3/foo and we couldn't compute a Findlib.location it would wrongly add /d3/foo to the search path. I've changed it to (silently) discard the path in such case. The alternative would be to expand $PREFIX/d3/footo a relatived3/foo` but that behavior seems a bit surprising and unexpected.

@gerdstolpmann
Copy link
Contributor

ok, this seems to be the best PR so far to add some relocatibility. As far as I understand, the PR needs a special version of OCaml to work properly, right? Can we test for this in the configure script, and only allow relocatable paths when this version is available?

Regarding the #use topfind problem. Instead of #directory you can also call Topdirs.directory as a regular function. So there is a way to compute the install location. You could copy the code from findlib_config.ml into the topfind script when it is generated.

@Leonidas-from-XIV
Copy link
Contributor Author

For the relocation it shouldn't need any specific compiler, the compiler patches are for relocating the compiler itself. I'm developing this on a normal OCaml 3.08.4 switch. Originally I thought I could use the same approach as the relocatable compiler patches do, but thanks to @gridbugs testing it turned out that I had to scrap the idea and the environment variables set by OPAM and Dune are probably a better option.

Excellent hint wrt. Topdirs.dir_directory! This has allowed me to port the code for the detection to topfind. It's a bit ugly as there is much duplicated code and it uses cat to stitch it together for the rd0 and rd1 variants.

I thought about unifying rd0 and rd1 which would simplify the preprocessing, since the only difference seemed remove_directory (could be done in a similar way as PPXOPT_BEGIN/END is implemented), but it seems that rd0 also doesn't handle cmxs so I am unsure whether this is a deliberate choice or just accidental divergence.

@Leonidas-from-XIV
Copy link
Contributor Author

Hi @gerdstolpmann, I think this is ready as far as we are concerned. @gridbugs and I tested it on macOS and Linux as well as in Dune. I've just force-pushed to skip over all the intermediate steps in finding a proper solution and merging the commits fixing issues that were discovered on the way.

I took some extra care that #use "topfind" wouldn't trigger additional - : unit = () messages and now both variants with and without directories are generated from the same file to improve maintainability.

Could you take a look again and tell me if there's anything missing?

@gerdstolpmann
Copy link
Contributor

I think it is good now. There is a conflict now, however, can you fix?

This unifies both templates to use only one single topfind template so
every change happens in both "older" and "newer" topfind files
@Leonidas-from-XIV
Copy link
Contributor Author

@gerdstolpmann I've rebased upon the current master branch.

@Leonidas-from-XIV Leonidas-from-XIV mentioned this pull request Oct 21, 2024
@jonahbeckford
Copy link

Relying on opam and dune environment variables seems a bit hacky and brittle. And it limits the usefulness to one package manager and one build system. (Sorry for the churn!)

Isn't the real relocation problem the absolute paths inside findlib.conf? And the solution for that is making sure that configuration files can be portable through dune-like path variables?

For example, I have a Windows findlib.conf:

destdir="Y:\\source\\scoutapps\\#s\\site-lib"
path="C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\lib;C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkDev\\src;C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkFs\\src;C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkNet\\src;C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkStdRestApis\\src;Y:\\source\\scoutapps\\#s\\site-lib;C:\\Users\\beckf\\AppData\\Local\\DkCoder\\site\\96697830-7d79-49f5-8d79-2066101f0ef9"
stdlib="C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\lib\\ocaml"
ldconf="Y:\\source\\scoutapps\\#s\\ld.conf"

In the findlib example it can be made partially portable with a prefix variable that is always relative to findlib.conf, and dirsep and pathsep variables:

destdir="%{prefix}%{dirsep}#s%{dirsep}site-lib"
path="C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\lib%{pathsep}C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkDev\\src%{pathsep}C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkFs\\src%{pathsep}C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkNet\\src%{pathsep}C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\share\\DkSDKCoder_Us\\DkStdRestApis\\src%{pathsep}%{prefix}%{dirsep}#s%{dirsep}site-lib%{pathsep}C:\\Users\\beckf\\AppData\\Local\\DkCoder\\site\\96697830-7d79-49f5-8d79-2066101f0ef9"
stdlib="C:\\Users\\beckf\\AppData\\Local\\Programs\\DkCoder\\coder\\h\\Env\\lib\\ocaml"
ldconf="%{prefix}%{dirsep}#s%{dirsep}ld.conf"

Then it would be the responsibility of ./configure to write a portable findlib.conf with some commands like sed s#$(dirname $PWD)#%{prefix}/..#g. As long as there are no paths outside of the findlib.conf tree (plus one parent directory to allow for lib/findlib.conf opam conventional placement), the findlib configuration will be fully portable.

@gerdstolpmann
Copy link
Contributor

@jonahbeckford At some point findlib needs to know the prefix, sooner or later. The first version of the PR tried to infer the prefix from the path of the running executable, but for the documented reasons we can't do this. Hence the environment variables. You can move the point in time when the prefix is needed, but you can't avoid it.

@gerdstolpmann gerdstolpmann merged commit df3430d into ocaml:master Nov 6, 2024
@Leonidas-from-XIV Leonidas-from-XIV deleted the relocatable-relative-path branch November 6, 2024 13:45
@dra27
Copy link
Member

dra27 commented Nov 8, 2024

I'm puzzled (but it can be addressed in a subsequent release) as to why this ended up with C stubs, rather than just using Sys.executable_name (added in OCaml 3.05, and in practice an absolute path to the running executable on virtually every platform). What have I missed?

@Leonidas-from-XIV
Copy link
Contributor Author

@dra27 I don't see any C stubs in the version that was merged?

@dra27
Copy link
Member

dra27 commented Nov 8, 2024

No, I mean in the original version that got discarded for this less good version relying on specific environment variables?

@gerdstolpmann
Copy link
Contributor

Well, there are a couple of downsides of Sys.executable_name:

  • While the function normally works on Linux, MacOS and Windows, it falls back to Sys.argv.(0) for any other OS, and you get some brittleness (argv.(0) is not always properly set, and there is an implicit dependency on PATH or on the current working directory). But even on the mentioned OS there are some edge cases where you might run into problems, in particular when the executable is not in the visible part of the file system (e.g. deleted, or hidden via inaccessible mounts).
  • This method doesn't work for dynamic loading, because in that case the running executable is not ocamlfind or the toploop ocaml, but an arbitrary command that could be installed anywhere

In particular the latter means that any executable-lookup method is not a good solid basis for figuring out the install location. But this doesn't mean such a method couldn't complement it. I've mainly merged this PR because this way it is easy to test, and I'm sure there will be some good suggestions for further improvement.

@dra27
Copy link
Member

dra27 commented Nov 11, 2024

Indeed - I'm aware the library part needs additional work (it's the same thing with compiler-libs in Relocatable OCaml). Granted, Sys.executable_name is missing the required stubs for BSD (I intend to add them), but apart from that, I don't think there's a platform on which you'd be running where Sys.executable_name in practice isn't actually the absolute path to where the executable was genuinely loaded from. Those failure modes aren't relevant - all we need is the absolute path to where the executable was started from, which is determined by the OCaml runtime (both bytecode and native) incredibly early on in main. If the user has started ocamlfind (from an installation prefix) and then unmounted it, we're already in a lot of trouble!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants