Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local repo with include files and libs #43

Closed
wants to merge 1 commit into from
Closed

Conversation

cypof
Copy link
Contributor

@cypof cypof commented Oct 13, 2015

Prototype of the repository we discussed. It is located in ~/.javacpp and has the same structure as Maven's ~/.m2 folder. The build process now packages include files, so that libraries are self-sufficient. E.g. it is not necessary to build native versions of a library's dependencies, and inter-project references to cppbuild/../include and lib can be removed. They are now discovered from the pom definition and fetched to the repo. I only tested on Opencv and Caffe on Linux-64.

@saudet
Copy link
Member

saudet commented Oct 15, 2015

Looks like a great idea, thanks! I guess we can assume that users who have write access to /tmp can also write to their home directory. How does this interact with Loader.load()? I'm not sure I understand how calls to System.load() get changed...

@cypof
Copy link
Contributor Author

cypof commented Oct 15, 2015

For now I've only changed the build phase, once we get more comfortable with it we could also use it at runtime, and remove the shutdown hook etc.

@saudet
Copy link
Member

saudet commented Oct 16, 2015

Well, it can't replace the current mechanism entirely. A lot of users don't use Maven, or anything like that. We need a way to detect that at runtime...

@cypof
Copy link
Contributor Author

cypof commented Nov 5, 2015

New attempt. I could not find a reliable way to read the pom or header file in a jar, so the maven coordinates are now embedded in the generated code of each class. This way the repo can be used at runtime without maven. For legacy jars without coordinates, the repo path is generated using a hash of the jar. It still needs work, but I think it heading in the right direction.

@saudet
Copy link
Member

saudet commented Nov 7, 2015

Looking better, yes, thanks! :) For the coordinates, we can easily include any information we want in the manifest of the JAR file and get that at runtime using the standard API without anything fancy. That's what I do to store and display the version number here:
https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/tools/Builder.java#L699
That would be an appropriate place to put some kind of coordinates. But when that's missing or writing to non-temp drive doesn't work, I still feel we need a nice fallback mechanism. Using the temp directory works well enough, so I'd keep it around. You feel differently?

@cypof
Copy link
Contributor Author

cypof commented Nov 7, 2015

Having the coordinates in the class also makes them available in case the jars are combined using mvn-assembly or when classes are loaded directly without a jar. This might not actually be necessary, e.g. for our spark app instead of using mvn assembly we could add each jar separately. And when classes are loaded directly the coordinates might not be useful. I'm not sure about the various use cases yet, so I went for generated code, but you should go for whatever you are comfortable with.

For the tmp folder, I like the hash trick because it ensures we have the same mechanism and perf for all jars, e.g. libs are expanded only only once, and I am really happy to get rid of the shutdown hook. The perf thing is not a big deal usually but we have more and more tools that require instant starts and stops, like utilities that we run using the scala command line tool.

@saudet
Copy link
Member

saudet commented Nov 8, 2015

It's not about what I'm comfortable with, but about what the actual needs of users are. I'm not convinced that we can always write in the user's home... Can we really assume that? (The shutdown hook is only useful on Windows, BTW.) Or maybe your idea is that, if we can't write to the home, in the worst case, we can always expand in the temporary directory and leave it there, not delete, is this the idea? I guess that would make sense.

So, about the coordinates, I'm sure we can do better than generated code. I wonder what SWT does for that, do you know?

@cypof
Copy link
Contributor Author

cypof commented Nov 8, 2015

I'm not sure what is the best for the coordinates. For our app we should be able to switch to regular jars, so it would work in all cases. If you are already reading the manifest anyway you probably want to extend that. About the home folder, deamons might be configured to write to something like /var/cache or $SPARK_HOME. As long as there is a clear error message saying home cannot be written to, and where to set the config, I think it's largely worth it.

@saudet
Copy link
Member

saudet commented Nov 8, 2015

Let's say we figure this out, there might be issues making the native platform load the librairies from arbitrary directories. We can use rpath at build time on some platforms... Have you made tests for that?

@cypof
Copy link
Contributor Author

cypof commented Nov 8, 2015

It seems to work so far, but I have only done a few runs with the shared libs. More tomorrow if I have time.

@saudet
Copy link
Member

saudet commented Nov 10, 2015

What about the META-INF/maven/${groupId}/${artifactId}/pom.properties resource file? Unlike the manifest, it's unique and already contains exactly the information we want. Plus it can be retrieved by the standard Java API without any additional dependencies. Or was that something you didn't find reliable enough? When does it fail?

@cypof
Copy link
Contributor Author

cypof commented Nov 10, 2015

It fails if jars are combined using maven assembly, but we won't be using that anymore. Does this provide the version also?

@saudet
Copy link
Member

saudet commented Nov 10, 2015

Yes, it contains the version. AFAIK, that's the main reason it's there. And the assembly plugin doesn't remove it or anything, for example:
http://www.mkyong.com/maven/create-a-fat-jar-file-maven-assembly-plugin/

@cypof
Copy link
Contributor Author

cypof commented Nov 13, 2015

I should have a bit of time to update it this weekend, do you see anything else blocking for this PR?

@saudet
Copy link
Member

saudet commented Nov 14, 2015

Let's see, about the coordinates, thinking about this some more, it might not be a bad idea to keep the generated code mostly as is, and not rely on pom.properties or whatever. It would allow users to make versioned builds even without Maven, which is good. We never know what users need. The only thing I would change though is to put Coordinates as an annotation to avoid polluting further the namespace of native APIs with the "tools" package and a "coordinates" field. We can even create our own instances of annotations BTW:
http://stackoverflow.com/questions/16299717/how-to-create-an-instance-of-an-annotation

Other than that, the include paths are already part of the properties. You're already doing that for linkPaths, so I don't believe we need to add a separate String includePath; field?

Also, I'm wondering, do we really need that Eclipse dependency? Is it something we can't do with only the standard Maven plugins?

And we shouldn't remove existing extractResources() and other public APIs that can be used outside of the current loading system...

I wonder if we can get a hash for the class itself? Rather than the JAR file, which might not even exist, like on Android, although we won't extract files on Android, but anyway I believe that would be more reliable.

About bundling links, since those aren't supported by JAR files, have you tried recreating them from the @Properties during the extraction phase instead?

I think that would be about it, but obviously I'll be testing this some more before merging it :) Thanks a lot!! Looks great

@saudet
Copy link
Member

saudet commented Nov 14, 2015

For the coordinates, we could even go further, and add a field to @Properties to let users set the coordinates there. If missing, under a Maven build, it could load them from the pom.properties. If anything, it would make things more consistent from the point of view of a user as part of @Properties :) What do you think?

@saudet
Copy link
Member

saudet commented Nov 15, 2015

Hum, to be able to load pom.properties, we already need to know the groupId and artifactId. That's not going to work. Well, let's go with Coordinates as an annotation on the (generated) top level class. Sounds good?

@cypof
Copy link
Contributor Author

cypof commented Nov 15, 2015

Works for me, thanks for looking at this.

@saudet
Copy link
Member

saudet commented Nov 16, 2015

Cool :) But I forgot, there's still issues with rpath. Right now, you're including the absolute path to the .javacpp folder, and that gets passed to the rpath option of the linker, but that's like "/home/cypof/.javacpp" on your machine, and "/home/saudet/.javacpp" on mine, so I don't think that's going to work. When I investigated that the last time, I concluded that we had specify at build time relative paths using $ORIGIN on Linux and @loader_path on Mac OS X, but since all libraries were extracted in the same directory, we didn't have to do anything more than plug that in the compiler options. (On Windows, DLLs found in the same directory can load each other by default.) But now that the library files are getting extracted here and there, unless I'm mistaken, that's going to be a bit more complicated. The linkpaths that you pass to the builder will have to be relative to the library in the repository. So, say we have these files:

~/.javacpp/repository/org/bytedeco/javacpp-presets/opencv/3.0.0-1.1/linux-x86_64/libopencv_core.so
~/.javacpp/repository/org/bytedeco/javacpp-presets/caffe/master-1.1/linux-x86_64/libcaffe.so

Then libcaffe.so will have to be linked with -rpath $ORIGIN/../../../opencv/3.0.0-1.1/linux-x86_64.

That also makes like difficult for the hashing scheme. Can we guarantee to get the same hash at build time than at runtime? If for some reason the hash changes, then the libraries won't load. Besides, this is only for "backward compatibility", for classes missing the @Coordinates annotation. Like I explain above, unless I'm wrong, we need to rebuild using additional -rpath options anyway, so... I would instead recommend to leave alone the current loading mechanism that puts libraries all in the same directory, and let's not worry about that ;) Comments?

@cypof
Copy link
Contributor Author

cypof commented Nov 17, 2015

If libraries are built with path relative the current folder it should work. All dependencies are specified in the preset, so they will be loaded in the right order, and all at once for a given project. E.g. for caffe, opencv will be loaded first, then the ones in caffe's platform jar, all loaded before caffe. In that jar, the references either depend on opencv, that will already be loaded in the process, or on other libraries that will have been unzipped in the same folder. What am I missing?

@saudet
Copy link
Member

saudet commented Nov 17, 2015

Most, if not all implementations of Java, don't use dlopen() with the RTLD_GLOBAL flag or the equivalent on OS X and what not. To get more control over the loading process, we'd need to use potentially non-portable APIs and undefined behavior w.r.t. System.load() ... It's doable, but is it worth the hassle?

@cypof
Copy link
Contributor Author

cypof commented Nov 17, 2015

Wait, I'm missing something. What's the difference between the current loading system that puts all the libs in the same folder, and the proposed one that groups them by potential dependencies? You mean e.g. caffe would not be able to see opencv's symbols because they are not in the same folder? Even if opencv was loaded before?

@cypof
Copy link
Contributor Author

cypof commented Nov 17, 2015

I don't fundamentally mind keeping the current system, but I would like to be sure I understand why.

@cypof
Copy link
Contributor Author

cypof commented Nov 17, 2015

About the others questions, the includePath is a new field, but it's only copied from BuildMojo. It's needed for extracting the headers, I'm not sure how to do it otherwise.

The eclipse thing is already a dependency of the Maven plugin, that's what they use internally for resultion etc. so I guess it's fine to use that too. I could not find how to use the maven api to do direct resolution.

I will put back to extractResources method, might be useful.

@saudet
Copy link
Member

saudet commented Nov 17, 2015

To make a long story short, when we call something like the following on Unix-like systems,

System.load("/home/saudet/.javacpp/repository/org/bytedeco/javacpp-presets/opencv/3.0.0-1.1/linux-x86_64/libopencv_core.so");
System.load("/home/saudet/.javacpp/repository/org/bytedeco/javacpp-presets/caffe/master-1.1/linux-x86_64/libcaffe.so");

the symbols found in libopencv_core.so will probably not be visible to libcaffe.so, unless the former happens to be in the rpath of the later, or in the system library path. (Fortunately, this is not usually the case of operating systems that do not have rpath, such as Android and Windows: The symbols do become globally visible by default.)

We want to copy header files from just some directories, right, so what about just copying only the files found in the "platform.include" property instead? We need to list most of the files there for Parser and Generator anyway, and it would offer a more precise way to include exactly just what we want.

I didn't notice about Maven depending Aether, but you're right, so that's alright. Thanks! :)
https://wiki.eclipse.org/Aether/Using_Aether_in_Maven_Plugins

@cypof
Copy link
Contributor Author

cypof commented Nov 17, 2015

OK, that's unfortunate for the loader part. Let me look at what I can remove from the PR in that case. For the include, we want to also be able to build C++ code against the repo, so we need the full set. For our library for example the maven build phase also builds a couple hpp and cpp files that extends Caffe in ways that would be painful to do in Java.

@saudet
Copy link
Member

saudet commented Nov 18, 2015

Your code for the loader is fine, I also feel it's a step in the right direction, no problem there. We just have to think about the rpath issue, or some other way of dealing with this limitation. I neglected to mention, but rpath gets inherited from the loading libraries. In the example above, we don't need to modify libcaffe.so in any way. It inherits the rpath from libjnicaffe.so, so I think the easiest way to fix this is to modify Builder to support relative link paths a bit better.

For the header files, I get you simply want to include all the files, but what if some other users want only a subset? How would we selectively remove some files? I understand that you specifically might not care about that, but I'm sure others would for some other libraries. I think some sort of support for wildcards somewhere should satisfy both needs... Let me think about that...

@saudet
Copy link
Member

saudet commented Nov 22, 2015

I've been checking out TensorFlow and released some (preliminary) presets for that. The new Bazel build system can't install librairies and header files neatly under one directory as we have been used to with autoconf, cmake, etc, and TensorFlow has its header files scattered over multiple directories, so if we should probably have something that works with that. Ideas?

@saudet
Copy link
Member

saudet commented Nov 23, 2015

For the rpath issue, I guess the easiest thing to do would be to add a "platform.preloadpath.prefix" property and use that for relative rpaths (and keep using "platform.linkpath.prefix2" for absolute rpaths since that's what are passed with "platform.linkpath" for the linker). So for OS X, "platform.preloadpath.prefix=-Wl,-rpath,@loader_path/" and on Linux, "platform.preloadpath.prefix=-Wl,-rpath,$ORIGIN/". Then we'd have the Builder add the desired "platform.preloadpath" as relative paths w.r.t to the path of the library once extracted. What do you think?

@cypof
Copy link
Contributor Author

cypof commented Nov 23, 2015

For Bazel, I'm not sure. Listing headers from all include paths might be enough, adding wildcard to that would allow the customization would were mentioning. For rpath, I agree that the final repository relative path should work. Another option would be to have a default /lib folder at the root of the repo containing symlinks to the current lib version like an actual linux distrib would do, but that doesn't allow multi-versioning.

@saudet
Copy link
Member

saudet commented Nov 27, 2015

For the headers files, I've just thought of something interesting I think. We can think a bit larger, and consider them as "resources". There's various other kinds of resources we might want to bundle, like trained models for classifiers and what not. So, what about adding a "platform.resourcepath" that we could set via a "resourcePath" Mojo property for ease of use in the pom.xml file, and set it to say "${basedir}/cppbuild/${platform}/" by default, and along with a @Platform(resource="include"), it would copy the "include" directory, or whatever path we put there, so we could essentially copy anything we want in the JAR file generated by JavaCPP. And then everything would get extracted in a standard place in the "repository" such that setting @Platform(inherit=..., includepath="include") in a dependent library, Builder would find it for the compiler. How does that sound? We wouldn't need to think about wildcard support for now either.

@cypof
Copy link
Contributor Author

cypof commented Nov 29, 2015

That seems great, more flexible and it doesn't remove any functionality needed for includes. Sorry I havn't been able to finish the PR, I'm super busy on other things right now.

@saudet
Copy link
Member

saudet commented May 7, 2017

This has now been basically implemented for arbitrary resources, because we need more than include files for native build systems (pkg-config files, CMake config files, dummy .lib files for Visual Studio, etc). Thanks a lot for all the great ideas! If you see anything to improve, please open another PR.

@saudet saudet closed this May 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants