-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local repo with include files and libs #43
Conversation
Looks like a great idea, thanks! I guess we can assume that users who have write access to /tmp can also write to their home directory. How does this interact with Loader.load()? I'm not sure I understand how calls to System.load() get changed... |
For now I've only changed the build phase, once we get more comfortable with it we could also use it at runtime, and remove the shutdown hook etc. |
Well, it can't replace the current mechanism entirely. A lot of users don't use Maven, or anything like that. We need a way to detect that at runtime... |
New attempt. I could not find a reliable way to read the pom or header file in a jar, so the maven coordinates are now embedded in the generated code of each class. This way the repo can be used at runtime without maven. For legacy jars without coordinates, the repo path is generated using a hash of the jar. It still needs work, but I think it heading in the right direction. |
Looking better, yes, thanks! :) For the coordinates, we can easily include any information we want in the manifest of the JAR file and get that at runtime using the standard API without anything fancy. That's what I do to store and display the version number here: |
Having the coordinates in the class also makes them available in case the jars are combined using mvn-assembly or when classes are loaded directly without a jar. This might not actually be necessary, e.g. for our spark app instead of using mvn assembly we could add each jar separately. And when classes are loaded directly the coordinates might not be useful. I'm not sure about the various use cases yet, so I went for generated code, but you should go for whatever you are comfortable with. For the tmp folder, I like the hash trick because it ensures we have the same mechanism and perf for all jars, e.g. libs are expanded only only once, and I am really happy to get rid of the shutdown hook. The perf thing is not a big deal usually but we have more and more tools that require instant starts and stops, like utilities that we run using the scala command line tool. |
It's not about what I'm comfortable with, but about what the actual needs of users are. I'm not convinced that we can always write in the user's home... Can we really assume that? (The shutdown hook is only useful on Windows, BTW.) Or maybe your idea is that, if we can't write to the home, in the worst case, we can always expand in the temporary directory and leave it there, not delete, is this the idea? I guess that would make sense. So, about the coordinates, I'm sure we can do better than generated code. I wonder what SWT does for that, do you know? |
I'm not sure what is the best for the coordinates. For our app we should be able to switch to regular jars, so it would work in all cases. If you are already reading the manifest anyway you probably want to extend that. About the home folder, deamons might be configured to write to something like /var/cache or $SPARK_HOME. As long as there is a clear error message saying home cannot be written to, and where to set the config, I think it's largely worth it. |
Let's say we figure this out, there might be issues making the native platform load the librairies from arbitrary directories. We can use rpath at build time on some platforms... Have you made tests for that? |
It seems to work so far, but I have only done a few runs with the shared libs. More tomorrow if I have time. |
What about the |
It fails if jars are combined using maven assembly, but we won't be using that anymore. Does this provide the version also? |
Yes, it contains the version. AFAIK, that's the main reason it's there. And the assembly plugin doesn't remove it or anything, for example: |
I should have a bit of time to update it this weekend, do you see anything else blocking for this PR? |
Let's see, about the coordinates, thinking about this some more, it might not be a bad idea to keep the generated code mostly as is, and not rely on Other than that, the include paths are already part of the Also, I'm wondering, do we really need that Eclipse dependency? Is it something we can't do with only the standard Maven plugins? And we shouldn't remove existing I wonder if we can get a hash for the class itself? Rather than the JAR file, which might not even exist, like on Android, although we won't extract files on Android, but anyway I believe that would be more reliable. About bundling links, since those aren't supported by JAR files, have you tried recreating them from the I think that would be about it, but obviously I'll be testing this some more before merging it :) Thanks a lot!! Looks great |
For the coordinates, we could even go further, and add a field to |
Hum, to be able to load |
Works for me, thanks for looking at this. |
Cool :) But I forgot, there's still issues with rpath. Right now, you're including the absolute path to the
Then That also makes like difficult for the hashing scheme. Can we guarantee to get the same hash at build time than at runtime? If for some reason the hash changes, then the libraries won't load. Besides, this is only for "backward compatibility", for classes missing the |
If libraries are built with path relative the current folder it should work. All dependencies are specified in the preset, so they will be loaded in the right order, and all at once for a given project. E.g. for caffe, opencv will be loaded first, then the ones in caffe's platform jar, all loaded before caffe. In that jar, the references either depend on opencv, that will already be loaded in the process, or on other libraries that will have been unzipped in the same folder. What am I missing? |
Most, if not all implementations of Java, don't use dlopen() with the RTLD_GLOBAL flag or the equivalent on OS X and what not. To get more control over the loading process, we'd need to use potentially non-portable APIs and undefined behavior w.r.t. System.load() ... It's doable, but is it worth the hassle? |
Wait, I'm missing something. What's the difference between the current loading system that puts all the libs in the same folder, and the proposed one that groups them by potential dependencies? You mean e.g. caffe would not be able to see opencv's symbols because they are not in the same folder? Even if opencv was loaded before? |
I don't fundamentally mind keeping the current system, but I would like to be sure I understand why. |
About the others questions, the includePath is a new field, but it's only copied from BuildMojo. It's needed for extracting the headers, I'm not sure how to do it otherwise. The eclipse thing is already a dependency of the Maven plugin, that's what they use internally for resultion etc. so I guess it's fine to use that too. I could not find how to use the maven api to do direct resolution. I will put back to extractResources method, might be useful. |
To make a long story short, when we call something like the following on Unix-like systems, System.load("/home/saudet/.javacpp/repository/org/bytedeco/javacpp-presets/opencv/3.0.0-1.1/linux-x86_64/libopencv_core.so");
System.load("/home/saudet/.javacpp/repository/org/bytedeco/javacpp-presets/caffe/master-1.1/linux-x86_64/libcaffe.so"); the symbols found in We want to copy header files from just some directories, right, so what about just copying only the files found in the "platform.include" property instead? We need to list most of the files there for I didn't notice about Maven depending Aether, but you're right, so that's alright. Thanks! :) |
OK, that's unfortunate for the loader part. Let me look at what I can remove from the PR in that case. For the include, we want to also be able to build C++ code against the repo, so we need the full set. For our library for example the maven build phase also builds a couple hpp and cpp files that extends Caffe in ways that would be painful to do in Java. |
Your code for the loader is fine, I also feel it's a step in the right direction, no problem there. We just have to think about the rpath issue, or some other way of dealing with this limitation. I neglected to mention, but rpath gets inherited from the loading libraries. In the example above, we don't need to modify For the header files, I get you simply want to include all the files, but what if some other users want only a subset? How would we selectively remove some files? I understand that you specifically might not care about that, but I'm sure others would for some other libraries. I think some sort of support for wildcards somewhere should satisfy both needs... Let me think about that... |
I've been checking out TensorFlow and released some (preliminary) presets for that. The new Bazel build system can't install librairies and header files neatly under one directory as we have been used to with autoconf, cmake, etc, and TensorFlow has its header files scattered over multiple directories, so if we should probably have something that works with that. Ideas? |
For the rpath issue, I guess the easiest thing to do would be to add a "platform.preloadpath.prefix" property and use that for relative rpaths (and keep using "platform.linkpath.prefix2" for absolute rpaths since that's what are passed with "platform.linkpath" for the linker). So for OS X, "platform.preloadpath.prefix=-Wl,-rpath,@loader_path/" and on Linux, "platform.preloadpath.prefix=-Wl,-rpath,$ORIGIN/". Then we'd have the Builder add the desired "platform.preloadpath" as relative paths w.r.t to the path of the library once extracted. What do you think? |
For Bazel, I'm not sure. Listing headers from all include paths might be enough, adding wildcard to that would allow the customization would were mentioning. For rpath, I agree that the final repository relative path should work. Another option would be to have a default /lib folder at the root of the repo containing symlinks to the current lib version like an actual linux distrib would do, but that doesn't allow multi-versioning. |
For the headers files, I've just thought of something interesting I think. We can think a bit larger, and consider them as "resources". There's various other kinds of resources we might want to bundle, like trained models for classifiers and what not. So, what about adding a "platform.resourcepath" that we could set via a "resourcePath" Mojo property for ease of use in the |
That seems great, more flexible and it doesn't remove any functionality needed for includes. Sorry I havn't been able to finish the PR, I'm super busy on other things right now. |
…, and used as `include` or `link` paths (pull #43)
This has now been basically implemented for arbitrary resources, because we need more than include files for native build systems (pkg-config files, CMake config files, dummy .lib files for Visual Studio, etc). Thanks a lot for all the great ideas! If you see anything to improve, please open another PR. |
Prototype of the repository we discussed. It is located in ~/.javacpp and has the same structure as Maven's ~/.m2 folder. The build process now packages include files, so that libraries are self-sufficient. E.g. it is not necessary to build native versions of a library's dependencies, and inter-project references to cppbuild/../include and lib can be removed. They are now discovered from the pom definition and fetched to the repo. I only tested on Opencv and Caffe on Linux-64.