[Loader] improve performance of resource extraction and library search #512

HannesWell · 2021-08-20T17:32:54Z

With this PR I suggest changes to reduce the runtime of resource extraction and library search performed by the javacpp.Loader.

My benchmark is a stripped variant of the embeddedpython.Python that uses only javacpp and cpython (but not numpy) and the 'regular' variant of embeddedpython.Python but with updated javacpp dependencies.
The benchmark is just the following program, which is absolutly dominated by the initialization of the embeddedpython.Python :

long start = System.currentTimeMillis();
Python.put("a", 5);
Python.put("b", 3);
Python.exec("v = a/b");
Object x = Python.get("v");
System.out.println(x);
System.out.println("Took " + (System.currentTimeMillis() - start) + "ms");

On my Windows 10 computer this program takes (when all files are already cached in a previous run) with the current javacpp master 2000ms+-20 for the stripped version respectively 3630ms+-20 for the regular version.

With the suggested changes, the stripped version takes around 800+-20ms and the regular version 1400ms+-20.
So this improves the runtime of the initialization by more than the factor of two.

HannesWell · 2021-08-20T17:35:29Z

src/main/java/org/bytedeco/javacpp/Loader.java

                        } else if (!cacheDirectory || !file.exists() || file.length() != entrySize
-                                || file.lastModified() != entryTimestamp || !file.equals(file.getCanonicalFile())) {


I'm not sure what the intention of the comparison with the canonical file is, but the file is not canonicalized in the called extractResource() method. Furthermore canonicalization does not (or at least should not) change the effective target the file is pointing to, so the metadata checked before should not change because of that.

Symbolic links get created and we need to resolve them to prevent creating cycles.

Can you please explain how this prevents cycles or is there an example anywhere? Because I don't understand it.
I'm asking because avoiding the canonicalization had a significant impact on the runtime improvement (around one third of the runtime reduction).
So if we can avoid this call or at least have a cheaper pre-check (e.g. canCreateSymbolicLink==true? or Files.isSymbolicLink()==true, I'm not sure if this is much faster), it should make it faster.
But of course the logic has to be correct in any case.

There can be symbolic links anywhere in the path, not just the last file in the path.

I'm pretty sure we can disable those checks when org.bytedeco.javacpp.cachedir.nosubdir is set though, and I think that's what you're already using, right? If so, I'd be happy to skip all that when that flag is set. How does that sound?

HannesWell · 2021-08-20T17:42:47Z

src/main/java/org/bytedeco/javacpp/Loader.java

@@ -1247,7 +1251,7 @@ public static String load(Class cls, Properties properties, boolean pathsFirst,
                    foundLibraries.put(preload, urls = findLibrary(cls, p, preload, pathsFirst));
                }
                String filename = null;
-                if (oldUrls == null || urls.length > 0) {
+                if (oldUrls == null && urls.length > 0) {


I'm not sure the 'or' was intentional or not, but before it called the following loadLibrary method when

oldURLs==null and urls.length>0 , so the library in question was not searched before and found now.

oldURLs==null and urls.length==0, e.g. the library in question was not searched before but not found now.

oldURLs!=null and urls.length>0, e.g. the library in question was searched before but and found.

For me it does not make sense to attempt to load the library when it was not found (case 2) or was already loaded (case 3). Or did I oversee something?

If a previously found library failed to load, we should try to keep finding other versions that might be able to load.

HannesWell · 2021-08-20T17:46:48Z

Maybe this also useful regarding issues #452.
The most significant improvements from this PR were achieved by avoiding calls to File methods.

saudet · 2021-08-20T22:53:19Z

Maybe this also useful regarding issues #452.
The most significant improvements from this PR were achieved by avoiding calls to File methods.

If you know of a more efficient way without changing the logic, please try to do that! Thanks

saudet · 2021-12-02T23:42:17Z

So, is there anything in particular that is preventing you from making progress with those pull requests?

HannesWell · 2021-12-03T10:19:20Z

So, is there anything in particular that is preventing you from making progress with those pull requests?

It is just my limited time and shifted priorities, but I definitely plan to complete this PR (as well as my other open PRs!)
I hope/expect to have some free-time around the christmas/new-year days to complete this and the others.

saudet · 2021-12-04T01:06:09Z

BTW, concerning symbolic links on Windows, we could definitively disable them and assume by default that they are never supported. The user could still enable them with some system property, but since they are not available to users by default, and with bugs such as https://bugs.openjdk.java.net/browse/JDK-8003887 that seem like they are never going to get fixed but that @devjeonghwan found can still cause problems, it doesn't sound like we should be relying on them for anything on Windows. But we should still be able to enable their use in JavaCPP for users that really need them.

…ibrary versions (pull #512)

saudet · 2022-01-24T07:17:15Z

Based on suggestions in this pull request, I've updated a couple of things in commit e8b5734. With these modifications, it takes a bit less than 1200 ms on my Windows 10 machine to execute the following, when everything is already cached, extracted, and all:

        Py_Initialize(org.bytedeco.scipy.presets.scipy.cachePackages());
        org.bytedeco.numpy.global.numpy._import_array();

I haven't tried to do something about the unnecessary calls to loadLibrary(), but it wouldn't even gain us 100 ms, so I'm reluctant to change the logic there just for that. If you see other places where we can gain more, please update this pull request! Thanks

@yukoba Does your javacpp-embedded-python incur any additional overhead that we should know about?

HannesWell · 2022-01-24T19:45:48Z

Thanks for implementing the changes and sorry for the delay. Seem that my estimation was bad but I hope to be able to check for possible updates of this and my other PR/issues in the next few weeks.

[Loader] improve performance of resource extraction and library search

6dba8d2

HannesWell commented Aug 20, 2021

View reviewed changes

saudet added a commit that referenced this pull request Jan 24, 2022

* Speed up Loader on Windows when there are no symbolic links or l…

e8b5734

…ibrary versions (pull #512)

saudet mentioned this pull request Jan 30, 2022

Long loading time Loader.load(org.bytedeco.opencv.opencv_java.class); bytedeco/javacv#1638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Loader] improve performance of resource extraction and library search #512

[Loader] improve performance of resource extraction and library search #512

HannesWell commented Aug 20, 2021 •

edited

Loading

HannesWell Aug 20, 2021

saudet Aug 20, 2021

HannesWell Aug 21, 2021

saudet Aug 21, 2021

HannesWell Aug 20, 2021 •

edited

Loading

saudet Aug 20, 2021

HannesWell commented Aug 20, 2021

saudet commented Aug 20, 2021

saudet commented Dec 2, 2021

HannesWell commented Dec 3, 2021

saudet commented Dec 4, 2021

saudet commented Jan 24, 2022

HannesWell commented Jan 24, 2022

		} else if (!cacheDirectory \|\| !file.exists() \|\| file.length() != entrySize
		\|\| file.lastModified() != entryTimestamp \|\| !file.equals(file.getCanonicalFile())) {

[Loader] improve performance of resource extraction and library search #512

Are you sure you want to change the base?

[Loader] improve performance of resource extraction and library search #512

Conversation

HannesWell commented Aug 20, 2021 • edited Loading

HannesWell Aug 20, 2021

Choose a reason for hiding this comment

saudet Aug 20, 2021

Choose a reason for hiding this comment

HannesWell Aug 21, 2021

Choose a reason for hiding this comment

saudet Aug 21, 2021

Choose a reason for hiding this comment

HannesWell Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

saudet Aug 20, 2021

Choose a reason for hiding this comment

HannesWell commented Aug 20, 2021

saudet commented Aug 20, 2021

saudet commented Dec 2, 2021

HannesWell commented Dec 3, 2021

saudet commented Dec 4, 2021

saudet commented Jan 24, 2022

HannesWell commented Jan 24, 2022

HannesWell commented Aug 20, 2021 •

edited

Loading

HannesWell Aug 20, 2021 •

edited

Loading