Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalArgumentException after using dir(jpype.JPackage("mypackage")) and non-ascii paths. #1194

Closed
tachyonicClock opened this issue Jun 16, 2024 · 3 comments · Fixed by #1195 or #1226

Comments

@tachyonicClock
Copy link
Contributor

It appears that dir(jpype.JPackage("mypackage")) does not handle non-ASCII characters anywhere in its package path. This method is critical for tools like stubgenj that loop over packages using dir.

Consider the following working and broken cases:

├── bad_workdir_à
│   ├── bad_example.py
│   ├── MyClass.java
│   ├── mypackage
│   │   └── MyClass.class
│   └── mypackage.jar
└── good_workdir
    ├── good_example.py
    ├── MyClass.java
    ├── mypackage
    │   └── MyClass.class
    └── mypackage.jar

Both are identical, except for the name of the working directory. The good_example.py and bad_example.py scripts are:

import jpype
jpype.addClassPath("mypackage.jar")
jpype.startJVM(jpype.getDefaultJVMPath())
print(dir(jpype.JPackage("mypackage")))

The output of bad_workdir_à/bad_example.py is:

Traceback (most recent call last):
  File "org.jpype.pkg.JPypePackage.java", line -1, in org.jpype.pkg.JPypePackage.getContents
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/antonlee/github.com/tachyonicClock/A/CapyMOA/jpype_bug/bad_workdir_à/bad_example.py", line 4, in <module>
    print(dir(jpype.JPackage("mypackage")))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
java.lang.java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Bad escape

The output of good_workdir/good_example.py is:

['MyClass']

I would expect the output of bad_workdir_à/bad_example.py to be the same as good_workdir/good_example.py. I've attached the files I used to reproduce this bug.

jpype_bug.zip

@Thrameos
Copy link
Contributor

To debug the problem the first step is to get a full stacktrace.

import jpype

try:
    jpype.addClassPath("mypackage.jar")
    jpype.startJVM(jpype.getDefaultJVMPath())
    print(dir(jpype.JPackage("mypackage")))
except Exception as ex:
    print(ex.stacktrace())

This will yield....

java.lang.IllegalArgumentException: Bad escape
        at java.base/sun.nio.fs.UnixUriUtils.fromUri(UnixUriUtils.java:88)
        at java.base/sun.nio.fs.UnixFileSystemProvider.getPath(UnixFileSystemProvider.java:102)
        at java.base/java.nio.file.Path.of(Path.java:203)
        at java.base/java.nio.file.Paths.get(Paths.java:98)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.uriToPath(ZipFileSystemProvider.java:76)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:151)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:142)
        at java.base/java.nio.file.Path.of(Path.java:208)
        at java.base/java.nio.file.Paths.get(Paths.java:98)
        at org.jpype.pkg.JPypePackageManager.getPath(Unknown Source)
        at org.jpype.pkg.JPypePackage.getContents(Unknown Source)

So the last place before it goes into Java library land is org.jpype.pkg.JPypePackageManager.getPath.

The next step would be to build a custom version of the org.jpype.jar with appropriate instrumentation to see what is the issue. There are two possibilities.

  1. The string is already malformed before it reached Java in which case we need to call the proper Python to Java string encoder first.
  2. The string reaching Java is not malformed but instead Java itself can't handle this particular string pattern in the underlying library. (Very difficult to fix.)

Let start by looking upstream to see if we can find the corresponding context of the call. Thus we need to see about finding the entry point. Grepping we find "getContext" to be created at ./native/common/jp_context.cpp: m_Package_GetContentsID = frame.GetMethodID(packageClass, "getContents",. It is then used in the function...

./native/common/jp_javaframe.cpp-jarray JPJavaFrame::getPackageContents(jobject pkg)
./native/common/jp_javaframe.cpp-{
./native/common/jp_javaframe.cpp-       jvalue v;
./native/common/jp_javaframe.cpp-       JAVA_RETURN(auto, "JPJavaFrame::getPackageContents",
./native/common/jp_javaframe.cpp:                       (jarray) CallObjectMethodA(pkg, m_Context->m_Package_GetContentsID, &v));
./native/common/jp_javaframe.cpp-}

As we can see here there is absolutely no string handling taking place. The input and output are not strings so no encoding conversion is expected. (This is most unfortunate as that means the issue is with Java and not JPype directly.)

The other relevant point is getting the Package name through to Java.

./native/common/jp_javaframe.cpp:jobject JPJavaFrame::getPackage(const string& str)
./native/common/jp_javaframe.cpp-{
./native/common/jp_javaframe.cpp-       jvalue v;
./native/common/jp_javaframe.cpp-       v.l = fromStringUTF8(str);  <== properly goes through C++ string to Java encoding
./native/common/jp_javaframe.cpp:       JAVA_RETURN(jobject, "JPJavaFrame::getPackage",
./native/common/jp_javaframe.cpp-                       CallObjectMethodA(m_Context->m_JavaContext.get(), m_Context->m_Context_GetPackageID, &v));
./native/common/jp_javaframe.cpp-}

Again this points to a downstream library problem in Java.

So next we instrument the jar file with ....

  /**
   * Convert a URI into a path.
   *
   * This has special magic methods to deal with jar file systems.
   *
   * @param uri is the location of the resource.
   * @return the path to the uri resource.
   */
  static Path getPath(URI uri)
  {
    System.out.println("GetPath uri="+uri);
    try
    {
      return Paths.get(uri);
    } catch (java.nio.file.FileSystemNotFoundException ex)
    {
        System.out.println("Not found");
    }

    if (uri.getScheme().equals("jar"))
    {
      try
      {
        System.out.println("Open file system");
        // Limit the number of filesystems open at any one time
        fs.add(jfsp.newFileSystem(uri, env));
        if (fs.size() > 8)
          fs.removeFirst().close();
        return Paths.get(uri);
      } catch (IOException ex)
      {
      }
    }
    throw new FileSystemNotFoundException("Unknown filesystem for " + uri);
  }

We then recompile with ant -f project/jpype_java/build.xml jar.

Now the next part is going to get a bit tricky because if we just place the org.jpype.jar in the working directory and run it likely it will get ignored if the bad path is the issue. So we may need to copy it the installed JPype copy and try it there. But lets give the simple try first and hope it works...

cp project/jpype_java/dist/org.jpype.jar bug/bad_workdir_à/
cp project/jpype_java/dist/org.jpype.jar bug/good_workdir/

We make a quick modification to play org.jpype.jar on the classpath so we can use our modified copy. For the good example we get...

GetPath uri=jar:file:/mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/good_workdir/mypackage.jar!/mypackage
Not found
Open file system
GetPath uri=jar:file:/mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/good_workdir/mypackage.jar!/mypackage
GetPath uri=jar:file:///mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/good_workdir/mypackage.jar!/mypackage/MyClass.class
['MyClass']

For the bad example we get

GetPath uri=jar:file:/mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/bad_workdir_%c3%a0/mypackage.jar!/mypackage
Not found
Open file system
GetPath uri=jar:file:/mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/bad_workdir_%c3%a0/mypackage.jar!/mypackage
GetPath uri=jar:file:///mnt/c/Users/nelson85/Documents/devel/open/jpype/bug/bad_workdir_à/mypackage.jar!/mypackage/MyClass.class
java.lang.IllegalArgumentException: Bad escape
        at java.base/sun.nio.fs.UnixUriUtils.fromUri(UnixUriUtils.java:88)
        at java.base/sun.nio.fs.UnixFileSystemProvider.getPath(UnixFileSystemProvider.java:102)
        at java.base/java.nio.file.Path.of(Path.java:203)
        at java.base/java.nio.file.Paths.get(Paths.java:98)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.uriToPath(ZipFileSystemProvider.java:76)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:151)
        at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:142)
        at java.base/java.nio.file.Path.of(Path.java:208)
        at java.base/java.nio.file.Paths.get(Paths.java:98)
        at org.jpype.pkg.JPypePackageManager.getPath(JPypePackageManager.java:108)
        at org.jpype.pkg.JPypePackage.getContents(JPypePackage.java:119)

So here is the issue the getContents while looking up the path is failing to properly encode the uri. Thus the problem must be in getContents which is calling getPath. Tracking down from that point the problem is then shown to first appear at collectContents in which it calls toURI.

The code reads...

  // Java 8 windows bug https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8131067
  private static URI toURI(Path path)
  {
    URI uri = path.toUri();
    if (uri.getScheme().equals("jar") && uri.toString().contains("%2520"))
      uri = URI.create("jar:" + uri.getRawSchemeSpecificPart().replaceAll("%25", "%"));
    return uri;
  }

So basically we are just calling path.toUri() which is giving us the badly encoded string. So JPype is just the innocent bystander. We took a path called the toUri method on it then passed it back to the Java files system to be resolved and got a failure.

At this point we would need to find a patch to address the flaw in Java. Unfortunately, this is the extent of my time available right now. Hopefully I can get a deeper look later. My first attempt would be to just blattently replace à with %c3%a0 to see if it works. We would then need to find the proper encoder to call on the URI (being mindful that in other cases it may already be encoded properly) which would make this a nightmare to test properly.

Hopefully this is helpful in working out a solution.

@Thrameos
Copy link
Contributor

BTW thanks for the detailed bug report. Though I don't have an immediate solution, I was able to use the report to quickly get the isolating the source of the issue which will hopefully make a fix possible.

@tachyonicClock
Copy link
Contributor Author

@Thrameos thank you this hugely helpful. I'll let you know if I figure anything more out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants