Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenDAL Java Bindings #1572

Closed
Xuanwo opened this issue Mar 12, 2023 · 18 comments · Fixed by #2291
Closed

OpenDAL Java Bindings #1572

Xuanwo opened this issue Mar 12, 2023 · 18 comments · Fixed by #2291
Assignees

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Mar 12, 2023

No description provided.

@kidylee
Copy link
Contributor

kidylee commented Mar 16, 2023

Could I contribute to this issue?

@tisonkun
Copy link
Member

tisonkun commented Mar 16, 2023

@kidylee I'd suggest you briefly outline your ideas first.

@Xuanwo
Copy link
Member Author

Xuanwo commented Mar 16, 2023

Do we need to build the C-binding before starting on the Java binding?

@tisonkun
Copy link
Member

@Xuanwo Using jni-rs, we can build Java bindings without a full functional C binding. But JNI is not always the best solution.

@kidylee
Copy link
Contributor

kidylee commented Mar 16, 2023

@kidylee I'd suggest you briefly outline your ideas first.

Thanks for your advice. As @tisonkun said, we don't need a C-binding for Java, a cdylib is good to go.

A few candidates I'm looking into:

  • raw jni with jni-rs
  • j4rs - looks like this is a promising approach.
  • flapigen - alike PyO3 macros style, but it generates C-API wrapper first.

@tisonkun
Copy link
Member

FYI - if we have a C binding, using JNA can avoid JNI glue code. But it still uses the JNI technology underneath.

@kidylee
Copy link
Contributor

kidylee commented Mar 19, 2023

FYI - if we have a C binding, using JNA can avoid JNI glue code. But it still uses the JNI technology underneath.

Thanks @tisonkun, it is a solid approach.

Here are snippets of my approach:

public void testFileSystem(){
        FileSystemService service  = new FileSystemBuilder()
                .root("/tmp")
                .build();

        var op = new OperatorBuilder(service)
                .build();

        op.write("hello.txt","hello world");
        var rs = op.read("hello.txt");

        assert "hello world".equals(rs);
    }

For the rust part, I'm taking assumption where operator instance keeps resource which we don't want to re- create it per calling. So Operator.java will hold a pointer to the rust operator instance, and below are the method to create and free it:

#[no_mangle]
pub extern "C" fn java_get_operater() -> *const i32 {
    let op = Box::new(Operator{});
    let op_ptr = Box::into_raw(op);
    op_ptr as *const i32
}


#[no_mangle]
pub extern "C" fn java_free_operater(ptr: *mut Operator)  {
    unsafe {
        let _ = Box::from_raw(ptr);
    }
}

Comments are welcomed.

@tisonkun
Copy link
Member

@kidylee I'm glad to review a MVP, go ahead!

@tisonkun
Copy link
Member

tisonkun commented Apr 20, 2023

@ShadowySpirits
Copy link
Member

ShadowySpirits commented Apr 24, 2023

Findings for JNI x async -

I found a possible solution: we can store JavaVM as thread local variable, and when we need to execute Java code, attach the rust-managed thread to obtain Jenv. Additionally, we cloud hold Jenv as thread local variable.

thread_local! {
    static JAVA_VM: RefCell<Option<Arc<JavaVM>>> = RefCell::new(None);
}

#[no_mangle]
pub extern "system" fn Java_org_apache_opendal_Operator_getOperator0(
    env: JNIEnv,
    _class: JClass,
    input: JString,
    params: JObject,
) -> jlong {
    // build operator
    let operator = 1;

    let java_vm = Arc::new(env.get_java_vm().unwrap());

    let runtime = Builder::new_multi_thread()
        .worker_threads(4)
        .on_thread_start(move || {
            JAVA_VM.with(|cell| {
                *cell.borrow_mut() = Some(java_vm.clone());
            });
        })
        .build()
        .unwrap();
    
    Box::into_raw(Box::new((operator, runtime))) as jlong
}

And then we can use async/await without suspending the caller thread.

#[no_mangle]
pub unsafe extern "system" fn Java_org_apache_opendal_Operator_async_write(
    mut env: JNIEnv,
    _class: JClass,
    ptr: *mut (Operator, Runtime),
    file: JString,
    content: JString,
    future: JObject,
) {
    let (op, runtime) = &mut *ptr;

    let file: String = env.get_string(&file).unwrap().into();
    let content: String = env.get_string(&content).unwrap().into();
    let future = env.new_global_ref(future).unwrap();

    let x = async move {
        op.write(&file, content).await.unwrap();
        JAVA_VM.with(|cell| {
            let java_vm = cell.borrow();
            let java_vm = java_vm.as_ref().unwrap();
            let mut env = java_vm.attach_current_thread().unwrap();
            env.call_method(
                future,
                "complete",
                "(Ljava/lang/Object;)Z",
                &[JValue::Bool(1)],
            )
            .unwrap();
        });
    };
    runtime.spawn(x);
}

If the community approves this solution, I'd love to add asynchronous API to Java binding.

@kidylee
Copy link
Contributor

kidylee commented Apr 25, 2023

Thanks @ShadowySpirits , looking forward for this feature.

Regard to the approach, I think we need other design to manage thread lifecycle in Rust side, instead of creating it each time when calling native method.

One approach is we can use the register method when Java loading dylib. You can check this doc here.

If a library L is statically linked, then upon the first invocation of System.loadLibrary("L") or equivalent API, a JNI_OnLoad_L function will be invoked with the same arguments and expected return value as specified for the JNI_OnLoad function.

We can create tokio runtime in this OnLoad method.

When the class loader containing a statically linked native library L is garbage collected, the VM will invoke the JNI_OnUnload_L function of the library if such a function is exported.

Same as register, tokio runtime could be destroyed in OnUnload method.

JVM guaranteed dynamic lib will only be loaded once:

A library L that is statically linked will prohibit a library of the same name from being loaded dynamically.

@ShadowySpirits
Copy link
Member

We can create tokio runtime in this OnLoad method.

Thanks @kidylee ! Great idea, I will look into this approach.

@tisonkun
Copy link
Member

After a quick review on the patch, as long as we use global_reference, the parallelism is limited to a constant upper bound:

JNI ERROR (app bug): global reference table overflow (max=65535)

@ShadowySpirits
Copy link
Member

JNI ERROR (app bug): global reference table overflow (max=65535)

This limitation is enough for most scenarios. But we do need to eliminate this risk. One solution could be to provide users with the option to either throw an exception or initiate a blocking operation.

@tisonkun
Copy link
Member

This limitation is enough for most scenarios

If each call obtains one global ref, I'm afraid that the iops is limited by 65535.

@ShadowySpirits
Copy link
Member

If each call obtains one global ref, I'm afraid that the iops is limited by 65535.

IMO, this limitation means we can not hold over 65535 global refs at the same time. However, this limit is not a significant concern since the lifetime of each global reference ends with the completion of its respective operation. Therefore, our theoretical IOPS can exceed 65535.

And most blob service limit IOPS, 65535 is already a big number: https://www.alibabacloud.com/help/en/object-storage-service/latest/usage-notes

@Xuanwo
Copy link
Member Author

Xuanwo commented Apr 26, 2023

We don't need to address all possible problems right now. We can document this limitation and fix it in the future.

@kidylee
Copy link
Contributor

kidylee commented Apr 26, 2023

I agree with @ShadowySpirits, the scenarios of OpenDAL are about file access, bandwidth and processor open file limitation are less than this global references threshold, and as long as we managed it well(no leaks and release immediately), it should be OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants