Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panama-backend - basic integration with jextract #499

Closed
wants to merge 73 commits into from

Conversation

jcrist1
Copy link
Contributor

@jcrist1 jcrist1 commented Jun 14, 2024

I've implemented a basic integration with jextract for a pure java backend. This is just a baseline to start building on, that generates the C2 backend, and uses that with jextract to create the java interface to the native code. There's no code generation yet for ergonomic interfaces. I feel like enums and structs will be relatively straightforward, so once again I will start off building the wrappers for opaque type and cleanup. Also I don't actually know any java yet 🤪, but no time like the present to learn. This is as discussed in #144

Edit (2024-07-11):

I've been struggling a bit to make time for this lately as other priorities have taken precedence but I've managed another big chunk today.

I'm still missing the actual complete codegen pipeline. For the time being I'm just copy pasting generated code from the insta snapshots into the project, to test that it works, as I don't want to implement all of the features yet, and it's more convenient to just panic. Otherwise, this is starting to come together, so I'm going to add the checklist again to keep track of what I need to do still.

  • primitive types
  • opaque types:
    • basic definiton
    • return a boxed opaque
    • as self parameter
    • as another parameter
  • structs
  • enums
  • writeable
    • maybe a better adapted writeable for panama
  • slices
    • primitive slices
    • str slices
    • owned slices I think I would need to integrate with jna again to support these as I can't find a way to free memory in panama (it seems to happen with closeables and a dedicated scoped allocator) I am doing some stuff with owned slices but it feels a little be dangerous. It feels like we could accidentally get a user after free somehow. But I haven't thought too deeply about it.
    • slices of strings
    • strings
  • borrows
    • borrows of parameters
    • in struct fields
  • nullables
  • fallibles


let mut lib_file = File::create(&lib_path)?;
for include in include_files {
writeln!(lib_file, "#include \"{include}\"")?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jextract suggests loading all headers into a single library file. This gets transformed into an enormous monster of java

tool/src/lib.rs Outdated
Comment on lines 83 to 91
attr_validator.support.renaming = true;
attr_validator.support.disabling = true;
attr_validator.support.iterators = true;
attr_validator.support.iterables = true;
attr_validator.support.indexing = true;
attr_validator.support.constructors = true;
attr_validator.support.named_constructors = true;
attr_validator.support.memory_sharing = true;
attr_validator.support.accessors = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just enabled everything for generating some code to start off with. Will adjust later

@jcrist1
Copy link
Contributor Author

jcrist1 commented Jun 14, 2024

I now manually implemented an opaque type with cleanup as well. This will be the model for the template. I also added a benchmark where I just instantiate the opaque struct because I was quite curious what the cost of GC was. With GC on the benchmark throughput more than halved. going from ~2.7M/s to 1.2 M/s. A slow down from about 350ns to ~800ns, which is just interesting. For comparison the opaque formatter in the example took ~8000ns, so this could be about a 10x speedup. All in all this seems worth pursuing and my biggest open questions for this have been resolved.

Copy link
Contributor

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is neat but i'm kind of surprised it works??

) -> std::io::Result<FileMap> {
let files = FileMap::default();
let mut context = c2::CContext::new(tcx, files, false);
context.run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that after #495, the C2 backend can now be directly invoked per-type in a cleaner fashion, if you decide that's more useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to implement that as a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually implemented that. Biggest problem is that I need the C runtime too, which I made pub (crate)

somelib_h.Opaque_destroy(this.internal);
}

public void assertStruct(MyStruct struct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: wait, jextract generated this from the C header files? How did it know that these C header files correspond to a class with methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no, that would be amazing 😃. That’s just a hand rolled wrapper for a more ergonomic type. I’m using this to build the templating for opaque types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in the ntv package was generated using jextract from the c backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Okay, carry on!

import java.nio.charset.StandardCharsets;


public class Opaque {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the tool doesn't actually generate code, the class body has now been been copied from an insta snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ no longer valid 😝

import java.lang.ref.Cleaner;


public class Opaque {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the snapshot from where we copied the body

Comment on lines 324 to 334
// We can probably do boxed returns by just relying on jna
// public double[] asBoxedSLice() {
// try (var arena = Arena.ofConfined()) {
// var nativeVal = somelib_h.Float64Vec_as_boxed_slice(arena, internal);
// var data = dev.diplomattest.somelib.ntv.DiplomatF64View.data(nativeVal);
// var len = dev.diplomattest.somelib.ntv.DiplomatF64View.len(nativeVal);
// var returnVal = data.asSlice(0, len * JAVA_DOUBLE.byteSize()).toArray(JAVA_DOUBLE);
// Native.free(data.address());
// return returnVal;
// }
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what a boxed slice would look like using jna. It should be the only place where we'd need to use it.

import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

public class OpaqueBench {
Copy link
Contributor Author

@jcrist1 jcrist1 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a shell command to actually run this yet. I only ran it from IntelliJ, but want to keep it as it'll be nice to have to compare to the kotlin backend once the icu4x example can be made to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of files seem to be generated by default. It seems to be possible to pass in every single function and struct to jextract to specify which ones should be used to generate code, but I can't see a way to only forbid these standard constructions which are causing a lot of noise. I'd like to move that investigation to a followup.

@@ -17,6 +17,7 @@ mod ffi {
Box::new(Self(v.to_string()))
}

#[diplomat::attr(supports = memory_sharing, disable)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth I had an issue with passing owned strings in java as then I needed to pass an arena around which felt contradictory to the idea of an owned parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you need to instead call diplomat_alloc and write to that string when you see this kind of thing.

well, you need to do that for all strings, the difference with owned strings is that you don't free it afterwards

@@ -33,6 +33,7 @@ pub type DiplomatStr = [u8];
pub type DiplomatStr16 = [u16];

/// Like [`u8`], but interpreted explicitly as a raw byte as opposed to a numerical value.
///
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this came from a clippy fix, but I'm not sure

@@ -38,15 +38,15 @@ pub(crate) fn attr_support() -> BackendAttrSupport {
a
}

#[derive(askama::Template)]
#[template(path = "c/runtime.h.jinja", escape = "none")]
pub(crate) struct Runtime;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need access to the c runtime for the java backend so that jextract can process it.

converted_value: Cow<'cx, str>,
}

mod arena {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted param conversions to not have to be aware of the context wherein they are done, and so introduced a kind of inversion of control, where if a conversion needs an arena it returns a variant that can only be accessed by providing an arena.


impl<'a, 'cx> TyGenContext<'a, 'cx> {
#[allow(unused)]
fn java_to_native<T: TyPosition>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your comment last review @Manishearth I tried to harmonise the param conversion to just a language conversion, so it can be reused in e.g. the struct conversion. It feels a lot cleaner but I admit the control inversion above is a little weird.

@jcrist1
Copy link
Contributor Author

jcrist1 commented Sep 13, 2024

Okay @Manishearth I think this could do with another review. I substantially redid the native <-> java conversion code. Also added the ci steps. I'm still missing a ci step that actually runs the tests instead of skipping, but I'd like to wrap this up, and focus on smaller pieces at a time.

@jcrist1
Copy link
Contributor Author

jcrist1 commented Sep 13, 2024

Screenshot 2024-09-13 at 15 35 47 🎉 🎉 🎉 🎉 🎉 🎉

Copy link
Contributor

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, never posted this

@@ -0,0 +1,47 @@
// Generated by jextract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we shouldn't need FILE

@@ -0,0 +1,219 @@
// Generated by jextract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't need any of these either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was supposed to be on the threading files below this

@@ -17,6 +17,7 @@ mod ffi {
Box::new(Self(v.to_string()))
}

#[diplomat::attr(supports = memory_sharing, disable)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you need to instead call diplomat_alloc and write to that string when you see this kind of thing.

well, you need to do that for all strings, the difference with owned strings is that you don't free it afterwards

// /path/to/mylib/include/mylib.h

let package = format!("{domain}.{lib_name}.ntv");
let mut command = std::process::Command::new("jextract");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: would be nice to accept the jextract binary location from the CLI/config

@Manishearth
Copy link
Contributor

Alright, so my biggest concern with this is making the JDK a part of the gen step. "Run tests for a different backend" isn't a common task for Diplomat development, but "regen test code for all backends" is (since it's common enough to add stuff to the feature_tests crate), and this means all Diplomat devs need to set up JDK, something that can often be very finicky.

I would like for the Java backend to be a part of this repo, however. We have a couple routes in front of us, and I'd like to know what you think:

  1. Diplomat devs get used to setting up JDK, potentially with us merging PRs that temporarily break the java gen task (we can keep it a separate CI job) with the understanding that one of us with a JDK setup will fix it. I can commit to doing so, though I'd rather not.
    • Downside: This is going to be really annoying for other Diplomat devs. I'm open to this but I'd rather not go down this route.
  2. This backend is added in-tree, however we do not check in generated code for it, and CI runs gen-java before running tests.
    • Downside: we lose out on Java diffs
  3. This backend is added in-tree, and we check in regular generated code but not the jextracted code, and CI runs jextract (or perhaps some command like diplomat-tool --backend jextractwrapper)
    • Downside: Kind of weird, we still lose out on diffs but not the important ones
  4. This backend is maintained as rust-diplomat/java as a separate crate, and has full CI there.
    • Downside: Diplomat changes to the HIR don't automatically get nicely propagated there, it's possible for the backends to diverge more in features. Right now even if I don't maintain the Kotlin backend whenever I change something it is relatively easy for me to ensure that Kotlin is kept up to date most of the time.
  5. We manually generate bindings using JEP 454 APIs rather than using jextract
  • Downside: this is a bunch of extra work

The last one is my preferred end state, honestly: when we were considering Panama we largely intended for the JEP 454 java.lang.foreign (etc) APIs to be used, but not jextract. I think jextract is a great way to bootstrap this project, though.

A hybrid route I find somewhat appealing is to take option 4 or 5 above ("don't check in the full bindings"), and over time move to a diplomat that doesn't need jextract. A particularly cool option would be for diplomat to generate an ntv/ folder that is API compatible with jextract, so we can run tests swapping out the diplomat-generated ntv with the jextracted ntv.

@jcrist1 what do you think? Ultimately you're the one doing the work here, so I don't want to dictate anything, but I want to try and resolve the tradeoff with the impact on other backends, and try to understand your own preferences here.

@jcrist1
Copy link
Contributor Author

jcrist1 commented Sep 19, 2024

@Manishearth thanks for the feedback. I'll probably just implement a filter to remove the extraneous generated files from jextract ... depending on the conclusion of what to do with this backend

... need to set up JDK, something that can often be very finicky.

write once run... something something ... I forget the rest 😛

Honestly the jextract thing bothers me from a rust devx perspective as well as an end user perspective. I agree that generating the JEP 454 binding code will probably be the way to go longer term, but I gotta be honest, I'm not super enthusiastic about doing that myself, not right now at least. While jextract generates a lot of superfluous stuff and is a bit of a pain to setup, it also seems like the easiest way to get functional bindings.

So for that moving a java backend to its own repo in the diplomat project is a nice compromise. We could get a working backend, and slowly transition to diplomat generated JEP 454 code. Having a repo in the rust-diplomat project would give the panama backend a bit more legitimacy (and hopefully direct other contributors to it) while signalling that the support level is lower and also allowing a more customised dev-setup. We could definitely open an issue around setting up the JEP 454 generation but it's not the most motivating kind of coding for me so progress there will be slow. I would like to pin the diplomat dependency to a specific commit and maybe set up an alert for when the latest main breaks it. That way people can still have a working tool, but I could get notified to address incompatible changes to the diplomat dependency so maintenance wouldn't fall on your shoulders.

@jcrist1
Copy link
Contributor Author

jcrist1 commented Sep 19, 2024

@Manishearth If we're agreed I can try and set up a standalone bin project tomorrow, if you can create the rust-diplomat/java repo. I'll see if I can get an "initial commit" PR by end of day

@Manishearth
Copy link
Contributor

@jcrist1 here you go: https://github.com/rust-diplomat/diplomat-java

@jcrist1
Copy link
Contributor Author

jcrist1 commented Sep 20, 2024

Closing in favor of rust-diplomat/diplomat-java#1

@jcrist1 jcrist1 closed this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants