Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-2464: Make scryer prolog usable as a shared library for other programs #2464

Open
jjtolton opened this issue Aug 3, 2024 · 3 comments · May be fixed by #2465
Open

ISSUE-2464: Make scryer prolog usable as a shared library for other programs #2464

jjtolton opened this issue Aug 3, 2024 · 3 comments · May be fixed by #2465

Comments

@jjtolton
Copy link

jjtolton commented Aug 3, 2024

I'd like to be able to use scryer as a shared library. Taking inspiration from the WASM code, I imagined something like

eval_code_c and free_c_string.

Note: I know absolutely nothing about rust, but from some hacking I have something that does... something.

#[cfg(not(target_arch = "wasm32"))]
use std::ffi::{c_char, CStr, CString};

#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn eval_code_c(input: *const c_char) -> *mut c_char {
    use machine::mock_wam::*;

    let c_str = unsafe {
        assert!(!input.is_null());
        CStr::from_ptr(input)
    };
    let r_str = c_str.to_str().expect("Not a valid UTF-8 string");

    let mut wam = Machine::with_test_streams();
    let bytes = wam.test_load_string(r_str);
    let result_str = String::from_utf8_lossy(&bytes).to_string();

    let c_string = CString::new(result_str).expect("Failed to convert to CString");
    c_string.into_raw() // convert to raw pointer and prevent Rust from cleaning it up
}
#[no_mangle]
pub extern "C" fn free_c_string(ptr: *mut c_char) {
    unsafe {
        if ptr.is_null() {
            return;
        }
        let _ = CString::from_raw(ptr);
    };
}

Here is some test Python I'm using to work it:

import ctypes

lib = ctypes.cdll.LoadLibrary("/home/jay/programs/scryer-prolog/target/debug/libscryer_prolog.so")

lib.eval_code_c.argtypes = (ctypes.c_char_p,)
lib.eval_code_c.restype = ctypes.POINTER(ctypes.c_char)

lib.free_c_string.argtypes = (ctypes.c_char_p,)
lib.free_c_string.restype = None

def eval_and_free(s):
    res_ptr = lib.eval_code_c(s.encode('utf-8'))
    res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode()
    lib.free_c_string(res_ptr)
    return res


if __name__ == '__main__':
    print(eval_and_free("?- char_type(A, ascii_punctuation).")) #  console display: % Warning: singleton variables A at line 0 of user
    print(eval_and_free("?- char_type('\"', ascii_punctuation).")) # console display is empty, `res` above is an empty string
  

I apologize for insulting anyone with my barbarian prolog and rust, pointers (no pun intended) would be appreciated.

@jjtolton jjtolton changed the title Make scryer prolog usable as a shared library for other programs ISSUE-2464: Make scryer prolog usable as a shared library for other programs Aug 3, 2024
@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

One other bit of criticism --

I copied this loop in run_query_generator() nearly verbatim from run_query() -- I just force it to break at the end of the loop, and I left it in the loop even though it's silly because it was working and I was concerned I would make a nuanced mistake in the logic if I changed the code.

However, this would be rather brittle, since any change to the loop in run_query() would cause the logic to diverge in run_query_generator(), and vice-versa, so perhaps it would make more sense to factor out the commonality in an auxiliary function than then reuse it between run_query_generator() and run_query().

@jjtolton
Copy link
Author

jjtolton commented Aug 5, 2024

One more question for anyone for more experienced rust specialists. I'm using a web development pattern here for a lot of these binding functions to return "ok/error/panic" via a string, i.e., https://github.com/mthom/scryer-prolog/pull/2465/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R260-R280.

Is this typical for rust / c interop code or is that considered overkill?

@bakaq
Copy link
Contributor

bakaq commented Aug 5, 2024

[...] so perhaps it would make more sense to factor out the commonality in an auxiliary function than then reuse it between run_query_generator() and run_query().

I think run_query() should be implemented in terms of run_query_generator(). In Rust terms, run_query_generator() is the next() of an Iterator, and the result of run_query() should just be that iterator .collect()ed.

Or, even better, make run_query() return the iterator and leave to the user if they want to collect or iterate it. But that would be a breaking change, so it's probably best to make a run_query_iter() method that returns an iterator, and then the implementation of run_query() could literally just be run_query_iter().collect(). I talked about this in #1880 (comment) and #1880 (comment).

Is this typical for rust / c interop code or is that considered overkill?

This is very atypical for both Rust and C. In Rust you would usually use an enum like Result<T,E> to deal with errors, or some custom one like this:

enum QueryResult {
    Ok(QueryAnswer),
    Err(QueryError),
    Panic,
}

In C you would usually indicate the error with a return value (like a NULL pointer or a non 0 int), and then based on that the caller should react in different ways. Examples of ways to do that:

// On success returns the result and sets error_msg to NULL.
// On error returns NULL and makes error_msg point to the error message.
char *function1(int arg1, int arg2, char **error_msg);

// On success returns 0, makes result point to the result and sets error_msg to NULL.
// On error returns a non-zero value that indicates what kind of error occurred,
// sets result to NULL and makes error_msg point to the error message.
int function2(int arg1, int arg2, char **result, char **error_msg);

// Because in function2 only one of result and error_msg are ever set, we can
// use just one out pointer to represent both, but this can be a bit confusing so
// needs to be properly documented.
// On success returns 0 and makes result point to the result.
// On error returns a non-zero value that indicates what kind of error occurred
// and makes result point to the error message.
int function3(int arg1, int arg2, char **result);

// Usage
int main() {
    char *error_msg1 = NULL;
    char *result1 = function1(1, 2, &error_msg1);
    if (ret1 == NULL) {
       // Now result1 is NULL and error_msg1 is the error message.
    } else {
       // Now result1 is the result and error_msg1 is still NULL.
    }

    char *error_msg2 = NULL;
    char *result2 = NULL;
    int err2 = function2(1, 2, &result2, &error_msg2);
    if (err2 != 0) {
       // Now result2 is NULL and error_msg2 is the error message.
       // Can do a switch in the value of err2 to check what kind of 
       // error happened.
    } else {
       // Now result2 is the result and error_msg2 is still NULL.
    }

    char *result3 = NULL;
    int err3 = functio3(1, 2, &result3);
    if (err3 != 0) {
       // Now result3 is the error message.
       // Can do a switch in the value of err3 to check what kind of 
       // error happened.
    } else {
       // Now result3 is the result.
    }
}

In low level land where C lives it's in general very inconvenient to have to parse something like JSON just to get the results or even know if an error happened, because it involves third party libraries and probably a lot of allocation and memory management. It's a bit more acceptable in Rust because of the great ecosystem, but using better type-level representations is much preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants