Virgil FFI is a super simple and easy-to-use foreign function interface to call Haskell <-> Python.
Its main goals are simplicity, safety and flexibility.
Virgil FFI is named after Dante's guide in the 'Divine Comedy', who guides him through the nine circles of hell and purgatory and finally reach heaven. Virgil FFI takes inspiration from the article Calling Purgatory from Heaven: Binding to Rust from Haskell and the classic paper Calling hell from heaven and heaven from hell which first introduced the Haskell FFI. And just like there are nine circles in hell, Virgil FFI also has multiple 'levels'. The interface is implemented as a small number of higher-order-functions that do incremental 'lifting' and 'lowering' of user-defined functions between these levels.
The result is a very clean, maintainable and extensible system.
Virgil-FFI was presented during the PyUtrecht meetup of 2024-09-17. Find the Slidedeck here.
Note that Virgil-FFI was researched/built during a one-day Hackathon. While the result is decent, you should consider it as a building block for your own FFI systems, rather than treating it as a fully finished system.
- Start the Haskell Runtime from Python
- Call Haskell functions from Python
- Call back into Python from Haskell, by passing arbitrary Python callables to a Haskell function.
- Pass arbitrary Python/Haskell types (as long as they are JSON or CBOR serializable)
- Raised Exceptions bubble up from Haskell to Python, and from Python to Haskell, including a cross-language stacktrace!
- Haskell's threaded runtime can do anything it wants in the background as it is not impacted by the GIL
- Very hard to accidentally cause 'segmentation faults' or other kind of memory corruption or undefined behavior: Values passed between languages are type-checked at the other side.
- Reasonably efficient: Overhead of calling a foreign function is the JSON/CBOR serialization overhead +
malloc/free
of the intermediate byte buffer. - Supports functions of any arity, written in idiomatic style. (Implemented using the unpacking operator in Python and a currying trick in Haskell)
TODO:
- Tests
- Managing the shared library location with Nix. (Currently using a symlink.)
- Passing Haskell functions as callbacks to Python is not (yet) supported, but adding support would be easy.
- Callbacks (to closures that capture data) should be used before the function they were passed to returns. Storing them for longer and calling them later could cause memory corruption. Solving this would require a more complicated implementation so Haskell could indicate to Python's GC when the object goes out of scope according to its GC (and vice versa).
- Currently, users can choose CBOR or JSON for serialization. This allows for support of a lot of basic types, but for specialized types this might require adding extra JSON/CBOR serialization hooks.
- Signal handlers: Both languages have a set of (default) signal handlers, but only one of the two can be active at a given time. Currently there has been chosen (c.f.
cbits/virgil_wrapper.c
) to restore the Python signal handler after Haskell has started up, but this means that if Haskell is executing for a long time without calling any Python,SIGINT
(Ctrl+C) will not trigger a (Haskell)UserInterrupt
/(Python)KeyboardInterrupt
. See this longer explanation on the Hyphen project for more details.
- Create a new Cabal project (wrapped with Nix), in which you depend on the
virgil
library. - Its build target should be a
foreign-library
of the typenative-shared
. See the example project for details.- Make sure to include (a copy of)
cbits/virgil_wrapper.c
in thec-sources
of the new foreign-library.
- Make sure to include (a copy of)
- Inside the main foreign library module, add the function:
foreign export ccall virgilRealloc :: Ptr a -> Int -> IO (Ptr a)
virgilRealloc :: Ptr a -> Int -> IO (Ptr a)
virgilRealloc ptr size = Foreign.Marshal.reallocBytes ptr size
This is required to allow Python to allocate/deallocate byte buffers in the same way as Haskell.
- Besides this 'special' function, add any function you like. The parameters of these functions can be any type that have an
Aeson.FromJSON
instance and the output must have anAeson.ToJSON
instance. These typeclasses are used to convert the parameters/output from/to CBOR (usingcborg-json
) or JSON (usingaeson
). - The actual outward type signature should be
VirgilFunction
, and above the function addforeign ccall export myFunctionName :: VirgilFunction
. Inside, useVirgil.lowerCBOR
(orVirgil.lowerJSON
) and call your real function implementation.
Example:
-- | Divide two integers
foreign export ccall divIntegers :: VirgilFunction
divIntegers :: HasCallStack => VirgilFunction
divIntegers = Virgil.lowerCBOR impl
where
impl :: HasCallStack => Integer -> Integer -> IO Integer
impl left right = pure $ div left right
You can use any type supported by Aeson
in the signatures. If you want to be able to have Python pass values of Any
(serializable) type, use Aeson.Value
.
To accept callbacks as parameters, rather than writing their type as (a -> b -> IO c)
(which is not serializable),
write ForeignClosure (a -> b -> IO c)
(which is!).
You can turn an object of this type into a 'normal' callable function using ForeignClosure.liftCBOR
(resp. liftJSON
).
(Be sure to not store these callbacks or functions created from them after your function returns; see 'limitations'!)
-- | Call a Python function on each element in an list, returning that list
foreign export ccall mappy :: VirgilFunction
mappy :: HasCallStack => VirgilFunction
mappy = Virgil.lowerCBOR impl
where
impl :: HasCallStack => [Aeson.Value] -> ForeignClosure (Aeson.Value -> IO Aeson.Value) -> IO [Aeson.Value]
impl list callback = do
let fun = ForeignClosure.liftCBOR callback
mapM fun list
Your function is now ready to be called from Python!
You can load the Haskell dynamic library like so:
import virgil
dll = virgil.DynamicLibrary("path/to/NameOfCompiledLibrary.so")
To expose and wrap one of the Haskell functions from python, use the liftCBOR
/liftJSON
functions on the DynamicLibrary
instance:
haskellDivIntegers = dll.liftCBOR("divIntegers")
def divIntegers(left: int, right: int) -> int:
"""
Divide two integers (secretly using Haskell)
"""
return haskellDivIntegers(left, right)
Note that:
- The
liftCBOR
call is on the top source level.- This makes sure that if the function cannot be found, an error happens when the module is loaded (usually on app startup) rather than only when the function is called.
- It also is more efficient as it is only called once.
- In the Python wrapper you can add any type hints and documentation.
- And if desired you can do any preprocessing/postprocessing as well.
If you want to pass a callback to Haskell, call dll.lowerCBOR
/dll.lowerJSON
on a callable object:
haskellMappy = dll.liftCBOR("mappy")
def mappy(elems: list[Any], fun: Callable[[Any], Any]) -> list[Any]:
"""
Call a Python function on each element of an integer list,
returning the resulting list.
"""
return haskellMappy(elems, dll.lowerCBOR(fun))
You're now ready to call the code:
>>> import example
>>> example.mappy([1,2,3,"a", [3,4,5]], lambda x: x * 2)
[2, 4, 6, 'aa', [3, 4, 5, 3, 4, 5]]
Virgil FFI is implemented as a stack of 'levels', with higher-order-functions (functions that operate on other functions) that lift resp. lower user-defined functions to higher (more versatile) resp. lower (more fundamental) levels.
flowchart TD
subgraph Haskell
HaAny[Arbitrary user code]
HaEx[Arbitrary single-parameter code]
HaVal[Rich Values]
HaBytes[ByteString]
HaByteBox[ByteBox]
HaAny <--"(un)currying"--> HaEx
HaEx <--"declawing/reraising Exceptions"--> HaVal
HaVal <--"(de)serializing"--> HaBytes
HaBytes <--"ownership"--> HaByteBox
end
subgraph Python
PyAny[Arbitrary user code]
PyEx[Arbitrary single-parameter code]
PyVal[Rich Values]
PyBytes[ByteString]
PyByteBox[ByteBox]
PyAny <--"(un)currying"--> PyEx
PyEx <--"declawing/reraising Exceptions"--> PyVal
PyVal <--"(de)serializing"--> PyBytes
PyBytes <--"ownership"--> PyByteBox
end
subgraph C-FFI
C["{char * bytes; size_t len;}"]
HaByteBox <--"memory management"--> C
PyByteBox <--"memory management"--> C
end
By implementing both a 'lift' and a 'lower' function in both languages, we gain the following benefits:
- Individual parts are very easily understood/maintained on their own.
- Individual parts can easily be tested: In the Haskell unit test suite we can lower a normal Haskell test-function rather than requiring a full Python function. (and vice-versa)
This is the 'starting' level. Both languages' FFI bindings are able to already talk C. However, C is a very crude language. Many more complicated constructs cannot easily be expressed at this level.
The main thing we use the C level itself for, is to communicate to Python where to find realloc
.
realloc
can be used to allocate, resize and deallocate arbitrary byte buffers.
However, it is paramount that both languages use the same memory allocator for this.
As it cannot be guaranteed that the Python interpreter and the Haskell shared library were compiled/linked against the same memory allocator,
exposing the memory allocator in use by Haskell is the easiest way to guarantee compatibility.
We're now able to create byte buffers in one language and pass them to the other language which can destroy them. However, in order to read from these buffers, it is necessary to know how large they are.
As we may be dealing with binary data that can contain null bytes, the normal C string type AKA 'null-terminated-byte-string' is not a good choice.
Instead, we implement a very simple 'fat pointer' type in both languages. At both sides, we write whatever is necessary to construct/destruct and read/write from a datastructure which in C would look like:
struct ByteBox {
char *bytes;
size_t size;
}
Since both the ctypes
library in Python and the builtin FFI capabilities in (GHC) Haskell prefer structs to live behind pointers rather than passing/returning them directly from functions,
we choose to use the core FFI type:
void virgil_function(const ByteBox *input, ByteBox *output);
Equivalent Haskell:
type VirgilFunction = Ptr ByteBox -> Ptr ByteBox -> IO ()
Equivalent Python:
somedll.myfunction.argtypes = [ctypes.POINTER(ByteBox), ctypes.POINTER(ByteBox)]
somedll.myfunction.restype = None
So the first parameter is an input parameter and the second parameter is an output parameter. We establish the convention that the output parameter will be:
- allocated by the caller (with a 0 size and a
nullptr
buffer), - filled by the callee (by allocating a buffer of the appropriate size),
- and deallocated again by the caller (including deallocating buffer).
Level 1 abstracts all necessary details of C away, but manually writing to an output parameter feels non-ergonomic in Python and Haskell.
Luckily, it is very easy to add a small wrapper on top, which from the outside looks like
level1Function :: ByteString -> IO ByteString
def level1Function(input: bytes) -> bytes:
...
If we want to be super duper safe, we could memcpy
the bytes to/from the ByteBox objects.
However, since level 1 will outside of Virgil FFI's own unit test suite only be used from level 2, it is okay to 'borrow' the same underlying buffer.
Level 2 works well, if all you ever want to do is pass bytestrings back and forth. But it is easy to add support for other values on top, by serializing/deserializing (AKA encoding, marshalling, pickling, ...) them using a common serialization format.
Currently, Virgil FFI implements CBOR and JSON. It is recommended to use CBOR everywhere as it is much more efficient, except when debugging the internals of Virgil FFI for which the human-readability of JSON is nice.
Note that Virgil currently uses the cborg-json
library Haskell-side, which might somewhat limit what types are supported.
On this layer, we make sure that if the input from the other side cannot be parsed, that this does not result in an exception, but rather that such an error is turned into and passed on as a value.
This is done so we can delay exception handling to its own level, and test the serde-ing on its own.
We have now arrived at the final real level: Dealing with exceptions.
The main thing to do here, is to turn exceptions into values when lowering, and to turn exception-values back into exceptions when lifting.
The mapping between exceptions and values is done using the basic Either
type in Haskell.
On the Python side, this type looks like
ExceptionOrValue: {'Left': ExceptionValue} | {'Right': Any}
ExceptionValue = {'name': str, 'message': str, 'callstack': list(CallstackFrame), annotations: list(str)}
CallstackFrame = tuple(str, {'file': str, 'line': int, 'col': int})
A little bit of helper code is used to be able to turn a list(CallstackFrame)
to/from a Haskell CallStack
and a Python Traceback
object.
Also, a select few common exceptions that exist in both languages are mapped between (like division by zero and SIGINT).
We are now able to pass arbitrary values and run arbitrary code, which might even throw exceptions. However, one thing is still non-ergonomic: if we want to pass multiple parameters to the other side, we have to combine them together (in a list or tuple).
There exist a very simple convention to improve this however: Always serialize the input parameters as a list. (Even if there are zero or only one parameter!) That way, any number of parameters can be handled uniformly on the other side.
In Python, the packing/unpacking operator *
is used to turn multiple parameters into a list of arguments, and inside closures a list of parameters back into individual parameters.
In Haskell, a fancy little typeclass (c.f. the Curry
module) is used to pass a tuple of arguments to a normal fully-curried function. (And do 'full uncurrying' for closures).
We can now call completely arbitrary code from one language in the other, idiomatically. However, there is one final thing to make the interface extra flexible: support for closures.
The only thing we need to do to pass a closure, is to go through the levels in reverse:
We lower a closure down through all the levels until it has become a VirgilFunction
, and in the other language lift it again until it is a normal callable function.
However, there is a little problem: a VirgilFunction
is itself not CBOR/JSON serializable!
But, this is easily solved: By using the fact that on x86 (and most other mayor) architectures, function pointers are word size just like normal integers,
we can turn the resulting function pointer into a {'foreignFunctionAddr': raw_function_addr}
dict.
A serializaion handler is installed at both ends to make sure that this kind of dictionary is always recognized as a callable closure at the other end.
Besides the article Calling Purgatory from Heaven: Binding to Rust from Haskell and the classic paper Calling hell from heaven and heaven from hell, there was a lot of useful information in the Hyphen project. It also implements a FFI interface between Python and Haskell, although Hyphen works on Haskell source code: It expects GHC to be installed on the target system and tries to compile Haskell code on the fly.