Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface with Python (CPython) #703

Open
Tracked by #1600
certik opened this issue Jun 28, 2022 · 30 comments
Open
Tracked by #1600

Interface with Python (CPython) #703

certik opened this issue Jun 28, 2022 · 30 comments

Comments

@certik
Copy link
Contributor

certik commented Jun 28, 2022

We need to design some kind of a decorator similar to ccall and ccallable (which interfaces to and from C) but to/from Python. Since our surface language is Python, the design can be different. Such as:

@cpython
def f(n: i32, x: str) -> str:
    import sympy
    x_ = sympy.Symbol(x)
    e = ((x_+5)**n).expand()
    return str(e)

LPython will take the contents of the decorated function and turn it into a series of Python C/API calls in ASR, and it will convert the arguments to / from ASR's i32/str into CPython correctly.

This approach will allow calling any Python code naturally from LPython. With @ccall we can naturally call any C code.

The decorator can be just called @python, but since LPython is Python (subset at first, but we are keeping the door open to possibly support all of Python later), it might be confusing what exactly is meant by it.

The second decorator we need is to call a function from Python. Something like:

@cpython_callable
def fast_sum(x: f64[n]) -> f64:
    s: f64 = 0
    for i in range(n):
        s += x[i]
    return s

The contents of the function is regular LPython code that will get optimized by our optimizers to produce top performing (vectorized) machine code. But the interface will get exposed to CPython, so LPython will do the conversion from a NumPy array into f64[n], and it will convert the return value f64 from ASR to CPython. Not sure about all the details yet in ASR, but with the C backend it will produce code that when compiled would create a CPython extension module (shared library) that you just import in Python and it will just work.

With these two decorartors (@cpython and cpython_callable) we can use LPython to create extension modules to CPython. We can easily call any C library if we want to (thus from this perspective it is very similar to Cython), and we can also just implement things in regular LPython code which gets highly optimized. Everything is just Python, so you can take any such LPython code and if you run it with CPython, it will work (just slower).

This is a very powerful design that allows to 100% stay in Python. One starts with CPython, implements some code, then takes parts of it, extract into a separate module that LPython can compile, decorate the API with @cpython_callable and use LPython to compile. Since @ccall works from both CPython and LPython, one can interface any C code easily from both. It's a very simple gradual approach, and it allows to achieve the top performance, as we deliver more on LPython as a regular production optimizing compiler, that is very good with arrays.

@czgdp1807
Copy link
Collaborator

So, here's what I understand from our discussions and the issue above,

  1. @cpython_callable - Makes the function callable from CPython. For that you will be required to create a CPython extension module which can be imported from CPython code.
  2. @cpython - Makes CPython functions callable from LPython executables. For this you will be required to use PyObject_* functions. For example, PyObject_CallFunction you can easily call any CPython function from an executable.

Please let me know if I interpreted your thoughts correctly.

@certik
Copy link
Contributor Author

certik commented Jun 28, 2022

Yes. When you say "you will be required", it means LPython will do it for you automatically. The user will not be required to do that.

@czgdp1807
Copy link
Collaborator

Yeah. By "you will be required" I mean we will be required to add it in LPython so that it can create a CPython extension for the user.

@Smit-create
Copy link
Collaborator

In my understanding, for cpython_callable:

  1. We should get the function, parse it and create an executable for the same.
  2. Whenever the CPython code calls the function, we should use the previously created executable and pass the arguments.

Is that correct? I guess, numba also does something similar.

@certik
Copy link
Contributor Author

certik commented Jul 28, 2022

I think so. I think @cpython_callable should compile it to a regular binary function, and then create a Python wrapper by creating a Python extension module that calls this function.

@certik
Copy link
Contributor Author

certik commented Mar 21, 2023

This should be called @lpython.jit, here is how to use it from CPython:

@lpython.jit
def fast_sum(x: f64[n]) -> f64:
    s: f64 = 0
    for i in range(n):
        s += x[i]
    return s

This will call LPython under the hood, give it the source code of the function, LPython compiles via LLVM into a shared library, and it also emits Python wrappers of this function, all into the same shared library. Then the decorator loads this shared library and calls this function, so when you do later in CPython:

x = fast_sum(np.array([1, 2, 3]))

It will call into the function implemented in the shared library.

To get started, we can just generate C code (or even Cython) for the Python wrappers part, and compile everything in. Later we can maybe represent such C code using ASR itself.

So here are a list of steps:

  1. Add a jit decorator to lpython.py that does the following:
  2. Saves the source code of the function fast_sum as a string into a file
  3. Call LPython on the file and create an object file via the LLVM backend
  4. Generate a C file with the Python wrappers using Python C/API (I would not use Cython)
  5. Compile the C file together with the object file and create a shared library (=Python extension module)
  6. Import it from our_shared_library import fast_math_c
  7. Return fast_math_c as the result of the decorator

Once this works, we will then make some of these steps more robust, such as 4. should be later done by LPython instead of the decorator, and 5. should also be done by LPython with the appropriate option. Also 2. should be done by calling LPython directly from CPython, not going via a file.

@harshsingh-24
Copy link
Contributor

harshsingh-24 commented Apr 12, 2023

How are these decorators like @ccall or @jit implemented in LPython? I am unaware of the files and places where things need to be added in order to make it work this way. Any sample PR so that I learn about their implementations in LPython? @certik @Thirumalai-Shaktivel

@certik certik mentioned this issue Apr 12, 2023
9 tasks
@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Apr 12, 2023

ccall is implemented here:


A simple code like this would work:

from lpython import f64, ccall

@ccall
def _lfortran_ssin(x: f64 ) -> f64:
    pass

print(_lfortran_ssin(0.8)) # Calls in the function from lfortran_intrinsics.c

jit is not implemented yet.

@harshsingh-24
Copy link
Contributor

harshsingh-24 commented Apr 13, 2023

I was trying to make a decorator on my system first. It covers the step 1 and 2 as mentioned by @certik. Am I on the right track here @Thirumalai-Shaktivel ? It basically creates a new file called fast_sum.py which saves the definition of the function.

import inspect
import numpy as np

class jit:
    def __init__(self, func):
        self.func = func

    def __call__(self, *args, **kwargs):
        # Get the source code of the function
        source_code = inspect.getsource(self.func)

        # Get the decorator name
        decorator_name = self.__class__.__name__

        # Remove the decorator name from the source code
        source_code = source_code.replace(f"@{decorator_name}\n", "", 1)

        # Create a filename based on the function name
        filename = self.func.__name__ + ".py"

        # Open the file for writing
        with open(filename, "w") as file:
            # Write the source code to the file
            file.write(source_code)

        # Call the original function
        return self.func(*args, **kwargs)

    
@jit
def fast_sum(x):
    s = 0
    for i in range(3):
        s += x[i]
    return s

x = fast_sum(np.array([1, 2, 3]))

@Thirumalai-Shaktivel
Copy link
Collaborator

I think so.

@harshsingh-24
Copy link
Contributor

harshsingh-24 commented Apr 13, 2023

I think so.

How should I register jit in LPython? If I directly make a class in LPython as shown above, it does not work. @Thirumalai-Shaktivel . I think it is something related to ast_to_asr.cpp.

@certik
Copy link
Contributor Author

certik commented May 9, 2023

For @cpython, all we need for MVP is this:

@cpython
def f(n: i32, x: str) -> str:
    import my_module
    return my_module.run(n, x)

So we need:

  • importing a Python module
  • calling a function from the module
  • passing any LPython type as a variable / argument into the function
  • converting the result from the function back into LPython and return it

The decorator should be called @cpython, consistent with the @lpython decorator.

@Smit-create
Copy link
Collaborator

For @cpython, we can use the following steps:

  1. Recognize the cpython decorator and emit an extension module using Python-C API in C.
  2. Now we can use the emitted C function using BindC in LLVM.

Is this the way?

@certik
Copy link
Contributor Author

certik commented May 24, 2023

We can start with:

@cpython
def f() -> str:
    import my_module
    return my_module.run()

Then the next step is:

@cpython
def f(n: i32, x: str) -> str:
    import my_module
    return my_module.run(n, x)

@certik
Copy link
Contributor Author

certik commented May 24, 2023

Python allows the following:

@cpython
def f(n: i32) -> str:
    import sympy
    sympy.var("x")
    e = ((x+5)**n).expand()
    return str(e)

@certik
Copy link
Contributor Author

certik commented May 24, 2023

One idea is to take this:

@cpython
def f(n: i32) -> str:
    import sympy
    sympy.var("x")
    e = ((x+5)**n).expand()
    return str(e)

and this gets converted to the following CPython function (you can even imagine it being in a file):

def f(n):
    import sympy
    sympy.var("x")
    e = ((x+5)**n).expand()
    return str(e)

And then we just call it, by first converting all arguments and then calling the function f using Python API (PyObject_CallFunction) and then we convert the return value back. We have to somehow evaluate "f", or maybe we can create a separate a.py file and use that.

@certik
Copy link
Contributor Author

certik commented May 24, 2023

I think we should design first an equivalent of @ccall. An example of a @ccall:

@ccall
def f(n: i32) -> str:
    pass

So we should also have:

@pythoncall(module="mymodule")
def run(n: i32) -> str:
    pass

Now we import the "mymodule.py" (from the path provided by -I to lpython) and call the function run from it, and convert arguments.

We'll have to add BindPython. And we add the "module" to Function.

Once we implement physical types, this is part of the physical type.

@certik
Copy link
Contributor Author

certik commented May 24, 2023

Another idea is to do this at import time:

import mymodule import run

But somehow specify that mymodule is a CPython module.

@certik
Copy link
Contributor Author

certik commented May 24, 2023

In the same way we should add @pythoncallable which wraps an LPython function into CPython. It can then likely be used from the @lpython decorator as well (it would generate LPython code that uses the @pythoncallable decorator). Mostly at the ASR level, the function with implementation (body) can have an ABI=BindPython. This can makes array to be of physical type NumPy arrays. Then our optimizer can cast a NumPy array to a descriptor array with no array data movement, thus ensuring good performance.

@certik
Copy link
Contributor Author

certik commented May 30, 2023

Current status:

  • @lpython and @pythoncall is implemented as a prototype
  • We need to get arrays working in both
  • Get @cpythoncall working in LLVM
  • Also implement @pythoncallable working in both LLVM and C backends, that creates a function that can be called from Python; the @lpython decorator can then use it.
  • Harden everything

@Shaikh-Ubaid
Copy link
Collaborator

@certik it seems that to support @ pythoncall in the llvm backend, there are two approaches possible.

  1. We declare all the functions and types needed from the python-c library in the llvm and then link the python library.
  2. We generate the functions that convert the args and return values in a C file and link the llvm generated with this C file.

Which approach shall we follow?

@certik
Copy link
Contributor Author

certik commented Jun 9, 2023

I would call the Python C/API directly from LLVM. If something cannot be done easily, then we can write a simple wrapper in our C runtime library for just that one thing. If (and only if) the Python C/API is called from LLVM, we also need to link with the Python shared library.

@Shaikh-Ubaid
Copy link
Collaborator

I would call the Python C/API directly from LLVM. If something cannot be done easily, then we can write a simple wrapper in our C runtime library for just that one thing. If (and only if) the Python C/API is called from LLVM, we also need to link with the Python shared library.

Got it. Thank you!

@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Jun 12, 2023

We need to get arrays working in both

Apart from this, which other array handling should we need support?

def multiply(n: i32, x: f64[:]) -> f64[:]:
	pass
def multiply(n: i32, x: f64[:]) -> f64:
	pass

@certik
Copy link
Contributor Author

certik commented Jun 12, 2023

I think these are the main use cases.

@certik
Copy link
Contributor Author

certik commented Jun 12, 2023

This can't work:

def multiply(n: i32, x: f64[:]) -> f64[:]:
	pass

You have to use:

def multiply(n: i32, x: f64[:]) -> f64[n]:
	pass

@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Jun 19, 2023

What can be the difference between @lpython and @pythoncallable?

My understanding is:

  • @lpython is compiled using CPython only. We store the function into a file, compile it using LPython and create a shared library. Later call the required symbol from the shared library.

  • @pythoncallable is compiled using LPython? Compile and create a shared library?
    Later using @lpython decorator import the function?

@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Jun 22, 2023

The following throws error message:

from lpython import f64, lpython
from numpy import sqrt, array

@lpython
def test():
    arr: f64[3] = array([1.,2.,3.])
    arr = sqrt(arr)

test()

Error:

$ python examples/expr2.py 
semantic error: Function 'sqrt' is not declared and not intrinsic
 --> ./lpython_decorator_test/test.py:4:11
  |
4 |     arr = sqrt(arr)
  |           ^^^^^^^^^ 


Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).
Traceback (most recent call last):
  File "/Users/thirumalai/Open_Source/lpython/examples/expr2.py", line 5, in <module>
    def test():
  File "/Users/thirumalai/Open_Source/lpython/src/runtime/lpython/lpython.py", line 674, in __init__
    assert r == 0, "Failed to create C file"
AssertionError: Failed to create C file

./lpython_decorator_test/test.py

@pythoncallable
def test():
    arr: f64[3] = array([1.,2.,3.])
    arr = sqrt(arr)

The problem is that we must insert an import statement: from numpy import sqrt in the lpython file.

@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Jun 22, 2023

TODO for lpython

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants