Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding UDF example #52

Merged
merged 1 commit into from
Dec 30, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions doc/howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,50 @@ FROM penguins.csv
GROUP BY species
ORDER BY count DESC
```

## Register SQLite UDF

To register a user-defined function (UDF) when using SQLite, you can use [SQLAlchemy's `@event.listens_for`](https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#user-defined-functions) and SQLite's [`create_function`](https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.create_function):

### Install JupySQL

```{code-cell} ipython3
%pip install jupysql --quiet
```

### Create engine and register function

```{code-cell} ipython3
from sqlalchemy import create_engine
from sqlalchemy import event

def mysum(x, y):
return x + y

engine = create_engine("sqlite://")

@event.listens_for(engine, "connect")
def connect(conn, rec):
conn.create_function(name="MYSUM", narg=2, func=mysum)
```

### Create connection with existing engine

```{versionadded} 0.5.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example. Can't we allow the functionality without passing existing engine? It adds complexity and friction to the users.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot. the engine has to be prepared beforehand to register the function. I'm unsure what you're proposing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you do something like that?

from sqlalchemy import create_engine, func

# Define a function that takes in a value and returns its square
def my_square_function(x):
    return x * x

# Create an engine and connect to a database
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
connection = engine.connect()

# Use the func module to define the UDF as a SQL expression
square_udf = func.my_square_function(column)

# Use the UDF in a SELECT statement
query = (
    'SELECT *, {} AS squared_value FROM mytable'.format(square_udf)
)
result = connection.execute(query)

# Print the result set
for row in result:
    print(row)

# Close the connection
connection.close()
In this example, we define a function my_square_function that takes in a value and returns its square. We then use the func module to create a SQL expression for the function, and use it in a SELECT statement.

*Keep in mind that this example uses PostgreSQL as the database backend, and you may need to modify the example slightly to work with a different database.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide an example of how you think the API should be when using %%sql? The snippet you shared is not using it

Copy link

@idomic idomic Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Load the extension and connect to the database
%load_ext sql
%sql postgresql://username:password@localhost:5432/mydatabase

# Define a function that takes in a value and returns its square
def my_square_function(x):
    return x * x

# Use the func module to define the UDF as a SQL expression
from sqlalchemy import func
square_udf = func.my_square_function(column)

# Use the UDF in a SELECT statement
result = %sql SELECT *, :square_udf AS squared_value FROM mytable

# Print the result set
print(result)

We might want to even create a tighter integration to the func part, maybe something like:

# Define the UDF as a SQL expression
%sql --udf my_square_function

# And then consume it
%sql SELECT *, :square_udf AS squared_value FROM mytable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see. I think we can open an issue and discuss the API, I agree it's simpler. For now, I'd say let's get this example in the docs so people know they can do it, then we simplify it.

btw, I don't think sqlalchemy.func is what we want, based on this, it looks like it's module to construct SQL from Python, not really for UDFs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll keep the original issue open.

I think it's func with another thing like select(), the developer should dive into it.

Pass existing engines to `%sql`
```

```{code-cell} ipython3
%load_ext sql
```

```{code-cell} ipython3
%sql engine
```

## Query

```{code-cell} ipython3
%%sql
SELECT MYSUM(1, 2)
```