Skip to content

Commit

Permalink
Add documentations, tests and a small example (#109)
Browse files Browse the repository at this point in the history
  • Loading branch information
sfc-gh-lfallasavendano committed Jan 8, 2024
1 parent 7093d1f commit f98f826
Show file tree
Hide file tree
Showing 19 changed files with 713 additions and 73 deletions.
129 changes: 129 additions & 0 deletions docs/snowpark/snowpark-backend-limitations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Limitations of the Snowpark backend


The goal of the Snowpark backend is to generate expressions that can take advantage of the **Snowpark** infrastructure. Because of this, there are several language practices that are valid in `Mophir-IR/Elm` that are not supported by this backend. When possible a warning is generated in the code or in the `GenerationReport.md` file.

Some of these limitations include:

## Recursive functions manipulating dataframe expressions

This backend tries to generate as many dataframe expressions as possible. That is why many functions are generated as Scala functions that generate dataframe expression (or [Column](https://docs.snowflake.com/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/Column.html) objects). For example:

For example:

```elm
double : Int -> Int
double x =
if x == 0 then
0
else
x * x
```

Is converted to:

```elm
def double(
x: com.snowflake.snowpark.Column
)(
implicit sfSession: com.snowflake.snowpark.Session
): com.snowflake.snowpark.Column =
com.snowflake.snowpark.functions.when(
(x) === (com.snowflake.snowpark.functions.lit(0)),
com.snowflake.snowpark.functions.lit(0)
).otherwise((x) * (x))
```

As shown above this function is going to return a `Column` instance. This object represents the actual expression tree that is processed by the **Snowpark** library. This transformation makes it impossible to convert functions that make recursive calls. For example:

```elm
factorial : Int -> Int
factorial n =
if n == 0 then
1
else
n * (factorial (n - 1))
```

The generated Scala code looks like this:

```scala
def factorial(
n: com.snowflake.snowpark.Column
)(
implicit sfSession: com.snowflake.snowpark.Session
): com.snowflake.snowpark.Column =
com.snowflake.snowpark.functions.when(
(n) === (com.snowflake.snowpark.functions.lit(0)),
com.snowflake.snowpark.functions.lit(1)
).otherwise((n) * (mymodel.Basic.factorial((n) - (com.snowflake.snowpark.functions.lit(1)))))
```

Since this code is composed only of nested function calls, there is nothing preventing the execution of the recursive call. This

```bash
java.lang.StackOverflowError
com.snowflake.snowpark.Column.$eq$eq$eq(Column.scala:269)
mymodel.Basic$.factorial(Basic.scala:751)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
mymodel.Basic$.factorial(Basic.scala:753)
...
```

## Code that do not manipulate lists of table-like records

To take advantage of this backend, the code being processed has to manipulate lists of table-like records (ex. with fields of only basic). These structure are identified as [DataFrames](https://docs.snowflake.com/en/developer-guide/snowpark/scala/working-with-dataframes).


## Code that do not follow the backend conventions

The backend assumes some conventions to determine how to interpret the code that is being processed. These conventions are described in [snowpark-backend.md](snowpark-backend.md).


## Unsupported elements

There may be situations when this backend cannot convert an element from the **Morphir-IR** . Depending on the scenario the backend generates default expressions or types to indicate that something was not converted.

For example, given that there is no support for the `List.range` function we can try to convert the following snippet:

```elm
myFunc2: Int -> Int
myFunc2 x =
let
y = x + 1
z = y + 1
r = List.range 10 20
in
x + y + z
```

The resulting code is the following:

```scala
def myFunc2(
x: com.snowflake.snowpark.Column
)(
implicit sfSession: com.snowflake.snowpark.Session
): com.snowflake.snowpark.Column = {
val y = (x) + (com.snowflake.snowpark.functions.lit(1))

val z = (y) + (com.snowflake.snowpark.functions.lit(1))

val r = "Call not generated"

((x) + (y)) + (z)
}
```

Notice that the `"Call not generated"` expression was generated. Also, the `GenerationReport.md` file is going to include an error message for this function:

```markdown
### MyModel:Basic:myFunc2

- Call to function not generated: Morphir.SDK:List:range
```
69 changes: 51 additions & 18 deletions docs/snowpark/snowpark-backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,24 @@ id: snowpark-backend

# Snowpark Backend

**TODO**

**Snowpark** backend uses Scala as its JVM language.
The Morphir **Snowpark** backend generates Scala code that uses the [Snowpark](https://docs.snowflake.com/en/developer-guide/snowpark/scala/index) API .

## Generation conventions and strategies

The **Snowpark** backend supports two basic code generation strategies:
The **Snowpark** backend supports two main code generation strategies:

- Generating code that manipulates DataFrame expressions
- Generating code that manipulates [DataFrame](https://docs.snowflake.com/en/developer-guide/snowpark/scala/working-with-dataframes) expressions
- Generating "plain" Scala code

The backend uses a series of conventions for deciding which strategy is used to convert the code of a function. The conventions apply to types and function definitions.
The backend uses a series of conventions for deciding which strategy is used to convert the a function. These conventions apply to the way types and function are defined.

### Type definition conventions

Type definitions in the input **Morphir IR** are classified according to the following conventions:
Type definitions in the input **Morphir IR** are classified using the following conventions:

#### Records that represent tables

Records are classified as "representing a table definition" according to the types of its members. A DataFrame compatible type is one of the following:
Records are classified as "representing a table definition" according to the types of its members. A [DataFrame](https://docs.snowflake.com/en/developer-guide/snowpark/scala/working-with-dataframes) compatible type is one of the following:

- A basic datatype
- Int
Expand Down Expand Up @@ -173,7 +171,7 @@ In this case references to `List Employee` are converted to DataFrames:

### Custom types

The **Snowpark** backend uses two conventions to deal with [custom types](https://guide.elm-lang.org/types/custom_types.html) used as a field of a *DataFrame record*. These conventions depend of the presence of parameters for type constructors.
Two conventions are used to process [custom types](https://guide.elm-lang.org/types/custom_types.html) used as a field of a *DataFrame record*. These conventions depend of the presence of parameters for type constructors.

#### 1. Convention for custom types without data

Expand All @@ -200,15 +198,15 @@ northDirections dirs =
|> List.filter (\e -> e.direction == North)
```

In this case the backend assumes that the code stored in a `Directions` table has a column of type `VARCHAR` or `CHAR` with text with the name of field. For example:
In this case it is assumed that the code stored in the `Directions` table has a column of type `VARCHAR` or `CHAR` with text with the name of field. For example:

| ID | DIRECTION |
|-----|------------|
| 10 | 'North' |
| 23 | 'East' |
| 43 | 'South' |

As a convenience the backend generates a Scala object with the definition of the possible values:
As a convenience a Scala object is generated with the definition of the possible values:

```Scala
object CardinalDirection{
Expand All @@ -229,7 +227,7 @@ def West: com.snowflake.snowpark.Column =
```
Notice the use of [lit](https://docs.snowflake.com/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/functions$.html#lit(literal:Any):com.snowflake.snowpark.Column) to indicate that we expect a literal string value for each constructor.

This class is used where the value of the possible constructors is used. For example for the definition of `northDirections` above the comparison with `North` is generated as follows:
This object is used where the value of the possible constructors is used. For example for the definition of `northDirections` above the comparison with `North` is generated as follows:

```Scala
def northDirections(
Expand All @@ -245,12 +243,12 @@ This class is used where the value of the possible constructors is used. For exa

#### 2. Convention for custom types with data

In the case that a custom type has constructors with parameters this backend assumes that values of this type are stored in a [OBJECT column](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#object) .
In the case that a custom type has constructors with parameters this backend assumes that values of this type are stored in an [OBJECT column](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#object) .

The encoding of column is defined as follows:

- Values are encoded as a `JSON` object
- A special property of this object called `__tag` is used to determine which variant is used in the current value
- A special property of this object called `"__tag"` is used to determine which variant is used in the current value
- All the parameters in order are stored in properties called `field0`, `field1`, `field2` ... `fieldN`

Given the following custom type definition:
Expand All @@ -268,7 +266,7 @@ type alias TasksEstimations =
}
```

The data for `TaskEstimations estimation` is expected to be stored in a table using an `OBJECT` column:
The data for `TaskEstimations` is expected to be stored in a table using an `OBJECT` column:

| TASKID | ESTIMATION |
|--------|-------------------------------------------------------------------|
Expand Down Expand Up @@ -320,12 +318,21 @@ This code is generated as:
((tasksColumns.estimation("field0")) * (com.snowflake.snowpark.functions.lit(60))) + (tasksColumns.estimation("field1"))
).as("seconds"))
}

```

#### 3. Convention for `Maybe` types

The [Maybe a](https://package.elm-lang.org/packages/elm/core/latest/Maybe) type is assumed to be a nullable database value. This means that the data is expected to be stored as follows:

| Elm value | Value stored in the Database |
|--------------|-------------------------------|
| `Just 10` | `10` |
| `Nothing` | `NULL` |


### Function definition conventions

These conventions are based on the input and return types of a function. There are strategies: using DataFrame expressions or using Scala expressions. The following sections have more details.
These conventions are based on the input and return types of a function. Two strategies are used: using DataFrame expressions or using Scala expressions. The following sections have more details.

#### Code generation using DataFrame expressions manipulation

Expand Down Expand Up @@ -519,7 +526,33 @@ In this case code for `avgSalaries` is going to perform a Scala division operati
}
```

Code generation for this strategy is meant to be used for code that manipulates the result of performing DataFrame operations . At this moment its coverage is very limited.
Code generation for this strategy is meant to be used for code that manipulates the result of performing DataFrame operations. At this moment its coverage is very limited.

### Creation of empty DataFrames

Creating an empty list of table-like records is interpreted as creating an empty DataFrame. For example:

```elm
createDataForTest : List Employee -> DataFromCompany
createDataForTest emps =
{ employees = emps , departments = [] }
```

In this case the code is generated as follows:

```scala
def createDataForTest(
emps: com.snowflake.snowpark.DataFrame
)(
implicit sfSession: com.snowflake.snowpark.Session
): mymodel.Basic.DataFromCompany = {
val empsColumns: mymodel.Basic.Employee = new mymodel.Basic.EmployeeWrapper(emps)

mymodel.Basic.DataFromCompany(
employees = emps,
departments = mymodel.Basic.Department.createEmptyDataFrame(sfSession)
)
}
```

Notice that this is the main reason for having an `implicit` with the [Session object](https://docs.snowflake.com/en/developer-guide/snowpark/reference/scala/com/snowflake/snowpark/Session.html).
Loading

0 comments on commit f98f826

Please sign in to comment.