Kotlin DataFrame compiler plugin #704

koperagen · 2024-05-22T00:39:24Z

Place for discussion and questions about Kotlin DataFrame compiler plugin

Idea behind it is to make such code compile, provide coding assistance in project files and later in Kotlin Notebooks - on top of already existing code generation in-between notebook cells

@DataSchema
data class WikiData(val name: String, val paradigms: List<String>)

fun main() {
    val df = dataFrameOf(
        WikiData("Kotlin", listOf("object-oriented", "functional", "imperative")),
        WikiData("Haskell", listOf("Purely functional")),
        WikiData("C", listOf("imperative")),
    )
    val df1 = df.add("size") { 
        paradigms.size // `paradigms` is generated based on WikiData class structure
    }
    // `size` property is generated based on `add` argument
    df1.size.print()
}

Demo project that you can clone and run
https://github.com/koperagen/df-plugin-demo

Issue that describes required compiler API and provides some information about use case
https://youtrack.jetbrains.com/issue/KT-65859

The text was updated successfully, but these errors were encountered:

Jolanrensen · 2024-06-19T10:42:10Z

We might need to do some additional research with regard to the maintainability of the implementation, mainly the cases where we have to write the same DataFrame logic in two places.

Doing operations on DataFrames with the plugin happens in two places:

The library itself
- This works mostly on runtime
- Is based on both the structure, types, and names of the DataFrame, but also on its data
The compiler plugin
- This works during code analysis
- Is purely based on structure, types and names of the DataFrame
- It could carry some information under-the-hood, for example:
  - @Import json data
  - The state of a DSL scope, such as groupBy {}
  - df.transpose() -> The df will have a keys: String column containing the previous column names
  - Etc.

I believe we should try, wherever we can, to share the logic between these two scopes. This can only be done in places where the logic is exclusively dependent on the structure, types, or names of the DataFrame. Sharing the logic will help us (and future contributors) to a) fix bugs more easily and b) keep ensuring consistency between the plugin and the library.

I see 3 options for us:

Keep the logic separate (Such as with join generating names in two places: plugin, library.)
- +This keeps the plugin an add-on to the library without having to modify the library itself
- +Types work different in the compiler plugin, this will allow us to work in two different worlds without difficult bridges
- -Maintainability and ensuring consistency is difficult
Create a new abstract tree-structure as supertype of both DataFrame and the PluginDataFrameSchema (Such as with insert, called also from the plugin)
- +Allows us to share logic regarding structure/names
- -Type sharing is difficult because there's no easy ConeKotlinType <-> KType conversion
- -We'd have to convert each supported API function to run on this generic tree structure instead.
Share logic by running the original function on an "empty" DataFrame (Such as drafted for renameToCamelCase())
- +Allows us to share logic regarding structure/names
- +We can call the original function, as seen in the draft, no rewrites of the library needed
- -We would still need to write custom logic for more difficult operations or large DSLs
- -Sharing types is still difficult. We'd need to either ignore types and store the TypeApproximation inside the DF or try to find a way to create an empty dataframe wíth KTypes and figure out a way to do ConeKotlinType <-> KType

Feel free to edit this comment to add more pros and cons to each option or to add more options.

These are just my thoughts for now :) I'm curious to see what you think!

koperagen self-assigned this May 22, 2024

Jolanrensen mentioned this issue Jun 3, 2024

☂ type: KType in DataColumnImpl mismatches actual values sometimes #713

Closed

koperagen mentioned this issue Jun 10, 2024

Compiler plugin #729

Merged

Jolanrensen added this to the Backlog milestone Jun 19, 2024

Jolanrensen added enhancement New feature or request research This requires a deeper dive to gather a better understanding labels Jun 19, 2024

Jolanrensen added the Compiler plugin Anything related to the DataFrame Compiler Plugin label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kotlin DataFrame compiler plugin #704

Kotlin DataFrame compiler plugin #704

koperagen commented May 22, 2024 •

edited

Loading

Jolanrensen commented Jun 19, 2024

Kotlin DataFrame compiler plugin #704

Kotlin DataFrame compiler plugin #704

Comments

koperagen commented May 22, 2024 • edited Loading

Jolanrensen commented Jun 19, 2024

koperagen commented May 22, 2024 •

edited

Loading