Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kotlin Notebook crash when creating DataFrame from List<Map> #710

Closed
cmelchior opened this issue May 29, 2024 · 5 comments
Closed

Kotlin Notebook crash when creating DataFrame from List<Map> #710

cmelchior opened this issue May 29, 2024 · 5 comments
Labels
bug Something isn't working not reproducible We cannot reproduce this issue atm, needs extra attention
Milestone

Comments

@cmelchior
Copy link
Contributor

This code:

%use dataframe
val df = (1..1000).map {
  mapOf(
    "id" to it,
    "value" to "value$it"
  )
}.toDataFrame()

Crashes Kotlin Notebooks with:

The problem is found in one of the loaded libraries: check library converters (fields callbacks)
Error compiling code:
@DataSchema
interface _DataFrameType { }

val ColumnsContainer<_DataFrameType>.entries: DataColumn<kotlin.collections.Set<kotlin.collections.Map.Entry<K, V>>> @JvmName("_DataFrameType_entries") get() = this["entries"] as DataColumn<kotlin.collections.Set<kotlin.collections.Map.Entry<K, V>>>
val DataRow<_DataFrameType>.entries: kotlin.collections.Set<kotlin.collections.Map.Entry<K, V>> @JvmName("_DataFrameType_entries") get() = this["entries"] as kotlin.collections.Set<kotlin.collections.Map.Entry<K, V>>
val ColumnsContainer<_DataFrameType>.keys: DataColumn<kotlin.collections.Set<K>> @JvmName("_DataFrameType_keys") get() = this["keys"] as DataColumn<kotlin.collections.Set<K>>
val DataRow<_DataFrameType>.keys: kotlin.collections.Set<K> @JvmName("_DataFrameType_keys") get() = this["keys"] as kotlin.collections.Set<K>
val ColumnsContainer<_DataFrameType>.size: DataColumn<Int> @JvmName("_DataFrameType_size") get() = this["size"] as DataColumn<Int>
val DataRow<_DataFrameType>.size: Int @JvmName("_DataFrameType_size") get() = this["size"] as Int
val ColumnsContainer<_DataFrameType>.values: DataColumn<kotlin.collections.Collection<V>> @JvmName("_DataFrameType_values") get() = this["values"] as DataColumn<kotlin.collections.Collection<V>>
val DataRow<_DataFrameType>.values: kotlin.collections.Collection<V> @JvmName("_DataFrameType_values") get() = this["values"] as kotlin.collections.Collection<V>
(df as org.jetbrains.kotlinx.dataframe.DataFrame<*>).cast<_DataFrameType>()

Errors:
Line_6.jupyter.kts (4:110 - 111) Unresolved reference: K
Line_6.jupyter.kts (4:113 - 114) Unresolved reference: V
Line_6.jupyter.kts (4:177 - 250) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Set<Map.Entry<[Error type: Unresolved type for K], [Error type: Unresolved type for V]>>>
Line_6.jupyter.kts (4:243 - 244) Unresolved reference: K
Line_6.jupyter.kts (4:246 - 247) Unresolved reference: V
Line_6.jupyter.kts (5:90 - 91) Unresolved reference: K
Line_6.jupyter.kts (5:93 - 94) Unresolved reference: V
Line_6.jupyter.kts (5:156 - 217) Unchecked cast: Any? to Set<Map.Entry<[Error type: Unresolved type for K], [Error type: Unresolved type for V]>>
Line_6.jupyter.kts (5:211 - 212) Unresolved reference: K
Line_6.jupyter.kts (5:214 - 215) Unresolved reference: V
Line_6.jupyter.kts (6:78 - 79) Unresolved reference: K
Line_6.jupyter.kts (6:135 - 175) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Set<[Error type: Unresolved type for K]>>
Line_6.jupyter.kts (6:172 - 173) Unresolved reference: K
Line_6.jupyter.kts (7:58 - 59) Unresolved reference: K
Line_6.jupyter.kts (7:140 - 141) Unresolved reference: K
Line_6.jupyter.kts (8:113 - 131) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Int>
Line_6.jupyter.kts (10:87 - 88) Unresolved reference: V
Line_6.jupyter.kts (10:148 - 195) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Collection<[Error type: Unresolved type for V]>>
Line_6.jupyter.kts (10:192 - 193) Unresolved reference: V
Line_6.jupyter.kts (11:67 - 68) Unresolved reference: V
Line_6.jupyter.kts (11:160 - 161) Unresolved reference: V

org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library converters (fields callbacks)
	at org.jetbrains.kotlinx.jupyter.exceptions.CompositeReplExceptionKt.throwLibraryException(CompositeReplException.kt:52)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:68)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:98)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$1.invoke(CellExecutorImpl.kt:97)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll(Logging.kt:77)
	at org.jetbrains.kotlinx.jupyter.config.LoggingKt.catchAll$default(Logging.kt:71)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:97)
	at org.jetbrains.kotlinx.jupyter.repl.execution.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:12)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.evaluateUserCode(ReplForJupyterImpl.kt:581)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.access$evaluateUserCode(ReplForJupyterImpl.kt:136)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl$evalEx$1.invoke(ReplForJupyterImpl.kt:439)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl$evalEx$1.invoke(ReplForJupyterImpl.kt:436)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.withEvalContext(ReplForJupyterImpl.kt:417)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.evalEx(ReplForJupyterImpl.kt:436)
	at org.jetbrains.kotlinx.jupyter.messaging.IdeCompatibleMessageRequestProcessor$processExecuteRequest$1$response$1$1.invoke(IdeCompatibleMessageRequestProcessor.kt:140)
	at org.jetbrains.kotlinx.jupyter.messaging.IdeCompatibleMessageRequestProcessor$processExecuteRequest$1$response$1$1.invoke(IdeCompatibleMessageRequestProcessor.kt:139)
	at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$Task.execute(JupyterExecutorImpl.kt:42)
	at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$executorThread$1.invoke(JupyterExecutorImpl.kt:82)
	at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$executorThread$1.invoke(JupyterExecutorImpl.kt:80)
	at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)
Caused by: org.jetbrains.kotlinx.jupyter.exceptions.ReplCompilerException: Line_6.jupyter.kts (4:110 - 111) Unresolved reference: K
Line_6.jupyter.kts (4:113 - 114) Unresolved reference: V
Line_6.jupyter.kts (4:177 - 250) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Set<Map.Entry<[Error type: Unresolved type for K], [Error type: Unresolved type for V]>>>
Line_6.jupyter.kts (4:243 - 244) Unresolved reference: K
Line_6.jupyter.kts (4:246 - 247) Unresolved reference: V
Line_6.jupyter.kts (5:90 - 91) Unresolved reference: K
Line_6.jupyter.kts (5:93 - 94) Unresolved reference: V
Line_6.jupyter.kts (5:156 - 217) Unchecked cast: Any? to Set<Map.Entry<[Error type: Unresolved type for K], [Error type: Unresolved type for V]>>
Line_6.jupyter.kts (5:211 - 212) Unresolved reference: K
Line_6.jupyter.kts (5:214 - 215) Unresolved reference: V
Line_6.jupyter.kts (6:78 - 79) Unresolved reference: K
Line_6.jupyter.kts (6:135 - 175) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Set<[Error type: Unresolved type for K]>>
Line_6.jupyter.kts (6:172 - 173) Unresolved reference: K
Line_6.jupyter.kts (7:58 - 59) Unresolved reference: K
Line_6.jupyter.kts (7:140 - 141) Unresolved reference: K
Line_6.jupyter.kts (8:113 - 131) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Int>
Line_6.jupyter.kts (10:87 - 88) Unresolved reference: V
Line_6.jupyter.kts (10:148 - 195) Unchecked cast: AnyCol /* = DataColumn<*> */ to DataColumn<Collection<[Error type: Unresolved type for V]>>
Line_6.jupyter.kts (10:192 - 193) Unresolved reference: V
Line_6.jupyter.kts (11:67 - 68) Unresolved reference: V
Line_6.jupyter.kts (11:160 - 161) Unresolved reference: V
	at org.jetbrains.kotlinx.jupyter.repl.impl.JupyterCompilerImpl.compileSync(JupyterCompilerImpl.kt:201)
	at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:120)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:79)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:77)
	at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.withHost(ReplForJupyterImpl.kt:758)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:77)
	at org.jetbrains.kotlinx.jupyter.repl.execution.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:12)
	at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$ExecutionContext.execute(CellExecutorImpl.kt:239)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.execute(Integration.kt:77)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.execute(Integration.kt:90)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.updateAnyFrameVariable(Integration.kt:125)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration.access$updateAnyFrameVariable(Integration.kt:67)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:289)
	at org.jetbrains.kotlinx.dataframe.jupyter.Integration$onLoaded$4.invoke(Integration.kt:284)
	at org.jetbrains.kotlinx.jupyter.api.libraries.FieldHandlerFactory.createUpdateExecution$lambda$0(FieldHandlerFactory.kt:49)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.executeEx(FieldsProcessorImpl.kt:95)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImplKt.access$executeEx(FieldsProcessorImpl.kt:1)
	at org.jetbrains.kotlinx.jupyter.codegen.FieldsProcessorImpl.process(FieldsProcessorImpl.kt:47)
	... 18 more

``
@zaleslaw
Copy link
Collaborator

Working solution

val df = (1..1000).toDataFrame {
    "id" from { it }
    "value" from {"value$it" }
}

@zaleslaw
Copy link
Collaborator

Need to check - does it have the same behaviour in the Gradle projects

@Jolanrensen Jolanrensen changed the title Kotlin Notebook crash when creating DataFrame from Map Kotlin Notebook crash when creating DataFrame from Maps May 29, 2024
@Jolanrensen
Copy link
Collaborator

Jolanrensen commented May 29, 2024

It works fine in gradle projects, so we'll need to check what weird type inference is going on in the Jupyter integration...

It creates a DataFrame like:

entrieskeyssizevalues
[id=1, value=value1][id, value]2[1, value1]
[id=2, value=value2][id, value]2[2, value2]
[id=3, value=value3][id, value]2[3, value3]
[id=4, value=value4][id, value]2[4, value4]
[id=5, value=value5][id, value]2[5, value5]
[id=6, value=value6][id, value]2[6, value6]
[id=7, value=value7][id, value]2[7, value7]
[id=8, value=value8][id, value]2[8, value8]
[id=9, value=value9][id, value]2[9, value9]
[id=10, value=value10][id, value]2[10, value10]

However, was this the solution you were looking for?

I suspect you want to create a dataframe with a column id and a column value right? Then indeed @zaleslaw 's solution works great.

Constructing a DataFrame is usually done by column and not by row, as that's how they're stored in memory. That's why all DataFrame creation methods are built the way they are. If you have a List<Map<String, T>> and you want each map to be like a row, you could make something like this:

val df = (1..1000).map {
    mapOf("id" to it, "value" to "value$it")
}.toDataFrame {
    source.map { it["id"] }.toColumn() into "id"
    source.map { it["value"] }.toColumn() into "value"
}

If you really have to construct a DataFrame row by row, in theory you could but it would essentially entail making many small DFs and concatenating them, like:

val df = (1..1000).map {
    mapOf("id" to it, "value" to "value$it")
}.map {
    dataFrameOf(header = it.keys, values = it.values)
}.concat()

@Jolanrensen Jolanrensen added the bug Something isn't working label May 29, 2024
@Jolanrensen Jolanrensen changed the title Kotlin Notebook crash when creating DataFrame from Maps Kotlin Notebook crash when creating DataFrame from List<Map> May 29, 2024
@cmelchior
Copy link
Contributor Author

Ah yeah. Good explanation. I just copied some code from ChatGPT as part of testing something else when I saw the crash. So I didn't realize that it did things slightly wrong.

@zaleslaw zaleslaw added this to the 0.14.0 milestone Jul 19, 2024
@zaleslaw zaleslaw added the not reproducible We cannot reproduce this issue atm, needs extra attention label Jul 19, 2024
@zaleslaw
Copy link
Collaborator

@cmelchior could we close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working not reproducible We cannot reproduce this issue atm, needs extra attention
Projects
None yet
Development

No branches or pull requests

3 participants