Unresolved reference when using Properties to access #703

murfel · 2024-05-21T10:51:00Z

I'm trying to use Properties to access a column, and it throws "Unresolved reference: title". Here's my demo repo, branch demo.

I'm using Dataframe version "0.13.1", as suggested in the onboarding documentation, and movies.csv dataset.

package org.example

import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.api.column
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.io.read

fun main() {
    val df = DataFrame.read("movies.csv")
    println(df.columnNames())  // [movieId, title, genres]

    // Properties - doesn't work "Unresolved reference: title"
//    df.title

    // Accessors - OK
    val title by column<String>()
    df[title].print()

    // Strings - OK
    df["title"].print()
}

(Following the getColumns doc page.)

It is extremely bizarre as the documentation shows the Properties tabs on almost every single page and yet doesn't mention if I need any extra imports or something else for it to work.

The text was updated successfully, but these errors were encountered:

murfel · 2024-05-21T10:55:20Z

Also a bit confusing that there's no tab which uses Indexes.

I do want to know that you can also select a column using an index, df.getColumn(1).

murfel · 2024-05-21T10:57:54Z

Also here Properties and Accessors are completely identical code, except the Accessors tab has an extra variable definition. Surely the Properties style wouldn't work.

// Properties
df.getColumn { age }

// Accessors
val age by column<Int>()

df.getColumn { age }

https://kotlin.github.io/dataframe/getcolumn.html

Jolanrensen · 2024-05-21T11:13:10Z

Hi there!
Thanks for reaching out. You indeed make a good point that our documentation is a bit confusing in this regard. The generated properties are available out-of-the-box in notebooks. After a cell is executed, the data inside dataframe instances is analysed and extension properties like df.title will work. This is why it's featured so prominently in the documentation.

In Gradle projects, it requires a bit more configuration; We namely need to tell the compiler how and where to generate these extension properties.
As seen in that part of the documentation, in Gradle projects there are 3 ways. You can either:

Create an interface/data class annotated with @DataSchema. Then, after recompiling, DataFrames cast to that interface/class will have the extension properties available to them.
Add a reference to (a sample of) your data to a @file:ImportDataSchema(..) statement at the top of your file. This will generate @DataSchema interfaces and extension properties for you.
Add a dataframes { schema {} } task to your gradle file. This works the same as @file:ImportDataSchema.

As for indexing, you're right, it's mentioned only here as far as I can see. That said, our documentation website is far from all-inclusive and needs a lot of work still. Discovering the API and possibilities from the IDE's autocomplete is the best way to explore the functionalities of DataFrame :).

Hopefully, this answered some of your questions/concerns. Feel free to reach out if you have more questions!

murfel · 2024-05-21T11:55:43Z

Re indexing, the link you provided indexes the row, not the column.

Otherwise thank you for clarification!

I haven't even looked into DataSchemas documentation, since I never had any references towards it. It could be useful to at least link to it from Getting Started on Gradle, and ideally in each of the "Properties" tab, too.

However, I understand that the documentation is not the priority yet, now. Something like a disclaimer "Warning: documentation is in beta mode, missing information and discrepancies are possible" would be nice, so that users don't expect it to be perfect, double check things and don't get upset when something doesn't work.

murfel · 2024-05-21T12:15:17Z

In general, do you need feedback at this stage?

A few things I noticed which are different from pandas -

I cannot load a CSV without a header - I either get the first row as a header or I need to provide my header for each column.
select doesn't allow to select the same column twice because of the name conflict

My use case is I have a CSV with 20 columns, out of which I only need 4, and two of them need to be duplicated. So I want to load, select required columns, and only then provide their headers.

val df = DataFrame.readCSV("filename.csv", header=???).select { cols(2, 4, 5, 17) } // cannot select { cols(2, 2) }

I understand that workarounds exist - create a fake header corresponding to indexes, header=(0..19).map { it.toString() } and insert the extra column afterwards, but it would be nice to have it out of the box if this is in the plans.

(I'm still unsure how to rename the header, though, apart from creating a new DataFrame.)

murfel · 2024-05-21T12:47:04Z

Also this val title by column<String>() doesn't seem to pull the data.

This sample prints Amazing org.jetbrains.kotlinx.dataframe.impl.columns.ColumnAccessorImpl@70f02c32 instead of the actual title.

    val df = DataFrame.read("movies.csv")
    println(df.columnNames())  // [movieId, title, genres]
    val title by column<String>()
    val newDf = df.add("amazingTitle") { "Amazing $title" }
    println(newDf[0]["amazingTitle"])

koperagen · 2024-05-21T12:50:43Z

Feedback is appreciated :)
Try this
df.select { col(2) named "col1" and col(2) named "col2" /* and so on */ }
In general every operation creates a new dataframe, but it's ok because data is reused whenever is possible

To pull the data by column accessor in DataRow context you can use either invoke or get functions
val newDf = df.add("amazingTitle") { "Amazing ${title()}" }

murfel · 2024-05-21T12:53:57Z

Thanks so much!

Aha, and then I assume I cannot rename the header because Dataframe is sort of unmutable in the persistent Kotlin style, but I can create a new df with the new titles.

koperagen · 2024-05-21T12:55:41Z

Exactly this, yes

Jolanrensen added the question Further information is requested label May 21, 2024

Jolanrensen added the documentation Improvements or additions to documentation (not KDocs) label May 21, 2024

zaleslaw added this to the 0.14.0 milestone Jul 19, 2024

zaleslaw self-assigned this Jul 19, 2024

zaleslaw modified the milestones: 0.14.0, 0.15.0 Sep 4, 2024

zaleslaw removed their assignment Oct 1, 2024

zaleslaw modified the milestones: 0.15.0, 0.16.0 Oct 1, 2024

zaleslaw self-assigned this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unresolved reference when using Properties to access #703

Unresolved reference when using Properties to access #703

murfel commented May 21, 2024

murfel commented May 21, 2024

murfel commented May 21, 2024

Jolanrensen commented May 21, 2024

murfel commented May 21, 2024

murfel commented May 21, 2024 •

edited

Loading

murfel commented May 21, 2024

koperagen commented May 21, 2024

murfel commented May 21, 2024

koperagen commented May 21, 2024

Unresolved reference when using Properties to access #703

Unresolved reference when using Properties to access #703

Comments

murfel commented May 21, 2024

murfel commented May 21, 2024

murfel commented May 21, 2024

Jolanrensen commented May 21, 2024

murfel commented May 21, 2024

murfel commented May 21, 2024 • edited Loading

murfel commented May 21, 2024

koperagen commented May 21, 2024

murfel commented May 21, 2024

koperagen commented May 21, 2024

murfel commented May 21, 2024 •

edited

Loading