-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: First steps in defining a common interface for Tables/Frames #16
Conversation
Why not call this |
That was also my first idea but I realized that @davidagold had developed AbstractTable in https://github.com/davidagold/AbstractTables.jl with a dependency on |
@@ -14,7 +14,7 @@ fields with possibly heterogeneous types. One of the primary goals of | |||
`StatsModels` is to make it simpler to transform tabular data into matrix format | |||
suitable for statistical modeling. | |||
|
|||
At the moment, "tabular data" means an `AbstractDataTable`. Ultimately, the | |||
At the moment, "tabular data" means an `Table`. Ultimately, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a Table
, not an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...or change Table
to AbstractTable
. :-)
We would need to ask David, but I don't think we want to provide different table abstractions. And there's only one use of |
So, this is David, who has permanently locked himself out of his davidagold GH account. Alas. I think it should be fine to remove those dependencies from AbstractTables. I'd be hesitant, however, to start changing all the method signatures in this PR from |
The idea here is to remove |
Seems like traits would be useful here. |
@andreasnoack, where is the current TableBase code? I'm starting to actively compile the various "abstracttable" attempts into a single package (taking parts of DataStreams, the AbstractDataFrame code, David's AbstractTables as well as his Relations.jl package). For everyone else, I think the biggest thing that would help as we try to converge on a single "AbstractTable" interface is the strongest set of use-cases for an AbstractTable. I'm going to try and dig more into the code here in StatsModels to figure out what exactly it "needs" from an AbstractTable interface, but it'd be great to have other strong use-cases. For me, I'm coming from the context of DataStreams, which requires functionality like getting the "schema" of a table and being able to get/set individual cells, as well as entire columns at a time. I think the ideal goal is to come up with:
In terms of a starter list of use-cases to consider, I can think of:
|
I think in theory the required interface for StatsModels is quite limited:
The issue is that the current code makes stronger assumptions about being able to access columns as vectors. I think @kleinschmidt had plans to change this, but it will take some work. So in the short term I think we should just write an abstraction which specifies that the abstract data table (or whatever we call that particular interface/trait) can be indexed with variable names, as that it returns vectors. We can always remove that requirement later, or provide inefficient fallbacks which would create such vectors for data tables which use a different storage. See also #14. |
Superseded by #71 |
There is almost nothing here yet. Just wanted to open the PR such that we have a place to discuss details of #14. This will require a new package with the common interface. So far I've named it
TableBase
. BothDataFrames
andDataTables
should then become subtypes of this newTable
and some subset of the functions in the two packages should be defined as erroring methods inTableBase
and overloaded inDataFrames
andDataTables
.