-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EachRow(df)[i] should be a vector, not a DataFrame #375
Comments
How else would this be defined? I'm personally inclined to remove all of the iteration constructs for DataFrames. |
Why? It's definitely important to be able to get a row of a DataFrame, in any case... is this possible right now? |
Right now the iterator returns one-row DataFrames. All of the standard indexing rules for DataFrames apply to that DataFrame. Does that make possible what you're trying to do? |
It's not causing problems for me right now, but I don't think that's what the iterator should return. What I meant was: if you removed the iteration constructs, how would you get rows from a dataframe? edit: it makes anything possible, just kind of ugly. It means that you have to do something like
to access elements in a row. I'm curious about how pandas handles this now. |
Just to be sure I understand: you think that a row of a DataFrame should not be a DataFrame, but an You can always do iteration with explicit indexing. The virtue of the iterator seems restricted to composition with functions over iterators. |
I would argue that a tuple would be better for this than an Any array. Of course, you can't index by name into a tuple or an array, so that's a downside of either of these alternatives. |
I think the row of a DataFrame should have the same type as a column of a DataFrame -- a Does that make sense? |
Ah. I think Stefan's point about indexing is the obvious reason why one might like to have a DataFrame returned, rather than a DataArray. That said, I can see reasonable arguments for many approaches. |
No, that doesn't really make sense: DataArrays are homogeneous whereas DataFrames are a heterogeneous bundle of homogeneous columns. |
Ideally I'd like to have a |
Hmmm. That seems like a non-trivial change to me. For me, the relevant equivalence is that a 1-row DataFrame, when indexed for its unique row, should return the DataFrame, not a separate entity. For what it's worth, our approach is like a sane version of R's approach. (R's approach is nutty because a single row of a 1-column DataFrame is a vector, but a multi-column DataFrame gives a DataFrame for a single row.) |
That's makes sense. It seems like making |
I'm also missing the ability to access rows of a data frame and treat them Would an implementation of "rows" using (I actually implemented a separate row type at one point, but it turned out On Fri, Oct 11, 2013 at 1:33 PM, Julia Evans notifications@git.luolix.topwrote:
|
I'm confused by this. Doesn't returning a row as a single-value DataFrame do that? Or are you talking about having something like a DataRow type? That could be represented efficiently as a reference to the data frame plus a row index. Although that seems similar to what a SubDataFrame would be. |
On Fri, Oct 11, 2013 at 10:26 PM, Stefan Karpinski <notifications@github.com
My actual issue is partly aesthetic and partly semantic. Aesthetically, x[1, As an example, I have a data processing pipeline which makes extensive use lanes['bam_dir'] = Taking advantage of the fact that a row is a dictionary: realigned_bams['target_intervals'] = I've wanted to port this code to Julia for a while--there are (non-Pandas) I realize this probably isn't a common use case for DataFrames (and I use Kevin |
If |
That's a very interesting idea--thanks Stefan! On Sat, Oct 12, 2013 at 1:55 PM, Stefan Karpinski
|
A related issue is that you can modify EachRow(df)[i] at will without any error, but the changes have absolutely no effect on the original DataFrame. Rather than a DataFrame, shouldn't EachRow(df)[i] be a SubDataFrame so that iterators can be used to modify data? |
Right now
EachRow(df)[i]
is defined like as a slice:getindex(itr::DFRowIterator, i::Any) = itr.df[i, :]
This means that if you set
row = EachRow(df)[i]
then
row[1]
is an array, not an element. Not sure how to handle this.The text was updated successfully, but these errors were encountered: