Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: plot with kind=scatter fails when checking if an array is in the DataFrame #8852

Closed
TomAugspurger opened this issue Nov 18, 2014 · 16 comments · Fixed by #8929
Closed

BUG: plot with kind=scatter fails when checking if an array is in the DataFrame #8852

TomAugspurger opened this issue Nov 18, 2014 · 16 comments · Fixed by #8929
Labels
Milestone

Comments

@TomAugspurger
Copy link
Contributor

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
c = np.array([1, 0])
df.plot(kind='scatter', x='A', y='B', c=c, cmap='spring')

fails with a TypeError.

we check elif c in self.data.columns which tries to hash the array.

(I know I can just add the array as a column to the DataFrame)

@TomAugspurger TomAugspurger added the Visualization plotting label Nov 18, 2014
@TomAugspurger TomAugspurger added this to the 0.16.0 milestone Nov 18, 2014
@jorisvandenbossche
Copy link
Member

How should this supposed to work in this case? The length of the array is not equal to the number of elements in the dataframe?

@shoyer
Copy link
Member

shoyer commented Nov 19, 2014

@jorisvandenbossche c = np.array([1, 0]) also raises.

@jorisvandenbossche
Copy link
Member

@shoyer yes, OK, I just found it a bit strange example (as this would have failed with matplotlib also)

I suppose this is related to #7780, and apparantly this case wasn't catched by the tests? So it seems a regression from 0.14

@shoyer
Copy link
Member

shoyer commented Nov 19, 2014

@jorisvandenbossche Yes, this is a regression (and I introduced it in #7780). If nobody else gets to it first I can write a fix (should be pretty easy).

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche sorry about the bad example! I meant to have c=[1, 0]. My excuse is that I'm on a Windows PC right now and copy-pasting from the interpreter is a pain.

@jorisvandenbossche
Copy link
Member

@TomAugspurger right-clicking with the mouse?

@TomAugspurger
Copy link
Contributor Author

Wow, that is not at all obvious that that puts it on your clipboard :) Thanks.

@jorisvandenbossche
Copy link
Member

Yep, indeed, it also took me a while to know this.. But now rather essential when I am working on Windows

@jorisvandenbossche
Copy link
Member

@jreback What is actually the best way to check if something is allowed to be a column name?

elif c in self.data.columns gives an error with np.arrays eg, but how to check for this?

@TomAugspurger
Copy link
Contributor Author

I'm having trouble coming up with anything better than checking if it's an iterable.

There may be something in core.common

@jorisvandenbossche
Copy link
Member

@TomAugspurger there are some people working here at the sprint on this issue

@TomAugspurger
Copy link
Contributor Author

Great! I'll be offline for the next hour or so but after that I'll stick around the computer. Have them post if there are any questions.

@jorisvandenbossche
Copy link
Member

problem is that eg tuples are also iterables, but allowed as column names

@TomAugspurger
Copy link
Contributor Author

Mmm. That's true.

So let's be explicit in the documentation and implementation. With

In [81]: df = pd.DataFrame({(1, 2): [1, 5], 3: [0, 1], 4: [1, 2]})

In [82]: df
Out[82]: 
   (1, 2)  3  4
0       1  0  1
1       5  1  2

A df.plot(kind='scatter', x=3, y=4, c=(1, 2)) should get the color (or size if s=(1, 2)) from the column (1, 2).

@TomAugspurger
Copy link
Contributor Author

As far as implementing that, the best may be a try / except?

@jreback
Copy link
Contributor

jreback commented Nov 29, 2014

@jorisvandenbossche

if col in self.columns

will work but put in a try except
also if the columns are a MultiIndex its a bit tricky

aevri added a commit to aevri/pandas that referenced this issue Dec 2, 2014
Ensure that we can pass an np.array as 'c' straight through to
matplotlib, this functionality was accidentally removed previously.

Add tests.

Closes pandas-dev#8852
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants