Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement frequency table function a la table in R #170

Closed
wesm opened this issue Sep 25, 2011 · 7 comments
Closed

Implement frequency table function a la table in R #170

wesm opened this issue Sep 25, 2011 · 7 comments
Milestone

Comments

@wesm
Copy link
Member

wesm commented Sep 25, 2011

No description provided.

@wesm
Copy link
Member Author

wesm commented Sep 25, 2011

A cut function would also be nice

@gregglind
Copy link
Contributor

What is the right way of doing a simple counts crosstab with marginals? For bonus points, all vars vs all vars.?

@wesm
Copy link
Member Author

wesm commented Jan 13, 2012

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call

@gregglind
Copy link
Contributor

So, supposing I have columns 'a','b', what is the simplest call to get
the crosstab table? For some reason, pivot_table is tough for me!

On Fri, Jan 13, 2012 at 3:33 PM, Wes McKinney
reply@reply.github.com
wrote:

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call


Reply to this email directly or view it on GitHub:
https://github.com/wesm/pandas/issues/170#issuecomment-3486638

@wesm
Copy link
Member Author

wesm commented Jan 13, 2012

example:


In [10]: wp
Out[10]: 
    breaks wool tension
1   26     A    L      
2   30     A    L      
3   54     A    L      
4   25     A    L      
5   70     A    L      
6   52     A    L      
7   51     A    L      
8   26     A    L      
9   67     A    L      
10  18     A    M      
11  21     A    M      
12  29     A    M      
13  17     A    M      
14  12     A    M      
15  18     A    M      
16  35     A    M      
17  30     A    M      
18  36     A    M      
19  36     A    H      
20  21     A    H      
21  24     A    H      
22  18     A    H      
23  10     A    H      
24  43     A    H      
25  28     A    H      
26  15     A    H      
27  26     A    H      
28  27     B    L      
29  14     B    L      
30  29     B    L      
31  19     B    L      
32  29     B    L      
33  31     B    L      
34  41     B    L      
35  20     B    L      
36  44     B    L      
37  42     B    M      
38  26     B    M      
39  19     B    M      
40  16     B    M      
41  39     B    M      
42  28     B    M      
43  21     B    M      
44  39     B    M      
45  29     B    M      
46  20     B    H      
47  21     B    H      
48  24     B    H      
49  17     B    H      
50  13     B    H      
51  15     B    H      
52  15     B    H      
53  16     B    H      
54  28     B    H      

In [11]: wp.pivot_table('breaks', rows='wool', cols='tension', aggfunc='count')
Out[11]: 
tension  H  L  M
wool            
A        9  9  9
B        9  9  9

I'll have a look at R's table function and add a simple crosstab function or something

@wesm
Copy link
Member Author

wesm commented Jan 14, 2012

Just wrote a blog post here: http://wesmckinney.com/blog/?p=443. I don't think it's necessary to add any more functions

@wesm
Copy link
Member Author

wesm commented Jan 16, 2012

OK Gregg, I'll bite:

In [7]: a
Out[7]: 
array([1, 2, 6, 6, 4, 0, 2, 0, 4, 3, 5, 1, 1, 2, 6, 3, 4, 4, 5, 4, 4, 5, 5,
       2, 1, 1, 6, 3, 5, 2, 5, 6, 2, 2, 5, 1, 1, 3, 1, 4, 1, 6, 0, 1, 3, 3,
       1, 4, 2, 1, 0, 5, 0, 5, 1, 1, 5, 0, 2, 4, 2, 4, 2, 2, 2, 6, 2, 0, 1,
       4, 6, 1, 4, 0, 5, 5, 3, 5, 5, 6, 0, 6, 6, 5, 0, 2, 4, 2, 2, 0, 5, 0,
       5, 6, 5, 6, 4, 5, 0, 4])

In [8]: b
Out[8]: 
array([0, 0, 0, 2, 0, 0, 2, 1, 1, 1, 2, 2, 0, 1, 0, 0, 2, 2, 1, 0, 0, 2, 1,
       1, 0, 2, 2, 1, 2, 1, 1, 1, 2, 1, 2, 0, 2, 1, 1, 0, 0, 0, 0, 2, 1, 1,
       2, 0, 0, 1, 1, 1, 2, 2, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 0, 2, 0,
       0, 1, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 2, 1,
       2, 0, 1, 1, 2, 0, 0, 0])

In [9]: c
Out[9]: 
array([3, 3, 4, 1, 1, 3, 4, 4, 1, 0, 2, 2, 4, 2, 3, 0, 1, 0, 2, 0, 4, 1, 3,
       1, 0, 1, 1, 0, 1, 4, 1, 4, 2, 3, 3, 0, 3, 3, 1, 3, 0, 1, 4, 4, 3, 1,
       3, 1, 1, 4, 1, 0, 0, 3, 1, 3, 3, 3, 2, 2, 1, 2, 3, 4, 0, 3, 1, 3, 3,
       0, 4, 3, 0, 3, 0, 2, 4, 3, 1, 0, 4, 1, 3, 0, 1, 1, 4, 0, 0, 3, 2, 1,
       4, 2, 3, 2, 2, 1, 2, 0])

In [10]: result = crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'),
                          margins=True)

In [11]: result
Out[11]: 
b    0               1               2              All
c    0  1  2  3   4  0  1  2  3   4  0  1  2  3  4     
0    0  0  1  4   1  0  3  0  0   2  1  0  0  1  0  13 
1    3  0  0  3   1  0  2  0  1   1  0  1  1  2  1  16 
2    0  3  1  1   0  1  1  1  2   2  2  1  1  0  1  17 
3    1  0  0  0   0  2  1  0  2   1  0  0  0  0  0  7  
4    3  2  2  1   1  0  1  0  0   1  2  1  1  0  0  15 
5    0  1  0  0   0  3  1  1  4   0  0  3  3  2  1  19 
6    1  2  1  1   1  0  0  1  1   2  0  2  0  1  0  13 
All  8  8  5  10  4  6  9  3  10  9  5  8  6  6  3  100

I think that's pretty slick

@wesm wesm closed this as completed Jan 16, 2012
yarikoptic added a commit to neurodebian/pandas that referenced this issue Jan 19, 2012
* master: (313 commits)
  TST: more Python 2.5 sadness
  TST: Python 2.5 float formatting changed
  TST: cast to i8 when checking margins
  BUG: DataFrame.join on keys produce wrong result, does not preserve order
  DOC: release notes
  ENH: xs level can take multiple levels, pass multiple levels to MultiIndex.droplevel, GH pandas-dev#371
  BUG: fix bugs related to comments in pandas-dev#371
  BUG: fix TextParser with list buglet, enable parsing of DataFrame output with index names
  BUG: convert tuples in concat to MultiIndex
  BUG: don't lose index names when adding row margin
  ENH: add margins to crosstab
  ENH: add crosstab function and test
  ENH: crosstab prototype function, API needs fleshing out, GH pandas-dev#170
  BUG: fix buglet with xs with level, GH pandas-dev#371
  TST: add test_sql.py module
  TST: testing, cleanup of io.sql module
  TST: indexing testing with minor Series.__getitem__ refactoring
  ENH: hack toward pandas-dev#629
  BUG: check for non-contiguous memory in SeriesGrouper, causing segfault
  ENH: add ability to pass list of dicts to DataFrame.append (GH pandas-dev#464)
  ...
dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants