-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected transform
behavior on grouped dataset
#3740
Comments
looks like u have some uint dtypes also try this on master I just fixed the cause of this On Jun 2, 2013, at 4:40 PM, Chris Fonnesbeck notifications@github.com wrote:
|
Here is an example. This is in 0.11.0
0.11.1 this works
|
My DataFrame info is as follows:
So, |
can u give me a Dropbox link to the frame as a csv? |
@fonnesbeck can you give a try with this PR, should fix it; this was a pretty esotric bug |
This is a reproduction: 0.11.0
0.11.1
|
Right, however the values for C are wrong. They should be upcast to floats, since they are z-scores: array([-1.22474487, 0., 1.22474487, -1.22474487, 0., 1.22474487]) |
what is your calculation? this seems correct (its by group)
|
It works when your data is [1,2,3], but try it for the values in C from your df example above, or even [1,2,3,4]. Also, as I reported originally, the function works stand-alone but not as the argument to |
I think my bug-fix works; but in your lamba you need to be sure to use floats (otherwise it is actually correct) e.g.
I think would work; you are getting integer division; with a lambda like this pandas cannot infer what you actually want so either astype the data on the way in, or use the lambda like above |
actually...hold on |
If you take the standard deviation of a series of integers, you should not get an integer back, since we are taking a square root of a sum. Its not clear why the explicit casting should be required. |
ok...think I got it; I put another commit us, pls give a shot here's your original data set
|
That works. Thanks! |
great! in time for 0.11.1 |
This fix appears to work for some numeric columns in the sample DataFrame that I sent, but not others:
With the exception of the two string variables (id, treatment), the columns appear to be valid int64s with no missing data:
Its not clear why they are coming up NaN. |
u. the particular groups that I looked they were all the same value can u show a particular group where that is not the case and they still come up nan? show your groupby as well thxs |
Yes, of course. My mistake, sorry. |
np |
I have a simple longitudinal biomedical dataset that I am grouping according to the patient on which measurements are taken. Here are the first couple of groups:
However, when I try to
transform
these data, say by normalization, I get nonsensical results:The
normalize
function is straightforward, and works fine when applied to manually subsetted data:Any guidance here much appreciated. I'm hoping its something obvious.
The text was updated successfully, but these errors were encountered: