-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Standardize method applied on individual variables in OTC/dOTC #1896
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a fine change. The transform
signature is very new, so no need for a deprecation warning message. Just need to update the CHANGELOG.
…otc_standardization
@SarahG-579462 Thoughts on these changes? Especially |
It's a step in the right direction, but I realze that's not even what I had in mind for the standardization. Right now, we just remove the mean/std of the cells, but those are simply evenly distributed. So, it doesn't say much about the distributions at hand, we don't have the information of the probability of each cell (or the count). Perhaps a proper standardization would not work well, but this remains to be seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Did you want to include your work with the time step vs grid cell motion, or do you want that in a different PR?
Maybe the dOTC evolution through cells (more fidel to the paper) can wait another PR. But, as I was refering in my last comment, I don't think what we were doing previously is what I had in mind for standardization. Both are interesting in their own way, and don't yield very different results as far as I can see. The new use def _standardize(grid, mu):
mu0 = mu.reshape(len(mu), 1)
mean = (mu0*grid).sum(axis=0)
std = np.sqrt((mu0*(grid**2-mean**2)).sum(axis=0))
return (grid-mean)/std
# this actually computes `( X - mean(X) ) / std(X)`
if normalization == "standardize":
gridX = _standardize(gridX, muX)
gridY = _standardize(gridY, muY)
# this is the old way, it computes `(gridX - mean(gridX) ) / std(gridX)`
elif normalization == "standardize_cells":
gridX = (gridX - gridX.mean(axis=0)) / gridX.std(axis=0)
gridY = (gridY - gridY.mean(axis=0)) / gridY.std(axis=0)
The differences are not huge though: |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…otc_standardization
…inc/xclim into fix_dotc_standardization
I'll just merge this PR now, other changes can be in another PR. |
Pull Request Checklist:
number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
transform
tonormalization
which I feel is more specific and informative. Also, inoptimal_transport
, we should follow the snake convention,numIterMax
->num_iter_max
(I guess this was done to imitate the signature ofot.emd
, but I think we need to stick to a standard inxclim
functions)Does this PR introduce a breaking change?
transform
->normalization
in input arguments.Other information: