-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837
Comments
I'm having a similar issue with some code I'm testing. If I discover anything I'll let you know. |
What are your |
@jox79 please post a code snippet in order to reproduce the error. |
I'm having the same problem. I'm using imbalanced-learn version 0.8.0. |
I have found a rather unattractive workaround for the meantime. I choose smotenc = SMOTENC(
categorical_features=[250],
random_state=42,
k_neighbors=5,
sampling_strategy="minority",
)
for label in np.unique(y):
X, y = smotenc.fit_resample(X, y) Did I miss something? |
I'm still having this error also with v 0.8.1 File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\base.py", line 83, in fit_resample
output = self._fit_resample(X, y)
File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 518, in _fit_resample
X_resampled, y_resampled = super()._fit_resample(X_encoded, y)
File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 311, in _fit_resample
X_class, y.dtype, class_sample, X_class, nns, n_samples, 1.0
File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 103, in _make_samples
X_new = self._generate_samples(X, nn_data, nn_num, rows, cols, steps)
File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 577, in _generate_samples
] = self._X_categorical_minority_encoded
Exception: could not broadcast input array from shape (6,154) into shape (455,154) I do not have idea how to solve it..... |
The issue here is that the internal algorithm was wrongly thought only for binary classification for the case when the median of the std. dev. == 0. This need to be adapted to multiclass. I assume that it boils down to |
In short: # we can replace the 1 entries of the categorical features with the
# median of the standard deviation. It will ensure that whenever
# distance is computed between 2 samples, the difference will be equal
# to the median of the standard deviation as in the original paper.
# In the edge case where the median of the std is equal to 0, the 1s
# entries will be also nullified. In this case, we store the original
# categorical encoding which will be later used for inversing the OHE
if math.isclose(self.median_std_, 0):
self._X_categorical_minority_encoded = _safe_indexing(
X_ohe.toarray(), np.flatnonzero(y == class_minority)
) Here, we need to store not only for the minority class but all class to be resampled. |
no way to have that issue fixed in one of the next releases? It is really important in my opinion. Thanks very much! |
@jox79 feel free to open a PR to fix the bug |
I got the same error
It looks like when internally its calling there may be some parameter that need to be reset: i checked my input by calling the same
and it is:
I was thinking maybe the
from the traceback. I dont know tho if this makes sense or could be helpful, i desperately need a fix to this hahaha |
It should be solved in #1015 |
Hi @glemaitre, just wondering when this change is going to be released. I think it didn't make it in to 0.11.0 right? Seems like #1015 was merged a couple days after the last release? |
It should aready be available in the latest release in 0.11 |
Oh right I have updated to 0.11 and am still getting this error - it only seems to happen sometimes though... |
It could be another bug with the same error. |
Describe the bug
Error with
SMOTENC.fit_resample
:ValueError: could not broadcast input array from shape (137,12) into shape (272,12)
Steps/Code to Reproduce
Using the two X and y csv dataset attached:
X.zip
y.zip
I'm running:
Expected Results
No error is thrown.
Actual Results
Versions
The text was updated successfully, but these errors were encountered: