[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837

jox79 · 2021-05-10T09:38:23Z

Describe the bug

Error with SMOTENC.fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12)

Steps/Code to Reproduce

Using the two X and y csv dataset attached:

X.zip
y.zip

I'm running:

smote = SMOTENC(
  categorical_features=[19],
  sampling_strategy="auto",
  random_state=0,
  n_jobs=8
)
X, y = smote.fit_resample(X, y)

Expected Results

No error is thrown.

Actual Results

File "C:\Users\c42steguerri\PycharmProjects\StrategyLab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 577, in _generate_samples
    ] = self._X_categorical_minority_encoded
ValueError: could not broadcast input array from shape (137,12) into shape (272,12)

Versions

System:
    python: 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users\c42steguerri\PycharmProjects\StrategyLab\venv\Scripts\python.exe
   machine: Windows-10-10.0.16299-SP0

Python dependencies:
          pip: 19.0.3
   setuptools: 40.8.0
      sklearn: 0.24.1
        numpy: 1.18.4
        scipy: 1.4.1
       Cython: None
       pandas: 1.0.5
   matplotlib: None
       joblib: 0.14.1
threadpoolctl: 2.0.0

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

SkylarTrigueiro · 2021-06-01T18:10:37Z

I'm having a similar issue with some code I'm testing. If I discover anything I'll let you know.

chkoar · 2021-06-01T18:17:56Z

What are your imbalanced-learn versions?

chkoar · 2021-06-01T18:24:33Z

@jox79 please post a code snippet in order to reproduce the error.

jonasjostmann · 2021-06-24T14:27:40Z

I'm having the same problem. I'm using imbalanced-learn version 0.8.0.

jonasjostmann · 2021-06-24T16:22:52Z

I have found a rather unattractive workaround for the meantime. I choose sampling_strategy='minority' and loop over all labels.

smotenc = SMOTENC(
    categorical_features=[250],
    random_state=42,
    k_neighbors=5,
    sampling_strategy="minority",
)

for label in np.unique(y):
    X, y = smotenc.fit_resample(X, y)

Did I miss something?

jox79 · 2021-12-02T16:56:02Z

I'm still having this error also with v 0.8.1

File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\base.py", line 83, in fit_resample
    output = self._fit_resample(X, y)
  File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 518, in _fit_resample
    X_resampled, y_resampled = super()._fit_resample(X_encoded, y)
  File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 311, in _fit_resample
    X_class, y.dtype, class_sample, X_class, nns, n_samples, 1.0
  File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 103, in _make_samples
    X_new = self._generate_samples(X, nn_data, nn_num, rows, cols, steps)
  File "C:\CRIF\StrategyOne\S170\wspace\lab\venv\lib\site-packages\imblearn\over_sampling\_smote\base.py", line 577, in _generate_samples
    ] = self._X_categorical_minority_encoded
Exception: could not broadcast input array from shape (6,154) into shape (455,154)

I do not have idea how to solve it.....

glemaitre · 2022-01-16T18:31:45Z

The issue here is that the internal algorithm was wrongly thought only for binary classification for the case when the median of the std. dev. == 0. This need to be adapted to multiclass. I assume that it boils down to _X_categorical_minority_encoded for all the classes to be over-sampled and not only the minority class.

glemaitre · 2022-01-16T18:32:32Z

In short:

        # we can replace the 1 entries of the categorical features with the
        # median of the standard deviation. It will ensure that whenever
        # distance is computed between 2 samples, the difference will be equal
        # to the median of the standard deviation as in the original paper.

        # In the edge case where the median of the std is equal to 0, the 1s
        # entries will be also nullified. In this case, we store the original
        # categorical encoding which will be later used for inversing the OHE
        if math.isclose(self.median_std_, 0):
            self._X_categorical_minority_encoded = _safe_indexing(
                X_ohe.toarray(), np.flatnonzero(y == class_minority)
            )

Here, we need to store not only for the minority class but all class to be resampled.

jox79 · 2022-01-31T17:46:32Z

no way to have that issue fixed in one of the next releases? It is really important in my opinion. Thanks very much!

glemaitre · 2022-01-31T17:47:34Z

@jox79 feel free to open a PR to fix the bug

freddyaboulton · 2022-06-03T14:53:58Z

I put up a fix here @jox79 #905

kelvinheng92 · 2022-09-05T06:03:47Z

Hi everyone, can i check the status of this MR? I am facing the same error. However, its pretty random, sometimes it is able to run, sometimes it isn't. Please see the error log below. Thanks a lot!

lolloconsoli · 2022-12-01T16:28:22Z

I got the same error
this is the traceback

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_112/2018849994.py in <module>
      6 Y_validation = np.asarray(LabelEncoder().fit_transform(Y_validation))
      7 print(f"Y_type {type(Y_training)}\tshape Y_train {Y_training.shape}")
----> 8 X_training_rus, Y_training_rus = over_sampler.fit_resample(X_train_concat, Y_training)
      9 print("Sampled!")
     10 

/opt/conda/lib/python3.7/site-packages/imblearn/base.py in fit_resample(self, X, y)
     75         check_classification_targets(y)
     76         arrays_transformer = ArraysTransformer(X, y)
---> 77         X, y, binarize_y = self._check_X_y(X, y)
     78 
     79         self.sampling_strategy_ = check_sampling_strategy(

/opt/conda/lib/python3.7/site-packages/imblearn/over_sampling/_random_over_sampler.py in _check_X_y(self, X, y)
    144             accept_sparse=["csr", "csc"],
    145             dtype=None,
--> 146             force_all_finite=False,
    147         )
    148         return X, y, binarize_y

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    430                 y = check_array(y, **check_y_params)
    431             else:
--> 432                 X, y = check_X_y(X, y, **check_params)
    433             out = X, y
    434 

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    800                     ensure_min_samples=ensure_min_samples,
    801                     ensure_min_features=ensure_min_features,
--> 802                     estimator=estimator)
    803     if multi_output:
    804         y = check_array(y, accept_sparse='csr', force_all_finite=True,

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    596                     array = array.astype(dtype, casting="unsafe", copy=False)
    597                 else:
--> 598                     array = np.asarray(array, order=order, dtype=dtype)
    599             except ComplexWarning:
    600                 raise ValueError("Complex data not supported\n"

/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85

It looks like when internally its calling /opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)

there may be some parameter that need to be reset:
the error is thrown by numpy when it calls array = np.asarray(array, order=order, dtype=dtype)

i checked my input by calling the same np.asarray() function

print(f"Y_type {type(Y_training)}\tshape Y_train {np.asarray(Y_training).shape}")

and it is:

Y_type <class 'numpy.ndarray'>	shape Y_train (56123,)

I was thinking maybe the force_all_finite or the ensure_2d arguments are the issue, even becasue we can read the lines:

/opt/conda/lib/python3.7/site-packages/imblearn/over_sampling/_random_over_sampler.py in _check_X_y(self, X, y)
    144             accept_sparse=["csr", "csc"],
    145             dtype=None,
--> 146             force_all_finite=False,
    147         )
    148         return X, y, binarize_y

from the traceback.

I dont know tho if this makes sense or could be helpful, i desperately need a fix to this hahaha

glemaitre · 2023-07-10T15:36:36Z

It should be solved in #1015

LukebethamStonehaven · 2023-09-07T08:16:28Z

Hi @glemaitre, just wondering when this change is going to be released. I think it didn't make it in to 0.11.0 right? Seems like #1015 was merged a couple days after the last release?

glemaitre · 2023-09-07T08:23:40Z

It should aready be available in the latest release in 0.11

LukebethamStonehaven · 2023-09-07T08:31:46Z

Oh right I have updated to 0.11 and am still getting this error - it only seems to happen sometimes though...

glemaitre · 2023-09-07T08:51:18Z

It could be another bug with the same error.
Don't hesitate to open a new issue with a minimal example that trigger the error.

jox79 changed the title ~~[BUG]~~ [BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 May 12, 2021

freddyaboulton mentioned this issue Jun 1, 2022

[MRG] Fix SmoteNC zero variance resampling #905

Closed

glemaitre mentioned this issue Jul 10, 2023

FIX compute the median of std dev for each class to over-sample in SMOTENC #1015

Merged

glemaitre closed this as completed in #1015 Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837

[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837

jox79 commented May 10, 2021 •

edited by hayesall

Loading

SkylarTrigueiro commented Jun 1, 2021

chkoar commented Jun 1, 2021

chkoar commented Jun 1, 2021 •

edited

Loading

jonasjostmann commented Jun 24, 2021 •

edited

Loading

jonasjostmann commented Jun 24, 2021 •

edited

Loading

jox79 commented Dec 2, 2021 •

edited by glemaitre

Loading

glemaitre commented Jan 16, 2022

glemaitre commented Jan 16, 2022

jox79 commented Jan 31, 2022

glemaitre commented Jan 31, 2022

freddyaboulton commented Jun 3, 2022

kelvinheng92 commented Sep 5, 2022

lolloconsoli commented Dec 1, 2022

glemaitre commented Jul 10, 2023

LukebethamStonehaven commented Sep 7, 2023 •

edited

Loading

glemaitre commented Sep 7, 2023

LukebethamStonehaven commented Sep 7, 2023

glemaitre commented Sep 7, 2023

[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837

[BUG]- error with SMOTENC fit_resample: ValueError: could not broadcast input array from shape (137,12) into shape (272,12 #837

Comments

jox79 commented May 10, 2021 • edited by hayesall Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

SkylarTrigueiro commented Jun 1, 2021

chkoar commented Jun 1, 2021

chkoar commented Jun 1, 2021 • edited Loading

jonasjostmann commented Jun 24, 2021 • edited Loading

jonasjostmann commented Jun 24, 2021 • edited Loading

jox79 commented Dec 2, 2021 • edited by glemaitre Loading

glemaitre commented Jan 16, 2022

glemaitre commented Jan 16, 2022

jox79 commented Jan 31, 2022

glemaitre commented Jan 31, 2022

freddyaboulton commented Jun 3, 2022

kelvinheng92 commented Sep 5, 2022

lolloconsoli commented Dec 1, 2022

glemaitre commented Jul 10, 2023

LukebethamStonehaven commented Sep 7, 2023 • edited Loading

glemaitre commented Sep 7, 2023

LukebethamStonehaven commented Sep 7, 2023

glemaitre commented Sep 7, 2023

jox79 commented May 10, 2021 •

edited by hayesall

Loading

chkoar commented Jun 1, 2021 •

edited

Loading

jonasjostmann commented Jun 24, 2021 •

edited

Loading

jonasjostmann commented Jun 24, 2021 •

edited

Loading

jox79 commented Dec 2, 2021 •

edited by glemaitre

Loading

LukebethamStonehaven commented Sep 7, 2023 •

edited

Loading