Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: high is out of bounds for int32 #66

Closed
XNarno opened this issue Jul 17, 2024 · 3 comments
Closed

ValueError: high is out of bounds for int32 #66

XNarno opened this issue Jul 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@XNarno
Copy link

XNarno commented Jul 17, 2024

Bug description

The tutorial is not running in the "initialize_active_learner" part related to "Setting up the Active Learner" session.

Steps to reproduce

Running the third notebook https://github.com/webis-de/small-text/blob/main/examples/notebooks/03-active-learning-with-setfit.ipynb

Expected behavior

Not getting this error.

Environment:

Python version: 3.12.4
small-text version: 1.4.0
small-text integrations (e.g., transformers): 4.42.3
PyTorch version (if applicable): /
OS : Windows 10 Enterprise

Installation (pip, conda, or from source): pip in a conda env
CUDA version (if applicable): /

Addition information

The error message :

`ValueError Traceback (most recent call last)
Cell In[26], line 27
22 active_learner.initialize_data(x_indices_initial, y_initial)
24 return x_indices_initial
---> 27 initial_indices = initialize_active_learner(active_learner, train.y)
28 labeled_indices = initial_indices

Cell In[26], line 22
19 #x_indices_initial = x_indices_initial.astype(int)
20 y_initial = y_train_int[x_indices_initial]
---> 22 active_learner.initialize_data(x_indices_initial, y_initial)
24 return x_indices_initial

File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:154, in PoolBasedActiveLearner.initialize_data(self, indices_initial, y_initial, indices_ignored, indices_validation, retrain)
151 self.indices_ignored = np.empty(shape=(0), dtype=int)
153 if retrain:
--> 154 self._retrain(indices_validation=indices_validation)

File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:393, in PoolBasedActiveLearner._retrain(self, indices_validation)
390 dataset.y = self.y
392 if indices_validation is None:
--> 393 self._clf.fit(dataset, **self.fit_kwargs)
394 else:
395 indices = np.arange(self.indices_labeled.shape[0])
...
File numpy\random\mtrand.pyx:780, in numpy.random.mtrand.RandomState.randint()

File numpy\random\_bounded_integers.pyx:1423, in numpy.random._bounded_integers._rand_int32()

ValueError: high is out of bounds for int32`

@XNarno XNarno added the bug Something isn't working label Jul 17, 2024
@chschroeder
Copy link
Contributor

Hi @XNarno,

Thank you for reporting this! This is an error I haven't seen before. I suspect this is an issue that numpy has with Windows 10.

I will further investigate this. If you want to try the notebook, you could execute it in Google Colab for now.

@chschroeder chschroeder added this to the small-text-1.4.1 milestone Jul 22, 2024
@chschroeder
Copy link
Contributor

This is (partly) a peculiarity of Windows 10 and numpy.

  • For numpy>=2 the solution seems easy
  • For numpy<2, I am not sure how I want the solution to be

Either way, it will be fixed in small-text 1.4.1.

chschroeder added a commit that referenced this issue Aug 2, 2024
Signed-off-by: Christopher Schröder <chschroeder@users.noreply.github.com>
@chschroeder
Copy link
Contributor

chschroeder commented Aug 2, 2024

I have a fix. Unfortunately, I don't have a system where I could test this. Whenever you have a moment, could you please let me know if the fix is working?

You can install from the v1.4.x branch directly with:

pip install git+https://github.com/webis-de/small-text.git@v1.4.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants