Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Sequence features #5866

Closed
alialamiidrissi opened this issue May 15, 2023 · 1 comment · Fixed by #5897
Closed

Issue with Sequence features #5866

alialamiidrissi opened this issue May 15, 2023 · 1 comment · Fixed by #5897

Comments

@alialamiidrissi
Copy link

alialamiidrissi commented May 15, 2023

Describe the bug

Sequences features sometimes causes errors when the specified length is not -1

Steps to reproduce the bug

import numpy as np
from datasets import Features, ClassLabel, Sequence, Value, Dataset
feats = Features(**{'target': ClassLabel(names=[0, 1]),'x': Sequence(feature=Value(dtype='float64',id=None), length=2, id=None)})
Dataset.from_dict({"target": np.ones(2000).astype(int), "x": np.random.rand(2000,2)},features = feats).flatten_indices()

Throws:

  TypeError: Couldn't cast array of type
  fixed_size_list<item: double>[2]
  to
  Sequence(feature=Value(dtype='float64', id=None), length=2, id=None)

The same code works without any issues when length = -1

EDIT: The error seems to happen only when the length of the dataset is bigger than 1000 for some reason

Expected behavior

No exception

Environment info

  • datasets version: 2.10.1
  • Python version: 3.9.5
  • PyArrow version: 11.0.0
  • Pandas version: 1.4.1
@alialamiidrissi alialamiidrissi changed the title Issue with Sequence feature Issue with Sequence features May 16, 2023
@mariosasko
Copy link
Collaborator

Thanks for reporting! I've opened a PR with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants