Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-extraction pipeline to return Tensor #10016

Closed
ierezell opened this issue Feb 4, 2021 · 9 comments · Fixed by #19257 or #19707
Closed

Feature-extraction pipeline to return Tensor #10016

ierezell opened this issue Feb 4, 2021 · 9 comments · Fixed by #19257 or #19707

Comments

@ierezell
Copy link
Contributor

ierezell commented Feb 4, 2021

🚀 Feature request

Actually, to code of the feature-extraction pipeline
transformers.pipelines.feature-extraction.FeatureExtractionPipeline l.82 return a super().__call__(*args, **kwargs).tolist()

Which gives a list[float] (or list[list[float]] if list[str] in input)

I guess it's to be framework agnostic, but we can specify framework='pt' in the pipeline config so I was expecting a torch.tensor.

Could we add some logic to return tensors ?

Motivation

Features will be used as input of other models, so keeping them as tensors (even better on GPU) would be profitable.

Thanks in advance for the reply,

Have a great day.

@LysandreJik
Copy link
Member

Hello! Indeed, this is a valid request. Would you like to open a PR and take a stab at it?

@ierezell
Copy link
Contributor Author

ierezell commented Feb 5, 2021

@LysandreJik Hi, thanks for the fast reply !

Ok will do that :)
I will comment here when the PR will be ready

@ak314
Copy link
Contributor

ak314 commented Mar 19, 2021

Hi @LysandreJik is there any update on this issue? If @ierezell didn't have time, I might be able to give a shot at it in the next days

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@steysie
Copy link

steysie commented May 27, 2022

Hi!
Is this issue somewhere in consideration still?
Would be awesome to be able to get tensors from the feature extraction pipeline

@LysandreJik
Copy link
Member

I think we'd still be open to that; WDYT @Narsil?

@Narsil
Copy link
Contributor

Narsil commented May 31, 2022

Sure !

Would adding an argument return_type= "tensors" be OK ? That way we can enable this feature without breaking backward compatibility ?

@ajsanjoaquin
Copy link
Contributor

I'm baffled as to why returning the features as a list is the default behavior in the first place... Isn't one common usage of feature extraction to provide an input to another model, which means it is preferred to keep it as a tensor?

@Narsil
Copy link
Contributor

Narsil commented Oct 17, 2022

@ajsanjoaquin

Well it depends, not necessarily. Another very common use case is to feed it to some feature database for querying later.
Those database engines are not necessarily expecting the same kind of tensors that you are sending.

But I kind of agree that it should be at least a numpy.array because usually conversions between numpy and PT or TF is basically free, meaning it would be much easier to use that way.

Some pipeline were added a long time ago where the current situation was not as clear as today, and since we are very conservative regarding breaking changes, that can explain why some defaults are the way they are.

If/When v5 is getting prepared there would be a lot of small but breaking changes in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment