Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[S2T]Using Whisper but an error occured #2844

Closed
Winter-Dry opened this issue Jan 20, 2023 · 2 comments
Closed

[S2T]Using Whisper but an error occured #2844

Winter-Dry opened this issue Jan 20, 2023 · 2 comments
Assignees
Labels

Comments

@Winter-Dry
Copy link

the code is :

  recognizer_whisper = WhisperExecutor()
  result_tmp = recognizer_whisper(model='whisper',
                                  task='transcribe',
                                  size='medium',
                                  sample_rate=16000,
                                  config=None,
                                  ckpt_path=None,
                                  audio_file=audio_file,
                                  # lang='zh',
                                  device='gpu:' + device)

Traceback (most recent call last):
  File "audio_seg_recognition_masr.py", line 191, in <module>
    run(video_path, audio_path, result_path, config['audio_rcg'])
  File "audio_seg_recognition_masr.py", line 132, in run
    result_tmp = recognizer_whisper(model='whisper',
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/cli/utils.py", line 328, in _warpper
    return executor_func(self, *args, **kwargs)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/cli/whisper/infer.py", line 484, in __call__
    self.infer(model)
  File "<decorator-gen-688>", line 2, in infer
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 375, in _decorate_function
    return func(*args, **kwargs)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/cli/whisper/infer.py", line 280, in infer
    self._outputs["result"] = self.model.transcribe(
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 588, in transcribe
    result: DecodingResult = decode_with_fallback(segment)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 520, in decode_with_fallback
    decode_result = model.decode(segment, options, resource_path)
  File "<decorator-gen-685>", line 2, in decode
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 375, in _decorate_function
    return func(*args, **kwargs)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 1298, in decode
    result = DecodingTask(model, options, resource_path).run(mel)
  File "<decorator-gen-682>", line 2, in run
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 375, in _decorate_function
    return func(*args, **kwargs)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 1220, in run
    tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features,
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 1168, in _main_loop
    tokens, completed = self.decoder.update(tokens, logits,
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddlespeech/s2t/models/whisper/whipser.py", line 784, in update
    tokens = paddle.concat([tokens, next_tokens[:, None]], axis=-1)
  File "/share/program/workspace/miniconda3/envs/news/lib/python3.8/site-packages/paddle/tensor/manipulation.py", line 1140, in concat
    return _C_ops.concat(input, axis)
ValueError: (InvalidArgument) The shape of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [5, 227], input[1]'s shape = [5, 1, 51865, 5].
  [Hint: Expected inputs_dims[i].size() == out_dims.size(), but received inputs_dims[i].size():4 != out_dims.size():2.] (at /paddle/paddle/phi/kernels/funcs/concat_funcs.h:55)

I am confused, I just pass in the audio file path, but it raises the dims mismatch error.

@zxcd
Copy link
Collaborator

zxcd commented Jan 20, 2023

duplicate with #2818
you can try this pr #2825

@Winter-Dry
Copy link
Author

ohh, thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants