Next WeNet Roadmap #1683

robin1001 · 2023-02-08T04:05:08Z

We will mainly focus on the following two problems in Next WeNet.

NN based contextual biasing and LM solution. On the one hand, a pure end-to-end model is our final goal, including contextual biasing and LM. On the other hand, there are a lot of problems in our current contextual biasing and LM, such as poor rare word performance in contextual biasing, complicated LM solution since FST and token passing beam search are introduced, and so on. Also, we are looking for new paradigm, such as joint text/audio learning, prompt learning, and so on.
Open source big model, pretrained model, and mutimodal model exploration. We can see the increasing capability, influence, and interest in these models, and we believe it may give a final solution to general AI. It's hard for us to directly do such things due to the lack of research and computation resources. However, we can explore the usage of the models in speech recognition applications as open source big models + task/private data may be the new paradigm for the next AI.

We are open for other proposals. WeNet is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

The text was updated successfully, but these errors were encountered:

Mddct · 2023-03-12T02:51:15Z

From Google's recent USM paper, we can see the following three points:

1 injecting tezt

2 Simpler pre-training

3 Text to speech intermediate representation

I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。

And the community is a good way to cooperate to make the big model or the road of the new pipeline

Mddct · 2023-03-14T15:21:34Z

From Google's recent USM paper, we can see the following three points:

1 injecting tezt

2 Simpler pre-training

3 Text to speech intermediate representation

I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。

And the community is a good way to cooperate to make the big model or the road of the new pipeline

For 2: sipmpler pretrin： May be bestrq is good start : https://github.com/wenet-e2e/wenet/tree/Mddct-bestrq/wenet/ssl/bestrq

robin1001 · 2023-03-15T01:57:25Z

@Mddct shows his insight on general speech recognition task, it's great.

github-actions · 2023-12-27T01:15:52Z

This issue has been automatically closed due to inactivity.

robin1001 pinned this issue Feb 8, 2023

xingchensong added the documentation Improvements or additions to documentation label Feb 21, 2023

xingchensong mentioned this issue Feb 21, 2023

Transformer models #180

Closed

robin1001 unpinned this issue Nov 3, 2023

github-actions bot added the Stale label Dec 27, 2023

github-actions bot closed this as completed Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next WeNet Roadmap #1683

Next WeNet Roadmap #1683

robin1001 commented Feb 8, 2023 •

edited

Loading

Mddct commented Mar 12, 2023

Mddct commented Mar 14, 2023 •

edited

Loading

robin1001 commented Mar 15, 2023

github-actions bot commented Dec 27, 2023

Next WeNet Roadmap #1683

Next WeNet Roadmap #1683

Comments

robin1001 commented Feb 8, 2023 • edited Loading

Mddct commented Mar 12, 2023

Mddct commented Mar 14, 2023 • edited Loading

robin1001 commented Mar 15, 2023

github-actions bot commented Dec 27, 2023

robin1001 commented Feb 8, 2023 •

edited

Loading

Mddct commented Mar 14, 2023 •

edited

Loading