Support for the new 450 language translation models from Google T5X "madlad" - apparently Apache-2 #4316

cmp-nct · 2023-12-04T00:10:01Z

Example: https://huggingface.co/jbochi/madlad400-3b-mt/tree/main
In Googles own space: https://huggingface.co/google/madlad400-10b-mt

The guy converted the format of the 3 smallest models (3b,7b,10b) to HF transformers. Given the severe lack in non english output a good translation model would be a gift.
I just tried the CPU demo of the 3B, it produced quite good output, if that gets better with 7B+ it would be a real solution for a huge amount of people.
It could be added as a 2nd stage into llama.cpp

Though the architecture is "T5ForConditionalGeneration" which isn't supported.

So far there was no urgent reason to add those T5 models, they did not stick out as special but the idea to output text in every single language worldwide .. that would be remarkable

JettScythe · 2024-01-05T20:44:02Z

@cmp-nct can you please link to the PR where this was completed?

cmp-nct · 2024-01-05T21:02:51Z

@cmp-nct can you please link to the PR where this was completed?

I’ve closed it as not important. Given the many things we need done and the almost zero interest for translation here.
The model from Google was ok but it was quite flawed when testing it in depth.

vasicvuk · 2024-02-26T14:27:58Z

This would be great if supported +1. @cmp-nct maybe you can reopen the issue so the support for this models is planned

sorasoras · 2024-02-26T14:58:52Z

That's interesting to me at least.

easp · 2024-02-29T02:59:18Z

Perhaps relevant: #5763

MathiasSchindler · 2024-03-06T15:03:18Z

@cmp-nct can you please link to the PR where this was completed?

I’ve closed it as not important. Given the many things we need done and the almost zero interest for translation here. The model from Google was ok but it was quite flawed when testing it in depth.

madlad400 is remarkable in the sense that it has the most permissive license in comparison to meta's nllb200 or seamless communication. The translation quality varies a lot between the 419 listed languages, with some language combinations producing a very decent translation quality. Some language pairs seem to be heavily influenced by the datasets from the European Parliament (with sometimes cute and hilarius translation mistakes coming from the European Parliament debates). I have heard from other users that some languages in the Indic family and Chinese family should not be used at the moment.

I have a huge interest if someone could support it.

github-actions · 2024-04-20T01:07:20Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Nikola-Milovic · 2024-05-04T17:39:46Z

Surprising lack of interest for translational models, any reasons why?

cmp-nct added the enhancement New feature or request label Dec 4, 2023

cmp-nct closed this as completed Dec 4, 2023

cmp-nct reopened this Feb 28, 2024

easp mentioned this issue Feb 29, 2024

Madlad400 model ollama/ollama#2802

Open

github-actions bot added the stale label Apr 6, 2024

github-actions bot closed this as completed Apr 20, 2024

MathiasSchindler mentioned this issue May 14, 2024

Support request - Google MADLAD400-10B #7238

Closed

fairydreaming mentioned this issue Jun 27, 2024

Inference support for T5 and FLAN-T5 model families #8141

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for the new 450 language translation models from Google T5X "madlad" - apparently Apache-2 #4316

Support for the new 450 language translation models from Google T5X "madlad" - apparently Apache-2 #4316

cmp-nct commented Dec 4, 2023 •

edited

Loading

JettScythe commented Jan 5, 2024

cmp-nct commented Jan 5, 2024

vasicvuk commented Feb 26, 2024

sorasoras commented Feb 26, 2024

easp commented Feb 29, 2024

MathiasSchindler commented Mar 6, 2024

github-actions bot commented Apr 20, 2024

Nikola-Milovic commented May 4, 2024

Support for the new 450 language translation models from Google T5X "madlad" - apparently Apache-2 #4316

Support for the new 450 language translation models from Google T5X "madlad" - apparently Apache-2 #4316

Comments

cmp-nct commented Dec 4, 2023 • edited Loading

JettScythe commented Jan 5, 2024

cmp-nct commented Jan 5, 2024

vasicvuk commented Feb 26, 2024

sorasoras commented Feb 26, 2024

easp commented Feb 29, 2024

MathiasSchindler commented Mar 6, 2024

github-actions bot commented Apr 20, 2024

Nikola-Milovic commented May 4, 2024

cmp-nct commented Dec 4, 2023 •

edited

Loading