Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong detection of source language #6

Open
aneesh1122 opened this issue Nov 11, 2024 · 10 comments
Open

Wrong detection of source language #6

aneesh1122 opened this issue Nov 11, 2024 · 10 comments

Comments

@aneesh1122
Copy link

aneesh1122 commented Nov 11, 2024

The sentence is "我們一同追著心中的夢想"

Google translate is detecting it as Chinese Traditional
Screenshot_2024-11-12-02-52-19-386_com.brave.browser-edit.jpg

But your translator is detecting the source language the same as the target language.

For example, the source language is shown English here
IMG_20241112_025152_366.jpg

the source language is shown Russian here
IMG_20241112_025519_608.jpg

but the source sentence is same in both the translations.

This problem is only with Traditional Chinese. Simplified Chinese works fine.

The translation for Traditional Chinese is working but I'm working in a transliteration process and for this the source language needs to be accurate.

@therealbush
Copy link
Owner

I have not been very active on this library recently and I apologize in advance if I cannot fix this in a timely manner. However my first guess is that there is an error in Language.kt parsing the language string incorrectly. I may be able to take a closer look soon, but I would start there. I would also check the raw JSON response to verify if the source language is correct there. It's also possible the format of the JSON response has changed.

@therealbush
Copy link
Owner

On a further look I don't think it's related to language string to enum parsing, likely just a change in the JSON format

@aneesh1122
Copy link
Author

aneesh1122 commented Nov 12, 2024

On a further look I don't think it's related to language string to enum parsing, likely just a change in the JSON format

Me and my friend twistios were looking at the raw output and found out that it's possible to directly transliterate the source sentence. He did a pull request and you've merged his pull.

Could you please do a new release so that Twistios and I can use it in a project we contribute to?

@therealbush
Copy link
Owner

I need to fix #4 at some point as well but I can probably do 1.1.1 right now.

@aneesh1122
Copy link
Author

I need to fix #4 at some point as well but I can probably do 1.1.1 right now.

That would be great. Thanks

@therealbush
Copy link
Owner

it appears the raw json response from the google api shows the target language as the source language if source is set to auto, and only for traditional chinese. Maybe there is something I can do to fix this, but it seems its on google's end, not mine. still weird that the web translate correctly detects the source language.

@therealbush
Copy link
Owner

maybe if it only happens for traditional chinese, you can use the fact that the source and target are the same to assume that the source must be traditional chinese? lol

@therealbush
Copy link
Owner

the following request

translator.translateBlocking("我們一同追著心中的夢想", Language.SPANISH, Language.AUTO).rawData

produces this raw response

[
   [
      [
         "Persigamos nuestros sueños juntos",
         "我們一同追著心中的夢想",
         null,
         null,
         11,
         null,
         null,
         [
            [
               
            ],
            [
               
            ]
         ],
         [
            [
               [
                  "af64405095a399ceb1e05c7abb7cda66",
                  "zh_en_2023q1.md"
               ]
            ],
            [
               [
                  "e050167b38f2a566522b4157651fc616",
                  "en_es_2023q1.md"
               ]
            ]
         ]
      ],
      [
         null,
         null,
         null,
         "Wǒmen yītóng zhuīzhe xīnzhōng de mèngxiǎng"
      ]
   ],
   null,
   "es",
   null,
   null,
   [
      [
         "我們一同追著心中的夢想",
         null,
         [
            [
               "Persigamos nuestros sueños juntos",
               null,
               true,
               false,
               [
                  11
               ]
            ],
            [
               "Persigamos juntos nuestros sueños",
               null,
               true,
               false,
               [
                  11
               ]
            ]
         ],
         [
            [
               0,
               11
            ]
         ],
         "我們一同追著心中的夢想",
         0,
         0
      ]
   ],
   1,
   [
      
   ],
   [
      [
         "es"
      ],
      null,
      [
         1
      ],
      [
         "es"
      ]
   ]
]

@therealbush
Copy link
Owner

Created a new release

@aneesh1122
Copy link
Author

aneesh1122 commented Nov 12, 2024

maybe if it only happens for traditional chinese, you can use the fact that the source and target are the same to assume that the source must be traditional chinese? lol

I'm already using if (sourceLanguage == targetLanguage) to disable the translation process otherwise I would have forced this condition for translating the sentence to Chinese Traditional 😅

Created a new release

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants