Skip to content

Commit

Permalink
Version 2023.11.12
Browse files Browse the repository at this point in the history
- force auto orientation in scanning mode;
- The recognized text is again showed in a txt file;
- Updated finish translation.
  • Loading branch information
ruifontes committed Nov 12, 2023
1 parent 66cbd8f commit 1544d98
Show file tree
Hide file tree
Showing 16 changed files with 174 additions and 150 deletions.
12 changes: 6 additions & 6 deletions 2023.9.26.json → 2023.11.12.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@
"displayName": "TesseractOCR: An OCR add-on",
"URL": "",
"description": "\nPerforms OCR on the selected image file, PDF, JPG, TIF, etc, or in a document through a scanner and shows the results in a browseable message.\nWindows+Control+r - Performs OCR to the selected image file\nWindows+Control+w - Scans and recognize the document in the scanner\n",
"sha256": "a6ba7b16427685044e38f28876e9b3d322f82c5cff391f113284dd4ad82723f7",
"sha256": "40b03482d0b7cb10fb4218d33261032e24664b2797de9aad19f6f19b0e56dae4",
"homepage": "https://github.com/ruifontes/tesseractOCR",
"addonVersionName": "2023.09.26",
"addonVersionName": "2023.11.12",
"addonVersionNumber": {
"major": 2023,
"minor": 9,
"patch": 26
"minor": 11,
"patch": 12
},
"minNVDAVersion": {
"major": 2019,
"minor": 3,
"patch": 0
},
"lastTestedVersion": {
"major": 2024,
"minor": 1,
"major": 2023,
"minor": 3,
"patch": 0
},
"channel": "stable",
Expand Down
4 changes: 2 additions & 2 deletions addon/doc/de/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Autoren: Rui Fontes, Ângelo Abrantes und Abel Passos do Nascimento Jr.
* Aktualisiert am 09.04.2023
* Aktualisiert am 12.11.2023
* Laden Sie die [stabile Version][1] herunter
* Kompatibilität: NVDA-Version 2019.3 und höher

Expand Down Expand Up @@ -174,4 +174,4 @@ Dieses Add-On unterstützt die folgenden Dateitypen:
* spix
* webp

[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/es/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Autores: Rui Fontes, Ângelo Abrantes y Abel Passos do Nascimento Jr.
* Actualizado el 04/09/2023
* Actualizado el 12/11/2023
* Descargar [versión estable][1]
* Compatibilidad con NVDA: versión 2019.3 y posteriores

Expand Down Expand Up @@ -176,4 +176,4 @@ Este complemento soporta los siguientes tipos de archivos:
* webp


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/fi/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Tekijät: Rui Fontes, Ângelo Abrantes ja Abel Passos do Nascimento nuorempi
* Päivitetty 04.9.2023
* Päivitetty 12.11.2023
* Lataa [vakaa versio][1]
* Yhteensopivuus: NVDA 2019.3 ja uudemmat

Expand Down Expand Up @@ -180,4 +180,4 @@ Tämä lisäosa tukee seuraavia tiedostotyyppejä:
* WebP


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/fr/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Auteurs: Rui Fontes, Ângelo Abrantes et Abel Passos do Nascimento Jr.
* Mis à jour le 04/09/2023
* Mis à jour le 12/11/2023
* Télécharger [version stable][1]
* Compatibilité NVDA: version 2019.3 et ultérieure

Expand Down Expand Up @@ -176,4 +176,4 @@ Cette extension supporte les types de fichiers suivants:
* webp


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
6 changes: 3 additions & 3 deletions addon/doc/pt_BR/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Autores: Rui Fontes, Ângelo Abrantes e Abel Passos do Nascimento Jr.
* Actualizado em 04/09/2023
* Actualizado em 12/11/2023
* Descarregar a [versão estável][1]
* Compatibilidade: NVDA 2019.3 e seguintes

Expand All @@ -28,7 +28,7 @@ Os comandos predefinidos são:
Windows+Control+r - Para reconhecer o ficheiro seleccionado;
Windows+Control+w - Para digitalizar e reconhecer um documento através do scanner.

Depois é só esperar que se abra o ficheiro ocr.pdf.
Depois é só esperar que se abra o ficheiro ocr.txt.
Se pretender preservar o texto reconhecido, não se esqueça de guardar o documento com outro nome e noutro local, pois todos os ficheiros da pasta temporária são eliminados no início do próximo processo de OCR!


Expand Down Expand Up @@ -174,4 +174,4 @@ Este extra suporta os seguintes tipos de ficheiros:
* webp


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
6 changes: 3 additions & 3 deletions addon/doc/pt_PT/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Autores: Rui Fontes, Ângelo Abrantes e Abel Passos do Nascimento Jr.
* Actualizado em 04/09/2023
* Actualizado em 12/11/2023
* Descarregar [versão estável][1]
Compatibilidade: NVDA versão 2019.3 e posteriores

Expand Down Expand Up @@ -33,7 +33,7 @@ Windows+Control+w - Para digitalizar e reconhecer um documento através do scann
Windows+Control+c - Para cancelar o processo de digitalização.
Nota: Tem de ser executado antes de aparecer a caixa de diálogo que pergunta se pretende digitalizar mais páginas!

Depois é só esperar que a mensagem navegável apareça com o texto reconhecido.
Depois é só esperar que o ficheiro ocr.txt apareça com o texto reconhecido.
Se pretender preservar o texto reconhecido, não se esqueça de guardar o documento com outro nome e noutro local, pois todos os ficheiros da pasta temporária são eliminados no início do próximo processo de OCR!

Estes comandos podem ser modificados na caixa de diálogo \"Definir comandos\" na secção \"TesseractOCR\".
Expand Down Expand Up @@ -178,4 +178,4 @@ Este extra suporta os seguintes tipos de ficheiros:
* spix
* webp

[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/ru/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
TesseractOCR: Дополнение для распознавания текста.

* Авторы: Rui Fontes, Ângelo Abrantes и Abel Passos do Nascimento Jr.
* Обновлено 04/09/2023
* Обновлено 12/11/2023
* Скачать [стабильную версию][1]
* Совместимость: NVDA версии 2019.3 и новее
* [Страница дополнения на GitHub](https://github.com/ruifontes/tesseractOCR)
Expand Down Expand Up @@ -119,4 +119,4 @@ TesseractOCR: Дополнение для распознавания текст
* Русский язык: Валентин Куприянов.


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/tr/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Yazarlar: Rui Fontes, Ângelo Abrantes and Abel Passos do Nascimento Jr.
* 04/09/2023'de güncellendi
* 12/11/2023'de güncellendi
* [Kararlı sürümü indirin][1]
* Uyumluluk: NVDA sürüm 2019.3 ve sonrası

Expand Down Expand Up @@ -183,4 +183,4 @@ Bu eklenti aşağıdaki dosya türlerini destekler:
* webp,


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
4 changes: 2 additions & 2 deletions addon/doc/uk/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


* Автори: Rui Fontes, Ângelo Abrantes і Abel Passos do Nascimento Jr.
* Оновлено 04/09/2023
* Оновлено 12/11/2023
* Завантажити [стабільну версію][1]
* Сумісність: NVDA версія 2019.3 і вище

Expand Down Expand Up @@ -183,4 +183,4 @@ Windows+Control+c - для скасування процесу скануван
* webp


[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.09.26/tesseractOCR-2023.09.26.nvda-addon
[1]: https://github.com/ruifontes/tesseractOCR/releases/download/2023.11.12/tesseractOCR-2023.11.12.nvda-addon
16 changes: 10 additions & 6 deletions addon/globalPlugins/tesseractOCR/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,24 +163,24 @@ def OCR_image_files(self, path):
# Removing the extension in order to use the file name as new file name with the txt extension
jpgFilePath = "\"" + jpgFilePath[:-4] + "\""
global scanning, endTask, pngFilesPath
# If we are digitalizing from scanner is better have auto-orientation of the text...
if doc == 1:
global lang
lang = "osd" + "+" + lang
self.ocr = runInThread.RepeatBeep(delay=2.0, beep=(300, 300), isRunning=None)
self.ocr.start()
# Perform OCR to the selected image file
# Different command for scanned documents or files
if scanning == True:
# If we are digitalizing from scanner is better have auto-orientation of the text...
lang = "osd" + "+" + lang
# Thai language is writen without spaces, so it is necessary the parameter "preserve_interword_spaces=1"
if "tha" in lang:
command = "{} {} {} --dpi 300 --psm {} --oem 1 -c preserve_interword_spaces=1 -l {} quiet".format(tesseractPath, path, jpgFilePath, doc, lang)
command = "{} {} {} --dpi 300 --psm 1 --oem 1 -c preserve_interword_spaces=1 -l {} quiet".format(tesseractPath, path, jpgFilePath, lang)
else:
command = "{} {} {} --dpi 300 --psm {} --oem 1 -c tessedit_do_invert=0 -l {} quiet".format(tesseractPath, path, jpgFilePath, doc, lang)
command = "{} {} {} --dpi 300 --psm 1 --oem 1 -c tessedit_do_invert=0 -l {} quiet".format(tesseractPath, path, jpgFilePath, lang)
self.backgroundProcessing(command)
self.ocr.stop()
self.creatTXTFromVariousTXT()
else:
if doc == 1:
lang = "osd" + "+" + lang
# Thai language is writen without spaces, so it is necessary the parameter "preserve_interword_spaces=1"
if "tha" in lang:
command = "{} {} {} --dpi 300 --psm {} --oem 1 -c preserve_interword_spaces=1 -l {} quiet".format(tesseractPath, path, pngFilesPath, doc, lang)
Expand Down Expand Up @@ -224,6 +224,9 @@ def backgroundProcessing(self, command):
def showResults(self):
self.ocr.stop()
from .vars import lang
# Opening the TXT file with OCR results.
z = ctypes.windll.shell32.ShellExecuteW(None, "open", ocrTxtPath, None, None, 10)
"""
# Getting the content of the TXT file to show it in a HTML message
with open(os.path.join(PLUGIN_DIR, "images", "ocr.txt"), "r", encoding = "utf-8") as f:
text = f.readlines()
Expand All @@ -240,6 +243,7 @@ def showResults(self):
"TesseractOCR",
True
)
"""

def doRoutines(self):
self.conv = runInThread.RepeatBeep(delay=2.0, beep=(200, 200), isRunning=None)
Expand Down
Loading

0 comments on commit 1544d98

Please sign in to comment.