Skip to content

Commit

Permalink
Move API docs to Swagger (#58)
Browse files Browse the repository at this point in the history
Update the README docs, and move those about the API into method
annotations in the OcrController so they can be exposed via
auto-generated API docs with NelmioApiDocBundle.

Bug: https://phabricator.wikimedia.org/T285513
  • Loading branch information
samwilson authored Aug 26, 2021
1 parent ca9fe3c commit 0f63ebc
Show file tree
Hide file tree
Showing 13 changed files with 694 additions and 300 deletions.
1 change: 1 addition & 0 deletions .phpcs.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@
<exclude-pattern>./node_modules/</exclude-pattern>
<exclude-pattern>./bin/.phpunit/</exclude-pattern>
<exclude-pattern>./public/build/</exclude-pattern>
<exclude-pattern>./public/bundles/</exclude-pattern>
<exclude-pattern>./assets/</exclude-pattern>
</ruleset>
2 changes: 2 additions & 0 deletions .stylelintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
public/bundles/

68 changes: 11 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,14 @@
Wikisource Google OCR tool
==========================
Wikimedia OCR
=============

![CI](https://github.com/wikimedia/wikimedia-ocr/workflows/CI/badge.svg)

Main documentation: https://wikisource.org/wiki/Wikisource:Google_OCR

This is a simple wrapper service around the Google Cloud Vision API,
enabling Wikisources to submit images for Optical Character Recognition
and retrieve the resultant text.

This works with more languages than the alternative service at https://tools.wmflabs.org/phetools
(used by e.g. https://wikisource.org/wiki/MediaWiki:OCR.js and similar scripts
on other Wikisources).

Requests can only be for images hosted on Commons.

## Usage

Send up to two parameters to `api.php`:

https://example.org/api.php?langs[]=[LANG_CODE_1]&langs[]=[LANG_CODE_2]&image=[IMAGE_URL]

And get back a JSON response with either 'text' or 'error' top-level items set:

{
'text': 'Lorem ipsum...',
'error': {
'code': '',
'message': ''
}
}
A web service and UI for providing OCR text from images hosted on MediaWiki wikis.
Can be integrated into the [ProofreadPage extension](https://www.mediawiki.org/wiki/Extension:ProofreadPage)
via the [Wikisource extension](https://www.mediawiki.org/wiki/Extension:Wikisource).

### Languages
Documentation:
* For system administrators: https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/Wikimedia_OCR
* For Wikisource users: https://www.mediawiki.org/wiki/Help:Extension:Wikisource/Wikimedia_OCR
* Of the API: https://ocr.wmcloud.org/api/doc
* For contributors: [CONTRIBUTING.md](https://github.com/wikimedia/wikimedia-ocr/blob/main/CONTRIBUTING.md)

#### Google

Note that you should only set the `lang` parameter for languages that require it.
The [documentation](https://cloud.google.com/vision/reference/rest/v1/images/annotate#imagecontext) informs us of the following:

> In most cases, an empty value yields the best results since it enables automatic language detection.
> For languages based on the Latin alphabet, setting languageHints is not needed.
> In rare cases, when the language of the text in the image is known, setting a hint will help get better results
> (although it will be a significant hindrance if the hint is wrong).
> Text detection via the web interface returns an error if one or more of the specified languages is not
> one of the [supported languages](https://cloud.google.com/vision/docs/languages). API requests will succeed
> with a warning reporting invalid languages.
#### Tesseract

Languages supported by Tesseract are [listed in the user manual](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html).

## Contributing

See [CONTRIBUTING.md](https://github.com/wikimedia/wikimedia-ocr/blob/main/CONTRIBUTING.md)

## External links

* https://phabricator.wikimedia.org/T142768
* https://github.com/wikisource/google-cloud-vision-php
![CI](https://github.com/wikimedia/wikimedia-ocr/workflows/CI/badge.svg)
4 changes: 3 additions & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
"php": ">=7.2.5",
"ext-bcmath": "*",
"ext-ctype": "*",
"ext-iconv": "*",
"ext-gd": "*",
"ext-iconv": "*",
"ext-json": "*",
"google/cloud-vision": "^1.3",
"imagine/imagine": "^1.2",
"nelmio/api-doc-bundle": "^4.4",
"sensio/framework-extra-bundle": "^6.1",
"symfony/cache": "5.2.*",
"symfony/console": "5.2.*",
Expand All @@ -20,6 +21,7 @@
"symfony/framework-bundle": "5.2.*",
"symfony/mailer": "^5.2",
"symfony/monolog-bundle": "^3.7",
"symfony/property-info": "5.2.*",
"symfony/twig-bundle": "5.2.*",
"symfony/webpack-encore-bundle": "^1.11",
"symfony/yaml": "5.2.*",
Expand Down
Loading

0 comments on commit 0f63ebc

Please sign in to comment.