Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Docker and GHA CD via Dockerhub #70

Merged
merged 16 commits into from
Oct 16, 2024
Merged

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented Sep 30, 2024

No description provided.

Copy link
Collaborator

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I am aware that it has been deprecated for a while but until we have the binarization functionality in eynollah available via OCR-D interface, this is still one of the best binarizers out there and we should support it as a slim container in ocrd_all.

@bertsky
Copy link
Contributor Author

bertsky commented Oct 11, 2024

Py3.9 and 3.10 failures are extremely strange:

>>> import sbb_binarize.cli
>>> type(sbb_binarize.cli)
>>> sbb_binarize.cli is None
>>> import inspect
>>> inspect.ismodule(sbb_binarize.cli)
>>>

So it seems that Python has some kind of null type not made explicit. No idea what's wrong with the packaging. The RECORD seems fine, it lists all module files.

@bertsky
Copy link
Contributor Author

bertsky commented Oct 14, 2024

Py3.9 and 3.10 failures are extremely strange:

>>> import sbb_binarize.cli
>>> type(sbb_binarize.cli)
>>> sbb_binarize.cli is None
>>> import inspect
>>> inspect.ismodule(sbb_binarize.cli)
>>>

So it seems that Python has some kind of null type not made explicit. No idea what's wrong with the packaging. The RECORD seems fine, it lists all module files.

Migrating from setup.py to pyproject.toml did not help.

@bertsky
Copy link
Contributor Author

bertsky commented Oct 14, 2024

I'll also relax TF requirement slightly (to be in line with eynollah) – tested: no different between 2.11 and 2.12.

@bertsky
Copy link
Contributor Author

bertsky commented Oct 14, 2024

oh, wow – relaxing the TF requirements also magically fixed the problem with Py3.9 and 3.10! (I wonder why we did not receive a build-time error saying that no such TF version is available, though...)

Copy link
Contributor

@joschrew joschrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dockerimage does not work for me without these changes. I testet ocrd-sbb-binarize --help and a simple binarization, which took quite long (72 secs for a single image).
Maybe it would also be good to include one ore more models into the dockerimage, because I had to download one by myself with resmgr, but I am not sure if this should be included.

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
bertsky and others added 2 commits October 15, 2024 12:41
Co-authored-by: joschrew <91774427+joschrew@users.noreply.github.com>
Co-authored-by: joschrew <91774427+joschrew@users.noreply.github.com>
@bertsky
Copy link
Contributor Author

bertsky commented Oct 15, 2024

which took quite long (72 secs for a single image).

yes, without a GPU this is slow, and even with it is not efficient. But that's well known and not the issue here.

Maybe it would also be good to include one ore more models into the dockerimage, because I had to download one by myself with resmgr, but I am not sure if this should be included.

No, that would just bloat the image. Downloading models in advance of using is the normal procedure for OCR-D. We don't want to reign in the admin's decision where to store persistent data (like models, which should not change as often as code).

Also I am wondering if the image should me based on the special ocr-d-tensorflow image as tensorflow is a requirement. But I might be totally wrong regarding the latter.

In principle, yes. But since here, TF is pinned to <=2.12, while our current core-cuda-tf2 is unrestricted on Py38 (and thus pulls 2.13) … https://github.com/OCR-D/core/blob/85bde1574293ea8b7ba29255fbb8e07312c28eb1/Makefile#L153-L158 … it would not help (only increase the image size even more).

But we should at least switch to core-cuda then...

@bertsky
Copy link
Contributor Author

bertsky commented Oct 15, 2024

So this is ready IMO. Fixes #67 and also brings support for Python 3.9 and 3.10.

@cneud
Copy link
Member

cneud commented Oct 16, 2024

Dear all, thx - I can merge this today but would have 2 more small requests:

  • can we also include Python 3.11 in the CI please
  • Tensorflow can be relaxed to 2.12.x iiuc

@bertsky
Copy link
Contributor Author

bertsky commented Oct 16, 2024

@cneud done!

  • can we also include Python 3.11 in the CI please

let's see if it works

  • Tensorflow can be relaxed to 2.12.x iiuc

IMO it would make sense now to relax Eynollah's current pin (2.12.1) as well (2.12.x) – so we don't reintroduce the conflict if some TF 2.12.2 should be published

@cneud
Copy link
Member

cneud commented Oct 16, 2024

relax Eynollah's current pin (2.12.1) as well (2.12.x)

Will do and thanks again! Ready to merge in 3, 2, 1...

@cneud cneud merged commit 5385162 into qurator-spk:master Oct 16, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants