Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch images to alpine #624

Merged
merged 8 commits into from
Feb 9, 2024
Merged

switch images to alpine #624

merged 8 commits into from
Feb 9, 2024

Conversation

Zoey2936
Copy link
Contributor

License Agreement for Contributions

By submitting this pull request, I acknowledge and agree that my contributions will be included in Stirling-PDF and that they can be relicensed in the future under MPL 2.0 (Mozilla Public License Version 2.0) license.

(This does not change the general open-source nature of Stirling-PDF, simply moving from one license to another license)

@Zoey2936 Zoey2936 requested a review from Frooodle as a code owner December 31, 2023 14:55
@Frooodle
Copy link
Member

Wow this is awesome work I could never get alpine images working due to inexperienced docker skills

Have you been able to test these and the functionality of python/libre office etc?

@Frooodle
Copy link
Member

Just ran a quick run of all 3 docker files

Ultra lite seems to be missing java

@Frooodle
Copy link
Member

Cool with that it seems they all load to the homepage
image

I will do some tests on OCR and conversions just to see if anything else wrong in parallel with your testing etc

@Zoey2936
Copy link
Contributor Author

From my tests (of the full image) everything works, except one thing; If I want to convert a pdf to word, I see in the docker logs: Error: source file could not be loaded and it downloads an empty zip file

@Frooodle
Copy link
Member

Frooodle commented Dec 31, 2023

doing docker compose i can expose the port
but when running in docker desktop i get
image
Without the Expose 8080

What is the normal expectation for docker files

@Zoey2936
Copy link
Contributor Author

docker run --rm -it p 8080:8080 works for me on docker desktop

@Frooodle
Copy link
Member

(Meaning i am unable to run it and access 8080)

@Frooodle
Copy link
Member

yeah works via CMD for me as well, just find it odd that docker desktop UI doesnt work with it or let you define ports, so was wondering if this is something non standard

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Dec 31, 2023

I've never used EXPOSE in a Dockerfile since it does nothing (usefull): https://docs.docker.com/engine/reference/builder/#expose

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Dec 31, 2023

I've tried again, it downloads now the converted document, but it is unuseable (pdf => html):
grafik

@Frooodle
Copy link
Member

Something is defo off with the outputs when compared to old version
(You can test here)
https://pdf.adminforge.de/pdf-to-word
pdf to work on your version gives me
image
vs
image

seems its just loaded the PDF file bytes into .doc format not actual converting

@Frooodle
Copy link
Member

i see libreoffice-core is not avaible, i tried just 'libreoffice'
and get a slightly better conversion now.. all 3 pages are overlayed onto single page and rotation is not kept.. so something weird..
but clearly something different about the libre package

image

@Zoey2936
Copy link
Contributor Author

I've found the error

Dockerfile-lite Outdated
python3 && \
wget https://bootstrap.pypa.io/get-pip.py -qO - | python3 - --break-system-packages --no-cache-dir --upgrade && \
# uno unoconv and HTML
pip install --break-system-packages --no-cache-dir --upgrade uno unoconv WeasyPrint && \
Copy link
Contributor Author

@Zoey2936 Zoey2936 Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added WeasyPrint since it seemed to be small, no idea what it does / if it is needed / adds more features

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be mentioned in the versions guide

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weasyprint is for html to pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I keep it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i dont want to lose any functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because lite didn't included Weasyprint before

@Frooodle
Copy link
Member

So right now the conversions are not happening in a way which i am happy with, it seems to throw everything onto single page, not sure why... Until then we wont be able to merge this.
However super appreciate your work, will try see whats up with libreoffice

@Zoey2936
Copy link
Contributor Author

it seems to be a bug in libreoffice, so it will also affect debian/ubuntu when they will update in the future

@Frooodle
Copy link
Member

If it's the version and not general package can we pull the older version?

@Zoey2936
Copy link
Contributor Author

v7.3.7.2-r0 and earlier of libreoffice works, everything after v7.5.5.2-r0 not, so the bug seems to be added with v7.4 or v7.5

@Zoey2936
Copy link
Contributor Author

https://packages.ubuntu.com/jammy/libreoffice currently the dokcer image uses v7.3

@Zoey2936
Copy link
Contributor Author

using alpine:3.17 instead of :latest makes it work for lite and normal, ultra lite can stay at latest, but 3.17 is two versions old...

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Jan 1, 2024

https://mirror1.hs-esslingen.de/pub/Mirrors/tdf/libreoffice/src/bugs-changelog-tag-libreoffice-7.6.4.1-release-7.6.4.1.log
"PDF: Conversion of pdf to docx or doc collapses all content onto one page (tdf#157589) [Kevin Suo]"

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Jan 1, 2024

https://gitlab.alpinelinux.org/alpine/aports/-/issues/15628

COPY scripts/init-without-ocr.sh /scripts/init-without-ocr.sh
COPY pipeline /pipeline
COPY build/libs/*.jar app.jar

# Create user and group using Alpine's addgroup and adduser
#RUN addgroup -g $PGID stirlingpdfgroup && \
# adduser -u $PUID -G stirlingpdfgroup -s /bin/sh -D stirlingpdfuser && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why this is disabled? Can it be removed? Or should I try to get it working in PR after this is merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously i had some issues with it and not had time to resolve
I would love to have correct permissions and user applied

@Frooodle
Copy link
Member

Frooodle commented Jan 1, 2024

using alpine:3.17 instead of :latest makes it work for lite and normal, ultra lite can stay at latest, but 3.17 is two versions old...

Based on this and you using 3.19 is this PR now blocked on resolution of alpine updating its package manager?

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Jan 1, 2024

using alpine:3.17 instead of :latest makes it work for lite and normal, ultra lite can stay at latest, but 3.17 is two versions old...

Based on this and you using 3.19 is this PR now blocked on resolution of alpine updating its package manager?

yes blocked until they fix libreoffice, but from what I know, they are fast in fixing this

@Zoey2936
Copy link
Contributor Author

should now work, I will create a PR when a fixed libreoffice verion from outside edge can be used

@Frooodle
Copy link
Member

wkhtmltopdf is removed in newer alpine versions... - but I calibre can be added by running apk add --no-cache calibre@testing
I've looked into the https://github.com/Stirling-Tools/Stirling-PDF/blob/main/src/main/java/stirling/software/SPDF/config/PostStartupProcesses.java - I think it would need to be changed completely to work with alpine - and I want to ask why you set the timezone in it?

I had issues with calibre install requiring user prompts without it

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Feb 1, 2024

what do you mean? maybe try adding --no-interactive

@Frooodle
Copy link
Member

Frooodle commented Feb 1, 2024

what do you mean? maybe try adding --no-interactive

that didnt work for me, at least via ubuntu apt

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Feb 2, 2024

I talk about apk

@Frooodle
Copy link
Member

Frooodle commented Feb 6, 2024

Okay lets remove wkhtmltopdf completly thats fine, ill merge your changes today or tomorrow and remove wkhtmltopdf as well as add
apk add --no-cache calibre@testing

@Frooodle
Copy link
Member

Frooodle commented Feb 6, 2024

was just rebuilding the dockerbaseFile (I think we can remove this base file now doesnt serve much purpose and move it all to dockerFile)

I got
2.066 fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
2.242 ERROR: unable to select packages:
2.245 so:libpoppler.so.133 (no such package):
2.245 required by: libreoffice-draw-7.6.4.1-r0[so:libpoppler.so.133]

Edit: fixed adding poppler to install list

@Zoey2936
Copy link
Contributor Author

Zoey2936 commented Feb 7, 2024

please retry

@Frooodle Frooodle merged commit f211eef into Stirling-Tools:main Feb 9, 2024
2 of 3 checks passed
@Frooodle
Copy link
Member

https://github.com/Stirling-Tools/Stirling-PDF/actions/runs/7950256132/job/21702556824

gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DFFI_BUILDING=1 -I/usr/include/ffi -I/usr/include/libffi -I/usr/include/python3.11 -c src/c/_cffi_backend.c -o build/temp.linux-aarch64-cpython-311/src/c/_cffi_backend.o
#22 98.48 error: command 'gcc' failed: No such file or directory
#22 98.48 [end of output]
#22 98.48
#22 98.48 note: This error originates from a subprocess, and is likely not a problem with pip.
#22 98.48 ERROR: Failed building wheel for cffi
#22 98.48 Failed to build cffi
#22 98.48 ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects
#22 ERROR: process "/bin/sh -c echo "@testing [https://dl-cdn.alpinelinux.org/alpine/edge/main](https://dl-cdn.alpinelinux.org/alpine/edge/main/)" | tee -a /etc/apk/repositories && echo "@testing [https://dl-cdn.alpinelinux.org/alpine/edge/community](https://dl-cdn.alpinelinux.org/alpine/edge/community/)" | tee -a /etc/apk/repositories && echo "@testing [https://dl-cdn.alpinelinux.org/alpine/edge/testing](https://dl-cdn.alpinelinux.org/alpine/edge/testing/)" | tee -a /etc/apk/repositories && apk add --no-cache ca-certificates tzdata tini bash curl openjdk17-jre libreoffice@testing python3 && wget https://bootstrap.pypa.io/get-pip.py -qO - | python3 - --break-system-packages --no-cache-dir --upgrade && pip install --break-system-packages --no-cache-dir --upgrade unoconv WeasyPrint && mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders && fc-cache -f -v && chmod +x /scripts/*.sh" did not complete successfully: exit code: 1

Any idea? works for other images

@Frooodle
Copy link
Member

just not lite

@Zoey2936
Copy link
Contributor Author

try to add build-base at the beginning using apk and remove it at the end using apk del --no-cache build-base
and please do everything inside one RUN step to make the image smaller

@Frooodle
Copy link
Member

@Zoey2936 for running as non root user how would be best with volume mounting etc?

I got stirling PDF running as none root user fine by using gosu/su-exec to switch from root to stirlingpdf user via entrypoint script
But cant get it working by doing standalone user

My concern is that i volume mount folders such as /logs which the app (started by stirling user) doesnt have permission to write to as since its volume mounted it users the hosts permissions and often uses root.

I can fix this by changing the permissions on host side but that would mess everyones installs and i wouldnt want to force people to change permissions on host level for every install

So is running root then switch to new user the only way? (since i can use root user to set volume mounted folders to stirling user after startup)
Basically

docker file

ENV PUID=1000 \
    PGID=1000 \
    UMASK=022 
RUN apk add --no-cache su-exec && \
    addgroup -g $PGID stirlingpdfgroup && \
    adduser -u $PUID -G stirlingpdfgroup -s /bin/sh -D stirlingpdfuser && \
    mkdir -p $HOME && chown stirlingpdfuser:stirlingpdfgroup $HOME
ENTRYPOINT ..... etc

entrypoint.sh

echo "Setting permissions and ownership for necessary directories..."
chown -R stirlingpdfuser:stirlingpdfgroup /logs /scripts /usr/share/fonts/opentype/noto /usr/share/tessdata /configs /customFiles
chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /usr/share/tessdata /configs /customFiles

# Run the main command and switch to stirling user for rest of run
exec su-exec stirlingpdfuser "$@"

@Zoey2936
Copy link
Contributor Author

yes the only way is to switch the user in the entrypoint

@Frooodle
Copy link
Member

Frooodle commented Apr 13, 2024

@Zoey2936 Seems 3.19.1 has the single page libreoffice bug again regardless of repository used,
If i move docker to tag 20240329 it fixes this but now this one has a issue with a python install as part of libreoffice..

0.648                py3-cffi-pyc-1.16.0-r1[python3~3.12]
0.648                py3-lxml-5.1.0-r0[python3~3.12]
0.648                py3-charset-normalizer-pyc-3.3.2-r1[python3]
0.648                py3-charset-normalizer-pyc-3.3.2-r1[python3~3.12]
0.649                py3-pygments-pyc-2.17.2-r1[python3]
0.649                py3-pygments-pyc-2.17.2-r1[python3~3.12]
0.650   so:libpython3.11.so.1.0 (no such package):
0.650     required by: libreoffice-common-7.6.4.1-r2[so:libpython3.11.so.1.0]
0.650   so:libmbedcrypto.so.7 (no such package):
0.650     required by: librist-0.2.10-r0[so:libmbedcrypto.so.7]

Any ideas?

@Zoey2936
Copy link
Contributor Author

just wait... they updated to python 3.12 and their builder currently rebuilds all packages against python 3.12

@Frooodle
Copy link
Member

Yeah first part seems fixed now thanks for bringing that to attention
And now libreoffice isnt including soffice and other executables :(

@Zoey2936
Copy link
Contributor Author

What di you mean?

@Frooodle
Copy link
Member

Frooodle commented Apr 15, 2024

libreoffice-7.6.4.1-r3
seems to be missing many of the required files for libreoffice to function making the overall install dramatically smaller (I think)

@Zoey2936
Copy link
Contributor Author

So it doesn't work?

@Zoey2936
Copy link
Contributor Author

I don't see any breaking chages in the last weeks https://gitlab.alpinelinux.org/alpine/aports/-/commits/master/community/libreoffice?ref_type=heads

@Frooodle
Copy link
Member

Frooodle commented Apr 15, 2024

How odd..
basically what i see is that after adding

openssl \
openssl-dev \

To the APK installs in current docker file as it seems to require that, when I check usr/lib/libreoffice/program/
I would normally see soffice and several others and these are suddenly missing

@Zoey2936
Copy link
Contributor Author

@Frooodle
Copy link
Member

Then i don't understand why it doesn't show :(

@Zoey2936
Copy link
Contributor Author

What do you mean exactly with "doesn't show", does running soffice return command not found?

@Frooodle
Copy link
Member

Correct and the file itself isn't in the final docker build for some reason

@Zoey2936
Copy link
Contributor Author

there is also an issue opened: https://gitlab.alpinelinux.org/alpine/aports/-/issues/16005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants