Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: switch to pdftohtml for pdf to html conversions #998

Merged
merged 2 commits into from
Mar 29, 2024
Merged

Conversation

sbplat
Copy link
Member

@sbplat sbplat commented Mar 29, 2024

Description

Switch from LibreOffice to pdftohtml for PDF to HTML conversions.

Closes #567

Checklist:

  • I have read the Contribution Guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

Contributor License Agreement

By submitting this pull request, I acknowledge and agree that my contributions will be included in Stirling-PDF and that they can be relicensed in the future under the MPL 2.0 (Mozilla Public License Version 2.0) license.

(This does not change the general open-source nature of Stirling-PDF, simply moving from one license to another license)

@Frooodle
Copy link
Member

Frooodle commented Mar 29, 2024

is pdftohtml in docker image?

@Frooodle
Copy link
Member

Also sonarcloud can be ignored, its more just for visability to judge yourself if you want to do code changes

@sbplat
Copy link
Member Author

sbplat commented Mar 29, 2024

is pdftohtml in docker image?

Not sure. Should add it to be safe.

@Frooodle
Copy link
Member

I only want to add depending on how large the dependencies are vs the results it has etc

I know the optional install we have of calibre could also support this

@sbplat
Copy link
Member Author

sbplat commented Mar 29, 2024

The optional install of calibre uses poppler to convert pdf to html, so it should be the same.

@sbplat sbplat marked this pull request as ready for review March 29, 2024 20:33
@sbplat sbplat requested a review from Frooodle as a code owner March 29, 2024 20:33
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots
33.3% Duplication on New Code (required ≤ 3%)
E Reliability Rating on New Code (required ≥ D)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link
Member

@Frooodle Frooodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sbplat sbplat merged commit dfb8c64 into main Mar 29, 2024
3 of 4 checks passed
szinn referenced this pull request in szinn/k8s-homelab Apr 3, 2024
…0.22.7 ) (#3352)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[ghcr.io/stirling-tools/s-pdf](https://github.com/Stirling-Tools/Stirling-PDF)
| patch | `0.22.6` -> `0.22.7` |

---

### Release Notes

<details>
<summary>Stirling-Tools/Stirling-PDF
(ghcr.io/stirling-tools/s-pdf)</summary>

###
[`v0.22.7`](https://github.com/Stirling-Tools/Stirling-PDF/releases/tag/v0.22.7):
0.22.7 Bug fixes for conversions and lang updates

[Compare
Source](https://github.com/Stirling-Tools/Stirling-PDF/compare/v0.22.6...v0.22.7)

## Bug fixes

- pdftohtml fixes (thanks [@&#8203;sbplat](https://github.com/sbplat)
)
-   Word docs now support chinese (WIP)
-   PDF to word fixes (the 1 page issue)

## Other

-   Language updates!
- Error page now supports translations (thanks
[@&#8203;cocomastergo](https://github.com/cocomastergo))
-   Lite docker image removed

#### What's Changed

- fix: switch to pdftohtml for pdf to html conversions by
[@&#8203;sbplat](https://github.com/sbplat) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/998](https://github.com/Stirling-Tools/Stirling-PDF/pull/998)
- Update messages_ru_RU.properties by
[@&#8203;cocomastergo](https://github.com/cocomastergo) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/999](https://github.com/Stirling-Tools/Stirling-PDF/pull/999)
- Update messages_it_IT.properties by
[@&#8203;albanobattistella](https://github.com/albanobattistella) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1002](https://github.com/Stirling-Tools/Stirling-PDF/pull/1002)
- doc: add --break-system-packages by
[@&#8203;NicolasFR](https://github.com/NicolasFR) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1001](https://github.com/Stirling-Tools/Stirling-PDF/pull/1001)
- Extract text from error pages and sync text variable to all lang by
[@&#8203;cocomastergo](https://github.com/cocomastergo) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1008](https://github.com/Stirling-Tools/Stirling-PDF/pull/1008)
- Update messages_it_IT.properties by
[@&#8203;albanobattistella](https://github.com/albanobattistella) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1010](https://github.com/Stirling-Tools/Stirling-PDF/pull/1010)
- Update messages_fr_FR.properties by
[@&#8203;NicolasFR](https://github.com/NicolasFR) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1011](https://github.com/Stirling-Tools/Stirling-PDF/pull/1011)
- remove lite package by
[@&#8203;Frooodle](https://github.com/Frooodle) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1012](https://github.com/Stirling-Tools/Stirling-PDF/pull/1012)
- Chinese font and word conv fix by
[@&#8203;Frooodle](https://github.com/Frooodle) in
[https://github.com/Stirling-Tools/Stirling-PDF/pull/1014](https://github.com/Stirling-Tools/Stirling-PDF/pull/1014)

**Full Changelog**:
Stirling-Tools/Stirling-PDF@v0.22.6...v0.22.7

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
@Frooodle Frooodle deleted the pdftohtml branch May 6, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pdf 2 html: All on one page.
2 participants