Releases: py-pdf/pypdf
Releases · py-pdf/pypdf
Version 5.2.0, 2025-01-26
What's new
Deprecations (DEP)
- Deprecate with replacement CCITParameters (#3019) by @j-t-1
- Correct deprecation of interiour_color (#2947) by @j-t-1
New Features (ENH)
- Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846
- Adding support for reading .metadata.keywords (#2939) by @Lucas-C
Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin
- Ensure
add_metadata
can deal with_info = None
(#3040) by @xmo-odoo - Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846
- Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846
- Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846
- Fixed reference of value with
str.__new__
in TextStringObject (#2952) by @thomas-forte - Handle indirect objects in font width calculations (#2967) by @nsw42
- Title sometimes is bytes and not str (#2930) by @reformy
- Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846
- Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn
Robustness (ROB)
- Handle zero height fonts when extracting text (#3075) by @blushingpenguin
- Deal with content streams not containing streams (#3005) by @stefan6419846
- Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846
- Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846
- Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846
- Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846
- Skip destination page being None in PdfWriter (#2963) by @dxsooo
- Fix infinite loop case when reading null objects within an Array by @jakep-allenai
- Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai
Documentation (DOC)
- Add note about default line colors (#3014) by @stefan6419846
Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1
- Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1
- Move Windows CI to Python 3.13 (#3003) by @stefan6419846
- Move to Ubuntu 22.04 (#3004) by @stefan6419846
Maintenance (MAINT)
- Fix formatting of warning message and include exception message (#3076) by @stefan6419846
- Narrow return type for
ContentStream.operations
(#2941) by @kmurphy4
Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846
- Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846
Code Style (STY)
Version 5.1.0, 2024-10-27
What's new
New Features (ENH)
- Add
layout_mode_font_height_weight
argument toPageObject.extract_text()
(#2920) by @hpierre001
Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893) by @ssjkamei
- Line breaks are not generated due to incorrect calculation of text leading (#2890) by @ssjkamei
- Improve handling of spaces in text extraction (#2882) by @ssjkamei
Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900) by @stefan6419846
Documentation (DOC)
- Use latest package versions (#2907) by @stefan6419846
- Correct example of reading FileAttachment annotation (#2906) by @j-t-1
Developer Experience (DEV)
- Update pinned requirements (#2918) by @stefan6419846
- Make make_release.py compatible with Windows environment (#2894) by @pubpub-zz
Maintenance (MAINT)
- Remove references to outdated Python versions (#2919) by @stefan6419846
- Generalize the method of obtaining space_code (#2891) by @ssjkamei
- Unnecessary character mapping process (#2888) by @ssjkamei
- New LZW decoding implementation (#2887) by @MartinThoma
Testing (TST)
- Add LzwCodec for encoding (#2883) by @MartinThoma
Code Style (STY)
Version 5.0.1, 2024-09-29
Version 5.0.1, 2024-09-29
New Features (ENH)
- Add
full
parameter to PdfWriter constructor (#2865)
Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)
Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)
Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)
Maintenance (MAINT)
Version 5.0.0, 2024-09-17
Version 5.0.0, 2024-09-17
This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead).
Deprecations (DEP)
- Remove the deprecated PfdMerger and AnnotationBuilder classes and other deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)
New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)
Bug Fixes (BUG)
- Fix sheared image (#2801)
Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)
Developer Experience (DEV)
Version 4.3.1, 2024-07-21
Version 4.3.0, 2024-07-14
What's new
New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz
- Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz
- context manager for PdfReader (#2666) by @tibor-reiss
- Add capability to set font and size in fields (#2636) by @pubpub-zz
- Allow to pass input file without named argument (#2576) by @pubpub-zz
Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz
- Reading large compressed images takes huge time to process (#2644) by @snanda85
- Highlighted Text Cannot Be Printed (#2604) by @Nifury
- Fix UnboundLocalError on malformed pdf (#2619) by @farjasju
Documentation (DOC)
- Various improvements on docstrings and examples by @j-t-1
Robustness (ROB)
- Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz
- Improve inline image extraction (#2622) by @pubpub-zz
- Cope with loops in Fields tree (#2656) by @pubpub-zz
- Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz
- Cope with some issues in pillow (#2595) by @pubpub-zz
- Cope with some image extraction issues (#2591) by @pubpub-zz
Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1
- Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1
Code Style (STY)
Version 4.2.0, 2024-04-07
What's new
New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream (#2585) by @pubpub-zz
- Add support for /Kids in page labels (#2562) by @stefan6419846
- Allow to update fields on many pages (#2571) by @pubpub-zz
- Tolerate PDF with invalid xref pointed objects (#2335) by @pubpub-zz
- Add Enforce from PDF2.0 in viewer_preferences (#2511) by @pubpub-zz
- Add += and -= operators to ArrayObject (#2510) by @pubpub-zz
Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ' (#2588) by @rfotino
- Fix fields update where annotations are kids of field (#2570) by @pubpub-zz
- Process CMYK images without a filter correctly (#2557) by @pubpub-zz
- Extract text in layout mode without finding resources (#2555) by @pubpub-zz
- Prevent recursive loop in some PDF files (#2505) by @pubpub-zz
Robustness (ROB)
- Tolerate "truncated" xref (#2580) by @pubpub-zz
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334) by @pubpub-zz
- Rebuild xref table if one entry is invalid (#2528) by @pubpub-zz
- Robustify stream extraction (#2526) by @pubpub-zz
Documentation (DOC)
- Update release process for latest changes (#2564) by @stefan6419846
- Encryption/decryption: Clone document instead of copying all pages (#2546) by @redfast00
- Minor improvements (#2542) by @j-t-1
- Update annotation list (#2534) by @j-t-1
- Update references and formatting (#2529) by @j-t-1
- Correct threads reference, plus minor changes (#2521) by @j-t-1
- Minor readability increases (#2515) by @j-t-1
- Simplify PaperSize examples (#2504) by @j-t-1
- Minor improvements (#2501) by @j-t-1
Developer Experience (DEV)
- Remove unused dependencies (#2572) by @stefan6419846
- Remove page labels PR link from message (#2561) by @stefan6419846
- Fix changelog generator regarding whitespace and handling of "Other" group (#2492) by @stefan6419846
- Add REL to known PR prefixes (#2554) by @stefan6419846
- Release using the REL commit instead of git tag (#2500) by @MartinThoma
- Unify code between PdfReader and PdfWriter (#2497) by @pubpub-zz
- Bump softprops/action-gh-release from 1 to 2 (#2514) by @dependabot[bot]
Maintenance (MAINT)
- Ressources → Resources (and internal name childs) (#2550) by @pubpub-zz
- Fix typos found by codespell (#2549) by @stefan6419846
- Update Read the Docs configuration (#2538) by @j-t-1
- Add root_object, _info and _ID to PdfReader (#2495) by @pubpub-zz
Testing (TST)
- Allow loading truncated images if required (#2586) by @stefan6419846
- Fix download issues from #2562 (#2578) by @pubpub-zz
- Improve test_get_contents_from_nullobject to show real use-case (#2524) by @stefan6419846
- Add missing test annotations (#2507) by @stefan6419846
Version 4.1.0, 2024-03-03
What's new
Generating name objects (NameObject
) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a DeprecationWarning for now.
New Features (ENH)
- Add get_pages_from_field (#2494) by @pubpub-zz
- Add reattach_fields function (#2480) by @pubpub-zz
- Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz
Bug Fixes (BUG)
- missing error on name without leading / (#2387) by @Rak424
- encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon
- BI in text content identified as image tag (#2459) by @pubpub-zz
Robustness (ROB)
- Missing basefont entry in type 3 font (#2469) by @pubpub-zz
Documentation (DOC)
Developer Experience (DEV)
- Fix changelog for UTF-8 characters (#2462) by @stefan6419846
Maintenance (MAINT)
- Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz
- Remove user assignment for feature requests (#2483) by @stefan6419846
- Remove reference to old 2.0.0 branch (#2482) by @stefan6419846
Testing (TST)
- Fix benchmark failures (#2481) by @stefan6419846
- Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon
Version 4.0.2, 2024-02-18
What's new
Bug Fixes (BUG)
- Use NumberObject for /Border elements of annotations (#2451) by @rsinger417
Documentation (DOC)
- Document easier way to update metadata (#2454) by @stefan6419846
- Typo
Polyline
\xe2\x86\x92PolyLine
in adding-pdf-annotations.md (#2426) by @CWKSC
Developer Experience (DEV)
- Bump codecov/codecov-action from 3 to 4 (#2430) by @dependabot[bot]
Testing (TST)
- Avoid catching not emitted warnings (#2429) by @stefan6419846
Version 4.0.1, 2024-01-28
What's new
Bug Fixes (BUG)
Testing (TST)
- Skip tests using fpdf2 if it's not installed (#2419) by @MartinThoma