forked from qpdf/qpdf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangeLog
2599 lines (1916 loc) · 102 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2019-07-03 Jay Berkenbilt <ejb@ql.org>
* Non-compatible API change: change
QPDFOutlineDocumentHelper::getTopLevelOutlines and
QPDFOutlineObjectHelper::getKids to return a std::vector instead
of a std::list of QPDFOutlineObjectHelper objects. This is to work
around bugs with some compilers' STL implementations that are
choking with list here. There's no deep reason for these to be
lists instead of vectors. Fixes #297.
2019-06-22 Jay Berkenbilt <ejb@ql.org>
* Handle encrypted files with missing or invalid /Length entries
in the encryption dictionary.
* QPDFWriter: allow calling set*EncryptionParameters before
calling setFilename. Fixes #336.
* It now works to run --completion-bash and --completion-zsh when
qpdf is started from an AppImage.
* Provided a more useful error message when Windows can't get
security context. Thanks to user zdenop for supplying some code.
Fixes #286.
* Favor PointerHolder over manual memory allocation in shippable
code where possible. Fixes #235.
* If pkg-config is available, use it to local libjpeg and zlib. If
not, fall back to old behavior. Fixes #324.
* The "make install" target explicitly sets a mode rather than
relying the user's umask. Fixes #326.
* When a file has linearization warnings but no errors, qpdf
--check and --check-linearization now exit with code 3 instead
of 2. Fixes #50.
* Add new function QUtil::read_file_into_memory.
2019-06-21 Jay Berkenbilt <ejb@ql.org>
* When supported, qpdf builds with -fvisibility=hidden, which
removes non-exported symbols from the shared library in a manner
similar to how Windows DLLs work. This is better for performance
and also better for safety and protection of private interfaces.
See https://gcc.gnu.org/wiki/Visibility. *NOTE*: If you are
getting linker errors trying to catch exceptions or derive things
from a base class in the qpdf library, it's possible that a
QPDF_DLL_CLASS declaration is missing somewhere. Please report
this as a bug at https://github.com/qpdf/qpdf/issues.
* Source-level incompatibility: remove the version
QPDF::copyForeignObject with an unused boolean parameter. If you
were, for some reason, calling this, just take the parameter away.
* Source-level incompatibility: remove the version
QPDFTokenizer::expectInlineImage with no arguments. It didn't
produce correct inline images. This is a very low-level routine.
There is little reason to call it outside of qpdf's lexical
engine.
* Source-level incompatibility: rename QUtil::strcasecmp to
QUtil::str_compare_nocase. This is a non-compatible change, but
QUtil::strcasecmp is hardly the most important part of qpdf's API.
The reason for this change is that strcasecmp is a macro on some
systems, and that was causing problems when QUtil.hh was included
in certain circumstances. Fixes #242.
2019-06-20 Jay Berkenbilt <ejb@ql.org>
* Enable compilation with additional warnings for integer
conversion and sign (-Wsign-conversion, -Wconversion for gcc and
similar; -W3 for msvc) if supported. These warnings are on by
default can be turned off by passing --disable-int-warnings
* Fix all integer sign and conversion warnings. This makes all
integer type conversions that have potential data loss explicit
with calls that do range checks and raise an exception.
* Change out_bufsize argument to Pl_Flate's constructor for int to
unsigned int for compatibility with underlying zlib
implementation.
* Change QPDFObjectHandle::pipeStreamData's encode_flags argument
from unsigned long to int since int is the underlying type of the
enumerated type values that are passed to it. This change should
be invisible to virtually all code unless you are compiling with
strict warning flags and explicitly casting to unsigned long.
* Add methods to QPDFObjectHandle to return the value of Integer
objects as int and unsigned int with range checking and fallback
behavior to avoid silent underflow/overflow conditions.
* Add functions to QUtil to convert unsigned integers to strings,
avoiding implicit conversion between unsigned and signed integer
types.
* Add QIntC.hh, containing integer type converters that do range
checking.
2019-06-18 Jay Berkenbilt <ejb@ql.org>
* Remove previously submitted qpdf_read_memory_fuzzer as it is a
small subset of qpdf_fuzzer.
2019-06-15 Jay Berkenbilt <ejb@ql.org>
* Update CI (Azure Pipelines) to run tests with some sanitizers.
* Do "ideal integration" with oss-fuzz. This includes adding a
better fuzzer with a seed corpus and adding automated tests of the
fuzzer with the test data.
* When parsing files, while reading an object, if there are too
many consecutive errors without enough intervening successes, give
up on the specific object. This reduces cases in which very badly
damaged files send qpdf into a tail spin reading one character at
a time and reporting warnings.
2019-06-13 Jay Berkenbilt <ejb@ql.org>
* Perform initial integration of Google's oss-fuzz project by
copying the fuzzer someone from Google already did into the qpdf
repository and adding build support. This shift in control is in
preparation for an ideal integration with oss-fuzz.
2019-06-09 Jay Berkenbilt <ejb@ql.org>
* When /DecodeParms is an empty list, ignore it on read and delete
it on write. Fixes #331.
2019-05-18 Jay Berkenbilt <ejb@ql.org>
* 8.4.2: release
2019-05-16 Jay Berkenbilt <ejb@ql.org>
* Fix memory error in Windows-only code from typo. Fixes #330.
2019-04-27 Jay Berkenbilt <ejb@ql.org>
* 8.4.1: release
2019-04-20 Jay Berkenbilt <ejb@ql.org>
* When qpdf --version is run, it will detect if the qpdf CLI was
built with a different version of qpdf than the library. This
usually indicates that multiple versions of qpdf are installed and
that the library path is not set up properly. This situation
sometimes causes confusing behavior for users who are not actually
running the version of qpdf they think they are running.
* Add parameter --remove-page-labels to remove page labels from
output. In qpdf 8.3.0, the behavior changed so that page labels
were preserved when merging and splitting files. Some users were
relying on the fact that if you ran qpdf --empty --pages ... all
page labels were dropped. This option makes it possible to get
that behavior if it is explicitly desired. Fixes #317.
* Add parameter --keep-files-open-threshold to override the
maximum number of files that qpdf will allow to be kept open at
once. Fixes #288.
* Handle Unicode characters in filenames properly on Windows. The
changes to support Unicode on the CLI in Windows broke Unicode
filenames on that platform. Fixes #298.
* Slightly tighten logic that determines whether an object is a
page. The previous logic was sometimes failing to preserve
annotations because they were passing the overly loose test for
whether something was a page. This fix has a slight risk of
causing some extraneous objects to be copied during page splitting
and merging for erroneous PDF files whose page objects contain
invalid types or are missing the /Type key entirely, both of which
would be invalid according to the PDF specification.
* Revert change that included preservation of outlines (bookmarks)
in --split-pages. The way it was implemented caused a very
significant performance penalty when splitting pages with
outlines. We need a better solution that only copies the relevant
items, not the whole tree.
2019-03-11 Jay Berkenbilt <ejb@ql.org>
* JSON serialization: add missing leading 0 to decimal values
between -1 and 1. Fixes #308.
2019-02-01 Jay Berkenbilt <ejb@ql.org>
* 8.4.0: release
2019-01-31 Jay Berkenbilt <ejb@ql.org>
* Bug fix: do better pre-checks on images before optimizing;
refuse to optimize images that can't be converted to JPEG because
of colorspace or depth.
* Add new options --externalize-inline-images, which converts
inline images larger than a specified size to regular images, and
--ii-min-bytes, which tweaks that size.
* When optimizing images, inline images are now included in the
optimization, first being converted to regular images. Use
--keep-inline-images to exclude them from optimization. Fixes #278.
* Add method QPDFPageObjectHelper::externalizeInlineImages, which
converts inline images whose size is at least a specified amount
to regular images.
* Remove traces of acroread, which hasn't been available in Linux
for a long time.
2019-01-30 Jay Berkenbilt <ejb@ql.org>
* Do not include space after ID operator in inline image data. The
token now correctly contains the image data, the EI operator,
and the delimiter that precedes the EI operator.
* Improve locating of an inline image's EI operator to correctly
handle the case of EI appearing inside the image data.
* Very low-level QPDFTokenizer API now includes an
expectInlineImage method that takes an input stream, enabling it
to locate an inline image's EI operator better. When this method
is called, the inline image token returned will not contain the EI
operator and will contain correct image data. This is called
automatically everywhere within the qpdf library. Most user code
will never have to use the low-level tokenizer API. If you use
Pl_QPDFTokenizer, this will be done automatically for you. If you
use the low-level API and call expectInlineImage, you should call
the new version.
2019-01-29 Jay Berkenbilt <ejb@ql.org>
* Bug fix: when returning an inline image token, the tokenizer no
longer includes the delimiter that follows EI. The
QPDFObjectHandle created from the token was correct.
* Handle files with direct page objects, which is not allowed by
the PDF spec but has been seen in the wild. Fixes #164.
2019-01-28 Jay Berkenbilt <ejb@ql.org>
* Bug fix: when using --stream-data=compress, object streams and
xref streams were not compressed. They were compressed if no
--stream-data option was specified. Fixes #271.
* When linearizing or getting the list of all pages in a file,
replace duplicated page objects with a shallow copy of the page
object. Linearization and all page manipulation APIs require page
objects to be unique. Pages that were originally duplicated will
still share contents and any other indirect resources. Fixes #268.
2019-01-26 Jay Berkenbilt <ejb@ql.org>
* Add --overlay and --underlay options. Fixes #207.
* Create examples/pdf-overlay-page.cc to demonstrate use of
page/form XObject interaction
* Add new methods QPDFPageObjectHelper::getFormXObjectForPage,
which creates a form XObject equivalent to a page, and
QPDFObjectHandle::placeFormXObject, which generates content stream
code to placing a form XObject on a page.
2019-01-25 Jay Berkenbilt <ejb@ql.org>
* Add new method QPDFObjectHandle::getUniqueResourceName() to
return an unused key available to be used in a resource
dictionary.
* Add new method QPDFPageObjectHelper::getAttribute() that
properly handles inherited attributes and allows for creation of a
copy of shared attributes. This is very useful if you are getting
an attribute of a page dictionary with the intent to modify it
privately for that page.
* Fix QPDFPageObjectHelper::getPageImages (and the legacy
QPDFObjectHandle::getPageImages()) to properly handle images in
inherited resources dictionaries.
2019-01-20 Jay Berkenbilt <ejb@ql.org>
* Tweak the content code generated for variable text fields to
better handle font sizes and multi-line text.
* When generating appearance streams for variable text
annotations, properly handle the cases of there being no
appearance dictionary, no appearance stream, or an appearance
stream with no BMC..EMC marker.
* When flattening annotations, remove annotations from the file
that don't have appearance streams. These were previously being
preserved, but since they are invisible, there is no reason to
preserve them when flattening annotations.
2019-01-19 Jay Berkenbilt <ejb@ql.org>
* NOTE: qpdf CLI: some non-compatible changes were made to how
qpdf interprets password arguments that contain Unicode characters
that fall outside of ASCII. On Windows, the non-compatibility was
unavoidable, as explained in the release notes. On all platforms,
it is possible to get the old behavior if desired, though the old
behavior would almost always result in files that other
applications were unable to open. As it stands, qpdf should now be
able to open passwords encrypted with a wide range of passwords
that some other viewers might not handle, though even now, qpdf's
Unicode password handling is not 100% complete.
* Add --password-mode option, which allows fine-grained control of
how password arguments are treated. This is discussed fully in the
manual. Fixes #215.
* Add option --suppress-password-recovery to disable the behavior
of searching for a correct password by re-encoding the provided
password. This option can be useful if you want to ensure you know
exactly what password is being used.
2019-01-17 Jay Berkenbilt <ejb@ql.org>
* When attempting to open an encrypted file with a password, if
the password doesn't work, try alternative passwords created by
re-interpreting the supplied password with different string
encodings. This makes qpdf able to recover passwords with
non-ASCII characters when either the decryption or encryption
operation was performed with an incorrectly encoded password.
* Fix data loss bug: qpdf was discarding referenced resources in
the case in which a page's resource dictionary contained an
indirect reference for either /Font or /XObject that contained
fonts or XObjects not referenced on all pages that shared the
resource. This was a "typo" in the code. The comment explained the
correct behavior, and the code was clearly intended to handle this
issue, but the implementation had an error in it. This is fixed by
a single-line change, which can be found in git commit
4bc434000c42a7191e705c8a38216ca6743ad9ff. That commit can be used
as a patch that applies cleanly against qpdf 8.1.0 and forward.
The bug was introduced in version 8.1.0. For the record, this is
the first bug in qpdf's history that could result in silent loss
of data when processing a correct input file. Fixes #276.
2019-01-15 Jay Berkenbilt <ejb@ql.org>
* Add QUtil::possible_repaired_encodings which, given a string,
generates other strings that represent re-interpretation of the
bytes in a different coding system. This is used to help recover
passwords if the password string was improperly encoded on a
different system due to user error or a software bug.
2019-01-14 Jay Berkenbilt <ejb@ql.org>
* Add new CLI flags to 128-bit and 256-bit encryption: --assemble,
--annotate, --form, and --modify-other to control encryption
permissions with more granularity than was allowed with the
--modify flag. Fixes #214.
* Add new versions of
QPDFWriter::setR{3,4,5,6}EncryptionParameters that allow
individual setting of the various permission bits. The old
interfaces are retained for backward compatibility. In the "C"
API, add qpdf_set_r{3,4,5,6}_encryption_parameters2. The new
interfaces use separate booleans for various permissions instead
of the qpdf_r3_modify_e enumerated type, which set permission bits
in predefined groups.
* Add versions of utf8 to single-byte character transcoders that
return a success code.
2019-01-13 Jay Berkenbilt <ejb@ql.org>
* Add several more string transcoding and analysis methods to
QUtil for bidirectional conversion between PDF Doc, Win Ansi, Mac
Roman, UTF-6, and UTF-16 along with detection of valid UTF-8 and
UTF-16.
2019-01-12 Jay Berkenbilt <ejb@ql.org>
* In the --pages option, allow the same page to be specified more
than once. You can now do "--pages A.pdf 1,1 --" or
"--pages A.pdf 1 A.pdf 1" instead of having to use two different
paths to specify A.pdf. Fixes #272.
* Add QPDFPageObjectHelper::shallowCopyPage(). This method creates
a new page object that is a "shallow copy" of the given page as
described in the comments in QPDFPageObjectHelper. The resulting
object has not been added anywhere but is ready to be passed to
QPDFPageDocumentHelper::addPage of its own QPDF or another QPDF
object.
* Add QPDF::getUniqueId() method to return an identifier that is
intended to be unique within the scope of all QPDF objects created
by the calling application in a single run.
* In --pages, allow "." as a replacement for the current input
file, making it possible to say "qpdf A.pdf --pages . 1-3 --"
instead of having to repeat the input filename.
2019-01-10 Jay Berkenbilt <ejb@ql.org>
* Add new configure option --enable-avoid-windows-handle, which
causes the symbol AVOID_WINDOWS_HANDLE to be defined. If set, we
avoid using Windows I/O HANDLE, which is disallowed in some
versions of the Windows SDK, such as for Windows phones.
QUtil::same_file will always return false in this case. Only
applies to Windows builds.
* Add new method QPDF::setImmediateCopyFrom. When called on a
source QPDF object, streams can be copied FROM that object to
other ones without having to keep the source QPDF or its input
source around. The cost is copying the streams into RAM. See
comments in QPDF.hh for setImmediateCopyFrom for a detailed
explanation.
2019-01-07 Jay Berkenbilt <ejb@ql.org>
* 8.3.0: release
* Add sample completion files in completions. These can be used by
packagers to install on the system wherever bash and zsh keep
their vendor-supplied completions.
* Add configure flag --enable-check-autofiles, which is on by
default. Packagers whose packaging systems automatically refresh
autoconf or libtool files should pass --disable-check-autofiles to
./configure to suppress warnings about automatically generated
files being outdated.
2019-01-06 Jay Berkenbilt <ejb@ql.org>
* Remove the restriction in most cases that the source QPDF used
in a copyForeignObject call has to stick around until the
destination QPDF is written. The exceptional case is when the
source stream gets is data using a
QPDFObjectHandle::StreamDataProvider. For a more in-depth
discussion, see comments around copyForeignObject in QPDF.hh.
Fixes #219.
2019-01-05 Jay Berkenbilt <ejb@ql.org>
* When generating appearances, if the font uses one of the
standard, built-in encodings, restrict the character set to that
rather than just to ASCII. This will allow most appearances to
contain characters from the ISO-Latin-1 range plus a few
additional characters.
* Add methods QUtil::utf8_to_win_ansi and
QUtil::utf8_to_mac_roman.
* Add method QUtil::utf8_to_utf16.
2019-01-04 Jay Berkenbilt <ejb@ql.org>
* Add new option --optimize-images, which recompresses every image
using DCT (JPEG) compression as long as the image is not already
compressed with lossy compression and recompressing the image
reduces its size. The additional options --oi-min-width,
--oi-min-height, and --oi-min-area prevent recompression of images
whose width, height, or pixel area (width * height) are below a
specified threshold.
* Add new option --collate. When specified, the semantics of
--pages change from concatenation to collation. See the manual for
a more detailed discussion. Fixes #259.
* Add new method QPDFWriter::getFinalVersion, which returns the
PDF version that will ultimately be written to the final file. See
comments in QPDFWriter.hh for some restrictions on its use. Fixes
#266.
* When unexpected errors are found while checking linearization
data, print an error message instead of calling assert, which
cause the program to crash. Fixes #209, #231.
* Detect and recover from dangling references. If a PDF file
contained an indirect reference to a non-existent object (which is
valid), when adding a new object to the file, it was possible for
the new object to take the object ID of the dangling reference,
thereby causing the dangling reference to point to the new object.
This case is now prevented. Fixes #240.
2019-01-03 Jay Berkenbilt <ejb@ql.org>
* Add --generate-appearances flag to the qpdf command-line tool to
trigger generation of appearance streams.
* Fix behavior of form field value setting to handle the following
cases:
- Strings are always written as UTF-16
- Check boxes and radio buttons are handled properly with
synchronization of values and appearance states
* Define constants in qpdf/Constants.h for interpretation of
annotation and form field flags
* Add QPDFAnnotationObjectHelper::getFlags
* Add many new methods to QPDFFormFieldObjectHelper for querying
flags and field types
* Add new methods for appearance stream generation. See comments
in QPDFFormFieldObjectHelper.hh for generateAppearance() for a
description of limitations.
- QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded
- QPDFFormFieldObjectHelper::generateAppearance
* Bug fix: when writing form field values, always write string
values encoded as UTF-16.
* Add method QUtil::utf8_to_ascii, which returns an ASCII string
for a UTF-8 string, replacing out-of-range characters with a
specified substitute.
2019-01-02 Jay Berkenbilt <ejb@ql.org>
* Add method QPDFObjectHandle::getResourceNames that returns a set
of strings representing all second-level keys in a dictionary
(i.e. all keys of all direct dictionary members).
2018-12-31 Jay Berkenbilt <ejb@ql.org>
* Add --flatten-annotations flag to the qpdf command-line tool for
annotation flattening.
* Add methods for flattening form fields and annotations:
- QPDFPageDocumentHelper::flattenAnnotations - integrate
annotation appearance streams into page contents with special
handling for form fields: if appearance streams are up to date
(/NeedAppearances is false in /AcroForm), the /AcroForm key of
the document catalog is removed. Otherwise, a warning is
issued, and form fields are ignored. Non-form-field
annotations are always flattened if an appearance stream can
be found.
- QPDFAnnotationObjectHelper::getPageContentForAppearance -
generate the content stream fragment to render an appearance
stream in a page's content stream as a form xobject. Called by
flattenAnnotations.
* Add method QPDFObjectHandle::mergeResources(), which merges
resource dictionaries. See detailed description in
QPDFObjectHandle.hh.
* Add QPDFObjectHandle::Matrix, similar to
QPDFObjectHandle::Rectangle, as a convenience class for
six-element arrays that are used as matrices.
2018-12-23 Jay Berkenbilt <ejb@ql.org>
* When specifying @arg on the command line, if the file "arg" does
not exist, just treat this is a normal argument. This makes it
easier to deal with files whose names start with the @ character.
Fixes #265.
* Tweak completion so it works with zsh as well using
bashcompinit.
2018-12-22 Jay Berkenbilt <ejb@ql.org>
* Add new options --json, --json-key, and --json-object to
generate a json representation of the PDF file. This is described
in more depth in the manual. You can also run qpdf --json-help to
get a description of the json format.
2018-12-21 Jay Berkenbilt <ejb@ql.org>
* Allow --show-object=trailer for showing the document trailer.
* You can now use eval $(qpdf --completion-bash) to enable bash
completion for qpdf. It's not perfect, but it works pretty well.
2018-12-19 Jay Berkenbilt <ejb@ql.org>
* When splitting pages using --split-pages, the outlines
dictionary and some supporting metadata are copied into the split
files. The result is that all bookmarks from the original file
appear, and those that point to pages that are preserved work
while those that point to pages that are not preserved don't do
anything. This is an interim step toward proper support for
bookmark preservation in split files.
* Add QPDFOutlineDocumentHelper and QPDFOutlineObjectHelper for
handling outlines (bookmarks) including bidirectionally mapping
between bookmarks and pages. Initially there is no support for
modifying the outlines hierarchy.
2018-12-18 Jay Berkenbilt <ejb@ql.org>
* New method QPDFObjectHandle::getJSON() returns a JSON object
with a partial representation of the object. See
QPDFObjectHandle.hh for a detailed description.
* Add a simple JSON serializer. This is not a complete or
general-purpose JSON library. It allows assembly and serialization
of JSON structures with some restrictions, which are described in
the header file.
* Add QPDFNameTreeObjectHelper class. This class provides useful
methods for dealing with name trees, which are discussed in
section 7.9.6 of the PDF spec (ISO-32000).
* Preserve page labels when merging and splitting files. Prior
versions of qpdf simply preserved the page label information from
the first file, which usually wouldn't make any sense in the
merged file. Now any page that had a page number in any original
file will have the same page number after merging or splitting.
* Add QPDFPageLabelDocumentHelper class. This is a document helper
class that provides useful methods for dealing with page labels.
It abstracts the fact that they are stored as number trees and
deals with interpolating intermediate values that are not in the
tree. It also has helper functions used by the qpdf command line
tool to preserve page labels when merging and splitting files.
* Add QPDFNumberTreeObjectHelper class. This class provides useful
methods for dealing with number trees, which are discussed in
section 7.9.7 of the PDF spec (ISO-32000). Page label dictionaries
are represented as number trees.
* New method QPDFObjectHandle::wrapInArray returns the object
itself if it is an array. Otherwise, it returns an array
containing the object. This is useful for dealing with PDF data
that is sometimes expressed as a single element and sometimes
expressed as an array, which is a somewhat common PDF idiom.
2018-10-11 Jay Berkenbilt <ejb@ql.org>
* Files generated by autogen.sh are now committed so that it is
possible to build on platforms without autoconf directly from a
clean checkout of the repository. The configure script detects if
the files are out of date when it also determines that the tools
are present to regenerate them.
* Add build in Azure Pipelines, now that it is free for open
source projects.
2018-08-18 Jay Berkenbilt <ejb@ql.org>
* 8.2.1: release
* Add new option --keep-files-open=[yn] to control whether qpdf
keeps files open when merging. Prior to version 8.1.0, qpdf always
kept all files open, but this meant that the number of files that
could be merged was limited by the operating system's open file
limit. Version 8.1.0 opened files as they were referenced, but
this caused a major performance impact. Version 8.2.0 optimized
the performance but did so in a way that, for local file systems,
there was a small but unavoidable performance hit, but for
networked file systems, the performance impact could be very high.
Starting with version 8.2.1, the default behavior is that files
are kept open if no more than 200 files are specified, but that
the behavior can be explicitly overridden with the
--keep-files-open flag. If you are merging more than 200 files but
less than the operating system's max open files limit, you may
want to use --keep-files-open=y. If you are using a local file
system where the overhead is low and you might sometimes merge
more than the OS limit's number of files, you may want to specify
--keep-files-open=n. Fixes #237.
2018-08-16 Jay Berkenbilt <ejb@ql.org>
* 8.2.0: release
2018-08-14 Jay Berkenbilt <ejb@ql.org>
* For the mingw builds, change the name of the DLL import library
from libqpdf.a to libqpdf.dll.a to avoid confusing it with a
static library. This potentially clears the way for supporting a
static library in the future, though presently, the qpdf Windows
build only builds the DLL and executables. Fixes #225.
2018-08-13 Jay Berkenbilt <ejb@ql.org>
* Add new class QPDFSystemError, derived from std::runtime_error,
which is now thrown by QUtil::throw_system_error. This enables the
triggering errno value to be retrieved. Fixes #221.
2018-08-12 Jay Berkenbilt <ejb@ql.org>
* qpdf command line: add --no-warn option to suppress issuing
warning messages. If there are any conditions that would have
caused warnings to be issued, the exit status is still 3.
* Rewrite the internals of Pl_Buffer to be much more efficient in
use of memory at a very slight performance cost. The old
implementation could cause memory usage to go out of control for
files with large images compressed using the TIFF predictor.
Fixes #228.
2018-08-05 Jay Berkenbilt <ejb@ql.org>
* Bug fix: end of line characters were not properly handled inside
strings in some cases. Fixes #226.
* Bug fix: infinite loop on progress reporting for very small
files. Fixes #230.
2018-08-04 Jay Berkenbilt <ejb@ql.org>
* Performance fix: optimize page merging operation to avoid
unnecessary open/close calls on files being merged. Fixes #217.
* Add ClosedFileInputSource::stayOpen method, enabling a
ClosedFileInputSource to stay open during manually indicated
periods of high activity, thus reducing the overhead of frequent
open/close operations.
2018-06-23 Jay Berkenbilt <ejb@ql.org>
* 8.1.0: release
2018-06-22 Jay Berkenbilt <ejb@ql.org>
* Bug fix: properly decrypt files with 40-bit keys that use
revision 3 of the security handler. Prior to this, qpdf was
reporting "invalid password" in this case. Fixes #212.
* With --verbose, print information about each input file when
merging files.
* Add progress reporting to QPDFWriter. Programmatically, you can
register a progress reporter with registerProgressReporter(). From
the command line, passing --progress will give progress indicators
in increments of no less than 1% as output files are written.
Fixes #200.
* Add new method QPDF::getObjectCount(). This gives an approximate
(upper bound) account of objects in the QPDF object.
* Don't leave files open when merging. This makes it possible
merge more files at once than the operating system's open file
limit. Fixes #154.
* Add ClosedFileInputSource class, and input source that keeps its
input file closed when not reading it. At the expense of some
performance, this allows you to operate on many files without
opening too many files at the operating system level.
* Add new option --preserve-unreferenced-resources, which
suppresses removal of unreferenced objects from page resource
dictionaries during page splitting operations.
2018-06-21 Jay Berkenbilt <ejb@ql.org>
* Add method QPDFPageObjectHelper::removeUnreferencedResources and
also QPDFPageDocumentHelper::removeUnreferencedResources that
calls the former on every page. This method removes any XObject or
Font references from the page's resource dictionary if they are
not referenced anywhere in any of the content streams. This
significantly reduces the size of split files whose pages
internally share resource dictionaries. Fixes #203.
* The --rotate option to qpdf no longer requires an explicit page
range. You can now rotate all pages of a document with
qpdf --rotate=angle in.pdf out.pdf. Fixes #211.
* Create examples/pdf-set-form-values.cc to illustrate use of
interactive form helpers.
* Added methods QPDFAcroFormDocumentHelper::setNeedAppearances and
added methods to QPDFFormFieldObjectHelper to set a field's value,
optionally updating the document to indicate that appearance
streams need to be regenerated.
* Added QPDFObject::newUnicodeString and QPDFObject::unparseBinary
to allow for more convenient creation of strings that are
explicitly encoded in UTF-16 BE. This is useful for creating
Unicode strings that appear outside of content streams, such as in
page labels, outlines, form field values, etc.
2018-06-20 Jay Berkenbilt <ejb@ql.org>
* Added new classes QPDFAcroFormDocumentHelper,
QPDFFormFieldObjectHelper, and QPDFAnnotationObjectHelper to
assist with working with interactive forms in PDF files. At
present, API methods for reading forms, form fields, and widget
annotations have been added. It is likely that some additional
methods for modifying forms will be added in the future. Note that
qpdf remains a library whose function is primarily focused around
document structure and metadata rather than content. As such, it
is not expected that qpdf will have higher level APIs for
generating form contents, but qpdf will hopefully gain the
capability to deal with the bookkeeping aspects of wiring up all
the objects, which could make it a useful library for other
software that works with PDF interactive forms. PDF forms are
complex, and the terminology around them is confusing. Please see
comments at the top of QPDFAcroFormDocumentHelper.hh for
additional discussion.
* Added new classes QPDFPageDocumentHelper and QPDFPageObjectHelper
for page-level API functions. These classes introduce a new API
pattern of document helpers and object helpers in qpdf. The helper
classes provide a higher level API for working with certain types
of structural features of PDF while still staying true to qpdf's
philosophy of not isolating the user from the underlying
structure. Please see the chapter in the documentation entitled
"Design and Library Notes" for additional discussion. The examples
have also been updated to use QPDFPageDocumentHelper and
QPDFPageObjectHelper when performing page-level operations.
2018-06-19 Jay Berkenbilt <ejb@ql.org>
* New QPDFObject::Rectangle class will convert to and from arrays
of four numerical values. Rectangles are used in various places
within the PDF file format and are called out as a specific data
type in the PDF specification.
2018-05-12 Jay Berkenbilt <ejb@ql.org>
* In newline before endstream mode, an extra newline was not
inserted prior to the endstream that ends object streams.
Fixes #205.
2018-04-15 Jay Berkenbilt <ejb@ql.org>
* Arbitrarily limit the depth of data structures represented by
direct object. This is CVE-2018-9918. Fixes #202.
2018-03-06 Jay Berkenbilt <ejb@ql.org>
* 8.0.2: release
* Properly handle pages with no contents. Fixes #194.
2018-03-05 Jay Berkenbilt <ejb@ql.org>
* Improve handling of loops while following cross reference
tables. Fixes #192.
2018-03-04 Jay Berkenbilt <ejb@ql.org>
* 8.0.1: release
* On the command line when specifying page ranges, support
preceding a page number by "r" to indicate that it should be
counted from the end. For example, the range r3-r1 would indicate
the last three pages of a document.
2018-03-03 Jay Berkenbilt <ejb@ql.org>
* Ignore zlib data check errors while uncompressing streams. This
is consistent with behaviors of other readers and enables handling
of some incorrectly written zlib streams. Fixes #191.
2018-02-25 Jay Berkenbilt <ejb@ql.org>
* 8.0.0: release
2018-02-17 Jay Berkenbilt <ejb@ql.org>
* Fix QPDFObjectHandle::getUTF8Val() to properly handle strings
that are encoded with PDF Doc Encoding. Fixes #179.
* Add qpdf_check_pdf to the "C" API. This method just attempts to
read the entire file and produce no output, making possible to
assess whether the file has any errors that qpdf can detect.
* Major enhancements to handling of type errors within the qpdf
library. This fix is intended to eliminate those annoying cases
where qpdf would exit with a message like "operation for
dictionary object attempted on object of wrong type" without
providing any context. Now qpdf keeps enough context to be able to
issue a proper warning and to handle such conditions in a sensible
way. This should greatly increase the number of bad files that
qpdf can recover, and it should make it much easier to figure out
what's broken when a file contains errors.
* Error message fix: replace "file position" with "offset" in
error messages that report lexical or parsing errors. Sometimes
it's an offset in an object stream or a content stream rather than
a file position, so this makes the error message less confusing in
those cases. It still requires some knowledge to find the exact
position of the error, since when it's not a file offset, it's
probably an offset into a stream after uncompressing it.
* Error message fix: correct some cases in which the object that
contained a lexical error was omitted from the error message.
* Error message fix: improve file name in the error message when
there is a parser error inside an object stream.
2018-02-11 Jay Berkenbilt <ejb@ql.org>
* Add QPDFObjectHandle::filterPageContents method to provide a
different interface for applying token filters to page contents
without modifying the ultimate output.
2018-02-04 Jay Berkenbilt <ejb@ql.org>
* Changes listed on today's date are numerous and reflect
significant enhancements to qpdf's lexical layer. While many
nuances are discussed and a handful of small bugs were fixed, it
should be emphasized that none of these issues have any impact on
any output or behavior of qpdf under "normal" operation. There are
some changes that have an effect on content stream normalization
as with qdf mode or on code that interacts with PDF files
lexically using QPDFTokenizer. There are no incompatible changes
for normal operation. There are a few changes that will affect the
exact error messages issued on certain bad files, and there is a
small non-compatible enhancement regarding the behavior of
manually constructed QPDFTokenizer::Token objects. Users of the
qpdf command line tool will see no changes other than the addition
of a new command-line flag and possibly some improved error
messages.
* Significant lexer (tokenizer) enhancements. These are changes to
the QPDFTokenizer class. These changes are of concern only to
people who are operating with PDF files at the lexical layer using
qpdf. They have little or no impact on most high-level interfaces
or the command-line tool.
New token types tt_space and tt_comment to recognize whitespace
and comments. this makes it possible to tokenize a PDF file or
stream and preserve everything about it.
For backward compatibility, space and comment tokens are not
returned by the tokenizer unless QPDFTokenizer.includeIgnorable()
is called.
Better handling of null bytes. These are now included in space
tokens rather than being their own "tt_word" tokens. This should
have no impact on any correct PDF file and has no impact on
output, but it may change offsets in some error messages when
trying to parse contents of bad files. Under default operation,
qpdf does not attempt to parse content streams, so this change is
mostly invisible.
Bug fix to handling of bad tokens at ends of streams. Now, when
allowEOF() has been called, these are treated as bad tokens
(tt_bad or an exception, depending on invocation), and a
separate tt_eof token is returned. Before the bad token
contents were returned as the value of a tt_eof token. tt_eof
tokens are always empty now.
Fix a bug that would, on rare occasions, report the offset in an
error message in the wrong space because of spaces or comments
adjacent to a bad token.
Clarify in comments exactly where the input source is positioned
surrounding calls to readToken and getToken.
* Add a new token type for inline images. This token type is only
returned by QPDFTokenizer immediately following a call to
expectInlineImage(). This change includes internal refactoring of
a handful of places that all separately handled inline images, The
logic of detecting inline images in content streams is now handled
in one place in the code. Also we are more flexible about what
characters may surround the EI operator that marks the end of an
inline image.
* New method QPDFObjectHandle::parsePageContents() to improve upon
QPDFObjectHandle::parseContentStream(). The parseContentStream
method used to operate on a single content stream, but was fixed
to properly handle pages with contents split across multiple
streams in an earlier release. The new method parsePageContents()
can be called on the page object rather than the value of the
page dictionary's /Contents key. This removes a few lines of
boiler-plate code from any code that uses parseContentStream, and
it also enables creation of more helpful error messages if
problems are encountered as the error messages can include
information about which page the streams come from.
* Update content stream parsing example
(examples/pdf-parse-content.cc) to use new
QPDFObjectHandle::parsePageContents() method in favor of the older
QPDFObjectHandle::parseContentStream() method.
* Bug fix: change where the trailing newline is added to a stream
in QDF mode when content normalization is enabled (the default for
QDF mode). Before, the content normalizer ensured that the output
ended with a trailing newline, but this had the undesired side
effect of including the newline in the stream data for purposes of
length computation. QPDFWriter already appends a newline without
counting in length for better readability. Ordinarily this makes
no difference, but in the rare case of a page's contents being
split in the middle of a token, the old behavior could cause the
extra newline to be interpreted as part of the token. This bug
could only be triggered in qdf mode, which is a mode intended for
manual inspection of PDF files' contents, so it is very unlikely
to have caused any actual problems for people using qpdf for
production use. Even if it did, it would be very unusual for a PDF
file to actually be adversely affected by this issue.
* Add support for coalescing a page's contents into a single
stream if they are represented as an array of streams. This can be
performed from the command line using the --coalesce-contents
option. Coalescing content streams can simplify things for
software that wants to operate on a page's content streams without
having to handle weird edge cases like content streams split in
the middle of tokens. Note that
QPDFObjectHandle::parsePageContents and
QPDFObjectHandle::parseContentStream already handled split content
streams. This is mainly to set the stage for new methods of
operating on page contents. The new method
QPDFObjectHandle::pipeContentStreams will pipe all of a page's
content streams though a single pipeline. The new method
QPDFObjectHandle.coalesceContentStreams, when called on a page
object, will do nothing if the page's contents are a single
stream, but if they are an array of streams, it will replace the
page's contents with a single stream whose contents are the
concatenation of the original streams.