XFA fields not updated when using update_page_form_field_values() #2824

pubpub-zz · 2024-09-01T14:19:02Z

Environment

Python 3.10
pypdf 4.3.1+dev on sept,1st

Code + PDF

cf #2780
When modifying a form with XFA form, the fields in the XFA dataset are not modified

ljbergmann · 2024-09-02T07:59:00Z

So for my use case i found a solution by "just" parsing the xfa:dataset xml and setting the values and saving the XML string back, the question is: is that a valid approach for every XFA form or not? If that approach is valid, I'll gladly write a PR that enhances the update_page_form_field_values method or implement an additional method to accomplish this. But I'm not quite sure if my approach is more than a shortcut.

pubpub-zz · 2024-09-02T11:37:56Z

Just working on the xfa will not allow standard tools to extract data from the fields information.
My idea is just to extend the existing update_form_fields to also update xfa dataset if it exists

ljbergmann · 2024-09-04T13:30:27Z

I identified something very interesting during the implementation of the proposed extension of update_form_fields.

The XFA "keys" of fields are different then the names used by pypdf in AcroForm. To verify i created this
pypdf_field_name_test.pdf . As you can clearly see in this screenshot the field is called F1.

If you check the key provided by pypdf you can see that it is 'F1[0]'. You can check with the code below.

from pypdf import PdfReader

reader = PdfReader("pypdf_field_name_test.pdf")
fields = reader.get_form_text_fields()

print(fields)

{'F1[0]': None}

If you look at the XFA template / dataset xml the field is name F1.

<template xmlns="http://www.xfa.org/schema/xfa-template/3.3/"><?formServer defaultPDFRenderFormat acrobat10.0dynamic?>
	<subform name="form1" layout="tb" locale="de_DE" restoreState="auto">
		<pageSet>
			<pageArea name="Page1" id="Page1">
				<contentArea x="0.25in" y="0.25in" w="197.3mm" h="284.3mm"/>
				<medium stock="a4" short="210mm" long="297mm"/><?templateDesigner expand 1?>
			</pageArea><?templateDesigner expand 1?>
		</pageSet>
		<subform w="197.3mm" h="284.3mm" name="topform">
			<field name="F1" y="12.7mm" x="41.275mm" w="130.175mm" h="9mm">
				<ui>
					<textEdit>
						<border>
							<edge stroke="lowered"/>
						</border>
						<margin/>
					</textEdit>
				</ui>
				<font typeface="Arial"/>
				<para vAlign="middle"/>
				<caption>
					<para vAlign="middle"/>
					<value>
						<text>This is test of pypdf field names</text>
					</value>
				</caption>
			</field><?templateDesigner expand 1?>
		</subform>
		<proto/>
		<desc>
			<text name="version">11.0.9.20240701.1.52.2</text>
		</desc><?templateDesigner expand 1?><?renderCache.subset "Arial" 0 0 ISO-8859-1 4 72 18 0003002900370044004700480049004B004C004F005000510052005300560057005B005C FTadefhilmnopstxy?>
	</subform><?templateDesigner DefaultPreviewDynamic 1?><?templateDesigner DefaultRunAt client?><?templateDesigner FormTargetVersion 33?><?templateDesigner DefaultCaptionFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultValueFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultLanguage JavaScript?><?acrobat JavaScript strictScoping?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 190?><?templateDesigner WidowOrphanControl 0?><?templateDesigner SaveTaggedPDF 1?><?templateDesigner SavePDFWithEmbeddedFonts 1?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000), objsnap:0, guidesnap:0, pagecentersnap:0?>
</template>

I suspect that the naming of the fields with [0] was a deliberate choice made in the implementation.

The questions that arises now: shouldn't the names in the XFA and the AcroForm be identical and if not, would the removal of the [0] to update the XFA be an valid approach?

In my opinion the names of fields should be consistent and therefor the AcroForm names should not contain [0].

Best regards,
Leon

pubpub-zz · 2024-09-04T17:37:29Z

some information are provided in
https://pdfa.org/norm-refs/XFA-3_3.pdf

looking at "Field names" page 72++

This was referenced Sep 1, 2024

ENH: add incremental capability to PdfWriter #2811

Merged

PDF-Form not editable after filling out text field (after upgrade from 3.9.* to 4.3*) #2780

Closed

ljbergmann added a commit to ljbergmann/pypdf that referenced this issue Sep 4, 2024

Initial development to incorporate XFA form updates py-pdf#2824

7a65af9

ljbergmann added a commit to ljbergmann/pypdf that referenced this issue Sep 4, 2024

added support for radio buttons py-pdf#2824

5db62a9

ianberg-volpe mentioned this issue Sep 24, 2024

Switch to different pdf library for shinylive ianberg-volpe/pdf-xfa-tools#1

Open

py-pdf deleted a comment Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XFA fields not updated when using update_page_form_field_values() #2824

XFA fields not updated when using update_page_form_field_values() #2824

pubpub-zz commented Sep 1, 2024

ljbergmann commented Sep 2, 2024

pubpub-zz commented Sep 2, 2024

ljbergmann commented Sep 4, 2024

pubpub-zz commented Sep 4, 2024

XFA fields not updated when using update_page_form_field_values() #2824

XFA fields not updated when using update_page_form_field_values() #2824

Comments

pubpub-zz commented Sep 1, 2024

Environment

Code + PDF

ljbergmann commented Sep 2, 2024

pubpub-zz commented Sep 2, 2024

ljbergmann commented Sep 4, 2024

pubpub-zz commented Sep 4, 2024