Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XFA fields not updated when using update_page_form_field_values() #2824

Open
pubpub-zz opened this issue Sep 1, 2024 · 4 comments
Open

XFA fields not updated when using update_page_form_field_values() #2824

pubpub-zz opened this issue Sep 1, 2024 · 4 comments

Comments

@pubpub-zz
Copy link
Collaborator

Environment

Python 3.10
pypdf 4.3.1+dev on sept,1st

Code + PDF

cf #2780
When modifying a form with XFA form, the fields in the XFA dataset are not modified

@ljbergmann
Copy link

So for my use case i found a solution by "just" parsing the xfa:dataset xml and setting the values and saving the XML string back, the question is: is that a valid approach for every XFA form or not? If that approach is valid, I'll gladly write a PR that enhances the update_page_form_field_values method or implement an additional method to accomplish this. But I'm not quite sure if my approach is more than a shortcut.

@pubpub-zz
Copy link
Collaborator Author

Just working on the xfa will not allow standard tools to extract data from the fields information.
My idea is just to extend the existing update_form_fields to also update xfa dataset if it exists

@ljbergmann
Copy link

I identified something very interesting during the implementation of the proposed extension of update_form_fields.

The XFA "keys" of fields are different then the names used by pypdf in AcroForm. To verify i created this
pypdf_field_name_test.pdf . As you can clearly see in this screenshot the field is called F1.
grafik

If you check the key provided by pypdf you can see that it is 'F1[0]'. You can check with the code below.

from pypdf import PdfReader

reader = PdfReader("pypdf_field_name_test.pdf")
fields = reader.get_form_text_fields()

print(fields)

{'F1[0]': None}

If you look at the XFA template / dataset xml the field is name F1.

<template xmlns="http://www.xfa.org/schema/xfa-template/3.3/"><?formServer defaultPDFRenderFormat acrobat10.0dynamic?>
	<subform name="form1" layout="tb" locale="de_DE" restoreState="auto">
		<pageSet>
			<pageArea name="Page1" id="Page1">
				<contentArea x="0.25in" y="0.25in" w="197.3mm" h="284.3mm"/>
				<medium stock="a4" short="210mm" long="297mm"/><?templateDesigner expand 1?>
			</pageArea><?templateDesigner expand 1?>
		</pageSet>
		<subform w="197.3mm" h="284.3mm" name="topform">
			<field name="F1" y="12.7mm" x="41.275mm" w="130.175mm" h="9mm">
				<ui>
					<textEdit>
						<border>
							<edge stroke="lowered"/>
						</border>
						<margin/>
					</textEdit>
				</ui>
				<font typeface="Arial"/>
				<para vAlign="middle"/>
				<caption>
					<para vAlign="middle"/>
					<value>
						<text>This is test of pypdf field names</text>
					</value>
				</caption>
			</field><?templateDesigner expand 1?>
		</subform>
		<proto/>
		<desc>
			<text name="version">11.0.9.20240701.1.52.2</text>
		</desc><?templateDesigner expand 1?><?renderCache.subset "Arial" 0 0 ISO-8859-1 4 72 18 0003002900370044004700480049004B004C004F005000510052005300560057005B005C FTadefhilmnopstxy?>
	</subform><?templateDesigner DefaultPreviewDynamic 1?><?templateDesigner DefaultRunAt client?><?templateDesigner FormTargetVersion 33?><?templateDesigner DefaultCaptionFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultValueFontSettings face:Arial;size:10;weight:normal;style:normal?><?templateDesigner DefaultLanguage JavaScript?><?acrobat JavaScript strictScoping?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 190?><?templateDesigner WidowOrphanControl 0?><?templateDesigner SaveTaggedPDF 1?><?templateDesigner SavePDFWithEmbeddedFonts 1?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000), objsnap:0, guidesnap:0, pagecentersnap:0?>
</template>

I suspect that the naming of the fields with [0] was a deliberate choice made in the implementation.

The questions that arises now: shouldn't the names in the XFA and the AcroForm be identical and if not, would the removal of the [0] to update the XFA be an valid approach?

In my opinion the names of fields should be consistent and therefor the AcroForm names should not contain [0].

Best regards,
Leon

ljbergmann added a commit to ljbergmann/pypdf that referenced this issue Sep 4, 2024
ljbergmann added a commit to ljbergmann/pypdf that referenced this issue Sep 4, 2024
@pubpub-zz
Copy link
Collaborator Author

some information are provided in
https://pdfa.org/norm-refs/XFA-3_3.pdf

looking at "Field names" page 72++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants