BXC-4768 ignore pdf children #114

krwong · 2024-11-06T18:11:20Z

https://unclibrary.atlassian.net/browse/BXC-4768

detect when object is Document-PDF and skip over its children
set ENTRY_TYPE_FIELD / cdm2bxc_entry_type to doc-pdf

bbpennel · 2024-11-06T20:59:14Z

src/test/java/edu/unc/lib/boxc/migration/cdm/services/CdmIndexServiceTest.java

+            ResultSet rs = stmt.executeQuery("select " + joinedFields
+                    + " from " + CdmIndexService.TB_NAME + " order by " + CdmFieldInfo.CDM_ID + " asc");
+            rs.next();
+            assertEquals(17926, rs.getInt(CdmFieldInfo.CDM_ID));


It seems like it isn't actually ignoring the children of the document pdf, since "page 1" is one of them? My impression is we would not want any of the pages that appear in the cpd file to be in the index, or that we would have to ignore those pages somehow in other steps. So it might be that we need to do something like we do for compounds here:

chompb/src/main/java/edu/unc/lib/boxc/migration/cdm/services/CdmIndexService.java

Line 262 in c9a64a8

for (var pageEl : childRoot.getChildren("page")) {

but instead of assigning each child a parent, we delete the child

krwong added 2 commits November 6, 2024 12:45

ignore children of doc-pdf cpd objects, assign doc-pdf type info

78c39e2

add test resources

c9a64a8

bbpennel reviewed Nov 6, 2024

View reviewed changes

remove pdf children

35caaee

bbpennel approved these changes Nov 8, 2024

View reviewed changes

bbpennel merged commit 11bb596 into main Nov 8, 2024
2 checks passed

bbpennel deleted the BXC-4768-ignore-pdf-children branch November 8, 2024 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BXC-4768 ignore pdf children #114

BXC-4768 ignore pdf children #114

krwong commented Nov 6, 2024

bbpennel Nov 6, 2024

BXC-4768 ignore pdf children #114

BXC-4768 ignore pdf children #114

Conversation

krwong commented Nov 6, 2024

bbpennel Nov 6, 2024

Choose a reason for hiding this comment