Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix: index out of bounds #861 #842 #863

Merged
merged 3 commits into from
Mar 4, 2024
Merged

🐛 Fix: index out of bounds #861 #842 #863

merged 3 commits into from
Mar 4, 2024

Conversation

Ludy87
Copy link
Contributor

@Ludy87 Ludy87 commented Mar 3, 2024

Description

  1. Addition of Collections Import:
  • Added import java.util.Collections;
  • This allows the use of the Collections.sort() method, which is applied in the following modifications.
  1. Sorting of Pages:
  • By adding Collections.sort(pagesToRemove);, the page numbers to be removed are sorted in ascending order.
  • Sorting is crucial to ensure pages are removed in the correct sequence, especially when iterating backwards through the list, to avoid issues caused by page re-indexing after removals.
  1. Adjustment of Page Indexes:
  • In lines where pages are removed from the document (document.removePage(pageIndex);) or reordered (newPages.add(document.getPage(newPageOrder.get(i) - 1));), the index is adjusted by subtracting 1: pageIndex = pagesToRemove.get(i) - 1; and newPageOrder.get(i) - 1.
  • PDFBox uses 0-based indices for pages, but user inputs are typically 1-based (i.e., the first page is page 1, not page 0). Subtracting 1 from each index reconciles this difference, ensuring the correct page in the PDF document is referenced.

Why Were These Changes Made?

  • Sorting: Sorting the pages to be removed ensures that pages are removed in the correct order, avoiding shifts in indices that could lead to errors such as removing the wrong page or generating an IndexOutOfBoundsException.

  • Index Adjustment: Modifying the page index by 1 was done to ensure correct indexing, as user inputs are usually 1-based, while PDFBox (and most Java-based systems) use 0-based indexing. This adjustment prevents IndexOutOfBoundsException and ensures that the intended pages are correctly handled.

Closes #861 #842

Checklist:

  • I have read the Contribution Guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

Contributor License Agreement

By submitting this pull request, I acknowledge and agree that my contributions will be included in Stirling-PDF and that they can be relicensed in the future under the MPL 2.0 (Mozilla Public License Version 2.0) license.

(This does not change the general open-source nature of Stirling-PDF, simply moving from one license to another license)

@Ludy87 Ludy87 requested a review from Frooodle as a code owner March 3, 2024 17:54
@Frooodle
Copy link
Member

Frooodle commented Mar 3, 2024

This bug was actually introduced by a change i made for
GeneralUtils.parsePageList

image

which has a isOneBased flag and defaulting etc..
might be good to set a flag for this instead of it using default
or maybe my code was done wrong (also likely)

@Ludy87
Copy link
Contributor Author

Ludy87 commented Mar 3, 2024

I can set the flag however you like it best.

@Frooodle Frooodle merged commit bdcccfd into Stirling-Tools:main Mar 4, 2024
4 checks passed
@Ludy87 Ludy87 deleted the fix_ex branch March 5, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Page extraction page-numbers
2 participants