pdfBookMark

extract table of contents from pdf into bookmarks

Requirements

ghostscript
pdftk
pdftotext (from poppler, may included in Tex distributions)
- Windows

both installed and added to the PATH

Usage

find string in specific pages

PS> all.ps1 <pdf> [<string> = 'Chapter'] [<pages to omit>]

for pages, use (1..10+15) to express page 1 to 10 plus page 15

extract table of contents

PS> table.ps1 <pdf> <page range> [<page count of page 1>] [<str>] [<separate char>]

Use 8,10 to specify page range 8-10

modify the text manually if the result is not accurate

If the OCR-generated text from the PDF file is inaccurate, you can use this command to extract the text with the original layout:

PS> pdftotext -f <first page> -l <last page> -layout -nopgbrk -raw <pdf> <output>

You can then edit the text file manually to correct any errors, and use

PS> table.ps1 <pdf> [<page count of page 1>] [<str>] [<separate char>] -text <text file>

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
README.md		README.md
all.ps1		all.ps1
table.ps1		table.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfBookMark

Requirements

Usage

find string in specific pages

for pages, use (1..10+15) to express page 1 to 10 plus page 15

extract table of contents

Use 8,10 to specify page range 8-10

modify the text manually if the result is not accurate

Example

all.ps1

table.ps1

About

Releases

Packages

Languages

8LWXpg/pdfBookMark

Folders and files

Latest commit

History

Repository files navigation

pdfBookMark

Requirements

Usage

find string in specific pages

for pages, use (1..10+15) to express page 1 to 10 plus page 15

extract table of contents

Use 8,10 to specify page range 8-10

modify the text manually if the result is not accurate

Example

all.ps1

table.ps1

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages