Skip to content

A pure JavaScript, cross-platform module designed for extracting text from PDF files using [pdf.js](https://mozilla.github.io/pdf.js/)

License

Notifications You must be signed in to change notification settings

necm1/pdf-parse2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

version downloads license node type size

PDF Parse

A pure JavaScript, cross-platform module designed for extracting text from PDF files using pdf.js.

Features

  • Extract text from PDF files.
  • Supports both browser and Node.js environments.
  • Easy to use with promise-based API.

Installation

npm install pdf-parse2

Or

yarn add pdf-parse2

Usage

Node.js

const fs = require('fs');
const PDFParse = require('pdf-parse2');

(async () => {
  const dataBuffer = fs.readFileSync('path/to/your/document.pdf');
  const PDFParse = new PDFParse();

  try {
    const pdfData = await PDFParse.loadPDF(dataBuffer);
    console.log('Text:', pdfData.text);
  } catch (error) {
    console.error(error);
  }
})();

Browser

Ensure you include pdf.js library in your project. You can then use PDFParse similar to the Node.js example, but with fetching the PDF file using Fetch API or XMLHttpRequest.

API Reference

  • loadPDF(src, options): Loads a PDF file and extracts text. src can be a Buffer or ArrayBuffer. options is optional.

  • renderPage(pageData, options): A helper function for rendering a single page. This function is used internally by loadPDF.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an issue for any bugs or feature requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A pure JavaScript, cross-platform module designed for extracting text from PDF files using [pdf.js](https://mozilla.github.io/pdf.js/)

Resources

License

Stars

Watchers

Forks

Packages

No packages published