Skip to content

lumina-ai-inc/chunkr

Repository files navigation


Logo

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Try it out! · Report Bug · Contact

Table of Contents

(Super) Quick Start

  1. Go to chunkr.ai
  2. Make an account and copy your API key
  3. Install our Python SDK:
    pip install chunkr-ai
  4. Use the SDK to process your documents:
    from chunkr_ai import Chunkr
    
    # Initialize with your API key from chunkr.ai
    chunkr = Chunkr(api_key="your_api_key")
    
    # Upload a document (URL or local file path)
    url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
    task = chunkr.upload(url)
    
    # Export results in various formats
    task.html(output_file="output.html")
    task.markdown(output_file="output.md")
    task.content(output_file="output.txt")
    task.json(output_file="output.json")
    
    # Clean up
    chunkr.close()

Documentation

Visit our docs for more information and examples.

Self-Hosted Deployment Options

Quick Start with Docker Compose

  1. Prerequisites:

  2. Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr
  1. Set up environment variables:
# Copy the example environment file
cp .env.example .env

# Configure your environment variables
# Required: LLM_KEY as your OpenAI API key
  1. Start the services:

With GPU:

docker compose up -d
  1. Access the services:
    • Web UI: http://localhost:5173
    • API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU

  1. Stop the services when done:
docker compose down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

  1. See our detailed guide at kube/README.md
  2. Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

The core of this project is dual-licensed:

  1. GNU Affero General Public License v3.0 (AGPL-3.0)
  2. Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us