📜 DocETL: Powering Complex Document Processing Pipelines

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers:

An interactive UI playground for iterative prompt engineering and pipeline development
A Python package for running production pipelines from the command line or Python code

🌟 Community Projects

📚 Educational Resources

🚀 Getting Started

There are two main ways to use DocETL:

1. 🎮 Interactive UI Playground (Recommended for Development)

The UI Playground helps you iteratively develop your pipeline:

Experiment with different prompts and see results in real-time
Build your pipeline step by step
Export your finalized pipeline configuration for production use

To run the playground locally, you can either:

Use Docker (recommended for quick start): make docker
Set up the development environment manually

See the Playground Setup Guide for detailed instructions.

2. 📦 Python Package (For Production Use)

If you want to use DocETL as a Python package:

Prerequisites

Python 3.10 or later
OpenAI API key

pip install docetl

Create a .env file in your project directory:

OPENAI_API_KEY=your_api_key_here  # Required for LLM operations (or the key for the LLM of your choice)

To see examples of how to use DocETL, check out the tutorial.

2. 🎮 UI Playground Setup

To run the UI playground locally, you have two options:

Option A: Using Docker (Recommended for Quick Start)

The easiest way to get the playground running:

Create the required environment files:

Create .env in the root directory:

OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000

Create .env.local in the website directory:

OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini

NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000

Run Docker:

make docker

This will:

Create a Docker volume for persistent data
Build the DocETL image
Run the container with the UI accessible at http://localhost:3000

To clean up Docker resources (note that this will delete the Docker volume):

make docker-clean

Option B: Manual Setup (Development)

For development or if you prefer not to use Docker:

Clone the repository:

git clone https://github.com/ucbepic/docetl.git
cd docetl

Set up environment variables in .env in the root/top-level directory:

OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=localhost
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000

And create an .env.local file in the website directory with the following:

OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini

NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000

Install dependencies:

make install      # Install Python package
make install-ui   # Install UI dependencies

Note that the OpenAI API key, base, and model name are for the UI assistant only; not the DocETL pipeline execution engine.

Start the development server:

make run-ui-dev

Visit http://localhost:3000/playground to access the interactive UI.

🛠️ Development Setup

If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:

make tests-basic  # Runs basic test suite (costs < $0.01 with OpenAI)

For detailed documentation and tutorials, visit our documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 711 Commits
.github/workflows		.github/workflows
docetl		docetl
docs		docs
example_data		example_data
server		server
tests		tests
website		website
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tailwind.config.js		tailwind.config.js
todos.md		todos.md
vision.md		vision.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📜 DocETL: Powering Complex Document Processing Pipelines

🌟 Community Projects

📚 Educational Resources

🚀 Getting Started

1. 🎮 Interactive UI Playground (Recommended for Development)

2. 📦 Python Package (For Production Use)

Prerequisites

2. 🎮 UI Playground Setup

Option A: Using Docker (Recommended for Quick Start)

Option B: Manual Setup (Development)

🛠️ Development Setup

About

Releases 8

Packages

Contributors 15

Languages

License

ucbepic/docetl

Folders and files

Latest commit

History

Repository files navigation

📜 DocETL: Powering Complex Document Processing Pipelines

🌟 Community Projects

📚 Educational Resources

🚀 Getting Started

1. 🎮 Interactive UI Playground (Recommended for Development)

2. 📦 Python Package (For Production Use)

Prerequisites

2. 🎮 UI Playground Setup

Option A: Using Docker (Recommended for Quick Start)

Option B: Manual Setup (Development)

🛠️ Development Setup

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 15

Languages

Packages