Skip to content

yeungadrian/markmagic

Repository files navigation

markmagic

convert files into markdown

Supported file types (and processing engine):

Getting started

from pathlib import Path
from markmagic import convert_any

with Path("tests/data/docx/msft_pr.docx").open("rb") as f:
    convert_any(filename="msft_pr.docx", file=f)

If you're interested in using vision language models to ocr a pdf

Create a .env file in the root directory

API_KEY="REPLACE"
from pathlib import Path
from markmagic import convert_any

with Path("tests/data/pdf/msft_ar.pdf").open("rb") as f:
    settings = Settings(use_vlm=True)
    convert_any(filename="msft_ar.pdf", file=f, settings=settings)

Design / Limitations

  • markmagic only looks at the file extension to decide how to convert your files
  • markmagic uses python-docx so cannot extract text from shapes / images (consider using python-mammoth + markdownify)

Goals / Motivation

TODOs:

  • TBD