Skip to content

lastmover/gitbook-fetcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gitbook Fetcher

Gitbook Fetcher is a tool to download an entire GitBook website as Markdown files. It parses the sitemap automatically, cleans up unwanted elements like navigation, aside, and footer, and saves each page into a folder structure that mirrors the URL paths.

Anyone can run this tool locally.

Features

  • Automatic Sitemap Parsing: Just provide the base URL (e.g., https://hyperliquid-co.gitbook.io), and the tool will automatically use the sitemap-pages.xml.
  • HTML Cleaning: Removes <nav>, <aside>, and <footer> elements.
  • Markdown Conversion: Converts cleaned HTML to Markdown.
  • Folder Structure: Saves files in a folder structure matching the URL paths.

Requirements

  • Python 3.7+
  • uv (for dependency and project management)

Installation

Clone the repository and set up the environment:

git clone https://github.com/lastmover/gitbook-fetcher.git
cd gitbook-fetcher
uv sync

This will install all the dependencies as specified in pyproject.toml and locked in uv.lock.

Usage

Run the script with your GitBook base URL. For example:

uv run main.py "https://hyperliquid-co.gitbook.io"

This will:

  • Automatically construct the sitemap URL (appending sitemap-pages.xml if necessary).
  • Parse the sitemap and extract page URLs.
  • Download, clean, and convert each page to Markdown.
  • Save the Markdown files in your current directory, following a directory structure that mirrors the GitBook URLs.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests for improvements or additional features.

License

This project is licensed under the MIT License.


Happy fetching!

About

Download gitbook docs as markdown files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages