Gitbook Fetcher is a tool to download an entire GitBook website as Markdown files. It parses the sitemap automatically, cleans up unwanted elements like navigation, aside, and footer, and saves each page into a folder structure that mirrors the URL paths.
Anyone can run this tool locally.
- Automatic Sitemap Parsing: Just provide the base URL (e.g.,
https://hyperliquid-co.gitbook.io
), and the tool will automatically use thesitemap-pages.xml
. - HTML Cleaning: Removes
<nav>
,<aside>
, and<footer>
elements. - Markdown Conversion: Converts cleaned HTML to Markdown.
- Folder Structure: Saves files in a folder structure matching the URL paths.
- Python 3.7+
- uv (for dependency and project management)
Clone the repository and set up the environment:
git clone https://github.com/lastmover/gitbook-fetcher.git
cd gitbook-fetcher
uv sync
This will install all the dependencies as specified in pyproject.toml
and locked in uv.lock
.
Run the script with your GitBook base URL. For example:
uv run main.py "https://hyperliquid-co.gitbook.io"
This will:
- Automatically construct the sitemap URL (appending
sitemap-pages.xml
if necessary). - Parse the sitemap and extract page URLs.
- Download, clean, and convert each page to Markdown.
- Save the Markdown files in your current directory, following a directory structure that mirrors the GitBook URLs.
Contributions are welcome! Feel free to open issues or submit pull requests for improvements or additional features.
This project is licensed under the MIT License.
Happy fetching!