DokuWiki Dumper

A tool for archiving DokuWiki.

Recommend using dokuWikiDumper on modern filesystems, such as ext4 or btrfs. NTFS is not recommended because it denies many special characters in the filename.

For webmaster

If you don’t want your wiki to be archived, add the following to your domain/robots.txt:

User-agent: dokuWikiDumper
Disallow: /

Requirements

dokuWikiDumper

Python 3.8+ (developed on py3.10)
beautifulsoup4
requests
lxml
rich

dokuWikiUploader

Upload wiki dump to Internet Archive. dokuWikiUploader -h for help.

internetarchive
p7zip (7z command) (p7zip-full package)

Install `dokuWikiDumper`

dokuWikiUploader is included in dokuWikiDumper.

Install `dokuWikiDumper` with `pip` (recommended)

https://pypi.org/project/dokuwikidumper/

pip3 install dokuWikiDumper

Install `dokuWikiDumper` with `Poetry` (for developers)

Install Poetry
```
pip3 install poetry
```

Install dokuWikiDumper

git clone https://github.com/saveweb/dokuwiki-dumper
cd dokuwiki-dumper
poetry install
rm dist/ -rf
poetry build
pip install --force-reinstall dist/dokuWikiDumper*.whl

Usage

usage: dokuWikiDumper [-h] [--content] [--media] [--html] [--pdf] [--current-only] [--skip-to SKIP_TO] [--path PATH] [--no-resume] [--threads THREADS]
                      [--i-love-retro] [--insecure] [--ignore-errors] [--ignore-action-disabled-edit] [--ignore-disposition-header-missing]
                      [--trim-php-warnings] [--delay DELAY] [--retry RETRY] [--hard-retry HARD_RETRY] [--parser PARSER] [--username USERNAME]
                      [--password PASSWORD] [--cookies COOKIES] [--auto] [-u] [-g UPLOADER_ARGS]
                      url

dokuWikiDumper Version: 0.1.31

positional arguments:
  url                   URL of the dokuWiki (provide the doku.php URL)

options:
  -h, --help            show this help message and exit
  --current-only        Dump latest revision, no history [default: false]
  --skip-to SKIP_TO     !DEV! Skip to title number [default: 0]
  --path PATH           Specify dump directory [default: <site>-<date>]
  --no-resume           Do not resume a previous dump [default: resume]
  --threads THREADS     Number of sub threads to use [default: 1], not recommended to set > 5
  --i-love-retro        Do not check the latest version of dokuWikiDumper (from pypi.org) before running [default: False]
  --insecure            Disable SSL certificate verification
  --ignore-errors       !DANGEROUS! ignore errors in the sub threads. This may cause incomplete dumps.
  --ignore-action-disabled-edit
                        Some sites disable edit action for anonymous users and some core pages. This option will ignore this error and textarea not found
                        error.But you may only get a partial dump. (only works with --content)
  --ignore-disposition-header-missing
                        Do not check Disposition header, useful for outdated (<2014) DokuWiki versions [default: False]
  --trim-php-warnings   Trim PHP warnings from requests.Response.text
  --delay DELAY         Delay between requests [default: 0.0]
  --retry RETRY         Maximum number of retries [default: 5]
  --hard-retry HARD_RETRY
                        Maximum number of retries for hard errors [default: 3]
  --parser PARSER       HTML parser [default: lxml]
  --username USERNAME   login: username
  --password PASSWORD   login: password
  --cookies COOKIES     cookies file
  --auto                dump: content+media+html, threads=3, ignore-action-disable-edit. (threads is overridable)
  -u, --upload          Upload wikidump to Internet Archive after successfully dumped (only works with --auto)
  -g UPLOADER_ARGS, --uploader-arg UPLOADER_ARGS
                        Arguments for uploader.

Data to download:
  What info download from the wiki

  --content             Dump content
  --media               Dump media
  --html                Dump HTML
  --pdf                 Dump PDF (Only available on some wikis with the PDF export plugin) (Only dumps the latest PDF revision)

For most cases, you can use --auto to dump the site.

dokuWikiDumper https://example.com/wiki/ --auto

which is equivalent to

dokuWikiDumper https://example.com/wiki/ --content --media --html --threads 3 --ignore-action-disabled-edit

Highly recommend using --username and --password to login (or using --cookies), because some sites may disable anonymous users to access some pages or check the raw wikitext.

--cookies accepts a Netscape cookies file, you can use cookies.txt Extension to export cookies from Firefox. It also accepts a json cookies file created by Cookie Quick Manager.

Dump structure

Directory or File	Description
`attic/`	old revisions of page. (wikitext)
`dumpMeta/`	(dokuWikiDumper only) metadata of the dump.
`dumpMeta/check.html`	?do=check page of the wiki.
`dumpMeta/config.json`	dump's configuration.
`dumpMeta/favicon.ico`	favicon of the site.
`dumpMeta/files.txt`	list of filename.
`dumpMeta/index.html`	homepage of the wiki.
`dumpMeta/info.json`	infomations of the wiki.
`dumpMeta/titles.txt`	list of page title.
`html/`	(dokuWikiDumper only) HTML of the pages.
`media/`	media files.
`meta/`	metadata of the pages.
`pages/`	latest page content. (wikitext)
`*.mark`	mark file.

Available Backups/Dumps

Check out: https://archive.org/search?query=subject%3A"dokuWikiDumper"

How to import dump to DokuWiki

If you need to import Dokuwiki, please add the following configuration to local.php

$conf['fnencode'] = 'utf-8'; // Dokuwiki default: 'safe' (url encode)
# 'safe' => Non-ASCII characters will be escaped as %xx form.
# 'utf-8' => Non-ASCII characters will be preserved as UTF-8 characters.

$conf['compression'] = '0'; // Dokuwiki default: 'gz'.
# 'gz' => attic/<id>.<rev_id>.txt.gz
# 'bz2' => attic/<id>.<rev_id>.txt.bz2
# '0' => attic/<id>.<rev_id>.txt

Import pages dir if you only need the latest version of the page.
Import meta dir if you need the changelog of the page.
Import attic and meta dirs if you need the old revisions content of the page.
Import media dir if you need the media files.

dumpMeta and html dirs are only used by dokuWikiDumper, you can ignore it.

Information

DokuWiki links

Other tools

wikiteam/WikiTeam, a tool for archiving MediaWiki, written in Python 2 that you won't want to use nowadays. :(
mediawiki-client-tools/MediaWiki Scraper (aka wikiteam3), a tool for archiving MediaWiki, forked from WikiTeam and has been rewritten in Python 3. (Lack of code writers and reviewers, STWP no longer maintains this repo.)
saveweb/WikiTeam3 forked from MediaWiki Scraper, maintained by STWP. :)
DigitalDwagon/WikiBot a Discord and IRC bot to run the dokuWikiDumper and wikiteam3 in the background.

License

GPLv3

Contributors

This tool is based on an unmerged PR (8 years ago!) of WikiTeam: DokuWiki dump alpha by @PiRSquared17.

I (@yzqzss) have rewritten the code in Python 3 and added ~~some features, also fixed~~ some bugs.

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
.github/workflows		.github/workflows
dokuWikiDumper		dokuWikiDumper
dokuWikiUploader		dokuWikiUploader
.gitignore		.gitignore
DEV.md		DEV.md
LICENSE.md		LICENSE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DokuWiki Dumper

For webmaster

Requirements

dokuWikiDumper

dokuWikiUploader

Install `dokuWikiDumper`

Install `dokuWikiDumper` with `pip` (recommended)

Install `dokuWikiDumper` with `Poetry` (for developers)

Usage

Dump structure

Available Backups/Dumps

How to import dump to DokuWiki

Information

DokuWiki links

Other tools

License

Contributors

About

Releases

Packages

Contributors 3

Languages

License

saveweb/dokuwiki-dumper

Folders and files

Latest commit

History

Repository files navigation

DokuWiki Dumper

For webmaster

Requirements

dokuWikiDumper

dokuWikiUploader

Install dokuWikiDumper

Install dokuWikiDumper with pip (recommended)

Install dokuWikiDumper with Poetry (for developers)

Usage

Dump structure

Available Backups/Dumps

How to import dump to DokuWiki

Information

DokuWiki links

Other tools

License

Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Install `dokuWikiDumper`

Install `dokuWikiDumper` with `pip` (recommended)

Install `dokuWikiDumper` with `Poetry` (for developers)

Packages