/!\ Important: **The script is not working anymore.** I don’t have time to maintain this script, and scribd’s interface changes quite often and I can’t update the script every month. I saw that now some alternative exists, so I don’t think I’ll continue to maintain it, but if someone wants to send some PR, feel free! The current code is not complicated to modify ;)
This script is a very short python script whose aim is to download scribd document into a PDF file.
Depending on what you need, you have several ways to install this script. Or you can use the online docker image (slower, but you are sure to have the good firefox/selenium/geckodriver version), or you can just install the python deps yourself and run it.
To use this script you first need to make sure that firefox
, python3
and the python libraries selenium
and fpdf
are installed. Note that it may be better to setup all of these library inside a virtualenv
to avoid version clash.
On debian-like systems, you can proceed using something like this:
sudo apt install firefox python3 python3-pip sudo pip3 install selenium sudo pip3 install fpdf sudo pip3 install Pillow
And if you prefer the virtualenv
version (make sure you are in an empty folder):
sudo apt install firefox python3 python3-pip pip3 install --user virtualenv pip3 install --upgrade virtualenv virtualenv -p python3 venv source venv/bin/activate pip3 install selenium pip3 install fpdf pip3 install Pillow
Then, download this script :
git clone https://github.com/tobiasBora/scribd-downloader-3.git
Make sure it’s executable
chmod +x scribd_downloader_3.py
as well as the last driver geckodriver available at https://github.com/mozilla/geckodriver/releases (make sure it’s the version corresponding to your hardware):
wget https://github.com/mozilla/geckodriver/releases/download/v0.19.0/geckodriver-v0.19.0-linux64.tar.gz tar zxvf geckodriver-v0.19.0-linux64.tar.gz
Great, you can now use the script !
First, make sure you have a recent docker installed (the docker in the repository are outdated most of the time). Then, just run:
sudo docker run -it --shm-size 2g -v $(pwd):/host -w /host tobiasbora/scribd-downloader:18.01 bash
It will download the online docker image, mount your local folder in /host
, and run a bash in this folder (the --shm-size
is very important if you don’t want firefox to crash). Then, you can simply run this command (don’t forget the xvfb-run
):
xvfb-run scribd_downloader_3.py <your url> <your pdf.pdf>
Ex:
xvfb-run scribd_downloader_3.py "https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition" out.pdf
- convert into a pdf file an online scribd document
- deal with blured pages that would require an account
- deal with document with different page size
To use the script it’s pretty easy. Here is the general usage:
./scribd_downloader_3.py -p <PATH GECKODRIVER> <URL> <PDF OUTPUT>
(NB : if geckodriver
is in your global path, you can remove the flag p
)
For example if geckodriver
is in the same folder as the script, you can run something like:
./scribd_downloader_3.py -p . "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf
If you are on a headless server, or if the firefox window that is open bother you, you may want to install xvfb
and then run the same command prefixed with xvfb-run
:
xvfb-run ./scribd_downloader_3.py -p <PATH GECKODRIVER> <URL> <PDF OUTPUT>
the script will now run blindly !
By default, this script waits 1s between each “screenshot” to be sure the page is loaded. If you have a slow connection, this may be too short, creating some blank pages, and if you have a good connection this may be to big, and you may want to decrease this time to download the page quicker.
For example if you want to wait 0.2 seconds (I even tried 0s on a 77 pages document and it worked without any trouble) between each screenshot, just use the option “-w 0.2”:
Example:
./scribd_downloader_3.py -p . https://www.scribd.com/document/359303091/A-Repository-of-Unix-History-and-Evolution unix_history.pdf -w 0.2
I spent only a few hours to make this script, so it’s not supposed to be perfect, but for what I tried it works pretty nicely. However, because scribd is changing often, it may not be working, so if you have any trouble, please fill an issue here or edit the code (it should not be too complicated, the code is pretty straight forward and simple) and do a pull request.