Skip to content

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

License

Notifications You must be signed in to change notification settings

amalrajan/learncpp-download

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LearnCPP Downloader

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

Support ❤️

Please support here: https://www.learncpp.com/about/

Usage

Docker

Get the image

docker pull amalrajan/learncpp-download:latest

And run the container

docker run --rm --name=learncpp-download --mount type=bind,destination=/app/learncpp,source=/home/amalr/temp/downloads amalrajan/learncpp-download

Replace /home/amalr/temp/downloads with a local path on your system where you'd want the files to get downloaded.

Local

You need Python 3.10 and wkhtmltopdf installed on your system.

Run it

Clone the repository

git clone https://github.com/amalrajan/learncpp-download.git

Install Python dependencies

cd learncpp-download
pip install -r requirements.txt

Run the script

scrapy crawl learncpp 

You'll find the downloaded files inside learncpp directory under the repository root directory.

FAQ

Rate Limit Errors:

  • Modify settings.py.
  • Increase DOWNLOAD_DELAY (default: 0) to 0.2.

High CPU Usage:

  • Adjust max_workers in learncpp.py.
  • Decrease from default 192 to reduce CPU load.
self.executor = ThreadPoolExecutor(
    max_workers=192
)  # Limit to 192 concurrent PDF conversions

Further Issues:

License

GNU Affero General Public License v3