Modularized python script for webscraping images.
You can watch the changelog at CHANGELOG, it's an overall view, not greatly detailed.
Scrape the images from a website into an album formatting (file wise).
It's not perfect, it's straight to the point. It (somewhat) complies with Pylint, where useful
It's really annoying to download and organize those images, for whatever that reason may be, and doing it all that work manually, too much mental overhead, what if you skipped one image? how do you check it? they're unsorted, name may be the default (filename on the host), will you manually rename them? Jow many will you miss while doing so? And many more...
Why act like a robot when a robot can do it for you?
Started more than a year ago, at the time the project started, 2022, and I applied little modifications here and there, iterating, and I finally decided to try and encapsulate it for a modular usage.
It started as an assignment, from which I was reminded of the existence of webscraping as a technique, and an opportunity to automate one routine I had arose. I expanded upon it until I've decided to publish it on open-source it.
- Python 3.8.4 or higher
- not tested on lower, but it should work on >= 3.6.x
- Some decent internet speed, It worked nicely on 20 MB/s
Libraries:
- Pip
Or manually install the following:
Look at the requirements.txt file.
python -m pip install
# or pip3 install if you're on linux/unix systems
pip install -r requirements.txt # pip3 for any unix system
since python will be installed by default pip3
Folow the example.
To execute it, simply create an __init__.py
at the root for the modules
folder. And "export" the desired init function to execute, the __main__.py
will handle the rest:
from .example import init
And run the project, at the root:
python . # python3 for unix/linux
# or run the project folder
python album-scraper
Or watch out for the CLI, which might make it easier, and with more options
Those without pagination, and with all the links you want to scrape visible at the homepage
Page layouts, Image designed with excalidraw
I am not endorsing any illegal activities, images still hold their licensing and ownership by it's rightful author(s). If the robots.txt does not allow the officially legal webscraping of the website, any unrightful, mischievous or illegal act will still be illegal, and not my responsibility.
Use this scripts at your own responsibility.