Zelda

Finding urls for backfills on websites without sitemaps can be a pain. 😣

Scrape all the links on a webpage using Beautiful Soup!

Simply select the appropriate script for your scenario. Then modify the start url and classes according to the comments. Finally save and run it. The urls will print to your terminal.

The first three scenarios are for use with a specific webpage (example: https://website.com/news/world/2022/09/).

The fourth scenario is for webpages with pagination, such as press release sections and archives with multiple pages and a "Next" button (example: https://website.com/release/2/). Remember to modify the start and end of the loop.

Note: These scripts are fast! No need to make a cup of tea while they run.

Installation

Install the following libraries:

pip install requests
pip install html5lib
pip install bs4

Scenarios

Scenario 1: Extract all urls in <a> tags on a webpage
Scenario 2: Extract urls in <a> tags with a specific class on a webpage
Scenario 3: Extract urls in <a> tags, in <div> tags with a specific class, on a webpage
Scenario 4: Extract urls in <a> tags, in <div> tags with a specific class, on multiple webpages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
scenario1.py		scenario1.py
scenario2.py		scenario2.py
scenario3.py		scenario3.py
scenario4.py		scenario4.py
scrapememe.png		scrapememe.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zelda

Installation

Scenarios

About

Releases

Packages

Languages

marisacassidy/zelda

Folders and files

Latest commit

History

Repository files navigation

Zelda

Installation

Scenarios

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages