Skip to content

Website crawler for automated link and image validity testing (Selenium Webdriver - Python)

License

Notifications You must be signed in to change notification settings

sankaritan/site-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Site Crawler - website link tester

Yet another simple script that automatically crawls website and checks whether links are valid and that images are successfully loaded. Uses Selenium Webdriver and it is run by PyTest.

Features

  • crawls website links that match certain pattern
  • reports broken links - target page request returns HTTP error
  • reports invalid links - invalid characters in link url
  • reports images that could not be loaded
  • can ignore links with certain pattern
  • supports basic authentication

Execution

py.test site_crawler.py

Required libraries

Installation using PIP

  • pip install selenium pytest requests

Configuration

Variables need to be set in site_crawler.cfg. Required:

  • base url (page crawler starts on)
  • at least one acceptable url substring (typically domain such as "mypage.com"; script will crawl only links containing matching substrings)
  • image time delay (defines how long to wait for images to be loaded; in seconds)

About

Website crawler for automated link and image validity testing (Selenium Webdriver - Python)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages