To run this:
pip3 install -r requirements.txt
-
Output:
python link_extractor.py --help
usage: link_extractor.py [-h] [-m MAX_URLS] url Link Extractor Tool with Python positional arguments: url The URL to extract links from. optional arguments: -h, --help show this help message and exit -m MAX_URLS, --max-urls MAX_URLS Number of max URLs to crawl, default is 30.
- For instance, to extract all links from 2 first URLs appeared in github.com:
This will result in a large list, here is the last 5 links:
python link_extractor.py https://github.com -m 2
This will also save these URLs in[!] External link: https://developer.github.com/ [*] Internal link: https://help.github.com/ [!] External link: https://github.blog/ [*] Internal link: https://help.github.com/articles/github-terms-of-service/ [*] Internal link: https://help.github.com/articles/github-privacy-statement/ [+] Total Internal links: 85 [+] Total External links: 21 [+] Total URLs: 106
github.com_external_links.txt
for external links andgit.luolix.top_internal_links.txt
for internal links.