It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
It's based on the great and simple scraping tool written by Jeroen Janssens.
You can install scrape-cli using pip:
pipx install scrape-cli
Using pip
pip install scrape-cli
Or install from source:
git clone https://github.com/aborruso/scrape-cli
cd scrape-cli
pip install -e .
- Python >=3.6
- requests
- lxml
- cssselect
A CSS selector query like this
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be 'table.wikitable > tbody > tr > td > b > a'
or an XPATH query like this one:
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
gives you back:
<html>
<head>
</head>
<body>
<a href="/wiki/Afghanistan" title="Afghanistan">
Afghanistan
</a>
<a href="/wiki/Albania" title="Albania">
Albania
</a>
<a href="/wiki/Algeria" title="Algeria">
Algeria
</a>
<a href="/wiki/Andorra" title="Andorra">
Andorra
</a>
<a href="/wiki/Angola" title="Angola">
Angola
</a>
<a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
Antigua and Barbuda
</a>
<a href="/wiki/Argentina" title="Argentina">
Argentina
</a>
<a href="/wiki/Armenia" title="Armenia">
Armenia
</a>
...
...
</body>
</html>
Some notes on the commands:
-e
to set the query-b
to add<html>
,<head>
and<body>
tags to the HTML output.
If you are looking for precompiled executables for Linux, please refer to the Releases page on GitHub where you can find the latest precompiled binary file.
I have built the scrape-linux-x86_64
precompiled binary, using pyinstaller and this command: pyinstaller --onefile scrape.py
.
Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.