A ProductHunt.com miner in Python3.
Execute the following commands:
$ git clone https://github.com/collab-uniba/PH_miner.git
$ git submodule init
$ git submodule update
-
Register two apps using the dashboard,
PH_miner
andPH_updater
. -
For the first app, in the root folder, create the file
credentials_miner.yml
with the following structure:
api:
key: CLIENT_KEY
secret: CLIENT_SECRET
redirect_uri: APP_REDIRECT_URI
dev_token: DEVELOPER_TOKEN
-
For the second app, follow the same steps as above to create the file
credentials_updater.yml
. -
Create the folder
db/cfg/
, then create therein the filedbsetup.yml
to setup the connection to the MySQL database:
mysql:
host: 127.0.0.1
user: root
passwd: *******
db: producthunt
recycle: 3600
NOTE: If you're using a MySQL database, the default parameter pool_recycle
for resetting the database connection
is fine, since the wait_timeout
is set to 28800 by default. But, if you're using Maria DB, then wait_timeout
is set
by default to 600 seconds. Edit the my.cnf
file and change it to anything larger than the value chosen for pool_recycle
.
- Install packages via pip:
$ pip install -r requirements.txt
- Enable execution via crontab:
$ crontab -e
Add the following lines. Make sure to enter the correct path.
SHELL=bash
# New products are uploaded at 12.01 PST (just past midnight, 9am next morning in CET timezone):
# minute hour day-of-month month day-of-week command
35 8 * * * /path/.../to/PH_miner/cronjob.sh /var/log/ph_miner.log 2>&1
05 20 * * * /path/.../to/PH_miner/cronjob.sh --update -c credentials_updater.yml >> /var/log/ph_miner_updates.log 2>&1
*/30 * * * * /path/.../to/PH_miner/cronjob.sh --newest -c credentials_updater.yml >> /var/log/ph_miner.log 2>&1
- Enable the rotation of the log files:
$ sudo ln -s /fullpath/to/../ph_miner.logrotate /etc/logrotate.d/ph_miner
- Install Chromium browser and the chromedriver
This step depends on the OS. On Ubuntu boxes, run:
$ sudo apt-get install chromium-browser chromium-chromedriver
$ sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/bin/chromedriver
- Product Hunt API
- ph_py - ProductHunt.com API wrapper in Python
- Scrapy - A scraping and web-crawling framework
- Selenium - A suite of tools for automating web browsers
- ChromeDriver - Tool to connect to Chromium web browser
- Beautiful Soup 4 - HTML parser
The project is licensed under the MIT license.