Skip to content
/ pracker Public

Wordpress content crawler. Store data in the database

License

Notifications You must be signed in to change notification settings

koaj/pracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How does it work?

Pracker working based on the wordpress rest api wp-json/wp/v2/posts/. Please make sure that your wordpress site has enabled the rest api. You can add wp-json/wp/v2/posts/ to the end of your wordpress site url to check if it is enabled. If you see a json response, then it is enabled. If you see a 404 error, then it is not enabled. You can enable it by installing the WP REST API plugin.

How to run wordpress content crawler:

cargo run -- -s https://wordpres-site-example.com/

Output:

Post {
    title: "Post title",
    content: "<b>Post content</b>",
    url: "wordpres-site-example.com/?p=1",
    id: 1
}

Plain text:

cargo run -- -s https://wordpres-site-example.com/ -p

Output:

Post {
    title: "Post title",
    content: "Post content",
    url: "wordpres-site-example.com/?p=1",
    id: 1
}

Store the output into the DB:

Create a database in postgresql:

CREATE DATABASE pracker;
mv env.sample .env

Run migration:

diesel migration run

Insert data into the database:

cargo run -- -s https://wordpres-site-example.com/ -i

Insert plain text into the database:

cargo run -- -s https://wordpres-site-example.com/ -ip

If you are looking for more options, please run cargo run -- --help to see all the options.