Maman is a Rust Web Crawler saving pages on Redis.
Pages are send to list <MAMAN_ENV>:queue:maman
using
Sidekiq job format
{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
"document":"<html><body><a href='#' /><a href='/new' /></html>",
"urls": ["https://example.net/new"],
"headers": {"content-type": "text/html"},
"url": "https://example.net/"
}
}
cargo install maman
With just
PREFIX=~/.local just install
maman URL [LIMIT] [MIME_TYPES]
LIMIT
must be an integer or 0
is the default, meaning no limit.
- MAMAN_ENV=development
- REDIS_URL="redis://127.0.0.1/"
- RUST_LOG=maman=info
The MIT License
Copyright (c) 2016-2021 Laurent Arnoud laurent@spkdev.net