Skip to content

natanaelfneto/reddit_crawler_telergam_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Reddit Telegram Bot Web Crawler

Crawlers

O script presente no arquivo 'crawlers/Python/src/reddit_crawler.py' soluciona a parte 1, devendo ser executado por linha de comando:

python reddit_crawler.py [threads]

dependências:

  • beautifulsoup4==4.6.3
  • lxml==4.2.5

O uso pode ser notado de acordo com a tabela:

Campo Obrigatório Descrição Valor mínimo de upvotes
threads sim nomes de threads a serem vasculhadas 5000

Exemplo:

python reddit_crawler.py cats dogs

resultado examplo:

URL: https://www.reddit.com/r/cats
STATUS: 200
REASON: OK

URL: https://www.reddit.com/r/dogs
STATUS: 200
REASON: OK

{
    "cats": [
        {
            "commentaries_link": "https://www.reddit.com/r/cats/comments/9nitmu/my_first_rescue_i_guess_im_gonna_be_a_catdad/",
            "subreddit": "cats",
            "subthread_link": "https://www.reddit.com/r/cats/search?q=flair_name%253A%2522Cat%2520Picture%2522&restrict_sr=1",
            "title": "My first rescue I guess I'm gonna be a catdad...",
            "upvotes": "7.8k"
        }
    ]
},
{
    "dogs": []
}

Telegram bot

A parte 2 foi solucionada utilizando o script da parte 1 como modulo python, acrescido de um script para o bot, telegram.py

dependências:

  • beautifulsoup4==4.6.3
  • lxml==4.2.5
  • telepot==12.7

Para inicial o bot use o commando:

python telegram.py <bot_token>

Incie uma conversa com o bot, pela aplicação do telegram: @your_chosen_name_bot. Envie o commando na janela do chat:

/nadaprafazer [+ args]

O uso mp chat pode ser notado de acordo com a tabela:

Comando/mensagem Aceita multiplos argumentos Descrição Valor mínimo de upvotes
/nadaprafazer sim retorna posts de threads no reddit 5000
* não retorna mensagem de erro

mensagem padrão de erro no chat: "Desculpe, não entendi."

Resultado como exemplo:

/nadaprafazer cats.    brazil cats

o exemplo irá resultar no console:

Starting reddit_crawler_bot
Author: natanaelfneto
Description: An apprentice scrapper bot

Waiting for inputs...

INPUT:
        content type: text
        type: private
        chat id: 298223493
        pending requests: 4

URL: https://www.reddit.com/r/cats.
STATUS: 404
REASON: Not Found
No server response
1 out of 4 request sent

URL: https://www.reddit.com/r/brazil
STATUS: 200
REASON: OK
Attempt 1...

URL: https://www.reddit.com/r/brazil
STATUS: 200
REASON: OK
Attempt 2...

URL: https://www.reddit.com/r/brazil
STATUS: 200
REASON: OK
Attempt 3...

URL: https://www.reddit.com/r/cats
STATUS: 200
REASON: OK
4 out of 4 request sent

E responderá no telegram:

THREAD: cats. thread link
No threads with more than 5k upvotes could be retrieved, try again in a few minutes

THREAD: brazil thread link
No threads with more than 5k upvotes could be retrieved, try again in a few minutes

THREAD: cats thread link

title: The scent of spring
upvotes: 5.8k
link to commentaries: link
link to subthread: link

title: Not this year
upvotes: 11.3k
link to commentaries: link
link to subthread: link

title: Recently lost my two cats that my wife and I had for 10 years. The house felt too empty so we got these two floofs to bring some energy back to the home.
upvotes: 9.4k
link to commentaries: link
link to subthread: link

Releases

No releases published

Packages

No packages published

Languages