Skip to content

astartes91/haskell-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

haskell-crawler

Objective

Implement a web crawler (an http-endpoint, which accepts a list of URLs and returns a list of their titles, where title is something inside html title tag)

Some questions remain unanswered: What to do with errors, empty or absent titles, redirects, duplicate URLs and how many requests will be performed per unit of time. Those details could be implemented in any way.

Set up

Install Haskell stack

Run

# run and watch files
stack build --exec haskell-crawler-exe

# test it!
curl "http://localhost:8080/"\
  --header "Content-Type: application/json"\
  --request POST\
  --data '[
"https://twitter.com",
"https://google.com",
"https://ya.ru",
"https://yandex.ru",
"https://kremlin.ru",
"https://tutu.ru",
"https://s7.ru",
"https://technocity.ru",
"https://avito.ru",
"https://apple.ru",
"https://interfacelift.com",
"https://haskell.org",
"https://elm-lang.org",
"https://telegram.org",
"https://wikipedia.org",
"https://mozilla.com",
"https://samsung.com",
"https://benq.com",
"https://rambler.ru",
"https://zte.com",
"https://hp.com"
]'


Other useful commands

# create an potimized build for production
stack build --ghc-options -O2

# locate an executable file after build
stack exec -- whereis haskell-crawler-exe

# copy an executable file to ~/.local/bin
stack build --copy-bins

About

Web crawler microservice

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published