HandsomeSoup

Current Status: Usable and stable. Needs GHC 7.6. Please file bugs!

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make it easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT.

Install

cabal install HandsomeSoup

Example

Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

import Text.XML.HXT.Core
import Text.HandsomeSoup

main = do
    let doc = fromUrl "http://www.google.com/search?q=egon+schiele"
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using `fromUrl`

let doc = fromUrl "http://example.com"

Or a local page using `parseHtml`

contents <- readFile [filename]
let doc = parseHtml contents

Easily extract elements using `css`

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css "a.foo"
doc <<< css "p > a"
doc <<< css "p strong"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"
doc <<< css "a:first-child"

Easily get attributes using `(!)`

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"

Docs

Find Haddock docs on Hackage.

I also wrote The Complete Guide To Parsing HXT With Haskell.

Credits

Made by Adit.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
examples		examples
src/Text		src/Text
tests		tests
.gitignore		.gitignore
HandsomeSoup.cabal		HandsomeSoup.cabal
LICENSE		LICENSE
LIST_OF_SUPPORTED_SELECTORS.markdown		LIST_OF_SUPPORTED_SELECTORS.markdown
README.markdown		README.markdown
Setup.hs		Setup.hs
TODO.markdown		TODO.markdown
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HandsomeSoup

Install

Example

What can HandsomeSoup do for you?

Easily parse an online page using `fromUrl`

Or a local page using `parseHtml`

Easily extract elements using `css`

Easily get attributes using `(!)`

Docs

Credits

About

Releases

Packages

Contributors 8

Languages

License

egonSchiele/HandsomeSoup

Folders and files

Latest commit

History

Repository files navigation

HandsomeSoup

Install

Example

What can HandsomeSoup do for you?

Easily parse an online page using fromUrl

Or a local page using parseHtml

Easily extract elements using css

Easily get attributes using (!)

Docs

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Easily parse an online page using `fromUrl`

Or a local page using `parseHtml`

Easily extract elements using `css`

Easily get attributes using `(!)`

Packages