-
Notifications
You must be signed in to change notification settings - Fork 151
JSoup with html snippet
François De Serres edited this page Jul 10, 2015
·
1 revision
JSoup is more HTML5-friendly than TagSoup.
Nevertheless, html-snippet
passes a java.io.StringReader
instance to html-resource
, but Jsoup/parse
doesn't come with a corresponding interface.
Many thanks to @dhruvbhatia for proposing this workaround:
(ns my.namespace
(:import [org.jsoup Jsoup]
[org.jsoup.nodes Attribute Attributes Comment DataNode Document
DocumentType Element Node TextNode XmlDeclaration]
[org.jsoup.parser Parser Tag]))
(def ^:private ->key (comp keyword #(.. % toString toLowerCase)))
(defprotocol IEnlive
(->nodes [d] "Convert object into Enlive node(s)."))
(extend-protocol IEnlive
Attribute
(->nodes [a] [(->key (.getKey a)) (.getValue a)])
Attributes
(->nodes [as] (not-empty (into {} (map ->nodes as))))
Comment
(->nodes [c] {:type :comment :data (.getData c)})
DataNode
(->nodes [dn] (str dn))
Document
(->nodes [d] (not-empty (map ->nodes (.childNodes d))))
DocumentType
(->nodes [dtd] {:type :dtd :data ((juxt :name :publicid :systemid) (->nodes (.attributes dtd)))})
Element
(->nodes [e] {:tag (->key (.tagName e))
:attrs (->nodes (.attributes e))
:content (not-empty (map ->nodes (.childNodes e)))})
TextNode
(->nodes [tn] (.getWholeText tn))
nil
(->nodes [_] nil))
; redefined parser fn to support jsoup
(defn parser
"Parse a HTML document stream into Enlive nodes using JSoup."
[stream]
(with-open [^java.io.Closeable stream stream]
(->nodes (Jsoup/parse stream "ISO-8859-1" ""))))
; then this will work
(net.cgrand.enlive-html/html-resource (-> "<h1>Hi, cgrand!</h1>" (.getBytes "ISO-8859-1")
java.io.ByteArrayInputStream.) {:parser parser})