readability4s

A Scala library to extract content from an article HTML: title, full text, favicon, image, etc.

This project is a scala port of Mozilla's Readability.js with a few tweaks and improvements. Scala version is 2.12.

Usage

Import the project with Maven as follows:

<dependency>
  <groupId>com.github.ghostdogpr</groupId>
  <artifactId>readability4s</artifactId>
  <version>1.0.9</version>
</dependency>

To parse a document, you must create a new Readability object from a URI string and an HTML string, and then call parse(). Here's an example:

val article = Readability(url, htmlString).parse()

It returns an Option[Article]. It is either None when the article could not be processed, or an Article with the following properties:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.travis		.travis
src/main/scala/com/github/ghostdogpr/readability4s		src/main/scala/com/github/ghostdogpr/readability4s
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml