Robin is an XML parser and processing library that supports a sane version of HTML. It features a set of DOM utilities, including support for XPath 1.0 for interacting with and manipulating XML/HTML documents. Typical use-cases would be processing XML or HTML files, web scraping, etc. Worthy to note that robin is a non-validating parser, which means that DTD structures are not used for validating the markup document.
All samples below are for the Node.js runtime.
JavaScript
const { Robin } = require("@ziord/robin");
const robin = new Robin("<tag id='1'>some value<data id='2'>123456</data></tag>", "XML"); // use "XML" mode - which is the default mode - for XML documents ("HTML" for HTML documents)
// pretty-printing the document
console.log(robin.prettify());
// alternatively
// const root = new Robin().parse("...some markup...");
// console.log(root.prettify());
TypeScript
import { Robin } from "@ziord/robin";
const robin = new Robin("<div id='1'>some value<span id='2'>123456</span></div>", "HTML"); // mode "HTML" for HTML documents
console.log(robin.prettify());
By Name
JavaScript
// find "data" element
const element = robin.dom(robin.getRoot()).find("data");
// pretty-print the element
console.log(element.prettify());
TypeScript
// find "data" element
import { ElementNode } from "@ziord/robin";
const element = robin.dom(robin.getRoot()).find<ElementNode>("span")!;
// pretty-print the element
console.log(element.prettify());
By Filters
JavaScript
const { DOMFilter } = require("@ziord/robin");
const root = robin.getRoot();
// find the first "data" element
robin.dom(root).find({filter: DOMFilter.ElementFilter("data")});
// find the first element having attribute "id"
robin.dom(root).find({filter: DOMFilter.AttributeFilter("id")});
// find the first element having attributes "id", "foo"
robin.dom(root).find({filter: DOMFilter.AttributeFilter(["id", "foo"])});
// find the first element having attribute "id"="2"
robin.dom(root).find({filter: DOMFilter.AttributeFilter({ id: "2" })});
// find the first "data" element having attribute "id"="2"
robin.dom(root).find({filter: DOMFilter.ElementFilter("data", { id: "2" })});
The TypeScript variant pretty much follows the same logic. There are also lots of other utility functions available in the API.
By Queries
JavaScript
// find "data" element
const element = robin.path(robin.getRoot()).queryOne("/tag/data");
// pretty-print the element
console.log(element.prettify());
TypeScript
// find "data" element
import { ElementNode } from "@ziord/robin";
const element = robin.path(robin.getRoot()).queryOne<ElementNode>("//span")!;
// pretty-print the element
console.log(element.prettify());
The XPath API also provides other utilities such as query
, and queryAll
From an element
JavaScript
// find "attributeKey" attribute
const attribute = element.getAttributeNode("attributeKey");
console.log(attribute.prettify());
From the DOM using the DOM API
JavaScript
// find "attributeKey" attribute from any "foo" element
const attribute = robin.dom(robin.getRoot()).findAttribute("foo", "attributeKey");
console.log(attribute.prettify());
console.log("key:", attribute.name.qname, "value:", attribute.value);
From the DOM using the XPath API
TypeScript
import { AttributeNode } from "@ziord/robin";
// find "attributeKey" attribute from any "foo" element
const attribute = robin.path(robin.getRoot()).queryOne<AttributeNode>("//foo[@attributeKey]/@attributeKey")!;
console.log("key:", attribute.name.qname, "value:", attribute.value);
From the DOM using the DOM API
TypeScript
import { TextNode } from "@ziord/robin";
// find any text
const text = robin.dom(robin.getRoot()).find<TextNode>({text: { value: "some part of the text", match: "partial-ignoreCase" }})!; // match: "partial" | "exact" | "partial-ignoreCase" | "exact-ignoreCase"
console.log(text.stringValue());
From the DOM using the XPath API
TypeScript
import { TextNode } from "@ziord/robin";
// find any text
const text = robin.path(robin.getRoot()).queryOne<TextNode>("(//text())[1]")!;
console.log(text.stringValue());
console.log(text.prettify());
TypeScript
import { CommentNode } from "@ziord/robin";
// find a comment
const comment = robin.dom(robin.getRoot()).find<CommentNode>({comment: { value: "some part of the comment", match: "partial" }})!; // match: "partial" | "exact" | "partial-ignoreCase" | "exact-ignoreCase"
console.log(comment.stringValue());
JavaScript
// get the element's textual content
let text = robin.dom(element).text(); // string
console.log(text);
// alternatively
text = element.stringValue();
console.log(text);
See the web scraper example for more usage.
Check out the docs. You can also take a look at some examples here.
If you have little questions that you feel isn't worth opening an issue for, use the project's discussions.
Simply run the following command in your terminal:
npm install @ziord/robin
Contributions are welcome! See the contribution guidelines to learn more. Thanks!
Please open an issue. Checkout the issue template.
Robin is distributed under the MIT License.