Ruby microformat parser and HTML toolkit
RDoc | Gem | Metrics | Microformats.org
- A robust microformat parser
- A command-line tool for parsing microformats from a url or a string of markup
- A DSL for defining semantic markup patterns
- Export microformats to other standards:
- hCard => vCard
It is your lowercase semantic web friend.
Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).
Learn more about Microformats at http://microformats.org.
The command line tool takes a SOURCE from the Standard Input or as an argument:
$: curl http://markwunsch.com | prism --hcard > ~/Desktop/me.vcf
OR
$: prism --hcard http://markwunsch.com > ~/Desktop/me.vcf
With Ruby and Rubygems:
gem install prism
Or clone the repository and run bundle install
to get the development dependencies.
More on the way.
# All microformats
Prism.find 'http://foobar.com'
# A specific microformat
Prism.find 'http://foobar.com', :hcard
# Search HTML too
Prism.find big_string_of_html
twitter_contacts = Prism.find 'http://twitter.com/markwunsch', :hcard
me = twitter_contacts.first
me.fn
#=> "Mark Wunsch"
me.n.family_name
#=> "Wunsch"
me.url
#=> ["http://markwunsch.com/"]
File.open('mark.vcf','w') {|f| f.write me.to_vcard }
## Add me to your address book!
The Prism
module defines a group of methods to search, validate, and extract nodes out of a Nokogiri document.
All microformats inherit from Prism::POSH
, because all microformats begin as POSH formats. If you wanted to create your own POSH format, you'd do something like this:
class Navigation < Prism::POSH
search {|document| document.css('ul#navigation') }
# Search a Nokogiri document for nodes of a certain type
validate {|node| node.matches?('ul#navigation') }
# Validate that a node is the right element we want
has_many :items do
search {|doc| doc.css('li') }
end
# has_many and has_one define properties, which themselves inherit from
# Prism::POSH::Base, so you can do :has_one, :has_many, :search, :extract, etc.
end
Now you can do:
nav = Navigation.parse_first(document)
# document is a Nokogiri document.
# parse_first extracts just the first example of the format out of the document
nav.items
# Returns an array of contents
# This method comes from the has_many call up above that defines the :items property
- Mofo is a Ruby microformat parser backed by Hpricot.
- Sumo is a JavaScript microformat parser.
- Operator is a Firefox extension.
- hKit is a microformat parser for PHP.
- Oomph is a microformat toolkit add-in for Internet Explorer.
- HTML outliner (using HTML5 sectioning)
- HTML5 article, time, etc POSH support
- Extensions so you can do something like:
String.is_a_valid? :hcard
in your tests - Extensions to turn Ruby objects into semantic HTML. Hash.to_definition_list, Array.to_ordered_list, etc.
- Code is ugly. Especially XOXO.
- Better recursive parsing of trees. See above.
- Tests are all kinds of disorganized. And slow.
- Broader support for some of the weirder Patterns, like object[data]
- Man pages (see Ron)
Prism is licensed under the MIT License and is Copyright (c) 2010 Mark Wunsch.