An HTML parser library for Crystal like the amazing Nokogiri Ruby gem.
I won't pretend that Crystagiri does much as Nokogiri. All help is welcome! :)
Add this to your application's shard.yml
:
dependencies:
crystagiri:
github: madeindjs/crystagiri
and then run
$ shards install
require "crystagiri"
Then you can simply instantiate a Crystagiri::HTML
object from an HTML String
like this
doc = Crystagiri::HTML.new "<h1>Crystagiri is awesome!!</h1>"
... or directly load it from a Web URL or a pathname:
doc = Crystagiri::HTML.from_file "README.md"
doc = Crystagiri::HTML.from_url "http://example.com/"
Also you can specify
follow: true
flag if you want to follow redirect URL
Then you can search all XML::Node
s from the Crystagiri::HTML
instance. The tags found will be Crystagiri::Tag
objects with the .node
property:
- CSS query
puts doc.css("li > strong.title") { |tag| puts tag.node}
# => <strong class="title"> .. </strong>
# => <strong class="title"> .. </strong>
Known limitations: Currently, you can't use CSS queries with complex search specifiers like
:nth-child
- HTML tag
doc.where_tag("h2") { |tag| puts tag.content }
# => Development
# => Contributing
- HTML id
puts doc.at_id("main-content").tagname
# => div
- HTML class attribute
doc.where_class("summary") { |tag| puts tag.node }
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>
I know you love benchmarks between Ruby & Crystal, so here's one:
require "nokogiri"
t1 = Time.now
doc = Nokogiri::HTML File.read("spec/fixture/HTML.html")
1..100000.times do
doc.at_css("h1")
doc.css(".step-title"){ |tag| tag }
end
puts "executed in #{Time.now - t1} milliseconds"
executed in 00:00:11.10 seconds with Ruby 2.6.0 with RVM on old Mac
require "crystagiri"
t = Time.now
doc = Crystagiri::HTML.from_file "./spec/fixture/HTML.html"
1..100000.times do
doc.at_css("h1")
doc.css(".step-title") { |tag| tag }
end
puts "executed in #{Time.now - t} milliseconds"
executed in 00:00:03.09 seconds on Crystal 0.27.2 on LLVM 6.0.1 with release flag
Crystagiri is more than two time faster than Nokogiri!!
Clone this repository and navigate to it:
$ git clone https://github.com/madeindjs/crystagiri.git
$ cd crystagiri
You can generate all documentation with
$ crystal doc
And run spec tests to ensure everything works correctly
$ crystal spec
Do you like this project? here you can find some issues to get started.
Contributing is simple:
- Fork it ( https://github.com/madeindjs/crystagiri/fork )
- Create your feature branch
git checkout -b my-new-feature
- Commit your changes
git commit -am "Add some feature"
- Push to the branch
git push origin my-new-feature
- Create a new Pull Request
See the list on Github