Skip to content

An Html parser library for Crystal (like Nokogiri for Ruby)

License

Notifications You must be signed in to change notification settings

madeindjs/Crystagiri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crystagiri

An HTML parser library for Crystal like the amazing Nokogiri Ruby gem.

I won't pretend that Crystagiri does much as Nokogiri. All help is welcome! :)

Installation

Add this to your application's shard.yml:

dependencies:
  crystagiri:
    github: madeindjs/crystagiri

and then run

$ shards install

Usage

require "crystagiri"

Then you can simply instantiate a Crystagiri::HTML object from an HTML String like this

doc = Crystagiri::HTML.new "<h1>Crystagiri is awesome!!</h1>"

... or directly load it from a Web URL or a pathname:

doc = Crystagiri::HTML.from_file "README.md"
doc = Crystagiri::HTML.from_url "http://example.com/"

Also you can specify follow: true flag if you want to follow redirect URL

Then you can search all XML::Nodes from the Crystagiri::HTML instance. The tags found will be Crystagiri::Tag objects with the .node property:

  • CSS query
puts doc.css("li > strong.title") { |tag| puts tag.node}
# => <strong class="title"> .. </strong>
# => <strong class="title"> .. </strong>

Known limitations: Currently, you can't use CSS queries with complex search specifiers like :nth-child

  • HTML tag
doc.where_tag("h2") { |tag| puts tag.content }
# => Development
# => Contributing
  • HTML id
puts doc.at_id("main-content").tagname
# => div
  • HTML class attribute
doc.where_class("summary") { |tag| puts tag.node }
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>

Benchmark

I know you love benchmarks between Ruby & Crystal, so here's one:

require "nokogiri"
t1 = Time.now
doc = Nokogiri::HTML File.read("spec/fixture/HTML.html")
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title"){ |tag| tag }
end
puts "executed in #{Time.now - t1} milliseconds"

executed in 00:00:11.10 seconds with Ruby 2.6.0 with RVM on old Mac

require "crystagiri"
t = Time.now
doc = Crystagiri::HTML.from_file "./spec/fixture/HTML.html"
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title") { |tag| tag }
end
puts "executed in #{Time.now - t} milliseconds"

executed in 00:00:03.09 seconds on Crystal 0.27.2 on LLVM 6.0.1 with release flag

Crystagiri is more than two time faster than Nokogiri!!

Development

Clone this repository and navigate to it:

$ git clone https://github.com/madeindjs/crystagiri.git
$ cd crystagiri

You can generate all documentation with

$ crystal doc

And run spec tests to ensure everything works correctly

$ crystal spec

Contributing

Do you like this project? here you can find some issues to get started.

Contributing is simple:

  1. Fork it ( https://github.com/madeindjs/crystagiri/fork )
  2. Create your feature branch git checkout -b my-new-feature
  3. Commit your changes git commit -am "Add some feature"
  4. Push to the branch git push origin my-new-feature
  5. Create a new Pull Request

Contributors

See the list on Github