Skip to content
Pascal Raszyk edited this page May 12, 2016 · 7 revisions

Here are some pretty one-liners (ok, one-statements!) to show you the idea of what Infoboxer can do.

Some info on Argentina

puts Infoboxer.wp.get('Argentina').infobox.fetch('leader_name1')
# Prints: 
#  Cristina Fernández de Kirchner

Shows:

  • simple page and infobox data extraction;
  • readable representation of tree nodes.

Porsche 911 engines

table = Infoboxer.wp.get('Porsche 991').
  sections('Engines' => 'Performance').
  tables.first

headings = table.heading_row.cells.map(&:to_s)
# => ["Model", "Transmission", "Engine", "Top speed", "Acceleration 0-100", "Emissions"]

table.body_rows.
  map{|tr|
    headings.zip(tr.cells.map(&:to_s)).to_h
  }
# => [{"Model"=>"Carrera", "Transmission"=>"7-speed man", "Engine"=>"3.4", "Top speed"=>"289 km/h", "Acceleration 0-100"=>"4.8", "Emissions"=>"211 g/km"},
#     {"Model"=>"Carrera", "Transmission"=>"7-speed PDK", "Engine"=>"3.4", "Top speed"=>"287 km/h", "Acceleration 0-100"=>"4.6", "Emissions"=>"191 g/km"},
#     ...and so on...

Shows:

  • navigating by sections;
  • tables;
  • information extraction from tables.

List of works by Kilgore Trout

Infoboxer.wp.get('Kilgore Trout').
  sections('"Works" by Kilgore Trout' => /.*/).
  lookup(:ListItem).map{|li|
    {
      title: li.lookup(:Italic).first.text,
      mention: li.lookup(:Italic)[1].text,
      type: li.in_sections.first.heading.text_
    }
  }
# => [{:title=>"Barring-gaffner of Bagnialto or This Year's Masterpiece", :mention=>"Breakfast of Champions", :type=>"Novels"},
#     {:title=>"The Big Board", :mention=>"Slaughterhouse-Five", :type=>"Novels"},
#     ...and so on

Shows:

  • navigating by sections;
  • nodes tree lookup;
  • nodes text extraction.

Wikitravel: extract list of venues by city and type

Infoboxer.wikivoyage.get('Chiang Mai').
  sections('See' => 'Elephants').templates(name: 'see').
  fetch_hashes('name', 'address', 'price')
# => [{"name"=>#<Var(name): Baanchang Elephant Park>, "address"=>#<Var(address): 147/1 Rachadamnoen Rd>, "price"=>#<Var(price): 4500 baht a day (can be split b...>}
#     ...and so on...

Shows:

  • usage of other-than-Wikipedia sources,
  • navigation by sections,
  • usage of templates inside document body,
  • complex fetching from templates.

Birthdays of some presidents!

Infoboxer.wp.
  get('Argentina', 'Bolivia', 'Chile').
  infobox.fetch('leader_name1').
    lookup(:Wikilink).follow.
    infobox.fetch_hashes('name', 'office', 'birth_date')
# => [{"name"=>#<Var(name): Cristina Fernández de Kirchner>, "office"=>#<Var(office): President of Argentina>, "birth_date"=>#<Var(birth_date): 1953-02-19>},
#    ...and so on

Shows:

  • extracting several pages at once (in one request to Wikipedia API!);
  • working with list of pages, which is as simple as with list of nodes;
  • following wikilinks and parsing page by link.

Next topics: