Skip to content

How To Use The Executable

Michael Telford edited this page Jul 8, 2024 · 12 revisions

When you install the wgit gem you also get an executable by the same name. The wgit executable starts an interactive shell session (pry if installed, irb if not) with the Wgit gem already required.

Start the executable with:

$ wgit
Skipping .env load because 'dotenv' isn't installed

Searching for .wgit.rb file in local and home directories...

Using 'irb' REPL because 'pry' isn't installed

wgit v0.11.0
------------

irb(main):001> url = Wgit::Url.new 'http://example.com'
=> #<Wgit::Url url="http://example.com" crawled=false>
[2] pry(main)> doc = Wgit::Crawler.new.crawl url
=> #<Wgit::Document url="http://example.com" html_size=1255>

Type exit or press Ctrl+D to finish and exit your session.

Connecting to a Database

When connecting to a database with Wgit you can specify the connection string manually to methods like Wgit::Database.new or set the ENV['WGIT_CONNECTION_STRING'] value (allowing you to omit the connection string param).

Therefore, you can set the environment variable when you start the executable:

  $ WGIT_CONNECTION_STRING="<your_connection_string>" wgit
  $ [1] pry(main)> db = Wgit::Database.new

By setting the database's connection string in the ENV, you need not pass a connection string parameter.

An alternative to providing the connection string via the command line every time is to use .env and .wgit.rb files to set and store in the ENV Hash (see below).

Using .env to set the connection string

Start by creating a .env file. You'll also need to install the dotenv gem. For example:

$ gem install dotenv
$ touch .env
$ echo "WGIT_CONNECTION_STRING='<connection_string>'" >> .env

[Optional] Using .wgit.rb to connect to the database

The wgit executable will look for and eval a .wgit.rb file, if one can be found. The two locations that are searched (in order) are the local directory and the home directory.

You can therefore use a .wgit.rb file to store fixtures, configuration and define helper functions to easily index and search the web.

Start by creating a .wgit.rb file in either your local or home directory:

$ touch .wgit.rb

Save the following in your .wgit.rb file to connect to a database instance:

def db
  # We omit the <connection_string> param because it's set in ENV
  @db ||= Wgit::Database.new
end

Now, as soon as you start a shell session, you can access your database with commands like db.search(...) etc.

Tips & Tricks

  • Because of scoping rules, any variables defined in .wgit.rb should be instance variables (e.g. @url) or be accessed via a getter method (e.g. def url; ...; end).
  • require 'wgit/core_ext' in your .wgit.rb file so you can use methods like String#to_url etc.
  • Remove the Wgit namespace around its classes by adding include Wgit to your .wgit.rb file.
  • Include the Wgit::DSL which provides convenience methods for crawling, indexing and searching.
  • If you find yourself doing the same thing regularly e.g. indexing the same site, then define a helper function in your .wgit.rb file to execute with a single call.
  • It can be helpful to keep your personal .wgit.rb file in the home directory and override it with a local .wgit.rb file when working on a specific project.