-
Notifications
You must be signed in to change notification settings - Fork 2
How To Use The Executable
When you install the wgit
gem you also get an executable by the same name. The wgit
executable starts an interactive shell session (pry
if installed, irb
if not) with the Wgit gem already require
d.
Start the executable with:
$ wgit
Skipping .env load because 'dotenv' isn't installed
Searching for .wgit.rb file in local and home directories...
Using 'irb' REPL because 'pry' isn't installed
wgit v0.11.0
------------
irb(main):001> url = Wgit::Url.new 'http://example.com'
=> #<Wgit::Url url="http://example.com" crawled=false>
[2] pry(main)> doc = Wgit::Crawler.new.crawl url
=> #<Wgit::Document url="http://example.com" html_size=1255>
Type exit
or press Ctrl+D
to finish and exit your session.
When connecting to a database with Wgit you can specify the connection string manually to methods like Wgit::Database.new
or set the ENV['WGIT_CONNECTION_STRING']
value (allowing you to omit the connection string param).
Therefore, you can set the environment variable when you start the executable:
$ WGIT_CONNECTION_STRING="<your_connection_string>" wgit
$ [1] pry(main)> db = Wgit::Database.new
By setting the database's connection string in the ENV
, you need not pass a connection string parameter.
An alternative to providing the connection string via the command line every time is to use .env
and .wgit.rb
files to set and store in the ENV
Hash (see below).
Start by creating a .env
file. You'll also need to install the dotenv
gem. For example:
$ gem install dotenv
$ touch .env
$ echo "WGIT_CONNECTION_STRING='<connection_string>'" >> .env
The wgit
executable will look for and eval
a .wgit.rb
file, if one can be found. The two locations that are searched (in order) are the local directory and the home directory.
You can therefore use a .wgit.rb
file to store fixtures, configuration and define helper functions to easily index and search the web.
Start by creating a .wgit.rb
file in either your local or home directory:
$ touch .wgit.rb
Save the following in your .wgit.rb
file to connect to a database instance:
def db
# We omit the <connection_string> param because it's set in ENV
@db ||= Wgit::Database.new
end
Now, as soon as you start a shell session, you can access your database with commands like db.search(...)
etc.
- Because of scoping rules, any variables defined in
.wgit.rb
should be instance variables (e.g.@url
) or be accessed via a getter method (e.g.def url; ...; end
). -
require 'wgit/core_ext'
in your.wgit.rb
file so you can use methods likeString#to_url
etc. - Remove the
Wgit
namespace around its classes by addinginclude Wgit
to your.wgit.rb
file. - Include the
Wgit::DSL
which provides convenience methods for crawling, indexing and searching. - If you find yourself doing the same thing regularly e.g. indexing the same site, then define a helper function in your
.wgit.rb
file to execute with a single call. - It can be helpful to keep your personal
.wgit.rb
file in the home directory and override it with a local.wgit.rb
file when working on a specific project.
- How To Crawl A Website
- How To Crawl Locally
- How To Crawl More Than Just HTML
- How To Derive Crawl Statistics
- How To Extract Content
- How To Handle Redirects
- How To Index
- How To Multi-Thread
- How To Parse A URL
- How To Parse Javascript
- How To Prevent Indexing
- How To Use A Database
- How To Use Last Response
- How To Use The DSL
- How To Use The Executable
- How To Use The Logger
- How To Write A Database Adapter