Skip to content

mikekoetter/node-phantom-simple

 
 

Repository files navigation

node-phantom-simple

Build Status NPM version

A bridge between PhantomJS / SlimerJS and Node.js.

This module is API-compatible with node-phantom but doesn't rely on WebSockets / socket.io. In essence the communication between Node and Phantom / Slimer has been simplified significantly. It has the following advantages over node-phantom:

  • Fewer dependencies/layers.
  • Doesn't use the unreliable and huge socket.io.
  • Works under cluster (node-phantom does not, due to how it works) server.listen(0) works in cluster.
  • Supports SlimerJS.

Migrating 1.x -> 2.x

Your software should work without changes, but can show deprecation warning about outdated signatures. You need to update:

  • options.phantomPath -> options.path
  • in .create() .evaluate() & .waitForSelector() -> move callback to last position of arguments list.

That's all!

Installing

npm install node-phantom-simple

# Also need phantomjs OR slimerjs:

npm install phantomjs
# OR
npm install slimerjs

Note. SlimerJS is not headless and requires a windowing environment. Under Linux/FreeBSD/OSX xvfb can be used to run headlessly.. For example, if you wish to run SlimerJS on Travis-CI, add those lines to your .travis.yml config:

before_script:
  - export DISPLAY=:99.0
  - "sh -e /etc/init.d/xvfb start"

Usage

You can use it exactly like node-phantom, and the entire API of PhantomJS should work, with the exception that every method call takes a callback (always as the last parameter), instead of returning values.

For example, this is an adaptation of a web scraping example:

var driver = require('node-phantom-simple');

driver.create({ path: require('phantomjs').path }, function (err, browser) {
  return browser.createPage(function (err, page) {
    return page.open("http://tilomitra.com/repository/screenscrape/ajax.html", function (err,status) {
      console.log("opened site? ", status);
      page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function (err) {
        // jQuery Loaded.
        // Wait for a bit for AJAX content to load on the page. Here, we are waiting 5 seconds.
        setTimeout(function () {
          return page.evaluate(function () {
            //Get what you want from the page using jQuery. A good way is to populate an object with all the jQuery commands that you need and then return the object.
            var h2Arr = [],
                pArr = [];

            $('h2').each(function () { h2Arr.push($(this).html()); });
            $('p').each(function () { pArr.push($(this).html()); });

            return {
              h2: h2Arr,
              p: pArr
            };
          }, function (err,result) {
            console.log(result);
            browser.exit();
          });
        }, 5000);
      });
	  });
  });
});

.create(options, callback)

options (not mandatory):

  • path (String) - path to phantomjs/slimerjs, if not set - will search in $PATH
  • parameters (Array) - CLI params for executed engine, [ { nave: value } ].
  • ignoreErrorPattern (RegExp) - a regular expression that can be used to silence spurious warnings in console, generated by Qt and PhantomJS. On Mavericks, you can use /CoreText/ to suppress some common annoying font-related warnings.

For example

driver.create({ parameters: { 'ignore-ssl-errors': 'yes' } }, callback)

will start phantom as:

phantomjs --ignore-ssl-errors=yes

You can rely on globally installed engines, but we recommend to pass path explicit:

driver.create({ path: require('phantomjs').path }, callback)
// or for slimer
driver.create({ path: require('slimerjs').path }, callback)

You can also have a look at the test directory to see some examples of using the API, however the de-facto reference is the PhantomJS documentation. Just mentally substitute all return values for callbacks.

WebPage Callbacks

All of the WebPage callbacks have been implemented including onCallback, and are set the same way as with the core phantomjs library:

page.onResourceReceived = function(response) {
  console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response));
};

This includes the onPageCreated callback which receives a new page object.

Properties

Properties on the WebPage and Phantom objects are accessed via the get()/set() method calls:

page.get('content', function (err, html) {
  console.log("Page HTML is: " + html);
});

page.set('zoomfactor', 0.25, function () {
  page.render('capture.png');
});

// You can get/set nested values easy!
page.set('settings.userAgent', 'PhAnToSlImEr', callback);

Known issues

Engines are buggy. Here are some cases you should know.

  • .evaluate can return corrupted result:
    • SlimerJS: undefined -> null.
    • PhantomJS:
      • undefined -> null
      • null -> '' (empty string)
      • [ 1, undefined, 2 ] -> null
  • page.onConfirm() handler can not return value due async driver nature. Use .setFn() instead: page.setFn('onConfirm', function () { return true; }).

License

MIT

Other

Made by Matt Sergeant for Hubdoc Inc.

About

Simple bridge to phantomjs for Node

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%