Skip to content

Scrape pages with node/io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more.

Notifications You must be signed in to change notification settings

zackiles/deep-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Scrape

npm version

Scrape and crawl pages with io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more. Crawl sites, or scrape a single page. Add cookies or proxy requests. Fingerprints common javascript libraries, and allows you to write your own.

Installation

This was tested on node 0.12.x. It can be run as a module export, or a command line script.

npm install deep-scrape
// or clone the repository and run it as a script.

Use Case

  • You are scraping websites with lots of javascript (Angular, Ember, Browserfy).
  • You don't mind trading a bit of performance for more detailed scraping data.
  • You would like to find potential DOM sinks and sources on your pages (Possibly for vulnerability scanning).
  • You need the most detailed metadata, metrics, and analyitics on your scraped pages.
  • You would like to fingerprint possible technologies a certain site or page uses.

About

Scrape pages with node/io.js and get a whole lot of meta data. Shows; headers, Ajax requests/responses, rendered html, Javascript AST's, dependencies, console events, and a whole lot more.

Resources

Stars

Watchers

Forks

Packages

No packages published