GitHub - alexymik/node-metainspector: Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc. Inspired by the metainspector Ruby gem

Node-Metainspector

MetaInspector is an npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags....

Metainspector is inspired by the Metainspector gem by jaimeiniesta

Scraped data

client.url                	# URL of the page
client.scheme             	# Scheme of the page (http, https)
client.host               	# Hostname of the page (like, markupvalidator.com, without the scheme)
client.rootUrl 			  	# Root url (scheme + host, i.e http://simple.com/)
client.title              	# title of the page, as string
client.links              	# array of strings, with every link found on the page as an absolute URL
client.author               # page author, as string
client.keywords             # keywords from meta tag, as array
client.charset              # page charset from meta tag, as string
client.description        	# returns the meta description, or the first long paragraph if no meta description is found
client.image              	# Most relevant image, if defined with og:image
client.images              	# array of strings, with every img found on the page as an absolute URL
client.feeds            	# Get rss or atom links in meta data fields as array
client.ogTitle      		# opengraph title

Options

timeout - Defines the time Metainspector will wait for the url to respond in ms
maxRedirects - Specifies the number of redirects Metainspector will follow
limit - The limit in the number of bytes Metainspector will download when querying a site

Usage

var MetaInspector = require('node-metainspector');
var client = new MetaInspector("http://www.google.com", { timeout: 5000 });

client.on("fetch", function(){
    console.log("Description: " + client.description);

    console.log("Links: " + client.links.join(","));
});

client.on("error", function(err){
	console.log(error);
});

client.fetch();

TO DO

Finish implementation of the properties below:

Add absolutify url function to return all urls as an absolute url

client.internal_links     	# array of strings, with every internal link found on the page as an absolute URL
client.external_links     	# array of strings, with every external link found on the page as an absolute URL

ZOMG Fork! Thank you!

You're welcome to fork this project and send pull requests. Just remember to include tests.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
test		test
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
Gruntfile.js		Gruntfile.js
LICENSE		LICENSE
README.md		README.md
index.js		index.js
makefile		makefile
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node-Metainspector

Scraped data

Options

Usage

TO DO

ZOMG Fork! Thank you!

About

Releases

Packages

Languages

License

alexymik/node-metainspector

Folders and files

Latest commit

History

Repository files navigation

Node-Metainspector

Scraped data

Options

Usage

TO DO

ZOMG Fork! Thank you!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages