Skip to content

crishernandezmaps/liqen-scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Liqen Scrapper 2

Find news and get the relevant information of them.

This project uses

  1. Google Custom Search to search into the medias websites.
  2. Scraping techniques to extract the content of an article.

Usage

This package includes 2 functions that can be used together or separately:

  • googleSearch(term, options) => Promise<Object> to perform a Google Search
  • downloadArticle(uri) => Promise<Object> to parse an article

Examples

Using only googleSearch

const { googleSearch } = require('liqen-scrapper')

const options = {
  apiKey: 'MY_GOOGLE_API_KEY',
  cx: 'MY_CX'
}

googleSearch('climate change', options)
  .then(result => result.items)
  .then(items => items.forEach(item => {
    console.log(item.title)
    console.log(item.link)
  }))

Using only downloadArticle

const { downloadArticle } = require('liqen-scrapper')

  .then(article => {
    console.log(article.metadata.title)
    console.log(article.body.html.slice(0, 80))
    downloadArticle('http://cultura.elpais.com/cultura/2017/02/08/actualidad/1486573775_868895.html')
  })

Using both functions together

const { googleSearch, downloadArticle } = require('liqen-scrapper')
const options = {
  apiKey: 'MY_GOOGLE_API_KEY',
  cx: 'MY_CX'
}
const promiseOfArticles = googleSearch('climate change', options)
  .then(result => result.items.map(item => item.link))
  .then(links => links.map(downloadArticle))

Promise.all(promiseOfArticles)
  .then(articles => articles.map(article => article.body.html))
  .then(bodies => {
    bodies.forEach(body => {
      console.log(body.slice(0,80))
    })
  })

docs

See /docs directory for more docs

About

tool to collect news about environmental issues

Resources

License

Stars

Watchers

Forks

Packages

No packages published