Skip to content

Latest commit

 

History

History
75 lines (52 loc) · 1.74 KB

README.md

File metadata and controls

75 lines (52 loc) · 1.74 KB

Scraper

The system that will scrap data for the website.

Add a new scraper

  1. Create a scraper

Create a new file in ./groups/<name>.js

export const name = '<name of group>';
export const url = '<page that list brands>';
export const infoUrl = '<wikipedia page>';

export const scrapDetails = async (get$, getPage) => {
    const details = {
        name,
        slug: slugify(name),
        url,
        infoUrl,
        description,
        picture,
    };
    return details;
};

export const scrapBrands = async (get$, getPage) => {
    const brands = new Map();
    return brands;
};
  1. Scrap details

Usually, we scrap details from the group's wikipedia page.

You have access to a default one getDetailsScraper, it will scrap the name, description and logo of a group, given its url.

You can replace the scrapDetails function of your group with:

import { getDetailsScraper } from '../utils/index.js';

export const scrapDetails = getDetailsScraper(url, infoUrl);
  1. Scrap the brands

In your scrapBrands script you can choose to use either Cheerio or Puppeteer by using respectively get$ and getPage:

export const scrapBrands = async (get$, getPage) => {
    const $ = await get$(url);
    const page = await getPage(url);
};

Then you're free to use whatever lib you need. Take example of what's been already done in ./packages/scraper/groups/*

  1. Run the command
yarn scrap <name>

And it will add the new group and its brands to the shared data in ./packages/website/public/data.json

Usage

yarn start <group>

⚠️ New data will delete the previous data.