The system that will scrap data for the website.
- Create a scraper
Create a new file in ./groups/<name>.js
export const name = '<name of group>';
export const url = '<page that list brands>';
export const infoUrl = '<wikipedia page>';
export const scrapDetails = async (get$, getPage) => {
const details = {
name,
slug: slugify(name),
url,
infoUrl,
description,
picture,
};
return details;
};
export const scrapBrands = async (get$, getPage) => {
const brands = new Map();
return brands;
};
- Scrap details
Usually, we scrap details from the group's wikipedia page.
You have access to a default one getDetailsScraper
, it will scrap the name, description and logo of a group, given its url.
You can replace the scrapDetails
function of your group with:
import { getDetailsScraper } from '../utils/index.js';
export const scrapDetails = getDetailsScraper(url, infoUrl);
- Scrap the brands
In your scrapBrands
script you can choose to use either Cheerio or Puppeteer by using respectively get$
and getPage
:
export const scrapBrands = async (get$, getPage) => {
const $ = await get$(url);
const page = await getPage(url);
};
Then you're free to use whatever lib you need. Take example of what's been already done in ./packages/scraper/groups/*
- Run the command
yarn scrap <name>
And it will add the new group and its brands to the shared data in ./packages/website/public/data.json
yarn start <group>