Skip to content

dreamjet31/toptal_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toptal Scraper

A web scraper built using node.js and puppeteer to scrap the data (developer's resume information) from Toptal, which is an exclusive network of the top freelance software developers, designers, finance experts, product managers, and project managers in the world.

Features

  • Scrape developer profiles from Toptal
  • Save scraped data into MongoDB

Scrape Data

The scraper gets the following data from each developer profile:

  • id
  • name
  • title
  • location
  • country
  • summary
  • skills
  • top_skills
  • portfolio
  • availability
  • preferred_env
  • amazing
  • work_exp
  • proj_exp
  • education
  • certification
  • category_skills

Getting Started

Prerequisites

  • Node.js and npm installed on your machine. Here's a guide on how you can install them.
  • MongoDB instance running either locally or cloud-based (like MongoDB Atlas)

Installing

  1. Clone this repository

    git clone https://github.com/dreamjet31/toptal_scraper.git
    
  2. Install the dependencies

    cd toptal_scraper
    npm install
    
  3. Create a .env file and add your MongoDB connection string:

    MONGODB_URI=mongodb+srv://<username>:<password>@cluster0.mongodb.net/test?retryWrites=true&w=majority
    

    Replace <username> and <password> with the actual username and password of your MongoDB.

  4. Run the scraper

    node index.js
    
  5. Wait for the script to finish. All data is saved in the MongoDB collection 'resume'.

NOTE: Please ensure that you have a stable internet connection while running the script to successfully scrape the data.

Dependencies

  • dotenv: Loads environment variables from a .env file into process.env
  • memory-cache: In-memory cache that is simple to use
  • mongodb Node.js driver for MongoDB
  • puppeteer: Provides a high-level API to control Chrome or Chromium over the DevTools Protocol

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the ISC License.

Releases

No releases published

Packages

No packages published