A web scraper built using node.js and puppeteer to scrap the data (developer's resume information) from Toptal, which is an exclusive network of the top freelance software developers, designers, finance experts, product managers, and project managers in the world.
- Scrape developer profiles from Toptal
- Save scraped data into MongoDB
The scraper gets the following data from each developer profile:
- id
- name
- title
- location
- country
- summary
- skills
- top_skills
- portfolio
- availability
- preferred_env
- amazing
- work_exp
- proj_exp
- education
- certification
- category_skills
- Node.js and npm installed on your machine. Here's a guide on how you can install them.
- MongoDB instance running either locally or cloud-based (like MongoDB Atlas)
-
Clone this repository
git clone https://github.com/dreamjet31/toptal_scraper.git
-
Install the dependencies
cd toptal_scraper npm install
-
Create a .env file and add your MongoDB connection string:
MONGODB_URI=mongodb+srv://<username>:<password>@cluster0.mongodb.net/test?retryWrites=true&w=majority
Replace
<username>
and<password>
with the actual username and password of your MongoDB. -
Run the scraper
node index.js
-
Wait for the script to finish. All data is saved in the MongoDB collection 'resume'.
NOTE: Please ensure that you have a stable internet connection while running the script to successfully scrape the data.
- dotenv: Loads environment variables from a
.env
file intoprocess.env
- memory-cache: In-memory cache that is simple to use
- mongodb Node.js driver for MongoDB
- puppeteer: Provides a high-level API to control Chrome or Chromium over the DevTools Protocol
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the ISC License.