Web-scraping-project

First web scraping project using Python to get data about top repos from github topics webpage

For this project I've used JetBrains Datalore for my IDE. It's based on Jupyter notebooks. Idea was to gather data and create a datasets about top repositories on each programming topic that is available on GitHub Topics webpage. So, tipical web scraping project that can be done using Python with which cover hole data pipeline. Data on GitHub for this project, generally, are organized in two parts. First is the primary webpage where are listed all topics (https://github.com/topics), and second is that every topic itself has it's own webpage where are listed all top stared repositories.

I've used Request and BeutifulSoup libraries for scraping, parsing and extracting data that I needed. Also have small use of Pandas. Logic of a Python script is following: 1. Create csv file with all topics present. 2. For every topic extract data for top repos and put in csv format 3. All csv files are gathered in data folder

The future of this project can be developing maybe Stremit app in which user can choose which website to scrap with similar structure like GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
data.zip		data.zip
github topics repositories - Python Web Scraping Project.ipynb		github topics repositories - Python Web Scraping Project.ipynb
topics.csv		topics.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-scraping-project

About

Releases

Packages

Languages

dentitoreto/Web-scraping-project

Folders and files

Latest commit

History

Repository files navigation

Web-scraping-project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages