Skip to content

This project is a Python-based web scraper that extracts data on the largest public companies in the US by revenue from Wikipedia.

Notifications You must be signed in to change notification settings

Prajjwol09/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Largest Public Companies in the US Web Scraping Project

Overview:

This project is a Python-based web scraper that extracts the list of the largest public companies in the United States by revenue from Wikipedia. Using the BeautifulSoup library for parsing and requests for fetching the webpage, it scrapes relevant data, structures it in a DataFrame using pandas, and exports the result to a CSV file for further analysis.

Features:

Scrapes data from a Wikipedia page containing a table of the largest public companies in the US.

Extracts company information such as ranking, name, revenue, and other details from the table.

Stores the scraped data in a pandas DataFrame.

Exports the data to a CSV file.

Technologies Used:

Python: The core programming language used to write the script.

Requests: To fetch the HTML content of the Wikipedia page.

BeautifulSoup: For parsing and navigating the HTML content to extract data.

Pandas: For data manipulation and exporting the scraped data to a CSV file.

Jupyter Notebook (Optional): For testing and experimenting with the code interactively.

Prerequisites:

Ensure you have the following libraries installed:

requests: For making HTTP requests to fetch the webpage.

BeautifulSoup: For parsing the HTML page.

pandas: For data manipulation and CSV export.

About

This project is a Python-based web scraper that extracts data on the largest public companies in the US by revenue from Wikipedia.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published