GitHub - Prajjwol09/Web-Scraping: This project is a Python-based web scraper that extracts data on the largest public companies in the US by revenue from Wikipedia.

Largest Public Companies in the US Web Scraping Project

Overview:

This project is a Python-based web scraper that extracts the list of the largest public companies in the United States by revenue from Wikipedia. Using the BeautifulSoup library for parsing and requests for fetching the webpage, it scrapes relevant data, structures it in a DataFrame using pandas, and exports the result to a CSV file for further analysis.

Features:

Scrapes data from a Wikipedia page containing a table of the largest public companies in the US.

Extracts company information such as ranking, name, revenue, and other details from the table.

Stores the scraped data in a pandas DataFrame.

Exports the data to a CSV file.

Technologies Used:

Python: The core programming language used to write the script.

Requests: To fetch the HTML content of the Wikipedia page.

BeautifulSoup: For parsing and navigating the HTML content to extract data.

Pandas: For data manipulation and exporting the scraped data to a CSV file.

Jupyter Notebook (Optional): For testing and experimenting with the code interactively.

Prerequisites:

Ensure you have the following libraries installed:

requests: For making HTTP requests to fetch the webpage.

BeautifulSoup: For parsing the HTML page.

pandas: For data manipulation and CSV export.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
WebScraping.ipynb		WebScraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Prajjwol09/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages