XML Data Scraper

This Python script scrapes XML data from a list of URLs stored in a CSV file, processes it, and saves the results into an Excel file.

Installation

Clone this repository:

git clone https://github.com/yourusername/xml-data-scraper.git

Install the required dependencies:

pip install pandas requests beautifulsoup4

Place your CSV file containing XBRL URLs and company names in the same directory as the script.
Run the script:
```
python scraper.py
```
The scraped data will be saved in a directory named "Scrapped_Data" as an Excel file named "xml_data.xlsx".

Importing Libraries: The script imports necessary libraries such as Pandas, os, requests, BeautifulSoup, and datetime.
Finding CSV File: It looks for a CSV file in the current working directory. If exactly one CSV file is found, it loads it into a DataFrame (df).
Data Cleaning: Column names are cleaned by removing whitespace and '**' characters.
Scraping XML Data: The function scrape_xml_data() iterates over each row of the DataFrame (df). For each row, it retrieves the XBRL URL, sends a GET request to the URL, parses the XML content using BeautifulSoup, and extracts required elements like turnover, net worth, and emissions data. It then appends this data into a list (scraped_data).
Saving Scraped Data: After scraping data from all URLs, it converts the list (scraped_data) into a DataFrame (scraped_df). It sets the index starting from 1 and calculates the time taken for scraping. Then, it saves the DataFrame into an Excel file.
Creating Directory: It creates a directory named "Scrapped_Data" if it doesn't exist.
Saving to Excel: The scraped data is saved to an Excel file named "xml_data.xlsx" within the "Scrapped_Data" directory.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
code.py		code.py