Excel File Aggregator and Processor

Overview

This project is a Python-based tool designed to automate the aggregation and processing of multiple Excel files. It provides a user-friendly interface for selecting input and output directories, reads and concatenates data from multiple Excel sheets, standardizes column names, and removes duplicates based on specified criteria. The processed data is then saved into a new Excel file.

Key Features

Multi-File Aggregation: Reads multiple Excel files from a user-specified directory and aggregates data by sheet name.
Column Standardization: Standardizes column names for consistency across different data sets.
Duplicate Removal: Removes duplicate entries based on a unique identifier while retaining the entry with the highest specified value.
User-Friendly Interface: Utilizes a graphical user interface (GUI) for easy selection of input and output directories.
Output Generation: Saves the processed and cleaned data into a new Excel file in the user-specified output directory.

Requirements

Python 3.x
pandas
openpyxl
tkinter

Detailed Description

File Reading and Aggregation:
- Uses the glob module to identify and read all Excel files within the selected directory.
- Utilizes pandas to read and parse each sheet within the Excel files.
- Aggregates data from sheets with the same name across different files into a single DataFrame per sheet name.
Column Standardization:
- Standardizes variations of column names (e.g., SR, sr, SRID to SR ID, and AGECLASS to AGE) to ensure uniformity across all data sets.
Data Processing:
- Sorts the data by the unique identifier (SR ID) and a specified value (AGE) to prioritize entries with the highest value.
- Removes duplicate entries based on the unique identifier, retaining the entry with the highest specified value.
Output Generation:
- Saves the cleaned and processed data into a new Excel file using pandas' ExcelWriter.
- Each sheet name in the original data is preserved in the output file.
User Interaction:
- Utilizes tkinter to provide a GUI for folder selection, making it easy for users to specify input and output directories without manually entering paths.

Installation

Clone the repository:

git clone https://github.com/pbonotis15/aggregate-and-process-multiple-excel-files.git

Navigate to the project directory:
```
cd to_your_folder
```
Install the required packages:
```
pip install pandas openpyxl
```

Usage Instructions

Setup

(OPTIONAL) Install pyinstaller for creating an executable.

Running the Script

Run the script directly using Python to test and verify functionality.
```
python aggregate-tpbe.py
```
(OPTIONAL) Use pyinstaller to create an executable file:
```
pyinstaller --onefile aggregate-tpbe.py
```

Execution

Launch the executable or run the script.
Use the GUI to select the folder containing the Excel files for input.
Select the output folder where the processed Excel file will be saved.
The script will read, process, and save the data, displaying a confirmation message upon completion.

Example Use Case

A company regularly receives Excel reports from multiple departments. Each report contains various sheets with task data, but column names may vary and some tasks are duplicated. This tool can automate the aggregation of these reports, standardize the column names, remove duplicates, and produce a clean, consolidated Excel file for further analysis.

This project simplifies data consolidation, enhances data consistency, and saves time, making it ideal for organizations dealing with large volumes of Excel data from multiple sources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Excel File Aggregator and Processor

Overview

Key Features

Requirements

Detailed Description

Installation

Usage Instructions

Setup

Running the Script

Execution

Example Use Case

Files

README.md

Latest commit

History

README.md

File metadata and controls

Excel File Aggregator and Processor

Overview

Key Features

Requirements

Detailed Description

Installation

Usage Instructions

Setup

Running the Script

Execution

Example Use Case