Intro

In the competitive business environment, effective employee performance management is crucial for maintaining productivity and achieving organizational goals. However, HR departments often struggle with gathering, managing, and analyzing performance data from various sources, which can lead to delayed or inaccurate insights. The "People Performance Data Pipeline" project addresses this business problem by providing a streamlined process for collecting and managing performance indicators, such as daily tasks, from multiple data sources (APIs, databases, and Google Sheets). This project aims to enhance the HR department's ability to report on, visualize, and predict employee performance, ultimately leading to more informed decision-making and improved productivity across the company.

Goals

The primary goals of the People Performance Data Pipeline project include:

Centralized Data Management: Collect and consolidate employee performance data from diverse sources into a single Delta Lake storage solution. This centralized approach ensures that all relevant data is readily accessible for analysis.
Data Transformation and Standardization: Convert raw performance data into meaningful metrics by assigning points to tasks and standardizing the data format. This transformation allows for easy calculation and comparison of employee performance across different departments.
Data Quality Assurance: Implement rigorous data validation processes to ensure the accuracy and reliability of the data stored in the Delta Lake. High-quality data is essential for generating trustworthy insights and making sound business decisions.
Enhanced Reporting and Analytics: Provide HR with the tools to generate detailed performance reports, create data visualizations, and leverage machine learning models to predict and boost departmental performance. This goal directly ties to improving key business metrics such as employee productivity, retention rates, and overall company efficiency.

Solution

Installation

Create new resource group in microsoft azure and add azure key vault and databricks into the resource group
Store the secret key, such as API key, Database link, Google Sheet URL, etc.

Clone this repo

git clone https://github.com/ArkanNibrastama/people_performance_data_pipeline.git

Make points delta_lake and store .csv file in points folder. Also make delta lake for bronze, silver and gold stage
Copy data_ingestion, transformation, and data_validation folder into databricks
After that make a job from the notebook like this

Conclusion

The implementation of the People Performance Data Pipeline has had a significant impact on the HR department's ability to manage and analyze employee performance data. By centralizing and standardizing data from multiple sources, and processing over 10,000+ data using PySpark, the project has reduced the process in HR department by approximately 75%, enabling faster and more accurate reporting. The data validation processes have ensured high data quality, leading to more reliable insights and predictions. As a result, the HR department has been able to identify underperforming areas and take proactive measures to boost productivity, contributing to increase in overall company performance. This project demonstrates the value of leveraging data engineering and machine learning to drive business success.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data_ingestion		data_ingestion
data_validation		data_validation
delta_lake		delta_lake
img		img
transformation		transformation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Goals

Solution

Installation

Conclusion

About

Releases

Packages

Languages

ArkanNibrastama/people_performance_data_pipeline

Folders and files

Latest commit

History

Repository files navigation

Intro

Goals

Solution

Installation

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages