dataingestionframework

Here are 6 public repositories matching this topic...

divithraju / divith-raju-PySpark-Projects

linux data opensource web hadoop ubuntu bigdata apache project python3 pyspark hdfs software-engineering user dataprocessing dataengineering project-repository dataingestionframework movies-streaming

Updated Aug 15, 2024
Python

divithraju / divith-raju-Webapplication-Spark-memory-cal

Star

The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.

linux open-source calculator data database hadoop ubuntu apache project pyspark hdfs memory-allocation dataplatform dataprocessing lowcost project-repository dataingestionframework

Updated Aug 15, 2024
Python

divithraju / divith-aju-Hadoop-Pyspark-pipeline

Star

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

client documentation data database apache-spark pipeline bigdata project python3 pyspark hdfs software-engineering ecommerce-platform dataengineering datapreprocessing apache-hadoop-framework project-repository dataingestionframework

Updated Aug 17, 2024
Python

divithraju / divith-raju-Python

Star

This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.