You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs. It reduces boilerplate code, increases type safety, and improves IDE auto-completion, making Glue development easier and more efficient.
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Scenario: You have CSV files containing daily stock prices with columns: Date, Ticker, Open, High, Low, Close, and Volume. You want to compute the daily average closing price for each stock.
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
This project is designed to fetch historical stock data for a predefined list of companies and store it in a MySQL database. It utilizes Python, Spark, and the Alpha Vantage API to perform data extraction, transformation, and loading (ETL) tasks.