dataproc-cluster

Here are 22 public repositories matching this topic...

Wittline / pyDag

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

bigquery cloud big-data workflow-engine google-cloud data-engineering task-scheduler google-cloud-platform dataproc-cluster dag parallel-processing data-pipeline dataengineering dataproc directed-acyclic-graph task-scheduling

Updated Sep 19, 2022
Python

naranjja / gcp-jupyter-sql

Star

Run Jupyter Notebooks (and store data) on Google Cloud Platform.

jupyter-notebook dataproc-cluster cloud-sql compute-engine

Updated Oct 6, 2017
Python

anjijava16 / GCP_Data_Enginner_Utils

Star

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

MarieeCzy / METAR-Data-Engineering-and-Machine-Learning-Project

Star

An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.

python docker bigquery machine-learning looker big-data spark terraform pyspark dataproc-cluster googlecloudplatform dataproc prefect streamlit

Updated May 19, 2023
Python

spotify / limbo

Star

scala spark google-cloud google-cloud-dataflow dataproc-cluster

Updated Jan 2, 2017
Scala

jaiswalanshul / gcp_dataproc_spark_airflow

Star

Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark

airflow spark gcp dataproc-cluster dataproc airflow-operators

Updated Mar 4, 2020
Python

bilalsp / yelp_etl

Star

Yelp ETL Pipeline in Apache Spark on Google Cloud Dataproc

bigquery circleci spark apache-spark gcs dataproc-cluster etl-pipeline

Updated Jul 10, 2021
Jupyter Notebook

Keval-Gandevia / BigDataETLAndSentimentAnalysis

Star

A Java based project aims to extract news articles from large .sgm file, process them and load them into MongoDB Database. It includes an Apache Spark job for word frequency analysis directly from .sgm files, and a sentiment analysis implementation using a Bag-of-Words model in Java.

java big-data apache-spark mongodb sentiment-analysis etl nosql regex gcp bag-of-words dataproc-cluster solid-principles

Updated Aug 22, 2024
Java

pietrocarbo / scala-ble

Star

A Scala Spark based project to experiment with map-reduce algorithms on big data graph shaped

scala big-data spark apache-spark yarn hadoop cluster hdfs mapreduce google-cloud-platform dataproc-cluster triangle-counting friend-recommendation

Updated Jul 13, 2018
Scala

mr-ubik / google-nembo

Star

Collection of personal resources on Google Cloud

docker google cloud tensorflow keras google-cloud datascience google-cloud-platform nvidia-docker dataproc-cluster google-compute-engine dataproc

Updated Dec 1, 2017

jonathanAmancioSales / Hadoop_Dataproc_Google_Cloud_Platform_DIO

Star

Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One

hadoop google-cloud pyspark dataproc-cluster google-cloud-dataproc

Updated Aug 21, 2021
Shell

vishnudxb / gcloud-dataproc-creation

Star

Creating gcloud dataproc cluster with this github action

testing big-data google-cloud pyspark spark-streaming dataproc-cluster

Updated Oct 18, 2020
Shell

akaliutau / gcp-prod-spark-cluster

Star

Deploying production ready environment for Spark cluster

devops terraform gcp dataproc-cluster custom-image

Updated Oct 30, 2022
HCL

vasisthasinghal / Yelp-Review-Classification

Star

Training a classification model as a Dataproc Job and using Kafka/PubSub connector for real-time prediction using pre-trained models

gcp pubsub pyspark apache-kafka google-cloud-platform dataproc-cluster big-data-analytics pyspark-mllib

Updated Oct 11, 2020
Jupyter Notebook

tirthmehta / Google-Cloud-Platform-based-Hadoop-Map-Reduce

Star

Determination of which words occur in a dataset of textbooks along with each word's occurrence count identification with the help of Google Cloud Platform based Dataproc cluster formation.

java dataproc-cluster crawler4j googlecloud dataprocessing googlecloudplatform dataproc

Updated Jul 28, 2017
Java

InspiredcL / data-science-on-gcp

Star

Código fuente: Análisis de Vuelos basado en trabajo de Valliappa Lakshmanan.

sql spark analytics tensorflow sparkml eda transform dataproc-cluster streaming-analytics mlops ingestion-pipeline bqml real-time-ml

Updated Aug 27, 2024
Jupyter Notebook

jjtoharia / Kaggle_Outbrain

Star

Kaggle - Outbrain Click Prediction (Oct-2016 - Jan-2017)

python r spark python3 kaggle xgboost dataproc-cluster lstm-neural-networks

Updated Apr 21, 2017
R

natmurad / cloudbigdata

Star

Content about how to create big data ecosystems on the Cloud

aws aws-s3 google-cloud data-engineering aws-ec2 dataproc-cluster aws-firehose

Updated Aug 28, 2021
HTML

Cyang18 / MusicProducer

Star

This is a distributed system that utilizes Apache Spark through Dataproc. We use the Spotify API to send song data to Apache Spark, which then forwards the information to Google Cloud Services. The system processes this data to recommend songs based on the extracted information.

javascript hive apache python3 dataproc-cluster apachespark

Updated Oct 14, 2024
Python

mihir-robotics / pyspark-gcp-project

Star

PySpark Job that runs in Dataproc cluster, loads data from Cloud Storage to BigQuery table.

bigquery google-cloud dataproc-cluster pyspark-python

Updated Feb 15, 2024
Python

Improve this page

Add a description, image, and links to the dataproc-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataproc-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataproc-cluster

Here are 22 public repositories matching this topic...

Wittline / pyDag

naranjja / gcp-jupyter-sql

anjijava16 / GCP_Data_Enginner_Utils

MarieeCzy / METAR-Data-Engineering-and-Machine-Learning-Project

spotify / limbo

jaiswalanshul / gcp_dataproc_spark_airflow

bilalsp / yelp_etl

Keval-Gandevia / BigDataETLAndSentimentAnalysis

pietrocarbo / scala-ble

mr-ubik / google-nembo

jonathanAmancioSales / Hadoop_Dataproc_Google_Cloud_Platform_DIO

vishnudxb / gcloud-dataproc-creation

akaliutau / gcp-prod-spark-cluster

vasisthasinghal / Yelp-Review-Classification

tirthmehta / Google-Cloud-Platform-based-Hadoop-Map-Reduce

InspiredcL / data-science-on-gcp

jjtoharia / Kaggle_Outbrain

natmurad / cloudbigdata

Cyang18 / MusicProducer

mihir-robotics / pyspark-gcp-project

Improve this page

Add this topic to your repo