Interactive and Reactive Data Science using Scala and Spark.
-
Updated
May 16, 2023 - JavaScript
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Interactive and Reactive Data Science using Scala and Spark.
Notes about Spark Streaming in Apache Spark
Pyspark Notebook With Docker
An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster
Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Tutorial for exploring FHIR data with Apache Spark in an interactive notebook
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
Zeppelin notebook online
PySpark notebooks to learn Apache Spark (WIP)
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
One Click deployment of Notebooks - Bringing Notebooks to Production
Notebook image and notebook for feature reduction talk
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
Notebook de las clases de 75-06 Organización de Datos - FIUBA
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
PySpark & Jupyter Notebooks Deployed On Kubernetes
Gallery of Apache Zeppelin notebooks using Enth-Spark-AI.
Created by Matei Zaharia
Released May 26, 2014