Open source platform for the machine learning lifecycle
-
Updated
Nov 21, 2024 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Open source platform for the machine learning lifecycle
Simple and Distributed Machine Learning
lakeFS - Data version control for your data lake | Git for data
酷玩 Spark: Spark 源代码解析、Spark 类库等
Interactive and Reactive Data Science using Scala and Spark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Apache Spark docker image
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Feathr – A scalable, unified data and AI engineering platform for enterprise
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
A curated list of awesome Apache Spark packages and resources.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
PySpark + Scikit-learn = Sparkit-learn
(Deprecated) Scikit-learn integration package for Apache Spark
MapReduce, Spark, Java, and Scala for Data Algorithms Book
R interface for Apache Spark
Created by Matei Zaharia
Released May 26, 2014