Skip to content

v0.0.23 - Basic RDD Support + Spark ML Cookbook

Pre-release
Pre-release
Compare
Choose a tag to compare
@anthony-khong anthony-khong released this 19 Aug 03:22
· 153 commits to develop since this release
be42842

Preliminary RDD support with only certain transformations completed and completion of two parts of the cookbook for Spark ML.

  • Basic RDD support: mainly basic transformations such as map, reduce, map-to-pair and reduce-by-key. The main challenge has been doing serialisation of functions which are mainly taken from Sparkling and sparkplug.
  • Spark ML cookbook: added two chapters on Spark ML pipelines and ported customer segmentation blog post with non-negative matrix factorisation.
  • Better Geni CLI: new --submit command-line argument to emulate spark-submit.
  • Better CI steps: automated Geni CLI tests to avoid manual testing of the Geni REPL.
  • Completed benchmark results: added results from dplyr, data.table, tablecloth and tech.ml.dataset.