Skip to content

longtng/Stochastic-Gradient-Descent

Repository files navigation

Stochastic-Gradient-Descent

The laboratory from CLOUDS Course at EURECOM

This repository included the Stochastic Gradient Descent laboratory from CLOUDS Course at EURECOM, which was conducted in a group with three other members as NGUYEN Van Tuan (Van-Tuan.Nguyen@eurecom.fr) and Yangxin YUAN (Yangxin.Yuan@eurecom.fr)

Furthermore, the CLOUDS - Distributed Systems and Cloud Computing course was offered by Prof. Pietro Michiardi at EURECOM. The details of the course can be retrieved in here http://michiard.github.io/DISC-CLOUD-COURSE/

Course Description

The goal of this course is to provide a comprehensive view of recent topics and trends in distributed systems and cloud computing. We will discuss the software techniques employed to construct and program reliable, highly-scalable systems. We will also cover the architecture design of modern datacenters that constitute a central topic of the cloud computing paradigm. The course is complemented by some lab sessions to get hands-on experience with Apache Spark.

The Laboratory Description

The goal of this notebook is to work on distributed optimization algorithms, which are the foundation for large scale analytics and machine learning. Specifically, we will focus on the details of stochastic gradient descent (SGD). To do so, we will work on a simple regression problem, where we will apply SGD to minimize a loss function, as defined for the problem at hand. The emphasis of this laboratory is not on the machine learning part: even if you've never worked on regression problems, this shouldn't prevent you from being successful in developing the Notebook.

Next, an outline of the steps we will follow in this Notebook:

  • Brief introduction to linear regression
  • Implementation of serial algorithms: from Gradient Descent, to Stochastic Gradient Descent, Batch Gradient Descent, and Mini-Batch Gradient Descent
  • Implementation of distributed algorithms with Apache Spark