Skip to content

Why we created 6.S188

Raymie Stata edited this page Feb 6, 2018 · 2 revisions

by Raymie Stata

The education of computer scientists today focuses almost exclusively on upstream software development issues, with little exposure to how this upstream software is transformed from a bundle of bits into a secure, reliable, scalable software service. In 6.S188, our IAP workshop on Site Reliability Engineering, we aim to expose computer science students to the disciplines and practices for deploying and running Software-as-a-Service (SaaS) systems.

Who are we? Why are we doing this? David Chaiken and I (Raymie Stata) have worked together many years building and running large SaaS systems, first at Yahoo!, where we were the Chief Architect and CTO, respectively, and more recently at Altiscale, as CTO and CEO, a startup that offered Apache Hadoop and Spark as a managed service. (David is now Chief Architect at Pinterest. I’m on sabbatical.)

We are both software developers by profession and by passion, but we’ve found ourselves siding with operations teams to push our fellow developers to pay more attention to the reliability, security, and operability of the software they were writing. There are many reasons developers prioritize functionality over operability. One unfortunate one is that many simply lack the knowledge and experience required to build more operable systems. Our workshop is a small attempt to fill that gap.

The workshop consists of a few lectures plus hands-on lab work. We are particularly excited about the lab portion. Our goal is to provide a fairly realistic experience of operating a service. The lab is important because the value of operability is best appreciated when you have first-hand experience running a service. Using a Slack chatbot as a running example, students will be exposed to many of the processes of SaaS operations, including managing an incident, improving monitoring, improving automation, and pushing changes to production with feature flags.

We hope this class will inspire students to spend at least one summer internship as a Site Reliability Engineer. To that end, we will put together a book of the student’s resumes and send them out for consideration. Even if SRE isn’t your long-term career path, such an internship will experience will make you far better software engineer. (It would also stand out, in a very positive way, on your resume.)