Hints for Building & Operating High Quality Services

   Mark Verber

Early Draft, Version 0.15 -- October 1, 2003

In the spring of 1987 I found myself looking for a book which would provide guidance to people who were tried to deploy complex computing infrastructures.  A book from a systems perspective. There were plenty of books which talks about how to how to use various operating systems, that described the APIs and programming environments, books that examined core OS and kernel designs, etc, but I couldn't find anything that took a high level look at building services.  I wanted something like Elements of Networking Style by Michael Padlipsky but was focused on services rather than networks. The closest thing I found to the book I wanted to give to people was the most excellent paper by Butler Lampson entitled Hints for Computer System Design which was presented at 1983 ACM SOSP conference.

I made an outline for a book.  After two years I didn't have much to show for my work.  I tried to convince a number of friends who are much better writers of the need for a book on system building from an operations perspective. An "Elements of Style" / "Rules of Thumb" kind of book rather than a "insert part A into slot B" sort of book. While they agreed this would be a good idea, ten years went by and I still couldn't find the book I wanted to hand out.

In 1998-1999 I created some notes to help direct conversations with folks from Microsoft Research as they were writing up their Mega Services paper. When that project finished I took a few hours to type up my notes which is contained in Hints to System Designers from Ops.

In the fall on 1999 I concluded that (1) no one was going to write the book I wanted.  (2) I wasn't going to get around to writing the book either. I realized that while a book was just too much for me to tackle, I could get some articles written. So  I started to break my notes into a number of articles.  Alas, almost as soon as I started I joined a start-up and was working crazy hours. For a number of years I had been too busy to finish this paper.  In the mean time it has become clear to me that what I was trying to capture was not merely hints to system designers (engineers), but also to architects, operational staff, and management teams.  So this larger document was born.

Studying the architecture, design, implementation, and operations of a number of complex services has led me to some generalizations about design principles, architectural elements, operational processes, and organization structures which lead to the successful deploy quality services. The generalizations found in this document are based on the author's experience with a range of mission critical services for more than twenty five years, as well as interviews and papers about other complex services.

A Good Foundation

Tackle Tough Manageability Issues

Build Operational Infrastructure

Develop Good Practices

Brad's Principles in 2001

Resources

About the Author