I strongly encourage people working on computing systems to read classic systems papers. These are papers that first introduced key ideas that have shaped our understanding of computing and often illuminate fundamentals more effectively than later papers. I have found that reading classic papers has saved me from re-inventing the wheel, falling into known pits, and has helped me reason from first principles.
Many of the ideas that drive today’s “innovations” trace their origins to the groundbreaking work done between the 1950s and 1980s. Back then resources were scarse so people had to think and work harder than we do today. Technologies come and go, but foundations, core principles endure. Understanding the fundamental concepts ensures long-term adaptability and relevance in a rapidly changing field and enables first principles reasoning.
My Reading List
I was asked by some students for a list of classic system papers. Unfortunately the lists of papers I made for previous classic systems reading groups aren’t in my possession. So… the following is a list of papers I made this afternoon by harvesting my “outbox”. These are the papers that I have most frequently recommended to people in email. Sometime later I will update this page with a more carefully curated list of papers which will also include some more recent papers such as MapReduce which I consider classic, even if it was published in the 21st century :). I am sure there are important topics not covered below, but this can be a starting point. Ordered by the frequency I sent them, not by importance.
- Hints for Computer Systems Design (1983) Butler Lampson: A timeless collect of insights and practical guidance for building complex systems emphasizing simplicity, performance trade-offs, and fault tolerance. This is the paper I send to people more than any other by more than a factor of 2. There is an extended version released in 2021.
- Path-Based Failure and Evolution Management (2004) – Chen, Accardi, Kıcıman, Lloyd, Patterson, Fox, Brewer. The power of observational logs and analyzing paths to diagnosis complex systems.
- Fallacies of distributed computing – L Peter Deutsch + Bill Joy, Dave Lyon, James Gosling: Assumptions system designers make which are never true in the real world
- Worse is Better (1991) Richard Gabriel: Why good enough often trumps “correct”.
- Epidemic algorithms for replicated database maintenance (1988) Demers, Green, Hauser, Irish, Larson, Shanker, Sturgis, Swinehart, Terry: A landmark contribution to distributed systems, providing a practical, scalable, and fault-tolerant method for maintaining consistency across replicated data.
- Log: What Every Software Engineer Should Know – Jay Kreps: Post by linkedin data pipeline team which describes what logs are and how they are the heartbeat of all real systems.
- End-to-End Arguments in System Design (1984) by Saltzer, Reed, and Clark. Proposes a principle that influences the design of networks and systems, advocating that key functions should occur at endpoints rather than intermediate nodes.
- The Protection of Information in Computer Systems (1975) Saltzer, Schroeder: A foundational paper on computer security, introducing key concepts such as least privilege, fail-safe defaults, and complete mediation. These principles form the basis of secure system design.
- Reflections on Trusting Trust (1984) by Ken Thompson: Demonstrates the importance of understanding the tools used to build software, emphasizing the potential for malicious exploitation. So important in this age where the supply chain is being attacked.
- Using Encryption for Authentication in Large Networks of Computers (1978) Needham, Schroeder: The basis most authentication systems using cryptography
- Time, Clocks, and the Ordering of Events in a Distributed System (1978) by Leslie Lamport: This paper explores synchronization in distributed systems, influencing distributed databases and consensus algorithms like Paxos.
- The Byzantine Generals Problem (1982) by Lamport, Shostak, and Pease: Introduced the problem of achieving consensus in distributed systems, foundational for blockchain and fault-tolerant systems.
- Markets and Computation: Agoric Open Systems (1988) by Miller & Drexler: I dream of a secure, reliable digital marketplace without centralized control. See the book The Ecology of Computation edited by B.A. Huberman for a number of intriguing papers which provides the soil for this paper.
- Bumper-Sticker Computer Science (1985) Jon Bentley: Truisms
- Rules of Thumb for Data Engineers (2000) Jim Gray, Prashant Shenoy: Values every engineer should know.
- A Note on Distributed Computing (1996) Waldo, Wyatt: provides a timeless critique of oversimplified approaches to building distributed systems and showed the limitations of RPC systems.
- A Mathematical Theory of Communication (1948) by Claude Shannon. Is one of the most cited papers ever written. The foundation of information theory, this work explains how data is quantified and transmitted, influencing everything from data compression to network protocols.
Real World Systems
There are several projects which I often encourage people look at because they shifted my perspective and showed what was possible. Some of these links are to wikipedia and should be replaced with link to something that tells the story and significance of these systems.
Operating Systems
- Multics: so many first can’t list them all.
- KeyKOS a secure capability based system. Nice memorial for Norm Hardys
- Plan 9: what UNIX should have become, but Linux success and Bell Labs business decisions kept it from flying.
Programming Environments (way more than just a programming languages)
Not real world systems, but some university research programs which are worth a long look from UC Berkeley NOW, ROC and RADlab
Other People’s Classic Systems Papers List
- Werner Vogel, Amazon’s CTO Back to Basics posts
- MIT Press: Ideas That Created the Future: Classic Papers of Computer Science
- Harvard’s CSCI E-191 (2024) Classics of Computer Science
- Rutger’s CS417 (2020) Distributed Systems Reading List
- Dan Creswell’s Distributed Systems Reading List
- Murat Demirbas’s foundational distributed systems papers
- Fred Hebert’s A Distributed Systems Reading List (more of a discription of topics than a list of papers)
- Valbonne Consulting’s An incomplete list of classic papers every Software Architect should read
- Hacker’s News: Periodically someone posts a list of classic papers. Here are two that a quick search surfaced
Reading Groups
I have been part numerous “reading groups” which focused on “systems”. The participates would identify both classic and current papers they thought were important. Just making the list together was instructive and often led to a spirited discussion. Once we made a list of papers we would make a schedule that listed which papers were going to be covered and who was going to lead the discussion. We would typically met once per week, often over lunch. Everyone in the group would read the designated paper.
For a more complete set of guidelines, see How to Lead a Technical Reading Group by Cathy Wu of MIT. I am sure there are other good materials out there about reading groups.
Related
- Papers We Love: Repository of good papers in github
- Best Paper Awards from a variety of conferences
- A.M. Turing Award Winners in Alphabetical Order
- Murat Demirbas’s Advice to the young
- The Feynman Lectures on Physics
- Books that Really Changed Me
Are there papers that changed your perspective or have been foundational to you as a software engineer, systems designer, etc? Drop me a note because I would love to read it and potentially add it to my list.
“Instead of standing on each other’s shoulders, we stand on each other’s toes”. – Butler Lampson, quoted Hamming
Hayek, F.A. “The Theory of
Complex Phenomena.” In The
Critical Approach to Science and
Philosophy. Mario Bunge (ed.),
The Free Press of Glencoe, 1964,
pp. 332-349.