Disclaimer: This post is a reflection of my mind after reading this paper, it is not a summary or review, and should not be used for reference. But for fun is ok. Besides what mentioned here, this paper has a lot of other wonderful ideas.
Designing a computer system is very different from designing an algorithm.
The overall idea that I get is “simplify”. Simplicity ensures functionality and enhance performance.
Defining scope of each sub-system and their interface are the most important - and most difficult - part of system design. Try to keep it simple and minimal, and don’t generalize.
Each interface is a small programming language: it defines a set of objects and the operations that can be used to manipulate the objects.
The well-known KISS principle (stands for Keep It Simple, Stupid) demonstrates this idea.
A simple system is faster, more accurate, and has easier-to-predict behaviors comparing to a general one. Usually, general means slower. And users have to pay for extra cost, although they do not need those extra power. However, a fast and general solution is always the best, if you can find it.
If complicated functionalities are needed, try to delegate to client. Your system should provide basic functions, with wide range of usage that user can create. Rather than trying to cover all the things by yourself and make your system heavy and slow, potentially introduce many bugs, and hard to maintain.
Do one thing at a time, and do it well.
With hard problems, divide and conquer maybe a good approach. Transform big problem into smaller and simpler ones. Handle normal and worst case separately may save time, instead of solving them together (over-generalize). If a simple and general solution exists, choose it.
If a straightforward, brute-force solution exists, and its speed is acceptable comparing to an optimized version, or input data is relatively small, use the brute-force one.
Besides of main idea of simplify, this paper also notice about safety. System designer should avoid choosing a terrible way, rather than finding an optimum. They also need to have clear vision among sub-systems.
It also gives some advises on improving performance, some of them I feel interesting (the others I don’t understand):
- Caching. Result of time-consuming operations should be cached. Be aware of cache invalidation.
- Use hints. Time can be saved, and calculation can be executed before it is clearly stated. Note that hints can be wrong, and should check against “truths”. But checking correctness in many cases are faster than finding answer. Hints can be found by:
- Occurred events
- Statistical/historical data
- Compute heavy tasks in background.
Some ideas to enhance fault-tolerance of a system:
- Use appended logs. Current state can be re-produced from a series of append-only modification logs. Logs should be well-formatted.
- Use transactions.
- Operations should be idempotent.
Examples are quite old (1970s, 1980s) and maybe unfamiliar with current computing. However, basic concepts live long.