Chapter 1 - Reliable, Scalable and maintainable applications

Chapter 1 of Designing Data Intensive Applications

  • Data systems don't fit into clear categories now, redis is used for queues, kafka has database like durability guarantees.
  • Application code is responsible for making sure data remains consistent between any number of data systems the application might not be using.
  • Each application made from smaller data systems is a new data system. You're also a data system designer.
  • Lots of tricky questions: ensure correctness, design a good API, reliability, performance.
  • Many factors influence design: skills and experience of people, timescale for delivery etc.
  • Three concerns that are most important in systems:
    • Reliability
    • Scalability
    • Maintainability


  • Application should performs the function that the user expected
  • Should tolerate the user making mistakes.
  • Performance should be good enough
  • Should prevent unauthorized access and abuse
  • Things that can go wrong are called faults.
  • Failures are when the system as a whole stops providing the required service to the user.
  • Fault tolerance doesn't mean being tolerant of ALL faults.
  • It's best to try to design a system that prevents faults from causing failures, i.e it tolerates faults.
  • You also want to prevent certain types of faults like security issues.

Hardware faults

  • Hard disk crashes, power faults etc.
  • Usually, add redundancy to mitigate this issue.

Software errors

  • Hardware failures are generally not related, but correlated software bugs can cause the entire system to go down.
  • Systematic errors can be avoided by thorough testing, carefully thinking about assumptions and interactions in the system.

Human errors

  • Humans are unreliable.
  • Make it easy to do the right thing.
  • Decouple experimentation by providing a sandbox environment.
  • Test thoroughly.
  • Allow quick and easy recovery from human errors.
  • Set up detailed and clear monitoring.