Month 6, Week 3

System Design & Architectural Thinking

From Code to Blueprint

Module 1: The Mindset of an Architect

Thinking in Systems, Not Just Code

What is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.

It's the high-level blueprint for how we build scalable, reliable, and maintainable applications.

The 4-Step Framework for System Design

A structured approach is crucial. We will use this framework for any design problem.

  1. Step 1: Requirements Clarification (Functional & Non-Functional) - What are we building?
  2. Step 2: Back-of-the-Envelope Estimation - How big is the problem?
  3. Step 3: High-Level Design (The Whiteboard Sketch) - What are the major components?
  4. Step 4: Deep Dive & Bottlenecks - Where will it break? How do we fix it?

Functional vs. Non-Functional Requirements

  • Functional Requirements: What the system must *do*.
    • e.g., "A user must be able to shorten a URL," "A user must be able to retrieve the original URL."
  • Non-Functional Requirements (NFRs): The qualities of the system. This is where architects earn their keep.
    • Scalability: How does the system handle increased load?
    • Availability: Is the system always online? (e.g., 99.99% uptime)
    • Latency: How fast is the system? (e.g., responses under 100ms)
    • Durability: Is data ever lost?

Module 2: The Core Components of Scale

The Architect's Toolkit

Scaling Vertically vs. Horizontally

  • Vertical Scaling ("Scaling Up"): Adding more resources (CPU, RAM) to a single server. It's easy, but it has a hard physical limit and gets very expensive.
  • Horizontal Scaling ("Scaling Out"): Adding more servers to your pool of resources. This is how massive applications are built. It's more complex but has almost no limit.

All the components we will discuss are tools for effective horizontal scaling.

Load Balancers

A load balancer is a server that sits in front of your application servers and distributes incoming traffic across them.

Key Benefits:

  • Scalability: Allows you to add more application servers to handle more traffic.
  • Availability: Can detect if a server has failed (health checks) and stop sending traffic to it, ensuring the application stays online.

Database Replication & Sharding

A single database is often the first bottleneck.

  • Replication (Read Scaling): Create copies of your database. A "leader" (master) database handles all writes, and "follower" (replica) databases handle all reads. This is great for read-heavy applications.
  • Sharding (Write Scaling): Horizontally partition your data. For example, users A-M go to Database 1, and users N-Z go to Database 2. This is much more complex but allows for massive write scalability.

Caching

The most important tool for improving read performance. We place a high-speed in-memory cache (like Redis) in front of our database.

By serving frequent requests from the cache, we dramatically reduce database load and improve latency.

Content Delivery Networks (CDNs)

A CDN is a geographically distributed network of proxy servers and their data centers.

It is a cache for your static assets (images, videos, JS/CSS files). It serves these assets from a server that is geographically close to the user, dramatically reducing latency.

Mid-Lecture Knowledge Check

Module 3: The CAP Theorem

The Fundamental Trade-off of Distributed Systems

Brewer's Theorem

The CAP Theorem states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

  • Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
  • Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  • Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

The Choice: CP vs. AP

In a modern distributed system, network partitions (P) are a fact of life. You must assume they will happen. Therefore, the real trade-off is between Consistency and Availability.

  • CP (Consistency & Partition Tolerance): If a partition occurs, the system will shut down the non-consistent part of the system to guarantee that any data returned is correct. (e.g., Banking systems, financial ledgers).
  • AP (Availability & Partition Tolerance): If a partition occurs, the system will continue to respond, even if it means the data is stale or "eventually consistent." (e.g., Social media likes, DNS).

As an architect, you must decide which guarantee is more important for your specific feature.

Module 4: Practical Design - A URL Shortener

Applying the Framework

Step 1: Requirements Clarification

  • Functional:
    1. Users can submit a long URL and receive a shorter, unique URL.
    2. Users who visit a short URL are redirected to the original long URL.
  • Non-Functional:
    1. Must be highly available (people need links to work).
    2. Redirects must be extremely fast (low latency).
    3. Short URLs should not be guessable.

Step 2: Back-of-the-Envelope Estimation

Assume 100 million new URLs per month. What are the read and write loads?

  • Writes: `100M / (30 days * 24 hrs * 3600s)` ≈ 40 URLs/sec.
  • Reads: Assume a 10:1 read-to-write ratio. `40 * 10` = 400 redirects/sec.
  • Storage: `100M * 5 years * 12 months` ≈ 6 Billion URLs. If each URL is ~500 bytes, `6B * 500B` ≈ 3 TB of storage.

The system is read-heavy. This is a critical insight.

Step 3: High-Level Design (v1)

A single server and a single database.

Write Flow: `POST /shorten` with a long URL. The app generates a unique hash, stores `(hash, longURL)` in the database, and returns the short URL.

Read Flow: `GET /`. The app looks up the hash in the database and performs a 301 redirect.

This has single points of failure and will not scale.

Step 4: Deep Dive & Scaling (v2)

Let's add our scaling components.

  1. Add a Load Balancer and multiple API Servers (Horizontal Scaling).
  2. Since it's read-heavy, use Database Replication (Leader-Follower).
  3. The redirect is a perfect use case for a Cache. Add a Redis cache that stores `(hash, longURL)`. Most read requests will now hit the fast cache and never touch the database.

In-Class Practical Exercise

Whiteboard Design: A Social Media Feed

Your task is to work through the 4-step framework to design the backend for a simplified social media feed. We will do this together on the chalkboard.

  1. Requirements:
    • Users can post short text messages ("tweets").
    • Users can follow other users.
    • A user's home feed should show the most recent tweets from everyone they follow.
    • Must be highly available and eventually consistent.
  2. Estimations: Let's assume 1 Billion users and 500 Million daily active users. How many tweets per second? How many feed loads per second? Is it read-heavy or write-heavy?
  3. High-Level Design: What are the main API endpoints (`POST /tweet`, `GET /feed`)? What are the main database tables (`users`, `tweets`, `follows`)?
  4. Deep Dive & Bottlenecks: The `GET /feed` is the critical path. How do we generate it?
    • Fan-out on Write: When a user tweets, we push that tweet into the timeline of *all* their followers. Fast reads, slow writes.
    • Fan-out on Read: When a user requests their feed, we find everyone they follow, get all their recent tweets, merge, and sort them. Slow reads, fast writes.
    • Which is better? Let's discuss the trade-offs. How does caching fit in?

Final Knowledge Check