System Design Framework
How to structure a system design answer, reason about trade-offs, and communicate technical depth as a PM.
What PMs Are Expected to Know
System design questions for PMs are not the same as for engineers. You will not be asked to implement a consistent hashing algorithm or write distributed transaction code. You are expected to:
- Structure a system at a high level and explain how the pieces fit together
- Identify the right trade-offs and explain the reasoning
- Speak credibly with engineers about scale, reliability, and architecture decisions
- Demonstrate that you can translate user and product requirements into technical constraints
The signal interviewers look for: can this PM hold a whiteboard session with their engineering team? Can they spot when a technical decision has product implications?
The 5-Step Approach
Step 1: Clarify Requirements
Never start drawing boxes. Spend 2-3 minutes establishing scope.
Functional requirements — what the system must do:
Non-functional requirements — how the system must perform:
Establish your scale envelope upfront. A system for 10,000 users is designed very differently from one for 100 million.
Step 2: Estimate Capacity
Back-of-the-envelope math shows the interviewer you understand the scale of the problem. You do not need precision — order-of-magnitude thinking is the goal.
Useful reference points:
Estimate storage: daily writes × average object size × retention period
Estimate throughput: daily active users × average requests per user per day ÷ 86,400 seconds
State your assumptions out loud. "I'll assume 50M DAU, each making 20 read requests per day, which gives us roughly 12,000 read requests per second."
Step 3: Design the High-Level Architecture
Start with a simple diagram. Add complexity only when you can justify it with a requirement.
Core building blocks:
Clients (web, mobile, IoT) send requests to your system. Think about whether clients need a native app or a web interface, and whether offline support matters.
Load Balancer distributes incoming requests across multiple servers. Prevents any single server from becoming a bottleneck. Also handles health checks and automatic failover.
Application Servers (the API layer) contain your business logic. Should be stateless — any server can handle any request. Stateless servers scale horizontally: just add more.
Databases store persistent data. The biggest design decision in most systems.
Cache (Redis, Memcached) stores frequently read data in memory for fast retrieval. Dramatically reduces database load for read-heavy systems.
Message Queue (Kafka, SQS, RabbitMQ) decouples producers from consumers. Enables async processing, absorbs traffic spikes, and prevents data loss if a downstream service is slow.
CDN (Content Delivery Network) serves static assets (images, video, JS) from servers geographically close to the user. Essential for media-heavy products.
Object Storage (S3, GCS) stores large blobs — images, videos, files — cheaply and durably. Not a database; not meant for transactional queries.
Search Index (Elasticsearch) enables full-text and fuzzy search. Databases are not good at search; maintain a separate search index and sync it asynchronously.
Step 4: Dive Into Key Trade-offs
Pick 2-3 areas where the design involves a meaningful trade-off and explain your reasoning.
SQL vs. NoSQL
| | SQL (Postgres, MySQL) | NoSQL (DynamoDB, MongoDB, Cassandra) |
Rule of thumb: Start with SQL. Move to NoSQL when you have a concrete scale or schema-flexibility problem that SQL cannot handle.
Caching strategy
- Cache-aside (lazy loading): Application checks cache first; on miss, fetches from DB and populates cache. Simple and resilient, but first requests are slow.
- Write-through: Write to cache and DB simultaneously. Keeps cache warm but adds write latency.
- TTL: Every cached entry expires after a set time. Prevents stale data but may cause thundering herd (many cache misses at once when TTLs expire together).
Synchronous vs. asynchronous processing
- Sync: User waits for the full response. Simple, but the user is blocked by every downstream service call.
- Async (via message queue): User gets an immediate acknowledgment; processing happens in the background. Better for tasks that are slow, unreliable, or can be retried — but adds complexity and requires eventual consistency.
Example: When a user posts a photo on Instagram, the upload itself is synchronous. But generating thumbnails, running content moderation, and pushing to followers' feeds all happen asynchronously via queues.
CAP Theorem
In a distributed system, you can guarantee at most two of three properties:
- Consistency (C): Every read sees the most recent write
- Availability (A): Every request gets a response (not an error)
- Partition tolerance (P): The system continues to work when network partitions occur
Since network partitions are a physical reality, real distributed systems choose between CP (consistent but may reject requests during a partition) or AP (always responds but may return stale data).
PM translation: For most consumer products, eventual consistency is acceptable. A user seeing a slightly stale follower count is not a crisis. But for payments or inventory, strong consistency is required — showing someone a product as "in stock" when it is not has real business consequences.
Step 5: Address Reliability and Failure
A system that cannot handle failures is not production-ready. Walk through how your design handles the most important failure modes.
Redundancy: Every critical component should have a backup. No single points of failure.
Graceful degradation: When a component fails, the system should degrade gracefully rather than collapsing entirely.
Rate limiting: Protect your system from abusive clients and traffic spikes. Implement at the API gateway level.
Circuit breaker: If a downstream service starts failing, stop sending requests to it rather than letting failures cascade. After a timeout, try again with a small percentage of traffic.
Data durability: For critical data, use replication and backups. Define your Recovery Point Objective (RPO — how much data loss is acceptable) and Recovery Time Objective (RTO — how long can the system be down).
Common PM System Design Questions
"Design a URL shortener (bit.ly)" Core insight: read-heavy (redirects massively outnumber writes), needs low latency, globally distributed. Key decisions: hash function for short codes, caching hot URLs, handling custom aliases.
"Design a notification system" Core insight: async by nature, must handle multiple channels (push, email, SMS), needs deduplication, rate limiting per user, and priority queuing (transactional > marketing).
"Design a news feed (Twitter/Instagram)" Core insight: the fan-out problem — when a celebrity with 50M followers posts, do you push to all feeds immediately (fan-out on write) or compute feeds on read (fan-out on read)? Most systems use a hybrid: push for normal users, pull for celebrities.
"Design a ride-sharing system (Uber)" Core insight: real-time location matching under strict latency constraints. Key components: geospatial index for driver locations, matching algorithm, trip state machine, surge pricing engine.
Common Mistakes to Avoid
- Jumping to a solution before clarifying requirements and scale
- Designing for 1 billion users when the question is about an MVP
- Adding complexity without justifying it with a concrete requirement
- Forgetting failure modes — every good design addresses what happens when components fail
- Being too vague about trade-offs — "we'd use a cache" is weak; "we'd use a cache-aside strategy with a 1-hour TTL to handle the read-heavy access pattern" is strong
- Not knowing the difference between SQL and NoSQL well enough to defend a choice
- Ignoring the CAP theorem implications of your data store choices