1. The Problem: Naive Sharding Bad for Scaling
Using hash(key) % N distributes data evenly, but resizing the cluster is disastrous. Change the number of nodes below and watch how many keys (dots) change color (move servers).
Impact of Resizing
System stable. No nodes added or removed yet.
Cache Stampede Risk: In a real system, moving 70%+ of keys means your database gets hit by 70% of traffic instantly. This brings sites down.
2. The Solution: Consistent Hashing Scalable
Nodes and Keys are placed on a "Ring". A key belongs to the first node found clockwise. Add a node, and it only "steals" keys from its immediate neighbor.
Rebalancing Impact
vs. ~67% with Modulo
Simulation Log
Partitioning (Sharding)
Splits data into subsets. Each node owns a slice of the data.
- check Scales RAM (Total cache size = Sum of nodes)
- check Scales Writes (Parallel writing)
- close Complexity in rebalancing (solved by Consistent Hashing)
Replication
Copies the same data to multiple nodes (Master-Replica).
- check Scales Reads (Read from any replica)
- check High Availability (Failover if master dies)
- close Writes are limited to Master capacity
Virtual Nodes
A refinement of Consistent Hashing. Each physical server appears multiple times on the ring.
- check Prevents "Hot Spots" (uneven data distribution)
- check Better load balancing when nodes have different capacities
Real-World Implementation: Redis Cluster
Hybrid Approach
Redis Cluster combines Sharding and Replication. It does not use a pure Consistent Hashing ring (0-360°). Instead, it uses Hash Slots.
- • There are fixed 16,384 hash slots.
- • Key location =
CRC16(key) % 16384. - • Every node in the cluster is responsible for a subset of these slots.
- • Moving a slot is easier than recalculating a full ring, allowing for cleaner resharding.