Race Conditions
A race condition happens when software behavior depends on the uncontrollable timing of events (threads).
Imagine two people trying to withdraw money from the same bank account at the exact same moment. If the system doesn't "lock" the account during the transaction, both might read the old balance, withdraw cash, and overwrite the balance incorrectly. This is the "Lost Update" problem.
How it works conceptually:
- Thread A reads Balance ($100)
- Thread B reads Balance ($100) // B doesn't know A is busy!
- Thread A calculates $100 + $10 = $110
- Thread B calculates $100 - $10 = $90
- Thread A writes $110
- Thread B writes $90 OVERWRITE!
Interactive: Bank Vault Simulator
Interactive: Resource Gridlock
Printer+Disk
Disk+Printer
Deadlocks
A deadlock occurs when two or more threads are blocked forever, each waiting for the other to release a resource.
It's the "Deadly Embrace".
The Coffman Conditions:
A deadlock only happens if all four are true:
- check_circle Mutual Exclusion: Resources can't be shared (e.g., a printer).
- check_circle Hold and Wait: A thread holds a resource while waiting for another.
- check_circle No Preemption: Resources can't be forcibly taken away.
- check_circle Circular Wait: A waits for B, B waits for A.
lightbulb How to fix?
Establish a strict Lock Ordering. If everyone must acquire the Printer before the Disk, a circular wait becomes impossible.
warning Real World Disasters
Therac-25 (1985)
A radiation therapy machine killed patients due to a race condition. If an operator typed commands too quickly, the software set the beam to high power but failed to move the protective shield into place.
Read Case Study open_in_newMars Pathfinder (1997)
The lander kept rebooting on Mars. A low-priority meteorological thread held a mutex that a high-priority bus thread needed, but a medium-priority thread was blocking the low one.
Read Case Study open_in_newNortheast Blackout (2003)
A race condition in the alarm system software caused the primary server to freeze. The backup system failed to sync, leaving operators blind to the cascading power grid failures.
Read Case Study open_in_newbuild The Mitigation Toolkit
Mutex (Mutual Exclusion) lock
The most common tool. It acts like a key to a bathroom. Only one thread can hold the key at a time. If others want in, they must wait in line.
Semaphores traffic
Like a nightclub bouncer with a clicker. It allows a specific number of threads (N) to access a resource simultaneously. Good for connection pools.
Atomic Operations science
Hardware-supported operations that happen instantaneously. Cannot be interrupted. Great for simple counters, but hard for complex logic.
import threading
lock = threading.Lock()
balance = 0
def safe_deposit(amount):
# 1. Acquire the lock (Wait if needed)
lock.acquire()
try:
# Critical Section: Only one thread here!
current = balance
balance = current + amount
finally:
# 2. ALWAYS release the lock
lock.release()