Beginners often think concurrency is about:
“Using multiple threads to go faster.”
In HFT, concurrency is about:
Avoiding coordination costs while staying correct.
Most latency disasters come not from computation, but from threads waiting on each other.
1. Why Concurrency Is Harder Than It Looks
Modern CPUs are fast, but:
- Memory is shared
- Execution is out-of-order
- Visibility is not guaranteed
When multiple threads touch the same data:
- Cache lines bounce between cores
- Pipelines stall
- Latency explodes
HFT systems treat shared state as a liability.
2. The Real Cost of Locks
Locks seem simple:
- Acquire
- Modify
- Release
But under the hood, locks:
- Serialize execution
- Trigger cache invalidations
- Cause thread blocking
Even uncontended locks introduce latency.
In HFT hot paths, locks are usually forbidden.
3. Why Blocking Is Worse Than Spinning
When a thread blocks:
- The OS schedules something else
- Context switch occurs
- Cache state is lost
This unpredictability is fatal for HFT.
HFT prefers:
- Busy spinning
- Short critical sections
- Dedicated cores
Again, CPU waste is cheaper than jitter.
4. Single-Writer Principle
One of the most powerful HFT patterns:
Only one thread is allowed to write a piece of data.
Multiple readers are allowed.
Benefits:
- No write-write races
- Simplified memory ordering
- Fewer cache invalidations
Most market data and order books follow this model.
5. Lock-Free Does Not Mean Race-Free
Lock-free code:
- Avoids mutexes
- Uses atomic operations
But it still requires:
- Careful design
- Deep understanding of memory visibility
Incorrect lock-free code is worse than locked code.
Correctness always comes first.
6. Atomics and Memory Ordering (Beginner View)
Atomic operations guarantee:
- No torn reads/writes
But visibility depends on memory ordering.
At a high level:
- Relaxed → fastest, weakest guarantees
- Acquire/Release → common in HFT
- Sequentially consistent → safest, slowest
Most HFT systems avoid full sequential consistency.
7. False Sharing: The Silent Killer
Two threads updating different variables:
- In the same cache line
Result:
- Cache line ping-pongs between cores
- Massive latency increase
HFT engineers:
- Pad structures
- Align data
- Separate hot fields
Memory layout is part of concurrency design.
8. Ring Buffers: The HFT Workhorse
Many HFT systems use:
- Single-producer, single-consumer ring buffers
Why?
- No locks
- Predictable memory access
- Constant-time operations
This pattern appears everywhere:
- Market data pipelines
- Order routing
- Logging
Simple structures often win.
9. Message Passing Over Shared State
Instead of sharing objects, HFT systems:
- Pass messages
- Use queues
- Isolate responsibilities
This:
- Reduces contention
- Improves reasoning
- Matches hardware reality
Concurrency becomes data flow, not shared mutation.
10. Latency vs Throughput Tradeoffs
High throughput systems batch work.
HFT systems often:
- Avoid batching
- Process one message at a time
This sacrifices throughput for:
- Lower latency
- Faster reaction
Design always depends on strategy requirements.
11. Beginner Mental Model
Think of concurrency as:
Many cooks sharing a tiny kitchen
The fewer shared tools, the smoother the workflow.
Isolation beats coordination.
12. What Comes Next?
Now that we can process data safely and fast, we must understand what the data represents.
- Order books
- Matching engines
- Why microseconds matter economically
➡ Article 6: Market Microstructure & Trading Mechanics
