Skip to content

SPSC Queue

A lock-free single-producer single-consumer queue.

#include <loon/spsc.hpp>

Overview

loon::SpscQueue is a lock-free queue designed for exactly one producer thread and one consumer thread. It uses atomic operations instead of mutexes, making it ideal for inter-thread communication in low-latency systems.

Usage

loon::SpscQueue<int, 1024> queue;  // fixed capacity of 1024

// Producer thread
queue.push(42);
queue.push(43);

// Consumer thread
int value;
if (queue.pop(value)) {
    std::cout << value << std::endl;  // 42
}

queue.empty();     // check if empty
queue.full();      // check if full
queue.capacity();  // 1024

API Reference

Constructors

Constructor Description
SpscQueue() Default constructor

Member Functions

Operation Return Type Description
push(value) bool Push value (returns false if full)
pop(value&) bool Pop into reference (returns false if empty)
empty() bool True if queue is empty
full() bool True if queue is full
capacity() size_t Maximum capacity (N)

Complexity

Operation Time Space
push(value) O(1) O(1)
pop(value) O(1) O(1)
empty() / full() O(1) O(1)
capacity() O(1) O(1)

Thread Safety

Single Producer, Single Consumer Only

This queue is designed for exactly one producer thread and one consumer thread. Using multiple producers or multiple consumers leads to undefined behavior.

Performance

18.7x faster than mutex-protected queues in benchmarks.

Operation Latency

Operation SpscQueue MutexQueue Speedup
Interleaved push/pop 2.40 ns 44.9 ns 18.7x
Round-trip (16B) 2.42 ns 46.0 ns 19.0x
Round-trip (64B) 4.21 ns 47.2 ns 11.2x
Round-trip (256B) 13.3 ns 55.8 ns 4.2x

Multi-threaded Producer/Consumer

Items SpscQueue MutexQueue Speedup
4,096 297M ops/s 25.8M ops/s 11.5x
65,536 367M ops/s 23.1M ops/s 15.9x

See Benchmarks for full results.

Why It's Fast

  • Lock-free: Uses atomic operations instead of mutex locks
  • No contention: Designed for exactly one producer and one consumer
  • Cache-line padding: Producer and consumer indices on separate cache lines to prevent false sharing
  • Cached index optimization: Each thread caches the other's index, avoiding cross-cache-line reads on the hot path
  • Ever-increasing indices: Full/empty checks use subtraction only — no modulo
  • Minimal synchronization: Only acquire/release memory ordering where needed

Typical Use Cases

  • Audio/video processing pipelines
  • Sensor data acquisition
  • Log message passing
  • Real-time event handling
  • High-frequency trading order routing