Faster WebSocket Compression

WebSocket compression is essential for reducing bandwidth and improving responsiveness, especially when transmitting repetitive data like JSON payloads. The permessage-deflate extension compresses every WebSocket frame on the fly — but the speed of that compression directly impacts your application's throughput.

Starting with sgcWebSockets 2026.4.0, the permessage-deflate implementation has been completely rewritten for significantly faster performance. In our benchmarks, small messages compress and decompress up to 15x faster, with measurable gains across all payload sizes.

What Changed?

The previous implementation initialized and destroyed the compression engine on every single WebSocket frame. This meant that even a tiny 1 KB message paid the full cost of setting up the compressor, compressing the data, and then tearing everything down — only to repeat the entire process for the next message.

The new implementation keeps the compression engine alive across frames. It is initialized once when the first frame arrives and reused for the lifetime of the connection. This eliminates the per-frame setup overhead and also allows the engine to learn from previous messages, resulting in faster compression of repetitive data patterns.

In addition to the persistent compression context, the new implementation includes several other optimizations:

  • Pre-allocated memory buffers — Buffers are allocated once and reused, avoiding repeated memory allocation on every frame.
  • Direct memory access — When the input is already in memory, the engine reads it directly without copying it into intermediate buffers first.
  • Reused temporary streams — Internal working streams are created once in the constructor instead of being created and destroyed on every compress/decompress call.

Benchmark Results

We ran 10,000 compress + decompress round-trips for each message size. Every round-trip compresses a JSON payload and then decompresses it back, verifying the output matches the original. The test was performed on a Windows 64-bit machine compiled with Delphi 12 Athens.

Default Configuration (persistent context)

This is the default mode where the compression context is maintained across frames — the most common real-world scenario:

Message Size Previous (ms) New (ms) Speedup
1 KB 437 ms 28 ms 15.6x faster
4 KB 480 ms 88 ms 5.5x faster
16 KB 546 ms 431 ms 1.3x faster
64 KB 1,994 ms 1,725 ms 1.2x faster

With NoContextTakeOver (independent frames)

When NoContextTakeOver is enabled, each frame is compressed independently. Even in this mode, the buffer reuse and direct memory access optimizations provide a solid improvement:

Message Size Previous (ms) New (ms) Speedup
1 KB 149 ms 75 ms 2.0x faster
4 KB 173 ms 100 ms 1.7x faster
16 KB 302 ms 228 ms 1.3x faster
64 KB 1,216 ms 1,094 ms 1.1x faster

Who Benefits Most?

The improvement is most dramatic for applications that exchange many small messages — which is exactly the typical WebSocket use case:

Chat & Messaging
Short text messages (typically under 4 KB) see the biggest gains: 5–15x faster compression.
Real-time Data Feeds
JSON updates for dashboards, stock tickers, and IoT sensors benefit from both speed and the persistent context learning repetitive patterns.
Gaming & Multiplayer
Frequent small state updates benefit from the low per-frame overhead.
High-Concurrency Servers
Less CPU time per frame means the server can handle more simultaneous connections.

Fully Compatible

The optimization is fully transparent — no code changes are needed in your application. The compressed data on the wire is identical to the previous version, so upgraded servers work seamlessly with existing clients and vice versa.

The new implementation supports all platforms and compilers:

  • Delphi 7 through Delphi 13 (including C++Builder)
  • Windows, macOS, Linux, Android, iOS
  • 32-bit and 64-bit targets

Upgrade to 2026.4.0

The permessage-deflate optimization is available in sgcWebSockets 2026.4.0. Simply update to the latest version and your WebSocket connections will automatically benefit from faster compression. Download at esegece.com.

Special thanks to Michael for contributing the initial optimized implementation that inspired this work. His research into persistent zlib contexts and direct memory access laid the foundation for these performance improvements.

×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.

OAuth2 dPoP Delphi

Related Posts