Legion: An Ultra-Fast, Multi-Threaded Logging Library

By Siddharth Sabron, June 14, 2025

Mcleodganj

Image Link

This article presents a comprehensive design document for a high-performance Java logging library, based on deep research and proven architectural patterns.

The development of an ultra-fast, high-throughput, low-dependency logging library for Java applications, particularly for Spring environments, necessitates a rigorous architectural approach. The core value proposition of such a library lies in its ability to minimize performance overhead on application threads while ensuring comprehensive log capture, request tracing, and visualization-friendly JSON output. This objective is achieved through fundamental principles such as asynchronous processing, meticulous zero-allocation strategies, and intelligent concurrency management. The design emphasizes leveraging core Java constructs where possible, while pragmatically acknowledging the irreplaceable role of battle-tested components like the LMAX Disruptor for achieving truly “ultra-fast” performance characteristics. The success of this endeavor hinges on a deep understanding of JVM optimization, concurrent programming paradigms, and rigorous benchmarking.

1. Core Design & Foundation: High Level Architecture

The pursuit of an “ultra-fast” logging library begins with foundational architectural principles that dictate how log events are handled from their inception to their persistence. These principles are designed to ensure minimal impact on the application’s critical path and maximize logging throughput.

Asynchronous Processing: Decoupling for Minimal Caller Impact

A cornerstone for achieving “ultra-fast” performance is the immediate decoupling of logging events from the caller thread. This means that the application thread, upon initiating a log request, should relinquish control as quickly as possible, allowing the core business logic to proceed without delay. This asynchronous approach is widely recognized as a technique to significantly improve application logging performance by offloading all I/O operations to separate threads. For instance, Log4j 2’s asynchronous loggers are specifically designed to return control to the application with minimal delay.

The underlying reason for this design choice is rooted in the nature of I/O operations. Synchronous I/O, whether to disk or network, inherently blocks the executing thread until the operation completes. If logging were synchronous, the application’s responsiveness and overall throughput would be directly constrained by the speed of log persistence. By immediately handing off the log event to a dedicated queue and returning control, the application thread is shielded from these I/O latencies, thereby maintaining high responsiveness and throughput for its primary functions. This fundamental decoupling is a direct driver for achieving minimal overhead on the main application threads.

Batching: Amortizing I/O Costs for High Throughput

To further optimize I/O operations, log events should be processed and written in batches. The cost associated with initiating an I/O operation (e.g., a system call for a file write or a network packet transmission) often has a fixed overhead, regardless of the amount of data being transferred (up to a certain point). By accumulating multiple log events into a single, larger batch before writing, this fixed overhead is amortized over numerous log entries. This significantly reduces the number of I/O calls, leading to a substantial increase in overall logging throughput.

For example, asynchronous appenders in Log4j2 are designed to flush to disk at the end of a batch, which is more efficient than immediate flushing for each log event. While batching enhances throughput by reducing the frequency of costly I/O operations, it introduces a slight increase in latency for individual log events. Each event must wait in the buffer until a batch is full or a timeout occurs. This represents a classic trade-off between maximizing throughput and minimizing individual log event latency, a balance that requires careful tuning.

Zero-Allocation (Garbage-Free where possible): Minimizing GC Pressure in the Critical Path

A critical aspect of “ultra-fast” performance and consistent low latency is the minimization of new object allocations during the logging critical path. Excessive object creation leads to increased pressure on the Java Garbage Collector (GC), resulting in more frequent and potentially longer GC pauses, which can manifest as latency spikes. Libraries like ZeroLog, though a .NET library, exemplify this principle by aiming for a “complete zero-allocation manner” after initialization to prevent GC triggers. Log4j2 also implements a “garbage-free mode” by reusing objects and buffers to reduce GC pressure.

It is important to understand the practical implications of “zero-allocation.” While the aspiration is to eliminate all new object allocations after initialization, achieving this absolutely in a feature-rich logging library can be exceptionally challenging. Log4j2’s “garbage-free mode” demonstrates this nuance, noting that certain scenarios, such as logging messages exceeding a specific character limit, logging numerous parameters, or using certain lambda expressions, may still result in temporary object allocations. This suggests that “garbage-free” in a real-world context often translates to “minimal garbage” or “garbage-free under ideal, frequently occurring conditions.” The design should therefore prioritize object reuse in the most performance-critical paths, such as LogEvent creation and population, while acknowledging and documenting scenarios where some allocation might be unavoidable, or where pooling mechanisms can mitigate their impact.

Thread Isolation: Dedicated Logging Workers for Application Stability

To prevent logging operations from adversely affecting the main application’s responsiveness or stability, the threads responsible for processing and writing logs must be isolated from the application’s core threads. This separation ensures that even if logging operations encounter bottlenecks (e.g., slow disk writes, network congestion), the main application threads remain unblocked and continue executing business logic without interruption.

Asynchronous logging inherently promotes this thread isolation. The ExecutorService is a fundamental Java utility for managing thread execution and task scheduling, facilitating the creation of dedicated thread pools for logging activities. This approach to thread pool isolation prevents resource exhaustion within the logging subsystem from cascading and affecting other critical services within the application. Similarly, ThreadLocal variables contribute to thread isolation by providing each thread with its own independent storage, preventing data interference between concurrent operations. By containing logging-related performance fluctuations or failures within their own isolated thread pool, the overall system gains improved stability and resilience.

1.2. Core Components Overview

The architecture of this ultra-fast logging library is composed of several interconnected components, each designed for optimal performance and specific responsibilities:

1.3. Dependency Strategy: The “Less Libraries” Imperative

A core tenet of this project is to minimize external dependencies, thereby reducing the library’s footprint, potential for dependency conflicts, and overall complexity.

2. Low-Level Design & Implementation Details: Engineering for Speed

2.1. LogEvent Class: Object Pooling and reset() for Garbage-Free Operation

The LogEvent class serves as the data carrier for each log entry. Its design is critical for achieving garbage-free logging.

2.2. Custom MDC Implementation (MyMDC): Efficient ThreadLocal Management and Map Reuse

The Mapped Diagnostic Context (MDC) provides a mechanism to inject contextual information into log entries, such as a requestId for tracing. The custom MyMDC implementation will leverage ThreadLocal for this purpose.

2.3. LogBuffer Deep Dive: Achieving Extreme Throughput

The LogBuffer is the central component for inter-thread communication, directly influencing the “ultra-fast” performance claim.

Option A (Extreme - LMAX Disruptor)

For achieving the highest possible throughput and lowest latency, especially in high-volume logging scenarios, integrating the LMAX Disruptor library is often considered a necessity. The Disruptor is not merely a queue; it is a concurrency pattern designed for “mechanical sympathy” with modern hardware, minimizing cache contention and false sharing.

Its core components include:

To integrate, a LogEvent factory would be defined for the Disruptor to pre-allocate LogEvent objects. EventTranslator instances would be used by producer threads to publish LogEvent data to the Ring Buffer, and EventHandler implementations would be used by LogProcessor threads to consume and process these events. Log4j2’s asynchronous loggers leverage the Disruptor, demonstrating its capability to achieve significantly higher throughput and lower latency compared to ArrayBlockingQueue-based solutions.

However, the Disruptor’s performance is highly context-dependent. While it excels in single-producer, single-consumer scenarios, its advantages in multi-producer environments, especially without explicit bulk operations, can be less pronounced and even worse than a well-optimized ArrayBlockingQueue. The Disruptor also requires a substantial upfront memory allocation for its ring buffer (e.g., 80-140MB for Log4j2), which is a trade-off for its garbage-free design. This memory footprint must be considered for microservices or memory-constrained environments.

Option B (High Performance - Custom ArrayBlockingQueue)

As an alternative to the Disruptor, a java.util.concurrent.ArrayBlockingQueue<LogEvent> can be used. This is a fixed-capacity, bounded blocking queue.

While ArrayBlockingQueue is simpler to implement and avoids the external Disruptor dependency, it typically does not match the peak performance of a finely tuned Disruptor for extreme throughput scenarios. Log4j2’s asynchronous appenders, which use ArrayBlockingQueue, are demonstrably slower than its Disruptor-based asynchronous loggers.

Capacity & Full Policies

Determining the optimal buffer size is crucial. For the Disruptor, a power-of-two capacity (e.g., 2^N) is typically recommended. For ArrayBlockingQueue, a reasonable fixed size must be chosen.

When the LogBuffer is full, the library’s behavior must be explicitly defined:

A hybrid approach might be implemented, where low-priority logs are discarded when the buffer is full, while high-priority logs might block or be buffered to a secondary, smaller critical buffer.

The choice between Disruptor and ArrayBlockingQueue is not a simple matter of “faster is better.” While Disruptor offers unparalleled low-latency and high-throughput capabilities in specific configurations, its complexity and memory footprint are significant. For multi-producer scenarios, a well-tuned ArrayBlockingQueue with bulk operations can sometimes rival or even surpass Disruptor’s performance. This highlights that “ultra-fast” is highly dependent on the specific workload patterns and the careful tuning of the chosen concurrency mechanism.

2.4. LogProcessor (Worker Threads): Thread Management, Batching Logic, and Graceful Shutdown

The LogProcessor is responsible for consuming LogEvents from the LogBuffer and writing them to the configured sinks.

2.5. LogFormatter: Crafting Visualization-Friendly JSON

The LogFormatter transforms a LogEvent into a structured JSON string.

Schema Definition

The JSON schema must be rich and consistent for external tools.

{
  "timestamp": "ISO_DATE_TIME",
  "level": "INFO|WARN|ERROR|DEBUG|TRACE",
  "serviceName": "String",
  "hostName": "String",
  "requestId": "UUID_STRING",
  "threadName": "String",
  "loggerName": "String",
  "message": "String",
  "context": {
    "userId": "123",
    "sessionId": "abc"
  },
  "data": {
    "productId": "PROD-001",
    "operationTimeMs": 50,
    "eventIdentifier": "PRODUCT_RETRIEVAL_SUCCESS"
  },
  "exception": {
    "type": "String",
    "message": "String",
    "stackTrace": "String"
  }
}

Optimization: StringBuilder and ByteArrayOutputStream Reuse

To minimize object allocations, reusing mutable buffers is crucial.

Jackson vs. Gson

Given the “ultra-fast” requirement, Jackson is the preferred choice, using only its core modules to maintain the “less libraries” goal.

2.6. LogSink Interface and Implementations: Efficient Data Persistence

The LogSink interface defines the contract for writing log entries.

interface LogSink {
    void write(List<String> batchedLogEntries);
    void flush();
    void close();
}

FileLogSink

NetworkLogSink (e.g., HTTP/Kafka)

2.7. MyLogger Facade: The Ultra-Lightweight Entry Point

The MyLogger facade is the public API. It must be exceptionally fast.

2.8. Spring Integration: RequestIdFilter and Lifecycle Hooks

3. Deep Research & Advanced Optimization: Pushing the Performance Envelope

3.1. Comprehensive Garbage-Free Strategies

3.2. Concurrency Model Deep Dive

3.3. I/O Optimization: Advanced NIO Nuances

3.4. Logging Levels & Filtering

3.5. Visualization & Schema Refinement

3.6. Request Unique ID Lifecycle (Traceability)

4. Rigorous Testing & Reliability: Ensuring Production Readiness

4.1. Benchmarking: Measuring “Ultra-Fast” Performance

4.2. Failure Modes & Graceful Degradation: Resilience in Adversity

A production-ready library must degrade gracefully without crashing the application.

4.3. Configuration: Custom Parser vs. Standard Properties/YAML

For a low-dependency library, avoid heavy configuration frameworks.

4.4. Testability: Unit Testing Performance-Critical Paths

Conclusion & Future Outlook

The development of an “ultra-fast,” low-dependency, multi-threaded Java logging library is a challenging but achievable endeavor. The core principles of asynchronous processing, rigorous zero-allocation strategies, intelligent concurrency management, and dedicated thread isolation are fundamental to achieving minimal overhead and high throughput.

The success of such a library is measured by its empirical performance and resilience. This necessitates continuous, rigorous benchmarking using tools like JMH, focusing on tail latencies. Furthermore, the library must be designed with robust failure modes and graceful degradation strategies, ensuring that logging issues never compromise the main application’s stability.

Future enhancements could include:

By adhering to these principles, a truly “ultra-fast,” high-throughput, low-dependency logging library can be realized, providing invaluable observability for performance-critical Java applications.


Works Cited

  1. Asynchronous loggers :: Apache Log4j - Apache Logging Services
  2. Unlocking Java Asynchronous Programming: Key Concepts, Tools, and Benefits
  3. Log4j 2 Lock-free Asynchronous Loggers for Low-Latency Logging
  4. ZeroLog 2.2.0 - NuGet
  5. Abc-Arbitrage/ZeroLog: A high-performance, zero … - GitHub
  6. Garbage-free logging :: Apache Log4j
  7. Log4j – Garbage-free Steady State Logging - Apache Log4j 2
  8. Log4j 2: The Complete Guide to Modern Java Logging - Dash0
  9. Thread Pool Isolation | Apache Dubbo
  10. Java Thread-Local Variables: Explained - DZone
  11. Java ThreadLocal - Java and Spring Trends
  12. An Introduction to ThreadLocal in Java | Baeldung
  13. Jackson vs Gson: Edge Cases in JSON Parsing for Java Apps - DZone
  14. Efficient JSON Parsing in Java: Jackson vs. Gson vs. JSON-B - Java …
  15. Why did log4j2 used LMAX Disruptor in Async logger instead of any other built in non blocking data structure? - Stack Overflow
  16. A Well Known But Forgotten Trick: Object Pooling - High Scalability -
  17. Object Pooling in Java - memory management - Stack Overflow
  18. Logback MDC: An Essential Tool For Effective Logging In Java
  19. Log4j 2 Thread Context - Apache Logging Services
  20. Java ThreadLocal: Unveiling the Power of Thread-Specific Data - Devzery
  21. How to clean up ThreadLocals - Stack Overflow
  22. Adding user info to log entries in a multi-user app using Mapped Diagnostic Context
  23. Correlation in logging - DEV Community
  24. Java • ThreadLocal Best Practices - KapreSoft
  25. LMAX Disruptor User Guide
  26. Re: Disruptor vs ArrayBlockingQueue tests. Disruptor is better only …
  27. Create an alternative async logger implementation using JCTools · Issue #2220 · apache/logging-log4j2 - GitHub
  28. Analysis of Performance Practices in Java Log and Principle Interpretation - Alibaba Cloud
  29. What does the term “backpressure” mean in Rxjava? - Stack Overflow
  30. DiscardingAsyncQueueFullPolicy (Apache Log4j Core 2.24.3 API)
  31. Debugging Batch Applications - Documentation | Spring Cloud Data Flow
  32. Reuse StringBuilder for Efficiency | Baeldung
  33. XML Performance Optimization - Speed and Memory Efficiency - Web Reference
  34. Java File writing I/O performance - TMSVR
  35. Reading/Writing to/from Files using FileChannel and ByteBuffer in Java - Java Code Geeks
  36. Boost Java performance with memory-mapped files | Transloadit
  37. asynchronous - MappedByteBuffer throwing a java.nio …
  38. Java ByteBuffer performance issue - Stack Overflow
  39. ByteBuffer.allocateDirect ridiculously slow| JBoss.org Content Archive (Read Only)
  40. NIO Performance Improvement compared to traditional IO in Java - Stack Overflow
  41. IO and NIO performance difference and example - Stack Overflow
  42. Java NIO AsynchronousFileChannel - Jenkov.com
  43. Troubleshooting Java NIO Non-Blocking IO Performance Pitfalls - Java Tech Blog
  44. Simple Java Rest Client posting JSON to a server - GitHub Gist
  45. POST request send JSON data Java HttpUrlConnection - Stack Overflow
  46. How can I use java to send HTTP post request of a JSON array of strings? - Stack Overflow
  47. Enable logging to TCP inputs in your Java project - Splunk Dev
  48. A Guide to Using Raw Sockets - Open Source For You
  49. Basic Distributed Counter using Java Sockets - Codemia
  50. mirsamantajbakhsh/RawSocket: A simple code for raw socket in Java using JNetPCAP. - GitHub
  51. Kafka performance: 7 critical best practices - NetApp Instaclustr
  52. How to Improve Kafka Performance: A Comprehensive Guide
  53. Failure Handling Mechanisms in Microservices and Their Importance - DZone
  54. How Do I Handle Failures in Microservices? - JavaDZone
  55. LoggingRetryPolicy (DataStax Java Driver for Apache Cassandra - Binary distribution 2.1.10 API)
  56. Java Client | GridGain Documentation
  57. Timeouts, retries and backoff with jitter - AWS
  58. Exponential Backoff in RxJava - Stack Overflow
  59. Mastering Backpressure in Project Reactor: Common Pitfalls | Java Tech Blog
  60. Java Logging: Troubleshooting Tips and Best Practices - Last9
  61. Implementing Correlation ids in Spring Boot (for Distributed Tracing in SOA/Microservices)
  62. ThreadPool with CompletableFuture (need MDC propagation) : r/SpringBoot - Reddit
  63. Java Microbenchmark Harness (JMH) - GeeksforGeeks
  64. ExecutorService with backpressure - java - Stack Overflow
  65. Java • Logback Mapped Diagnostic Context (MDC) in Action | KapreSoft
  66. OpenElements/java-logger-benchmark: A benchmark for … - GitHub
  67. Benchmark - Swift Package Index
  68. Performance Analysis using JMH | The Backend Guy
  69. jmh: Run benchmark concurrently - java - Stack Overflow
  70. java - JMH - How to correctly benchmark Thread Pools? - Stack Overflow
  71. Is your reported p99 wrong? - Random Notes and Cheat Sheets - Kirk Pepperdine
  72. Design for graceful degradation | Cloud Architecture Center | Google …
  73. A guide to graceful degradation in web development - LogRocket Blog
  74. Logging Best Practices to Reduce Noise and Improve Insights - Last9
  75. ralfstx/minimal-json: A fast and small JSON parser and … - GitHub
  76. mmastrac/nanojson: A tiny, compliant JSON parser and … - GitHub

Read Previous

Clik - A Self-Deployable URL Shortener

Go to top