Latency, part one: Why Latency Matters

Latency issues are common in embedded software. They are caused by complex, hidden parts of the software stack. They usually become apparent late in the development process. By nature, they are costly and tricky to fix.

This was not an issue as long as embedded applications remained simple. Most applications were simply not affected by these issues as they were tolerant to latencies in the tens or hundreds of milliseconds. And when latencies mattered, the engineer in charge of the project knew the whole code base and could spend time tweaking timings with precise code optimizations.

This is no longer the case. Modern real-time applications (think machine vision, deep learning) are now entering life-critical embedded systems. They run on powerful multi-core processors and use deep software stacks totaling several millions of lines of code. The specifications require latencies below the millisecond. The complexity at play makes it imperative to use software designed for real-time systems.

Let's analyze what latency is, what are its causes and how we can minimize it.


What is latency exactly?

Latency can be defined as the time elapsed between an event and the response to that event. A latency issue occurs when that duration becomes longer than the required response time of the application.

Jitter is the variability in latency. Symptoms of jitter may include variations in response timemissed deadlines and the overall impression that the application is poorly designed.

​Latency ​in ​operating systems

The main element that enables the development of complex systems today is the operating system (OS). The OS is the abstraction layer with the hardware. By masking the complexity of modern processors, the OS enables the development of high-level libraries and frameworks for complex tasks. Nowadays, using an OS is the only solution for developing high-performance applications.

In these systems, the observed latency has two main components. First, the OS receives the event and transfers it to the right part of the application. Then, the application treats the event and emits the appropriate response.

 The OS latency is defined as the time between an event and the execution of the application that will respond to that event. Several functional steps contribute to this delay. Let's analyze the steps that the OS kernel must perform.

1. The kernel receives an interrupt request​

This step represents the transition from the hardware to the software. It is the amount of time that elapses between the moment the physical interrupt line is asserted and the kernel reaches the beginning of the Interrupt Service Routine (ISR). This delay is usually quite short and predictable. However, several sources of jitter may already appear. Since an interrupt forces the kernel to branch to the ISR, there might be cache misses. Furthermore, some RTOSes mask interrupts to create critical sections, making it harder to predict the response time. This goes to show that the design of the OS has an impact from the start.

2. The kernel runs the Interrupt Service Routine

After receiving the event, the kernel must identify the thread that will handle the event. This step varies from one OS to the other. There are two main approaches.

Monolithic kernels prefer to run complex ISRs in kernel mode to save on time later on. The kernel immediately propagates the event to the module in charge of that event. In turn, the module will interpret the event and create high-level abstractions. The most famous example of monolithic kernel is Linux.

Embedded systems usually favor the microkernel approach. Microkernel systems leave the complex parts of the ISR to a user-mode thread. In this model, the part of the ISR that runs in the kernel is extremely short and predictable. We followed this philosophy in the development of Maestro.

While monolithic kernels have better overall performance, the microkernels are more robust and predictable. Usually, monolithic kernels tend to have longer ISRs and therefore larger latencies at this step.

3. The kernel calls the scheduler

The thread that answers to the event is ready for execution. However, the kernel may decide to execute it later. After all, some events may be more important. The kernel must decide which thread should run now. This step is called scheduling. Scheduling is usually the main culprit in causing kernel latency, for several reasons. First, the execution of the scheduler itself can be quite expensive in execution time. Second (and most important), the time that elapses between the scheduling request and the actual invocation of the scheduler can be quite long, depending on the architecture of the kernel.

Standard Linux-based solutions call the scheduler at periodic intervals, the so-called scheduler tick. This interval is a critical part of the total latency as it can last up to several milliseconds on out-of-the-box solutions. You can mitigate this issue by properly configuring the kernel. However, it is a common pitfall for unknowing developers.

4. The kernel dispatches the thread

After taking a scheduling decision, all that ​remains is to execute the selected thread. The kernel must ​pause one of the threads that are currently running and start executing the new one. This is called a preemption.

The duration of a preemption depends on the total load of the system and the scheduling policy in use. Once again, this component depends on the design of the kernel. On partitioned systems, threads always execute on the same processor core. On global systems, threads can execute on any core available at the moment. While partitioned systems are simpler, they may waste time waiting for a high-priority thread to finish even if another core is available. On the other hand, global systems do not have that limitation but their implementation is often poor with regards to real-time constraints. We will talk more about these differences in a later article.

Hardware-related latencies

Some forms of latencies are inherent to the design of modern processors. The most typical example comes from the hierarchical architecture of the memory. Modern embedded processors usually have two layers of cache before reaching the RAM.
A typical cause of jitter is the absence of the code in charge of handling an event in the cache, a cache miss. Cache misses can happen at any level of the cache hierarchy. Except for specific applications, it is usually impossible to avoid cache misses at run-time. Hard real-time avionic application designers often disable caches entirely. Of course, this has a major impact on performance. For less critical domains, several techniques exist to make the system more predictable while keeping the performance benefits of caches.

We ​will not go over​ the standard issues caused by swapping memory to disk at run-time. Nowadays, several forms of disks can be present at the same time (PCI-Express SSD, SD card, USB drive, …) with different access times. Delays caused by I/O accesses can become unpredictable when using the mass storage architecture naïvely. Designing a reliable application on systems that allow swapping is a challenge in and of itself.

Multi-core processor memory hierarchy

​However, software solutions can mitigate latencies caused by the hardware. You can learn more in this article about avoiding interferences in memory accesses.


​There are multiple ​causes for latencies. All these sources accumulate and can generate pretty large and unpredictable delays. Latencies caused by the operating system affect all the applications running on top. These latencies may not be acceptable for demanding applications.

The next article will focus on how we can adapt a standard Linux-based system to reduce latencies and provide acceptable results for non-critical applications.

Olivier Desenfans

Olivier is a senior software engineer at HIPPEROS. He is passionate about embedded systems technologies and designing modern, safe applications.