Latency

Latency is the amount of time it takes a process to be given a cpu once it is made runnable. The graph at right represents, on average, the latency of all processes during the last sample period. A process may be made runnable but also be made to wait for a free processor if
  1. All processors are currently busy, and
  2. Higher or equal priority tasks are already waiting in front of it.
Some code paths are not optimized to search for idle processors upon wakeup and so processes awakened by those paths will need to wait for a rebalance to happen (1ms or less if there is a processor idle.)

Ideally, there are always fewer processors than processes and every process always has a processor waiting for it when it wants to run. In practice, this would characterize an underutilized machine. The key here is to keep latencies as close to zero as possible. If you are always at zero, then perhaps your load is mismatched to the machine.

The lower this number is, the better.

Runslice

A runslice is the amount of time a process takes (on average) when it runs. This differs from a timeslice, which is the amount of time it is permitted to run before being forced off the processor. The graph at right indicates the average runslice time of all processes during the sample period.

The numbers here are neither inherently good nor inherently bad, but can be used to help characterize the load on the system. Loads which have very short runslices are not using their full timeslice, and so modifying the length of those timeslices (either through kernel twiddling or nice(1)) is unlikely to have any effect on the system. They are voluntarily giving up the processor, probably because they are waiting for some other event such as a signal, semaphore, or I/O completion. Loads which have large runslices, on the other hand, are more compute-intensive and may be leaving the processor involuntarily. These types of processes may well see different behavior if their timeslices are shortened or lengthened.

Runslice information and latency information may be used to roughly predict the average queue length. If processes are waiting 10ms, on average, and running only 2ms, on average, then when a process is queued it probably has 10/2 or 5 processes in front of it. On average, remember!

Should you wish to, the easiest way to change a timeslice is to use nice(1). For comparison, the normal timeslice for a process with a nice of 0 is 100ms. It can range from as little as 10ms to as much as 200ms. Changing the minimum, maximum, or algorithm for calculating it in between requires, however, an actual kernel patch.

load_balance()

The top graph at right shows the number of times load_balance() was called per second. This will range from about 5 times per second per processor, to as high as a thousand times per second per processor when it is idle.

The number of times it is called is not, in itself, of great interest, but if it is being called frequently to do balancing and failing to find processes to balance (see find_busy_queue()), some heuristics may need to be tuned.

load_balance() calls when idle for some time

load_balance() may be called from a limited number of places. It may be called when a processor is idle or busy. In addition, it may be called when the scheduler realizes a processor is about to go idle, in an attempt to bring over one or more processes to this soon-to-be-idle processor. This graph indicates the number of times load_balance() was called while the processor was already idle.

When the processor is idle, load_balance() can be called every clock tick, or up to a thousand times per second, per processor. (If the processor is idle, why not go looking for jobs?) On a 8-processor system that was completely idle, then, you'd expect to see nearly 8000 calls per second. If this number is lower, it indicates either the system was not completely idle or the queue is not being checked once per clock tick. Being low or high in any of these categories is not in itself necessarily good or bad, but it can give evidence of or confirmation of other behavior in the system.

load_balance() calls when newly idle

This graph indicates the number of times load_balance() was called while the processor was about to become idle.

load_balance() calls when busy

The last graph at right indicates the number of times load_balance() was called while the processor was busy.

When the processor is busy, load_balance() is called far less often than when idle; typically about five times per second, per processor. On a 8-processor system that was fully loaded, then, you'd expect to see about 40 calls per second. If this number is lower, it indicates the processor was not that busy.

pull_task()

pull_task() is called to move exactly one task from one runqueue to another. It is presumed we are pulling from another queue to our own, as this simplifies the process considerably. The graph at right shows the number of times pull_task() was called. pull_task() can be called when the processor is idle, newly idle, or busy.

This number is neither good nor bad by itself, but since it represents the end result of a complex decision-making process, knowing how many tasks were pulled might be very helpful in determining how successful other functions are.

pull_task() calls when idle

The graph at right shows the number of times pull_task() was called while the processor was idle.

This number is neither good nor bad by itself, but since it represents the end result of a complex decision-making process, knowing how many tasks were pulled might be very helpful in determining how successful other functions are.

pull_task() calls when busy

pull_task() can be called when the processor is idle, newly idle, or busy. The graph at right shows the number of times pull_task() was called while the processor was busy.

This number is neither good nor bad by itself, but since it represents the end result of a complex decision-making process, knowing how many tasks were pulled might be very helpful in determining how successful other functions are.

pull_task() calls when newly idle

pull_task() can be called when the processor is idle, newly idle, or busy. The graph at right shows the number of times pull_task() was called while the processor was newly idle.

This number is neither good nor bad by itself, but since it represents the end result of a complex decision-making process, knowing how many tasks were pulled might be very helpful in determining how successful other functions are.

sched_migrate_task()
migrate_to_cpu()

The graph at right indicates the number of times sched_migrate_task() (or migrate_to_cpu(), in earlier versions) was called. This function can only be called when a process execs. The theory is that when a process execs, it is giving up its previous image and we can be untroubled about such things as whether it is likely that the memory it wants to use is already in cache. Upon exec, it will require new text and new data pages anyway.

Unless the system is madly creating processes, this is unlikely to be called more than a few times per second per processor, and in many benchmarks it's not unusual to see it tail off to zero after initialization completes and a few core processes are started.

This number is neither good nor bad, but it helps characterize the load and possibly add more meaning to other data.

load_balance() when idle

load_balance() may be called from a limited number of places. It may be called when a processor is idle, or busy. In some kernels, it may also be called from schedule() when the scheduler realizes a processor is about to go idle, in an attempt to bring over one or more processes to this soon-to-be-idle processor. This graph indicates the number of times load_balance() was called while the processor was idle.

When the processor is idle, load_balance() is called every clock tick, or up to a thousand times per second, per processor. (If the processor is idle, why not go looking for jobs?) On a 8-processor system that was completely idle, then, you'd expect to see about 8000 calls per second. If this number is lower, it indicates the processor was more busy. Being low or high in itself isn't necessarily good or bad, but it can give evidence of or confirmation of other behavior in the system.

Tasks moved in active_load_balance()

The graph at right shows the number of tasks moved by active_load_balance(). The purpose of active_load_balance() is described in the graph for calls to active_load_balance().

The lower this number is, the better.

active_load_balance()

The graph at right shows the number of calls to active_load_balance(). Normally tasks are pulled from other processors to the processor doing the balancing. This function is called by the migration threads utilized when a overburdened processor realizes that not enough of the other processors are stealing its tasks -- in essence doing a "push" rather than a "pull". This is usually a complex procedure, and is a stopgap measure to prevent imbalance from existing too long on the system. Ideally, the other balancing algorithms are doing a good job and this gets called very infrequently.

The lower this number is, the better. If it is high (more than a few times per second), other balancing algorithms may need to be tuned.

Imbalances detected while in load_balance()

When load_balance() is called, it will call find_busiest_queue() to determine if there is any queue busier than itself from which it should pull tasks. If find_busiest_queue() is successful, it will also include the imbalance that it found.

In releases prior to 2.6.6, if this queue has 5 processes waiting to run and another queue has 7, find_busiest_queue() will indicate an imbalance of 2. (Note that the actual number of tasks that need to move to create balance is half that, or 1.) In releases subsequent to 2.6.6, and in -mm trees after 2.6.2, the "imbalance" is actually the number of tasks to move -- that is, it's already divided by two. In the above example, it would return 1, not 2.

Since load_balance() is called so frequently when the machine is idle, counting a failure of find_busiest_queue() as "zero imbalance" would quickly run the numbers uselessly close to zero. So what is graphed here to the right is the average imbalance when there was an imbalance found. As an exceptional case, if load_balance() was not called during the sample or was but never detected an imbalance, a value of zero was entered on the graph rather than create discontinuities.

The lower this number, the better.

sys_sched_yield()

The graph at right shows the number of times sys_sched_yield() was called. This function instructs the scheduler to take the caller off the processor. How long it should be off the processor is implementation-dependent, and programs using this function to create very short delays usually needs to retuned after system modifications, much to the maintainers' chagrin.

Because of its unpredictability as a substitute for a quick delay and the subsequent need to be constantly retune applications utilizing it, using this function is to be discouraged. Accordingly, lower numbers are better, with zero being the best score possible. Nevertheless, some applications and libraries still use it (notably many Java implementations); these applications and libraries may benefit from a retuning from time to time as the operating system changes.

schedule()

The graph at right shows the number of times schedule() was called. This function is the heart of the scheduler and is called every time a scheduling decision needs to be made or possible reevaluated -- that is, every time a sleeping process wakes up or a running process goes to sleep. It's also called at many other times when priorities may have changed and the "currently running process" may need to be changed. Systems with low runslices may see a correspondingly higher frequency of schedule() calls, as more jobs are being switched in and out per second.

This number is neither good nor bad, but does help characterize the load when interpreting other results. Although it's been written carefully, schedule() is not a cheap function to call and any modifications at either the kernel or user level that result in fewer calls to schedule() will probably improve performance.

sched_balance_exec()

The graph at right shows the number of times sched_balance_exec() was called. This function is called each time a process does an exec(). When possible, it will call sched_migrate_task() to move the task to a less busy cpu, since at exec time the task has no resident text or data pages on this (or any) processor. See the description on sched_migrate_task() for a more thorough explanation.

This number, like the count of sched_migrate_task(), is neither good nor bad, but does help characterize the load when interpreting other results.