Threads, Threads Everywhere. And, Not a Needle In Sight

Introduction

Several terms have been used liberally throughout articles on thinkmiddleware.com. A moment should be taken to formally define them. I am going to attempt to present these concepts in as generic a sense as possible, but the truth of the matter is, every OS has its own threading model and implementation. And, every implementation has its own peculiar quirks.

This article is not an exhaustive study of threading concepts or implementation on any operating system. That would be a multi-volume book. This article attempts to layout the conceptual framework of threading concepts that lie between Java threads and an OS’s dispatchable entity.

In the Generic Sense

An OS scheduler is the piece of an operating system that decides what program or part of the operating system gets to run next; it makes this determination at set intervals, called the quantum. In the old days, a process was the dispatchable unit of a Unix-like operating system. This means that the OS scheduler dispatched processes to run on the CPU. More recently, threads have become the dispatchable unit. Or, more appropriately, kernel-level entities that represent threads have become the dispatchable unit. There are exceptions to this rule as we will see; in Linux, processes are still the basic dispatchable unit.

A process is a set of data structures that maintain state, an address space (memory), and other OS resources that provide the execution environment for a running program. A process contains (links to) another set of data structures that represent an abstraction known as a Light Weight Process or LWP.

An LWP is a kernel-level abstraction, another set of data structures, that represents a user-level thread; this abstraction hides the details in the kernel. It also divorces the user-level thread concept from kernel-level threads or an equivalent mechanism within the kernel.

A user-level thread(or user thread), or native thread in Java speak, creates a parallel execution path within a program. User threads are created, controlled, and destroyed by application code using a threading API provided by an OS library (or a Posix wrapper library). This OS thread library implements a documented threading API that is implemented with the kernel interface for kernel threads and LWPs. When you think of a thread, this is the concept most people would identify. We’re interested in what’s happening below user threads.

User-thread implementations begin to vary when you look at the mapping of user threads to LWPs. Depending on an OS thread library architecture , there may be a one-to-one mapping between these concepts or there may be an M-N relationship. In the first case, there is a permanent one-to-one mapping between user threads and LWPs; in the latter, the mappings change over time. In the first case, mapping CPU utilization to a particular user level thread is easy. In the latter case, this could prove difficult–you wouldn’t know which user thread is currently mapped to an LWP. And, it could change at some regular time slice. In older systems, a M:1 model was often used where M user threads would be mapped to a single LWP. The OS vendors demonstrate significant philosophical differences at this juncture–more to come on this topic. This function of the threading library is sometimes called a virtual processor because the thread library is scheduling user threads across available time slices of an LWP or kernel thread.

A kernel-level thread(or kernel thread) exists within the kernel. Unlike a user thread, program code has no way of directly interacting with kernel threads. There is usually a one-to-one mapping between kernel threads and LWPs. On some systems, the two concepts may simply be the same. These threads are the kernel-level entity that a scheduler dispatches.

A multithreaded program contains multiple user threads; if the program does not spawn additional threads, there will only be one thread. In this single threaded case, the concept of LWPs and kernel threads (even though present) is virtually hidden.

We are supremely interested in multithreaded processes; the JVM is multithreaded before a Java program even has an opportunity to spawn extra Java threads. In modern JVMs, there is a one-to-one correspondence between Java Threads and native threads. A native thread (often called a user thread) is a thread that is created by the operating system’s thread library as previously described.

The exact definition and implementation of these concepts vary significantly between operating systems. The remainder of this article touches on the differences between the popular OSes where Java-based Middleware deployed.

Solaris

In Solaris 2.9 and above (and OpenSolaris), Sun has implemented a 1:1 threading model. In Solaris 2.8, the default thread library used an M:N model. In LWPs and Kernel Threads are separate concepts, but a 1-1 mapping between the two exists.

This Sun whitepaper provides a good introduction to Solaris threading.

Linux

Starting in the Linux 2.6 kernel, a new threading library was introduced called NPTL–the Native Posix Thread Library. Prior to this the threading implementation had numerous issues that hampered scalability and efficiency–this article summarizes the history of Linux threads nicely.

In Linux, the dispatchable unit has always been processes. Even NPTL threads are still implemented using the same data structures that implement a process. NPTL implements a 1:1 threading model; NPTL also introduced a new Scheduler, which implemented an O(1) time complexity–it took the same amount of time to chose the next thread to schedule on the CPU regardless of whether or not there are 10 or 1000 threads. The Linux kernel distinguishes between a regular process(seen in top) and a process which represents an LWP mapping to a thread in another processes address space.

This IBM performance tuning guide gives more information about Linux threading.

AIX

On AIX 5.3, the dispatchable unit is a kernel thread.

By default, the AIX user thread library implements an M:N threading model; however, this can be changed by setting an environment variable prior to creating a multithreaded process. This article shows how to configure a 1-1 threading model for a process. By default, the IBM JVM, uses a 1-1 threading model.

The LWP construct is absent, but AIX kernel threads track similar data structures that an LWP might possess in another OS.

More information about AIX threads can be found here and there.

Windows

Windows is not a Posix-compliant operating system; in some cases, it is partially compliant. Projects like Cygwin provide a degree of Posix-compliance. That makes it unique in our conversation of threading libraries.

The dispatchable unit in Windows is the thread. This presentation provides a good overview of Windows thread scheduling.

Windows has similar process and user thread concepts. But, the Windows thread API is very different from that of Posix thread API.

Further discussion on the Windows thread model will occur in a future article.

Sun JVM

The relationship between Java threads and native threads is not dictated by the JVM specification. The Virtual Machine implementer is free to implement

The Sun JVM maintains a 1-1 relationship between Java threads and native threads. Furthermore, the mapping of any one Java thread and its native thread remains static for the lifetime of the Java Thread.

The Sun documentation provides a good article complete with history of the threading implementation in the Hotspot JVM.

When N Java threads are mapped to a single User Thread, this is called Green Threads*–the Sun Java 1.1. threading model. Consequently, N Java threads are mapped to a single LWP or kernel thread as well. This is a more portable threading implementation, but it severely impacts the scalability of the application. This also has forces the JVM to be the Java thread scheduler instead of relying on the underlying OS Scheduler as occurs in modern JVM implementations.

*Note, some of the information presented regarding Green Threads on the Linux platform predates NPTL.

IBM JVM

The IBM JVM also maintains a 1-1 mapping between Java threads and native threads. Given the AIX details discussed previously, the IBM JVM running on AIX 5.x, likewise, provides a 1-1 mapping between Java threads and kernel threads that remains static for each Java thread through the life of the java process.

The history and implementation of the IBM JVM is similar to that of the Sun JVM in regards to threading.

The Journey From Java Thread to Kernel Thread

This entire discussion has brought from the Java threads that are fed by Java Bytecode instructions to the kernel threads (or equivalent mechanism) that is dispatched by the OS scheduler. On common platforms and JVM implementations, we can generally say that there is a 1-1 mapping between the following concepts:

* Java thread
* Native Thread (or User Thread)
* LWPs (Light Weight Processes)
* Kernel Threads

As we’ve seen, in some cases these concepts are not relevant to the OS under discussion. But, the over all relationship still holds for modern JVMs.

If there is a 1-1 relationship between the dispatchable unit and user threads (i.e., a 1:1 thread architecture), then it is possible to have threads in a single process to be spread across multiple processor run queues. This means that multiple threads in the same process can be simultaneously executing instructions on different CPUs (true concurrency). If Java threads have a 1-1 mapping to native threads. This means Java threads can also achieve true concurrency.

Java Thread Priority & Scheduling

This is hardly an exhaustive discussion of threads. But, one additional concept deserves explicit mention. A corner stone of OS Schedulers is thread priority; this concept is a central decision point for which thread to runs next on a CPU. Native threads have a thread priority; Java threads have a thread priority. The JVM spec does not define the relationship between the two priorities. Java thread priorities are based on a scale of one to ten; five is the default priority–one is the lowest, ten is the highest.

In modern JVM implementations, the Java Thread priority and scheduling is handled by the underlying operating system. This works efficiently because of the 1-1 mapping between Java threads and the dispatchable unit.

Preemptable

This is a characteristic of thread implementations that means any thread at almost any arbitrary point in a sequence of instructions can be interrupted (i.e., taken off the CPU). This characteristic is generally present in Java threads, user threads, and kernel thread implementation on modern systems.

My Conclusions

From the stand point of building a highly-scalable, server-side middleware environment, having Java threads scheduled across multiple run queues (CPUs) is a desirable trait as just described. Likewise, troubleshooting issues (such as CPU utilization) becomes much easier if a Java thread maps to a unique LWP or kernel thread.

For these two reasons, the current threading models used by JVMs and Operating Systems represent a significant advancement over earlier models.

References

[1] http://java.sun.com/docs/hotspot/threads/threads.html
[2] http://java.sun.com/docs/books/jvms/
[3] http://www.sun.com/software/whitepapers/solaris9/multithread.pdf
[4] http://www.ddj.com/linux-open-source/184406204
[5] http://en.wikipedia.org/wiki/O(1)_scheduler
[6] http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/thread_tuning.htm
[7] http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IBMp690/IBM/usr/share/man/info/en_US/a_doc_lib/aixprggd/genprogc/understanding_threads.htm
[8] http://en.wikipedia.org/wiki/POSIX#POSIX_for_Windows
[9] http://www.opengroup.org/onlinepubs/009695399/
[10] http://www.unix.org/apis.html
[11] http://www.opengroup.org/onlinepubs/007908799/xsh/threads.html
[12] http://www.i.u-tokyo.ac.jp/edu/training/ss/lecture/new-documents/Lectures/03-ThreadScheduling/ThreadScheduling.ppt