Concurrency And Server-Side Networking APIs — Part 2

Introduction

In the second part of this series we continue to explore how Unix-like operating systems implement the networking API and threading API. In particular, we are supremely interested in how the kernel allows server programs accept incoming connections. Part two explores implementing concurrency in native code–C–servers. It is important to once again note that this series is exploring what is possible and how things work, not best practices.

Calling accept() With Multiple Processes Listening On The Same Endpoint (IP:Port):

If two instances of the server program given in the last article are started with the same port, the second instance will receive an error from the bind() system call:

[a_user@computer server]$ ./server -p 5001 &
[1] 20253
[a_user@computer server]$ ./server -p 5001 &
bind() failed: errno=Address already in use
: Address already in use
[2] 20255

If the server program is modified to fork N children after it has called bind() & listen(), then it is possible to share the file descriptor (and socket) between the instances of the server program. At this point, the operating system will load balance incoming requests between the server instances. The multi-process server program, mpserver.c, demonstrates this property. The output below demonstrates this program being started, spawning N server processes, and shows repeated client calls to port 5001 iterating through the mpserver instances. Notice, the server identifies itself by returning “server_pid=xxxxx” to the client.

[a_user@computer mpserver]$./mpserver -p 5001 -n 3 &

[a_user@computer mpserver]$ ps
PID TTY TIME CMD
20708 pts/1 00:00:00 mpserver
20709 pts/1 00:00:00 mpserver
20710 pts/1 00:00:00 mpserver
20791 pts/1 00:00:00 ps
21905 pts/1 00:00:00 bash

[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20710
Connection closed by foreign host.
[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20709
Connection closed by foreign host.
[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20708
Connection closed by foreign host.
[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20710
Connection closed by foreign host.
[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20709
Connection closed by foreign host.
[a_user@computer mpserver]$ telnet localhost 5001
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
connected to server: sd=4,server_pid=20708
Connection closed by foreign host.

To demonstrate that multiple processes are calling accept at the same time on the same endpoint, strace output was captured for the processes shown above using the following command:

strace -o /tmp/strace1.out -f -p 20708 -p 20709 -p 20710

The output for this command was:

20708 accept(3,
20709 accept(3,
20710 accept(3,

The seperate processes can be seen blocking in calls to accept() waiting for an incoming connection. Looking through the mpserver.c source code, each process will have the same file descriptor value passed into accept() (as seen by strace). So, there really are three distinct processes listening on the same endpoint at the same time.

In the absence of the multithreaded programing paradigm, this is a relatively easy way to increase your server-side throughput, assuming sufficient CPU resources.

It is important to note that there is no syncronization mechanism being used between all the processes that are calling accept on the endpoint. In this simple example, there is nothing that strictly warrents this being done. However, in any non-trivial application, syncronization at some level would probably be needed.

accept() with multiple threads: C

The multithreaded paradigm provides another option for achieving concurrency in network servers. It is possible to have a pool of threads in a single process simultaneously calling accept() on the same endpoint. This is achieved by creating the socket, binding to an endpoint, and listening on that end point prior to creating threads that call accept(). Then, pass the socket descriptor into each thread. Each individual LWP can call the accept() system call at the same time. The operating system will hand requests to the thread pool in a similar fashion to multiple processes calling accept() on the same endpoint. This is known to work this way on Linux 2.6.x and Solaris 2.8 and above. Most Unix-like operating systems will behave in a similar fashion.

The first question to ask is whether the networking functions we’ve been working with are thread safe? This whitepaper from the Opengroup answers this question. The networking functions used in this section are thread safe.

It should be noted that this in not the most scalable way of achieving concurrency in a server-side program. I am writing these examples to domonstrate a very specific point.

An example of a multithreaded server program can be found here. This program can be compiled wtih the following command:

gcc -o tserver tserver.c -lpthread

The server can be run with the following:

./tserver -p 5001 -n 10

The server program will begin listening for incoming requests on port 5001 and spawn ten handler threads to handle incoming requests. It is left as an experiment to the reader to demonstrate the load balancing of requests.

To demonstrate that there are ten threads concurrently making accept() system calls, strace was called on the above program. On Linux, the strace command will trace all LWPs in a multithreaded process:

strace -o /tmp/strace1.out -f -p pid

Assuming the tserver is at rest(i.e., no incoming requests), strace output looks similar to the following:

13511 accept(3,
13510 accept(3,
13509 accept(3,
13508 accept(3,
13507 accept(3,
13506 accept(3,
13505 accept(3,
13504 accept(3,
13503 accept(3,
13502 accept(3,
13501 futex(0xb7f5bbd8, FUTEX_WAIT, 13502, NULL

Notice, there are ten threads reported in unfinished accept() calls–these LWPs are in a sleep state waiting for an incoming connection. There is one thread waiting in a futex() system call–this is a locking mechanism used in the Linux 2.6 kernel. This last thread is the main() method waiting in a call to pthread_join() for the first thread that was created to end–lwpid=13502.

Reference

[1] http://www.opengroup.org/onlinepubs/009695399/

[2] http://www.unix.org/version2/whatsnew/networking.html