Virtual Memory — Linux

Introduction

It has been a couple of weeks since the last article was published. Recently, I had to dig into Virtual Memory on AIX to work through some issues. I realized there were several details I needed to explore further to understand Virtual Memory implementations. In particular, the implementation differences between Virtual Memory Manager implementations on Solaris, AIX, Linux, and Windows. But, first, the Linux implementation because of easy access.

This is an attempt at summarizing answers to common questions I found myself asking.

Part I of this series is relevent to Linux 2.6 kernel. For a brief history of the Linux Kernel, see here. Other OSes will be covered in future articles.

Virtual Memory Managers

A basic introduction to Virtual Memory can be found here.

The details of the Linux Virtual Memory Manager can be found in [3], [4], [5], & [6]. The details will not be rehashed here.

Linux Virtual Memory Q&A

1. How do you determine the Virtual Memory size of a process?
2. How do you identify the Virtual Memory range (and size) of a process’s native heap?
3. What structure does the OS use to store file system data in memory? Often called a page cache?
4. How can you identify the total amount of anonymous memory being used by all user space processes?
5. How do you map each lwpid’s stack to virtual memory range?
6. How do you generate a listing of shared libraries being used and their virtual memory addresses for each process?
7. How do you determine how much virtual memory is currently being used globally by the system?
8. How do you determine how much physical memory the kernel is using?
9. How do you determine how much physical memory is currently being used by a process?
10. What does Linux do with unused physical memory?
11. What is the Virtual Memory Page size or page sizes available on the system?
12. What consumes memory on a running system?
13. How do you determine the amount of swap space being used by a process?
14. How do you determine the amount of swap space being used globally by all processes?
15. How do you determine how much physical memory is being used globally?

1. How do you determine the virtual memory size of a process?
a. Does the OS actually reserve virtual memory?
b What is the reserved size of virtual memory? How to determine size?
c. If virtual memory isn’t reserved, how much virtual memory is currently used?

This can be a fuzzy concept. In most cases, an OS will not allocate a page of physical memory until it is actually needed. This allows the OS to make as efficient use of physical memory as possible. Just because a bit of C code calls malloc() and allocates 10MB of heap space, this doesn’t mean that 10MB of physical memory is consumed at that point. The virtual memory foot print of the process can reflect that ten megabytes, but in many cases, there will not be 10MB of physical memory until the program actually attempts to access or modify a memory range. To further obfuscate the answer, part of a processes virtual memory size includes shared libraries, shared memory segments, and similar constructs that can apply to multiple processes.

Linux doesn’t reserve physical memory or swap space to contain an entire processes virtual memory address space. It uses demand paging; it doesn’t allocate a page of virtual memory that is located in physical memory until the process attempts to use a memory address in the page’s range. Top reports a “VIRT” size, which can grow rather large for JVMs. This large size is composed mainly the JVM Heap, System Heap, thread (native and Java) stack pages, and shared libraries. At the initial creation of a java process with a minimum and maximum memory setting of 1024MB, there is neither physical memory or swap space allocated to contain most of virtual memory a Java process is using–this where some would call this number meaningless. After the process has been running long enough for most of the Java Heap to be used, the OS will have allocated many Virtual Memory pages to accomodate the Java Heap. At this point, the java process will have either physical memory frames or swap space allocated to accomodate the allocated Virtual Memory pages. If the full range of the Java Heap is accessed regularly, imagine a garbage collector running and scanning the full Java Heap address space routinely, then the Resident Set Size (RSS column in top) would be large. At the other extreme, if the java process starts a HelloWorld program with a small memory footprint, then then calls Thread.sleep() for an hour, the java process would have a large virtual memory footprint reported by top, but very little physical or swap space would be allocated even after the process has existed for an hour. It can be helpful to know the full Virtual Memory address space size, but it is important to understand that this number doesn’t have any bearing on how much physical memory or swap space a process is currently using.

2. How do you identify the virtual memory range (and size) of a process’s native heap?

A process’s native heap address range and size can be identified with the following:

[root@taz 8091]# ps -ef | grep slapd
root 8091 1 0 Nov16 ? 00:00:14 /usr/local/libexec/slapd

cat /proc/8091/smaps | less
.
.
.
084f4000-087e8000 rw-p 084f4000 00:00 0 [heap]
Size: 3024 kB
Rss: 2840 kB
Pss: 2840 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 2840 kB
Referenced: 2724 kB

So, the slapd process’s (Openldap) heap is located at 0x084f4000 and has a size of 3024kb.

3. What structure does the OS use to store file system data in memory? Often called a page cache?

Linux stores pages of memory containing file data in a Page Cache. This amount of memory allocated to the page cache is reported as “Cache” in top, vmstat, and similar tools. Linux attempts to efficiently use availabe physical memory; so, most available physical memory that isn’t being used by the kernel or processes will be given to the Page Cache or Buffer Cache (see below).

File system blocks are stored in a subset of the Page Cache called the Buffer Cache–this is the “buffers” statistic reported in vmstat and top.

4. How can you identify the total amount of anonymous memory being used by all user space processes?

[root@taz 8091]# cat /proc/meminfo | grep Anon
AnonPages: 119496 kB

5. How do you map each lwpid’s stack to Virtual Memory address range?

For an arbitrary, single-threaded process, the thread stack belonging to the only LWP the process has can be found with the following command:

pmap pid | grep stack

For a multithreaded process, the first LWP’s stack can be identified using the last command. For any other LWP in the process, do the following to identify the LWP’s stack address:

Identify the thread’s lwpid in /proc/pid/task.

[root@taz task]# ls -l
total 0
dr-xr-xr-x 5 root root 0 2008-12-24 15:51 8091
dr-xr-xr-x 5 root root 0 2008-12-24 17:27 8092
dr-xr-xr-x 5 root root 0 2008-12-24 17:27 8922

Notice, the lwpid 8091 is the same as the process pid. This is the thread that would run the main() function in a C program on Linux.

Start the gdb debugger and run the following commands.

[root@taz 8091]# gdb
GNU gdb Fedora (6.8-17.fc9)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “i386-redhat-linux-gnu”.
(gdb) attach 8091
Attaching to process 8091
Reading symbols from /usr/local/libexec/slapd…(no debugging symbols found)…done.
.
.
.
(gdb) info threads
3 Thread 0xb7d1db90 (LWP 8092) 0x00110416 in __kernel_vsyscall ()
2 Thread 0xb791cb90 (LWP 8922) 0x00110416 in __kernel_vsyscall ()
* 1 Thread 0xb7f1e950 (LWP 8091) 0x00110416 in __kernel_vsyscall ()
(gdb) thread 2
[Switching to thread 2 (Thread 0xb791cb90 (LWP 8922))]#0 0x00110416 in __kernel_vsyscall ()
(gdb) where
#0 0x00110416 in __kernel_vsyscall ()
#1 0x00cccba5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x081353ec in geteuid ()
#3 0x0851810c in ?? ()
#4 0x085180f4 in ?? ()
#5 0x00000000 in ?? ()
(gdb) info frame
Stack level 0, frame at 0xb791c170:
eip = 0x110416 in __kernel_vsyscall; saved eip 0xcccba5
called by frame at 0xb791c190
Arglist at 0xb791c168, args:
Locals at 0xb791c168, Previous frame’s sp is 0xb791c170
Saved registers:
eip at 0xb791c16c
(gdb) quit

Switch to whatever thread matches the LWPID of interest. Notice the debugger gives the current stack frame’s location in virtual memory. In this case, 0xb791c170. Now, look in pmap output, for this address range:

pmap 8091
.
.
.
b751c000 4K —– [ anon ]
.
.
.
total 32072K

So, the anonymous memory segment that makes up this thread’s stack is 4K and located at 0xb751c000. Since the page size on this system in 4K, we can assume there is only one page is currently allocated for this stack.

6. How do you generate a listing of shared libraries being used and their virtual memory addresses for each process?

The pmap command can be used to generate a list of shared libraries loaded into a processes address space:

[root@host task]# pmap 8091 | grep so | grep “r-x–” | awk ‘{print $4}’
/usr/lib/sasl2/liblogin.so.2.0.22
/usr/lib/sasl2/libanonymous.so.2.0.22
/usr/lib/sasl2/libplain.so.2.0.22
/usr/lib/sasl2/libdigestmd5.so.2.0.22
/usr/lib/sasl2/libsasldb.so.2.0.22
/usr/lib/sasl2/libcrammd5.so.2.0.22
/lib/libresolv-2.8.so
/lib/libcom_err.so.2.1
/usr/lib/libkrb5support.so.0.1
/usr/lib/libgssapi_krb5.so.2.2
/usr/lib/libk5crypto.so.3.1
/lib/libkeyutils-1.2.so
/lib/ld-2.8.so
/lib/libc-2.8.so
/lib/libdl-2.8.so
/lib/libpthread-2.8.so
/lib/libselinux.so.1
/lib/libz.so.1.2.3
/lib/libcrypto.so.0.9.8g
/usr/lib/libkrb5.so.3.3
/lib/libssl.so.0.9.8g
/lib/libcrypt-2.8.so
/usr/lib/libsasl2.so.2.0.22
/lib/libdb-4.6.so

The pmap command can also be used to find the location of the shared library in the process virtual address space. Note, there is an address range that contains the shared library and an address range for data structures.

[root@host task]# pmap 8091 | grep so
00111000 16K r-x– /usr/lib/sasl2/liblogin.so.2.0.22
00115000 4K rw— /usr/lib/sasl2/liblogin.so.2.0.22
00116000 16K r-x– /usr/lib/sasl2/libanonymous.so.2.0.22
0011a000 4K rw— /usr/lib/sasl2/libanonymous.so.2.0.22
0011b000 16K r-x– /usr/lib/sasl2/libplain.so.2.0.22
0011f000 4K rw— /usr/lib/sasl2/libplain.so.2.0.22
00120000 48K r-x– /usr/lib/sasl2/libdigestmd5.so.2.0.22
0012c000 4K rw— /usr/lib/sasl2/libdigestmd5.so.2.0.22
0012d000 1176K r-x– /usr/lib/sasl2/libsasldb.so.2.0.22
00253000 8K rw— /usr/lib/sasl2/libsasldb.so.2.0.22
00255000 16K r-x– /usr/lib/sasl2/libcrammd5.so.2.0.22
00259000 4K rw— /usr/lib/sasl2/libcrammd5.so.2.0.22
00a4e000 68K r-x– /lib/libresolv-2.8.so
00a5f000 4K r—- /lib/libresolv-2.8.so
00a60000 4K rw— /lib/libresolv-2.8.so
00a65000 8K r-x– /lib/libcom_err.so.2.1
00a67000 4K rw— /lib/libcom_err.so.2.1
00a6a000 32K r-x– /usr/lib/libkrb5support.so.0.1
00a72000 4K rw— /usr/lib/libkrb5support.so.0.1
00a95000 180K r-x– /usr/lib/libgssapi_krb5.so.2.2
00ac2000 8K rw— /usr/lib/libgssapi_krb5.so.2.2
00ac6000 144K r-x– /usr/lib/libk5crypto.so.3.1
00aea000 4K rw— /usr/lib/libk5crypto.so.3.1
00af3000 8K r-x– /lib/libkeyutils-1.2.so
00af5000 4K rw— /lib/libkeyutils-1.2.so
00b06000 112K r-x– /lib/ld-2.8.so
00b22000 4K r—- /lib/ld-2.8.so
00b23000 4K rw— /lib/ld-2.8.so
00b26000 1420K r-x– /lib/libc-2.8.so
00c89000 8K r—- /lib/libc-2.8.so
00c8b000 4K rw— /lib/libc-2.8.so
00cbc000 12K r-x– /lib/libdl-2.8.so
00cbf000 4K r—- /lib/libdl-2.8.so
00cc0000 4K rw— /lib/libdl-2.8.so
00cc3000 84K r-x– /lib/libpthread-2.8.so
00cd8000 4K r—- /lib/libpthread-2.8.so
00cd9000 4K rw— /lib/libpthread-2.8.so
00cde000 104K r-x– /lib/libselinux.so.1
00cf8000 4K r—- /lib/libselinux.so.1
00cf9000 4K rw— /lib/libselinux.so.1
00cfc000 76K r-x– /lib/libz.so.1.2.3
00d0f000 4K rw— /lib/libz.so.1.2.3
05cfc000 1244K r-x– /lib/libcrypto.so.0.9.8g
05e33000 80K rw— /lib/libcrypto.so.0.9.8g
05e4c000 628K r-x– /usr/lib/libkrb5.so.3.3
05ee9000 12K rw— /usr/lib/libkrb5.so.3.3
05f24000 284K r-x– /lib/libssl.so.0.9.8g
05f6b000 16K rw— /lib/libssl.so.0.9.8g
06036000 36K r-x– /lib/libcrypt-2.8.so
0603f000 4K r—- /lib/libcrypt-2.8.so
06040000 4K rw— /lib/libcrypt-2.8.so
068e3000 96K r-x– /usr/lib/libsasl2.so.2.0.22
068fb000 4K rw— /usr/lib/libsasl2.so.2.0.22
06954000 1300K r-x– /lib/libdb-4.6.so
06a99000 12K rw— /lib/libdb-4.6.so

7. How do you determine how much virtual memory is currently being used globally by the system?

This is a nebulous question. The answer depends upon how you defined “all virtual memory in use”. Let’s define it as all active pages of memory(i.e., pages that have been accessed and thus have a page of Virtual Memory allocated). This is also refered to as committed memory.

A rough estimate of how much active Virtual Memory is being used by the system can be found with the following command:

cat /proc/meminfo | grep Committed_AS

On the test system,

[root@host bin]# cat /proc/meminfo | grep Committed_AS
Committed_AS: 860144 kB

8. How do you determine how much physical memory the kernel is using?

The approximate amount of memory being used by the kernel can be found using information given here.

Generally, memory that is being used by the Page Cache and Buffer Cache are not considered memory that is being used by the kernel. Though, they aren’t really being used by processes either.

Also, this link may also be of interest.

9. How do you determine how much physical memory is currently being used by a process?

Look at the Resident Set Size (RSS) column of top or “ps -elf” output.

10. What does Linux do with unused physical memory?

Linux attempts to efficiently use all physical memory available to the system. Physical Memory that is not being used by the kernel or processes is mostly allocated to the Page Cache and Buffer Cache. A small amount of memory is reserved (the Free List) to accomodate requests for new memory. This is a different philosophy than something like Solaris 2.10, which tends to maintain a very large free pool of memory. Be aware of how your OS manages memory resources.

11. What is the Virtual Memory Page size or page sizes available on the system?

The getconf command can provide this information.

[root@host bin]# getconf PAGESIZE
4096

12. What consumes memory on a running system?

  • Kernel
  • Kernel Caches (examples, Inode Cache/ICache, DEntr Cache/DCache
  • Page Cache
  • Buffer Cache
  • Anonymous Memory(process heap, thread stacks, text segment, data segment etc)
  • Shared Libraries
  • IPC Resources

13. How do you determine the amount of swap space being used by a process?

Unless additional kernel debugging is enabled, getting an accurate count in the general case would be difficult.

Note, the SWAP column available in top is the total Virtual Memory minus Resident Set Size. This does not take into account memory pages that the VMM has not yet allocated.

14. How do you determine the amount of swap space being used globally by all processes?

Swap used can be found with top, vmstat, /proc/meminfo or other tools that are described below.

15. How do you determine how much physical memory is being used globally?

The amount Physical Memory used can be found using top, vmstat, /proc/meminfo or other tools that are described below.

Monitoring Linux Memory Utilization

The most widly known tool for monitoring a Linux system is top. Below is the fragment of top output relevent to virtual memory.

top – 00:07:46 up 64 days, 2:49, 5 users, load average: 0.00, 0.00, 0.00
Tasks: 182 total, 1 running, 181 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1025980k total, 1011040k used, 14940k free, 153324k buffers
Swap: 2031608k total, 48684k used, 1982924k free, 465268k cached

The bottom two lines are relevent to our discussion. This system has a total of 1GB of physical memory. Currently, 1011040KB are currently in use; 14940KB are free. Furthermore, 153324KB are allocated to the Buffer Cache(buffers) and 465268kb are used by the Page Cache. This system has 2GB of swap space; only 48684KB are currently in use. There is a very small amount of swap space in use–this is generally a good thing.

The vmstat command can be used to show swapping activity (i.e., pages brought into physical memory from swap, si, or pages removed from physical memory to swap, so). The following output was produced by running “vmstat 1”. There is no swap activity whatsoever (so & si — swap columns). This is good.

[root@host proc]# vmstat 1
procs ———–memory———- —swap– —–io—- –system– —–cpu——
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 48680 13720 250892 327024 0 0 96 83 3 3 1 1 96 3 0
0 0 48680 13472 250892 327284 0 0 256 0 232 355 0 1 98 1 0
0 0 48680 13332 250900 327376 0 0 128 16 308 484 0 1 99 0 0
0 0 48680 13208 250900 327512 0 0 128 0 173 322 0 0 100 0 0
0 0 48680 12976 250900 327796 0 0 256 0 222 440 0 0 99 1 0
0 0 48680 12836 250900 327896 0 0 128 0 181 316 0 1 99 0 0
0 0 48680 14632 249768 327896 0 0 256 0 247 465 0 0 100 0 0
0 0 48680 14432 249768 328016 0 0 128 0 173 330 0 0 100 0 0
0 0 48680 14324 249768 328132 0 0 128 0 206 428 0 1 99 0 0
0 0 48680 14076 249776 328384 0 0 264 0 294 381 0 1 97 2 0
0 0 48680 14060 249776 328412 0 0 0 0 284 435 0 0 100 0 0
0 0 48680 13828 249776 328592 0 0 204 0 239 373 0 1 98 1 0
0 0 48680 13452 249776 329104 0 0 472 0 510 589 0 1 98 1 0
0 0 48680 13080 249776 329476 0 0 384 0 446 452 1 0 97 2 0
0 0 48680 14720 248624 329584 0 0 372 0 525 598 0 1 98 1 0

Another useful tool for monitoring Linux memory is the free command. Output from the same system as the previous data is below. The free command provides a more succienct version of the data presented by top. This command has the advantage of showing used and free Physical Memory when Buffer Cache and Page Cacbe memory is taken into account. Physical Memory allocated to these two caches can be reallocated, quickly, to user space processes. So, from a certain standpoint, measuring physical memory without the caches makes sense.

[root@host task]# free -tm
total used free shared buffers cached
Mem: 1001 991 9 0 451 81
-/+ buffers/cache: 458 543
Swap: 1983 47 1936
Total: 2985 1039 1946

This system has 1001MB of phsyical memory; 991MB are currently in use, if you count the 451MB allocated for the Buffer Cache and 81MB allocated for the Page Cache as used memory. If you don’t count these caches as used memory, there is 458MB in use. Likewise, there is either 9MB or 543MB depending how the Page Cache and Buffer Cache is counted. The familiar swap numbers are also reported (1983MB (Total) = 47MB (Used) + 1936MB(Free)).

The /proc filesystem provides basic statistics about a system’s memory usage. Tools like top and vmstat use this data to display their system-wide memory usage information.

[root@host task]# cat /proc/meminfo
MemTotal: 1025980 kB
MemFree: 12916 kB
Buffers: 467144 kB
Cached: 70124 kB
SwapCached: 13868 kB
Active: 336384 kB
Inactive: 382292 kB
HighTotal: 122048 kB
HighFree: 224 kB
LowTotal: 903932 kB
LowFree: 12692 kB
SwapTotal: 2031608 kB
SwapFree: 1982924 kB
Dirty: 310296 kB
Writeback: 0 kB
AnonPages: 180992 kB
Mapped: 40412 kB
Slab: 221368 kB
SReclaimable: 208812 kB
SUnreclaim: 12556 kB
PageTables: 5480 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 2544596 kB
Committed_AS: 937076 kB
VmallocTotal: 110584 kB
VmallocUsed: 5668 kB
VmallocChunk: 104600 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 4096 kB

For detailed information about what these numbers mean, see here. /proc/vmstat also contains detailed information about the Virtual Memory subsystem of a running Linux system.

[root@host proc]# cat vmstat
nr_free_pages 4120
nr_inactive 88112
nr_active 88917
nr_anon_pages 47846
nr_mapped 10103
nr_file_pages 132557
nr_dirty 13377
nr_writeback 0
nr_slab_reclaimable 53784
nr_slab_unreclaimable 3100
nr_page_table_pages 1368
nr_unstable 0
nr_bounce 0
nr_vmscan_write 764459
pgpgin 530590528
pgpgout 458650507
pswpin 56956
pswpout 110857
pgalloc_dma 7137019
pgalloc_normal 380129111
pgalloc_high 23896564
pgalloc_movable 0
pgfree 411167006
pgactivate 55259924
pgdeactivate 54620998
pgfault 329207726
pgmajfault 60938
pgrefill_dma 2549147
pgrefill_normal 111061847
pgrefill_high 7366276
pgrefill_movable 0
pgsteal_dma 3172910
pgsteal_normal 163507118
pgsteal_high 3891553
pgsteal_movable 0
pgscan_kswapd_dma 4303761
pgscan_kswapd_normal 165197526
pgscan_kswapd_high 4095892
pgscan_kswapd_movable 0
pgscan_direct_dma 723
pgscan_direct_normal 26336
pgscan_direct_high 512
pgscan_direct_movable 0
pginodesteal 0
slabs_scanned 2466235264
kswapd_steal 170547397
kswapd_inodesteal 6521849
pageoutrun 3036257
allocstall 333
pgrotated 248259

See here for more information about what these numbers means.

Tuning

A good article regarding Linux Virtual Memory tuning can be found here.

Reference

[1] http://en.wikipedia.org/wiki/Virtual_memory

[2] http://www.linuxweblog.com/meminfo

[3] http://www.halobates.de/memory.pdf

[4] http://www.linux-security.cn/ebooks/ulk3-html/

[5] http://www.redhat.com/magazine/001nov04/features/vm/

[6] http://www.informit.com/content/images/0131453483/downloads/gorman_book.pdf

[7] http://www.ibm.com/developerworks/linux/library/l-linux-kernel/

[8] http://www.dba-oracle.com/t_tuning_linux_kernel_2_6_oracle.htm