Linux memory hotplug (hotremoval, exactly speaking)

Warning: This page is outdated and no longer maintained.


Scope of this document

This document is about adding memory hotplug functionality in linux. Hotplug, in general, means addition and removal of devices while target systems are running. In case of memory, removal is challenging because it requires that all the memory of the removing module must be freed. I'll exclusively explain how memory can be freed. Other items, such as managing memory configuration data (pgdat, zone_table, e.t.c.), are also integral but are outside of the scope of this document.

Overview of the patch

The purpose of this patch is to demonstrate and test data purging from a memory zone without special hardware requirements. It splits highmem into chunks (whose sizes are set by a kernel config option) at boot time. The following operations are possible for each zone.

/proc/memhotplug interface

Note that older revisions have different interface. There are a few more commands for debugging.

Page remapping operation

Remap operation does the following to every pages of a zone.

  1. allocate newpage
  2. lock newpage
  3. modify oldpage entry in the corresponding radix tree with newpage
  4. clear all PTEs that refer to oldpage
  5. memcpy(newpage, oldpage, PAGE_SIZE)
  6. set uptodate flag of newpage
  7. unlock newpage and wakeup waiters
The key point is to block accesses to the page under operation by modifying the radix tree. After the radix tree has been modified, no new access goes to page. And accesses to newpage are blocked until the data is ready because it is locked and !uptodate.

In some cases, a remap operation needs to be rolled back and to be retried later. This is a bit tricky because it is likely that some processes have already looked up the radix tree and waiting for its lock. Such processes need to discard newpage and look up the radix tree again, as newpage is now invalid. To achieve this, I defined a new page flag (PG_again).

  1. Roll back the radix tree change.
  2. Set the PG_again bit of newpage and unlock it.
  3. Woken up processes see the PG_again bit and looks up the radix tree again.
  4. Wait until the page count of newpage falls to 1 (for the remapd process).
  5. Roll back is complete. newpage can be freed.


Patches are against linux-2.6.7.

The main patches.

Older versions.

Continuously remap pages between zones.
Unlinks files while extracting from a tarball. Good VM subsystem exercise.

Link to Takahashi's HugeTLBfs page handling patch.

Link to swapout based hotremoval investigation report.

Known issues

In rare cases, possibly a combination of kswapd, remapd, and dirty page writeback, bad page states happen and crash the kernel. It is under investigation.

PAE is not supported.

IWAMOTO Toshihiro <iwamoto at valinux...>

$Id: mh.html,v 1.14 2004/07/13 02:14:13 toshii Exp $