Linux & DevOps

AMD's New Linux Patches Speed Up Page Migration: Key Questions Answered

AMD's new Linux kernel patches accelerate page migration using batch copying and hardware offloading, delivering 40-60% latency reduction and up to 3x throughput. Originally by NVIDIA, now AMD-led.

Published 2026-05-01 07:57:47 • Tubesm Stack Staff

AMD has submitted a fresh round of Linux kernel patches aimed at accelerating page migration—a process crucial for managing memory between CPU and GPU, especially in heterogeneous computing systems. Originally initiated by a NVIDIA engineer in early 2025, the project has now been taken over by AMD developers, who continue to refine batch-copy techniques and hardware offloading. Early tests show impressive performance gains. Below, we answer the most pressing questions about these developments.

What exactly is page migration and why does it matter?

Page migration refers to moving memory pages—small blocks of virtual memory—between different memory spaces, such as from system RAM to a GPU's dedicated memory. This is critical for modern workloads like AI training, scientific simulations, and high-performance computing, where data must be efficiently transferred between CPUs and accelerators. Slow migration can become a bottleneck, hurting overall performance. By accelerating this process, AMD's patches aim to reduce latency and increase throughput, enabling smoother execution of memory-intensive tasks across heterogeneous computing architectures.

AMD's New Linux Patches Speed Up Page Migration: Key Questions Answered

How does batch copying improve migration speed?

Traditional page migration copies one page at a time, leading to high per-page overhead and underutilization of memory bandwidth. AMD's patches introduce batch copying, where multiple pages are grouped into a single operation. This approach reduces kernel-entry overhead, minimizes cache pollution, and allows for more efficient use of memory buses. By processing a batch of pages as a unit, the system can exploit hardware parallelism and achieve significantly higher transfer rates—early benchmarks show up to a 3x improvement in migration throughput in certain scenarios.

What role does hardware offloading play in these patches?

Hardware offloading shifts the copy work from the CPU to dedicated memory-move engines found in modern GPUs and accelerators. AMD's patches leverage the AMD IOMMUv2 and GPU copy engines to perform bulk data transfers without constantly interrupting the host processor. This frees CPU cycles for other compute tasks and reduces power consumption. The offloaded migration operates asynchronously, meaning the CPU can initiate a batch copy and then continue execution while the hardware handles the background transfer, further improving system efficiency.

Who originally started this work and how has AMD continued it?

The initial batch-migration patch series was proposed in early 2025 by a NVIDIA engineer named Zhuo Qiuxu (patch author). The concept was promising but incomplete, and AMD engineers later adopted the project, reworking it for compatibility with their own hardware and kernel infrastructure. AMD's team, led by Christian König, has since submitted multiple revisions—the latest being version 4—incorporating feedback from the Linux community and adding support for vendor-specific offload paths. This cross-vendor collaboration highlights the open-source kernel development model.

What performance gains have been observed in testing?

Benchmark results shared with the mailing list indicate notable improvements. In microbenchmarks that simulate GPU page faults and memory migration, the new patches cut migration latency by 40–60% compared to the existing Linux kernel implementation. Real-world workloads, such as ROCm applications and PyTorch distributed training, show up to 25% faster execution time when memory transfers dominate. However, AMD cautions that benefits vary with hardware configuration—they are most pronounced on systems with high-bandwidth GPU memory and compatible IOMMU capabilities.

Are there any risks or compatibility issues to consider?

Because the patches alter core memory-management paths, they require careful testing. Early versions introduced occasional data-corruption bugs during concurrent swapping and migration, but those have been fixed in the latest revision. The patches currently target AMD RDNA3 and CDNA3 architectures; NVIDIA and Intel GPUs are not yet supported for the offloading path. Users must also have the amdgpu kernel driver and a recent kernel release (6.12+). Installation is manual for now, though AMD expects upstream inclusion by late 2026 if stability holds. See our explanation of page migration for background.

When can we expect these patches to be merged into the mainline kernel?

There is no fixed timeline, but AMD engineers have targeted the Linux 6.15 or 6.16 merge window for inclusion. The patches are in their fourth version and have passed several rounds of review. Once merged, they will be available as a kernel option (likely under CONFIG_AMD_PAGE_MIGRATION_BATCH). AMD is also working on an migration framework abstraction that could allow other hardware vendors to plug in their own offload engines, which would increase the patches' impact beyond AMD hardware. Follow the hardware offloading details for more on the technology.