[PATCH] fix NUMA interleaving for huge pages
Nishanth Aravamudan [Fri, 1 Sep 2006 04:27:53 +0000 (21:27 -0700)]
Since vma->vm_pgoff is in units of smallpages, VMAs for huge pages have the
lower HPAGE_SHIFT - PAGE_SHIFT bits always cleared, which results in badd
offsets to the interleave functions.  Take this difference from small pages
into account when calculating the offset.  This does add a 0-bit shift into
the small-page path (via alloc_page_vma()), but I think that is negligible.
 Also add a BUG_ON to prevent the offset from growing due to a negative
right-shift, which probably shouldn't be allowed anyways.

Tested on an 8-memory node ppc64 NUMA box and got the interleaving I
expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

mm/mempolicy.c

index e07e27e..a9963ce 100644 (file)
@@ -1176,7 +1176,15 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
        if (vma) {
                unsigned long off;
 
-               off = vma->vm_pgoff;
+               /*
+                * for small pages, there is no difference between
+                * shift and PAGE_SHIFT, so the bit-shift is safe.
+                * for huge pages, since vm_pgoff is in units of small
+                * pages, we need to shift off the always 0 bits to get
+                * a useful offset.
+                */
+               BUG_ON(shift < PAGE_SHIFT);
+               off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
                off += (addr - vma->vm_start) >> shift;
                return offset_il_node(pol, vma, off);
        } else