oom: task->mm == NULL doesn't mean the memory was freed
Oleg Nesterov [Sat, 30 Jul 2011 14:35:02 +0000 (16:35 +0200)]
exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
frees the memory.

However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.

Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/oom_kill.c

index eafff89..626303b 100644 (file)
@@ -303,7 +303,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
        do_each_thread(g, p) {
                unsigned int points;
 
-               if (!p->mm)
+               if (p->exit_state)
                        continue;
                if (oom_unkillable_task(p, mem, nodemask))
                        continue;
@@ -319,6 +319,8 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
                 */
                if (test_tsk_thread_flag(p, TIF_MEMDIE))
                        return ERR_PTR(-1UL);
+               if (!p->mm)
+                       continue;
 
                if (p->flags & PF_EXITING) {
                        /*