[PATCH] stop_machine() vs. synchronous IPI send deadlock
Kirill Korotaev [Mon, 14 Nov 2005 00:07:30 +0000 (16:07 -0800)]
This fixes deadlock of stop_machine() vs.  synchronous IPI send.  The
problem is that stop_machine() disables interrupts before disabling
preemption on other CPUs.  So if another CPU is preempted and then calls
something like flush_tlb_all() it will deadlock with CPU doing
stop_machine() and which can't process IPI due to disabled IRQs.

I changed stop_machine() to do the same things exactly as it does on other
CPUs, i.e.  it should disable preemption first on _all_ CPUs including
itself and only after that disable IRQs.

Signed-off-by: Kirill Korotaev <dev@sw.ru>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Andrey Savochkin" <saw@sawoct.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

kernel/stop_machine.c

index 84a9d18..b3d4dc8 100644 (file)
@@ -119,13 +119,12 @@ static int stop_machine(void)
                return ret;
        }
 
-       /* Don't schedule us away at this point, please. */
-       local_irq_disable();
-
        /* Now they are all started, make them hold the CPUs, ready. */
+       preempt_disable();
        stopmachine_set_state(STOPMACHINE_PREPARE);
 
        /* Make them disable irqs. */
+       local_irq_disable();
        stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
 
        return 0;
@@ -135,6 +134,7 @@ static void restart_machine(void)
 {
        stopmachine_set_state(STOPMACHINE_EXIT);
        local_irq_enable();
+       preempt_enable_no_resched();
 }
 
 struct stop_machine_data