slub: drop mutex before deleting sysfs entry
Glauber Costa [Tue, 18 Dec 2012 22:23:13 +0000 (14:23 -0800)]
Sasha Levin recently reported a lockdep problem resulting from the new
attribute propagation introduced by kmemcg series.  In short, slab_mutex
will be called from within the sysfs attribute store function.  This will
create a dependency, that will later be held backwards when a cache is
destroyed - since destruction occurs with the slab_mutex held, and then
calls in to the sysfs directory removal function.

In this patch, I propose to adopt a strategy close to what
__kmem_cache_create does before calling sysfs_slab_add, and release the
lock before the call to sysfs_slab_remove.  This is pretty much the last
operation in the kmem_cache_shutdown() path, so we could do better by
splitting this and moving this call alone to later on.  This will fit
nicely when sysfs handling is consistent between all caches, but will look
weird now.

Lockdep info:

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Tainted: G        W
  -------------------------------------------------------
  trinity-child13/6961 is trying to acquire lock:
   (s_active#43){++++.+}, at:  sysfs_addrm_finish+0x31/0x60

  but task is already holding lock:
   (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:
  -> #1 (slab_mutex){+.+.+.}:
          lock_acquire+0x1aa/0x240
          __mutex_lock_common+0x59/0x5a0
          mutex_lock_nested+0x3f/0x50
          slab_attr_store+0xde/0x110
          sysfs_write_file+0xfa/0x150
          vfs_write+0xb0/0x180
          sys_pwrite64+0x60/0xb0
          tracesys+0xe1/0xe6
  -> #0 (s_active#43){++++.+}:
          __lock_acquire+0x14df/0x1ca0
          lock_acquire+0x1aa/0x240
          sysfs_deactivate+0x122/0x1a0
          sysfs_addrm_finish+0x31/0x60
          sysfs_remove_dir+0x89/0xd0
          kobject_del+0x16/0x40
          __kmem_cache_shutdown+0x40/0x60
          kmem_cache_destroy+0x40/0xe0
          mon_text_release+0x78/0xe0
          __fput+0x122/0x2d0
          ____fput+0x9/0x10
          task_work_run+0xbe/0x100
          do_exit+0x432/0xbd0
          do_group_exit+0x84/0xd0
          get_signal_to_deliver+0x81d/0x930
          do_signal+0x3a/0x950
          do_notify_resume+0x3e/0x90
          int_signal+0x12/0x17

  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(slab_mutex);
                                 lock(s_active#43);
                                 lock(slab_mutex);
    lock(s_active#43);

   *** DEADLOCK ***

  2 locks held by trinity-child13/6961:
   #0:  (mon_lock){+.+.+.}, at:  mon_text_release+0x25/0xe0
   #1:  (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0

  stack backtrace:
  Pid: 6961, comm: trinity-child13 Tainted: G        W    3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117
  Call Trace:
    print_circular_bug+0x1fb/0x20c
    __lock_acquire+0x14df/0x1ca0
    lock_acquire+0x1aa/0x240
    sysfs_deactivate+0x122/0x1a0
    sysfs_addrm_finish+0x31/0x60
    sysfs_remove_dir+0x89/0xd0
    kobject_del+0x16/0x40
    __kmem_cache_shutdown+0x40/0x60
    kmem_cache_destroy+0x40/0xe0
    mon_text_release+0x78/0xe0
    __fput+0x122/0x2d0
    ____fput+0x9/0x10
    task_work_run+0xbe/0x100
    do_exit+0x432/0xbd0
    do_group_exit+0x84/0xd0
    get_signal_to_deliver+0x81d/0x930
    do_signal+0x3a/0x950
    do_notify_resume+0x3e/0x90
    int_signal+0x12/0x17

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/slub.c

index efe2cff..ba2ca53 100644 (file)
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3153,8 +3153,19 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 {
        int rc = kmem_cache_close(s);
 
-       if (!rc)
+       if (!rc) {
+               /*
+                * We do the same lock strategy around sysfs_slab_add, see
+                * __kmem_cache_create. Because this is pretty much the last
+                * operation we do and the lock will be released shortly after
+                * that in slab_common.c, we could just move sysfs_slab_remove
+                * to a later point in common code. We should do that when we
+                * have a common sysfs framework for all allocators.
+                */
+               mutex_unlock(&slab_mutex);
                sysfs_slab_remove(s);
+               mutex_lock(&slab_mutex);
+       }
 
        return rc;
 }