kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode
Frederic Weisbecker [Tue, 14 Apr 2009 03:34:25 +0000 (05:34 +0200)]
Impact: fix a deadlock

reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
of reiserfs. It can be called from different contexts where the write
lock can be already held.

But this function also grab the write lock (possibly recursively).
Subsequent release of the lock before sleep will actually not release
the lock if the caller of mark_inode_dirty() (which in turn calls
reiserfs_dirty_inode()) already owns the lock.

A typical case:

reiserfs_write_end() {
acquire_write_lock()
mark_inode_dirty() {
reiserfs_dirty_inode() {
reacquire_write_lock() {
journal_begin() {
do_journal_begin_r() {
/*
 * fail to release, still
 * one depth of lock
 */
release_write_lock()
reiserfs_wait_on_write_block() {
wait_event()

The event is usually provided by something which needs the write lock but
it hasn't been released.

We use reiserfs_write_lock_once() here to ensure we only grab the
write lock in one level.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

fs/reiserfs/super.c

index e1cfb80..58727b5 100644 (file)
@@ -554,25 +554,28 @@ static void reiserfs_dirty_inode(struct inode *inode)
        struct reiserfs_transaction_handle th;
 
        int err = 0;
+       int lock_depth;
+
        if (inode->i_sb->s_flags & MS_RDONLY) {
                reiserfs_warning(inode->i_sb, "clm-6006",
                                 "writing inode %lu on readonly FS",
                                 inode->i_ino);
                return;
        }
-       reiserfs_write_lock(inode->i_sb);
+       lock_depth = reiserfs_write_lock_once(inode->i_sb);
 
        /* this is really only used for atime updates, so they don't have
         ** to be included in O_SYNC or fsync
         */
        err = journal_begin(&th, inode->i_sb, 1);
-       if (err) {
-               reiserfs_write_unlock(inode->i_sb);
-               return;
-       }
+       if (err)
+               goto out;
+
        reiserfs_update_sd(&th, inode);
        journal_end(&th, inode->i_sb, 1);
-       reiserfs_write_unlock(inode->i_sb);
+
+out:
+       reiserfs_write_unlock_once(inode->i_sb, lock_depth);
 }
 
 #ifdef CONFIG_QUOTA