]> nv-tegra.nvidia Code Review - linux-3.10.git/log
linux-3.10.git
8 years agof2fs: use meta_inode cache to improve roll-forward speed
Jaegeuk Kim [Thu, 11 Sep 2014 20:49:55 +0000 (13:49 -0700)]
f2fs: use meta_inode cache to improve roll-forward speed

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/recovery.c

Change-Id: I8953277ddf6b333a3fd1f112ebf4e6b26ab9a0b4

8 years agof2fs: fix double lock for inode page during roll-foward recovery
Jaegeuk Kim [Fri, 12 Sep 2014 15:35:58 +0000 (00:35 +0900)]
f2fs: fix double lock for inode page during roll-foward recovery

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/recovery.c

Change-Id: I2447de8eaee8a64d11584be41de3ea07e6057984

8 years agof2fs: fix a race condition in next_free_nid
Huang Ying [Fri, 12 Sep 2014 11:21:11 +0000 (19:21 +0800)]
f2fs: fix a race condition in next_free_nid

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use nm_i->next_scan_nid as default for next_free_nid
Huang Ying [Fri, 12 Sep 2014 12:19:48 +0000 (20:19 +0800)]
f2fs: use nm_i->next_scan_nid as default for next_free_nid

Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: give an option to enable in-place-updates during fsync to users
Jaegeuk Kim [Wed, 10 Sep 2014 23:53:02 +0000 (16:53 -0700)]
f2fs: give an option to enable in-place-updates during fsync to users

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: expand counting dirty pages in the inode page cache
Jaegeuk Kim [Fri, 12 Sep 2014 22:53:45 +0000 (15:53 -0700)]
f2fs: expand counting dirty pages in the inode page cache

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agomm: change invalidatepage prototype to accept length
Lukas Czerner [Wed, 22 May 2013 03:17:23 +0000 (23:17 -0400)]
mm: change invalidatepage prototype to accept length

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
8 years agof2fs: remove lengthy inode->i_ino
Jaegeuk Kim [Wed, 10 Sep 2014 21:58:18 +0000 (14:58 -0700)]
f2fs: remove lengthy inode->i_ino

This patch is to remove lengthy name by adding a new variable.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix negative value for lseek offset
Jaegeuk Kim [Mon, 8 Sep 2014 17:59:43 +0000 (10:59 -0700)]
f2fs: fix negative value for lseek offset

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.

Reported-by: Tommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid node page to be written twice in gc_node_segment
Huang Ying [Sun, 7 Sep 2014 03:05:20 +0000 (11:05 +0800)]
f2fs: avoid node page to be written twice in gc_node_segment

In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

sync_node_pages
  try_lock_page
  ...
check_valid_map   f2fs_write_node_page
    ...
    write_node_page
      do_write_page
        allocate_data_block
  ...
  refresh_sit_entry /* update cur_valid_map */
  ...
    ...
    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use lock-less list(llist) to simplify the flush cmd management
Gu Zheng [Fri, 5 Sep 2014 10:31:00 +0000 (18:31 +0800)]
f2fs: use lock-less list(llist) to simplify the flush cmd management

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: refactor flush_sit_entries codes for reducing SIT writes
Chao Yu [Thu, 4 Sep 2014 10:13:01 +0000 (18:13 +0800)]
f2fs: refactor flush_sit_entries codes for reducing SIT writes

In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
sit page num cp count sit pages/cp
based 2006.50 1349.75 1.486
patched 1566.25 1463.25 1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns) dirty sit count
36038 2151
49168 2123
37174 2232

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO
Chao Yu [Thu, 4 Sep 2014 10:11:47 +0000 (18:11 +0800)]
f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: need fsck.f2fs if the recovery was failed
Jaegeuk Kim [Wed, 3 Sep 2014 00:19:04 +0000 (17:19 -0700)]
f2fs: need fsck.f2fs if the recovery was failed

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: handle bug cases by letting fsck.f2fs initiate
Jaegeuk Kim [Tue, 2 Sep 2014 23:24:11 +0000 (16:24 -0700)]
f2fs: handle bug cases by letting fsck.f2fs initiate

This patch adds to handle corner buggy cases for fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add BUG cases to initiate fsck.f2fs
Jaegeuk Kim [Tue, 2 Sep 2014 23:05:00 +0000 (16:05 -0700)]
f2fs: add BUG cases to initiate fsck.f2fs

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/segment.c

Change-Id: I3bbaab972ca0782c415dd1e7049df55a8cf801e6

8 years agof2fs: need fsck.f2fs when f2fs_bug_on is triggered
Jaegeuk Kim [Tue, 2 Sep 2014 22:52:58 +0000 (15:52 -0700)]
f2fs: need fsck.f2fs when f2fs_bug_on is triggered

If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/checkpoint.c
fs/f2fs/f2fs.h
fs/f2fs/node.c
fs/f2fs/recovery.c
fs/f2fs/segment.c

Change-Id: Ib717892a05f479e86ec784760e9b2aa47cf4ea6b

8 years agof2fs: retain inconsistency information to initiate fsck.f2fs
Jaegeuk Kim [Tue, 2 Sep 2014 22:43:52 +0000 (15:43 -0700)]
f2fs: retain inconsistency information to initiate fsck.f2fs

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB
Jaegeuk Kim [Tue, 2 Sep 2014 22:31:18 +0000 (15:31 -0700)]
f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB

This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/namei.c
fs/f2fs/node.c

Change-Id: Ib721e70032fdd48c32fcb4c09795ce6a38eb3b14

8 years agof2fs: reposition unlock_new_inode to prevent accessing invalid inode
Chao Yu [Sat, 30 Aug 2014 01:52:34 +0000 (09:52 +0800)]
f2fs: reposition unlock_new_inode to prevent accessing invalid inode

As the race condition on the inode cache, following scenario can appear:
[Thread a] [Thread b]
->f2fs_mkdir
  ->f2fs_add_link
    ->__f2fs_add_link
      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
  ->unlock_new_inode
        ->move_data_page
  ->make_bad_inode
  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix wrong casting for dentry name
Jaegeuk Kim [Fri, 29 Aug 2014 07:26:50 +0000 (00:26 -0700)]
f2fs: fix wrong casting for dentry name

The dentry name type is unsigned char *.
If we don't match this type, some character codes can be changed by signed bit.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: simplify by using a literal
Dan Carpenter [Thu, 28 Aug 2014 13:13:21 +0000 (16:13 +0300)]
f2fs: simplify by using a literal

We can make the code a bit simpler because we know that "!retry" is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: truncate stale block for inline_data
Jaegeuk Kim [Mon, 25 Aug 2014 21:45:59 +0000 (14:45 -0700)]
f2fs: truncate stale block for inline_data

This verifies to truncate any allocated blocks, offset[0], by inline_data.
Not figured out, but for making sure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/node.c

Change-Id: Icd98dbf108de75fb7af7d8d00be7fc6df239792a

8 years agof2fs: use macro for code readability
Chao Yu [Fri, 22 Aug 2014 08:17:38 +0000 (16:17 +0800)]
f2fs: use macro for code readability

This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.

change log from v1:
 o fix typo pointed out by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce need_do_checkpoint for readability
Chao Yu [Wed, 20 Aug 2014 10:37:35 +0000 (18:37 +0800)]
f2fs: introduce need_do_checkpoint for readability

This patch introduce need_do_checkpoint() to include numerous judgment condition
for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix incorrect calculation with total/free inode num
Chao Yu [Wed, 20 Aug 2014 10:36:46 +0000 (18:36 +0800)]
f2fs: fix incorrect calculation with total/free inode num

Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.

This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: skip if inline_data was converted already
Jaegeuk Kim [Mon, 18 Aug 2014 21:41:11 +0000 (14:41 -0700)]
f2fs: skip if inline_data was converted already

This patch checks inline_data one more time under the inode page lock whether
its inline_data is converted or not.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove rewrite_node_page
Jaegeuk Kim [Fri, 15 Aug 2014 16:56:46 +0000 (09:56 -0700)]
f2fs: remove rewrite_node_page

I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid double lock in truncate_blocks
Jaegeuk Kim [Thu, 14 Aug 2014 23:32:54 +0000 (16:32 -0700)]
f2fs: avoid double lock in truncate_blocks

The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/data.c
fs/f2fs/file.c

Change-Id: Ia07c84bbbf04bd5a9fc38e1395245aa1a347ce0f

8 years agof2fs: prevent checkpoint during roll-forward
Jaegeuk Kim [Wed, 13 Aug 2014 23:30:46 +0000 (16:30 -0700)]
f2fs: prevent checkpoint during roll-forward

Any checkpoint should not be done during the core roll-forward procedure.
Especially, it includes error cases too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/recovery.c

Change-Id: Ic06005da17a41f18bac387b3984996bac41ef278

8 years agof2fs: add WARN_ON in f2fs_bug_on
Jaegeuk Kim [Wed, 13 Aug 2014 17:45:41 +0000 (10:45 -0700)]
f2fs: add WARN_ON in f2fs_bug_on

This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: handle EIO not to break fs consistency
Jaegeuk Kim [Tue, 12 Aug 2014 01:37:46 +0000 (18:37 -0700)]
f2fs: handle EIO not to break fs consistency

There are two rules when EIO is occurred.
1. don't write any checkpoint data to preserve the previous checkpoint
2. don't lose the cached dentry/node/meta pages

So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure.
Then, writing checkpoint/dentry/node blocks is not allowed.

Note that, for the data pages, we can't just throw away by redirtying them.
Otherwise, kworker can fall into infinite loop to flush them.
(Ref. xfstests/019)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/data.c

Change-Id: Ia54fd4792593d023603355cf255cbc8e12e53b93

8 years agof2fs: check s_dirty under cp_mutex
Jaegeuk Kim [Tue, 12 Aug 2014 01:37:46 +0000 (18:37 -0700)]
f2fs: check s_dirty under cp_mutex

It needs to check s_dirty under cp_mutex, since s_dirty is reset under that
mutex.
And previous condition was not correct, since we can omit doing checkpoint
when checkpoint was done followed by all the node pages were written back.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: unlock_page when node page is redirtied out
Jaegeuk Kim [Tue, 12 Aug 2014 01:18:36 +0000 (18:18 -0700)]
f2fs: unlock_page when node page is redirtied out

This patch fixes missing unlock_page when a node page is redirtied out.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_cp_error for readability
Jaegeuk Kim [Mon, 11 Aug 2014 23:49:25 +0000 (16:49 -0700)]
f2fs: introduce f2fs_cp_error for readability

This patch adds f2fs_cp_error for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: give a chance to mount again when encountering errors
Jaegeuk Kim [Fri, 8 Aug 2014 22:37:41 +0000 (15:37 -0700)]
f2fs: give a chance to mount again when encountering errors

This patch gives another chance to try mount process when we encounter an error.
This makes an effect on the roll-forward recovery failures as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/super.c

Change-Id: I85fa80945ce678484c4c74e1ad2eba9169029739

8 years agof2fs: trigger release_dirty_inode in f2fs_put_super
Jaegeuk Kim [Tue, 19 Aug 2014 16:48:22 +0000 (09:48 -0700)]
f2fs: trigger release_dirty_inode in f2fs_put_super

The generic_shutdown_super calls sync_filesystem, evict_inode, and then
f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information
so we should release them at f2fs_put_super.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: don't skip checkpoint if there is no dirty node pages
Jaegeuk Kim [Tue, 19 Aug 2014 16:13:01 +0000 (09:13 -0700)]
f2fs: don't skip checkpoint if there is no dirty node pages

This is the errorneous scenario.
1. write data
2. do checkpoint
3. produce some dirty node pages by the gc thread
4. write back dirty node pages
5. f2fs_put_super will skip the checkpoint, since dirty count for node pages is
  zero.

This patch removes such the wrong condition check.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid bug_on when error is occurred
Jaegeuk Kim [Fri, 8 Aug 2014 17:18:43 +0000 (10:18 -0700)]
f2fs: avoid bug_on when error is occurred

During the recovery, if an error like EIO or ENOMEM, f2fs_bug_on should skip.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/recovery.c

Change-Id: I7a6ac12eea3a626e737d0fa89c55dd818f1a0e4d

8 years agof2fs: fix to recover inline_xattr/data and blocks
Jaegeuk Kim [Fri, 8 Aug 2014 06:49:17 +0000 (23:49 -0700)]
f2fs: fix to recover inline_xattr/data and blocks

This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: should clear the inline_xattr flag
Jaegeuk Kim [Fri, 8 Aug 2014 06:45:42 +0000 (23:45 -0700)]
f2fs: should clear the inline_xattr flag

During the recovery, we should clear the inline_xattr flag if its xattr node
block is recovered.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: clear FI_INC_LINK during the recovery
Jaegeuk Kim [Fri, 8 Aug 2014 00:06:18 +0000 (17:06 -0700)]
f2fs: clear FI_INC_LINK during the recovery

If an inode are fsynced multiple times with fsync & dent marks, this inode will
set FI_INC_LINK at find_fsync_dnodes during the recovery.
But, in recover_inode, recover_dentry doesn't clear that flag when multiple hits
were occurred.

So this patch removes the flag for the further consistency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix the initial inode page for recovery
Jaegeuk Kim [Fri, 8 Aug 2014 00:04:24 +0000 (17:04 -0700)]
f2fs: fix the initial inode page for recovery

If a new inode page is needed for recover_dentry, we should assing i_inline
as zero.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/node.c

Change-Id: I989384a5930389bcf8c9243c151c7ad96eaceafe

8 years agof2fs: make clear on test condition and return types
Jaegeuk Kim [Thu, 7 Aug 2014 23:57:17 +0000 (16:57 -0700)]
f2fs: make clear on test condition and return types

This patch adds a parentheses to make clear for condition check.
And also it changes the return type for better meanings.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: should convert inline_data during the mkwrite
Jaegeuk Kim [Thu, 7 Aug 2014 23:32:25 +0000 (16:32 -0700)]
f2fs: should convert inline_data during the mkwrite

If mkwrite is called to an inode having inline_data, it can overwrite the data
index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use for_each_set_bit to simplify the code
Chao Yu [Mon, 4 Aug 2014 02:10:07 +0000 (10:10 +0800)]
f2fs: use for_each_set_bit to simplify the code

This patch uses for_each_set_bit to simplify some codes in f2fs.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add f2fs_balance_fs for expand_inode_data
Chao Yu [Mon, 4 Aug 2014 02:11:17 +0000 (10:11 +0800)]
f2fs: add f2fs_balance_fs for expand_inode_data

This patch adds f2fs_balance_fs in expand_inode_data to avoid allocation failure
with segment.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: invalidate xattr node page when evict inode
Chao Yu [Mon, 4 Aug 2014 01:54:58 +0000 (09:54 +0800)]
f2fs: invalidate xattr node page when evict inode

When inode is evicted, all the page cache belong to this inode should be
released including the xattr node page. But previously we didn't do this, this
patch fixed this issue.

v2:
 o reposition invalidate_mapping_pages() to the right place suggested by
Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid skipping recover_inline_xattr after recover_inline_data
Chao Yu [Sat, 2 Aug 2014 07:26:04 +0000 (15:26 +0800)]
f2fs: avoid skipping recover_inline_xattr after recover_inline_data

When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add tracepoint for f2fs_direct_IO
Chao Yu [Thu, 31 Jul 2014 13:11:22 +0000 (21:11 +0800)]
f2fs: add tracepoint for f2fs_direct_IO

This patch adds a tracepoint for f2fs_direct_IO.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/data.c

Change-Id: Icb2e26e51a4ee52ebfe8136731500b7b944cda27

8 years agof2fs: reduce competition among node page writes
Chao Yu [Thu, 3 Jul 2014 10:58:39 +0000 (18:58 +0800)]
f2fs: reduce competition among node page writes

We do not need to block on ->node_write among different node page writers e.g.
fsync/flush, unless we have a node page writer from write_checkpoint.
So it's better use rw_semaphore instead of mutex type for ->node_write to
promote performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix coding style
Jaegeuk Kim [Thu, 31 Jul 2014 00:25:54 +0000 (17:25 -0700)]
f2fs: fix coding style

This patch fixes wrong coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove redundant lines in allocate_data_block
Dongho Sim [Wed, 30 Jul 2014 06:52:41 +0000 (06:52 +0000)]
f2fs: remove redundant lines in allocate_data_block

There are redundant lines in allocate_data_block.

In this function, we call refresh_sit_entry with old seg and old curseg.
After that, we call locate_dirty_segment with old curseg.

But, the new address is always allocated from old curseg and
we call locate_dirty_segment with old curseg in refresh_sit_entry.
So, we do not need to call locate_dirty_segment with old curseg again.

We've discussed like below:

Jaegeuk said:
 "When considering SSR, we need to take care of the following scenario.
  - old segno : X
  - new address : Z
  - old curseg : Y
  This means, a new block is supposed to be written to Z from X.
  And Z is newly allocated in the same path from Y.

  In that case, we should trigger locate_dirty_segment for Y, since
  it was a current_segment and can be dirty owing to SSR.
  But that was not included in the dirty list."

Changman said:
 "We already choosed old curseg(Y) and then we allocate new address(Z) from old
  curseg(Y). After that we call refresh_sit_entry(old address, new address).
  In the funcation, we call locate_dirty_segment with old seg and old curseg.
  So calling locate_dirty_segment after refresh_sit_entry again is redundant."

Jaegeuk said:
 "Right. The new address is always allocated from old_curseg."

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Dongho Sim <dh.sim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add tracepoint for f2fs_issue_flush
Jaegeuk Kim [Sat, 26 Jul 2014 00:46:10 +0000 (17:46 -0700)]
f2fs: add tracepoint for f2fs_issue_flush

This patch adds a tracepoint for f2fs_issue_flush.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid retrying wrong recovery routine when error was occurred
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:25 +0000 (15:47 -0700)]
f2fs: avoid retrying wrong recovery routine when error was occurred

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/recovery.c

Change-Id: I914547ea612937738a5e7ea9c5e555bfa067540d

8 years agof2fs: test before set/clear bits
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:23 +0000 (15:47 -0700)]
f2fs: test before set/clear bits

If the bit is already set, we don't need to reset it, and vice versa.
Because we don't need to make the caches dirty for that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix wrong condition for unlikely
Jaegeuk Kim [Fri, 25 Jul 2014 14:41:43 +0000 (07:41 -0700)]
f2fs: fix wrong condition for unlikely

This patch fixes the wrongly used unlikely condition.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: enable in-place-update for fdatasync
Jaegeuk Kim [Fri, 25 Jul 2014 02:11:43 +0000 (19:11 -0700)]
f2fs: enable in-place-update for fdatasync

This patch enforces in-place-updates only when fdatasync is requested.
If we adopt this in-place-updates for the fdatasync, we can skip to write the
recovery information.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: skip unnecessary data writes during fsync
Jaegeuk Kim [Fri, 25 Jul 2014 02:08:02 +0000 (19:08 -0700)]
f2fs: skip unnecessary data writes during fsync

This patch intends to improve the fsync performance by skipping remaining the
recovery information, only when there is no data that we should recover.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add info of appended or updated data writes
Jaegeuk Kim [Fri, 25 Jul 2014 14:40:59 +0000 (07:40 -0700)]
f2fs: add info of appended or updated data writes

This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use radix_tree for ino management
Jaegeuk Kim [Fri, 25 Jul 2014 01:15:17 +0000 (18:15 -0700)]
f2fs: use radix_tree for ino management

For better ino management, this patch replaces the data structure from list
to radix tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add infra for ino management
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:17 +0000 (15:47 -0700)]
f2fs: add infra for ino management

This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/checkpoint.c

Change-Id: Iba6f73620269080d3fad9bdeeb3b9825b9651719

8 years agof2fs: punch the core function for inode management
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:16 +0000 (15:47 -0700)]
f2fs: punch the core function for inode management

This patch punches out the core functions to manage the inode numbers.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/checkpoint.c

Change-Id: I5d147dc8bfc9461b85f5426cd83ba4735f46e08b

8 years agof2fs: add nobarrier mount option
Jaegeuk Kim [Wed, 23 Jul 2014 16:57:31 +0000 (09:57 -0700)]
f2fs: add nobarrier mount option

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/f2fs.h
fs/f2fs/super.c

Change-Id: I52b65987a3353135cbc029221203fdfb65d26212

8 years agof2fs: fix to put root inode in error path of fill_super
Chao Yu [Fri, 25 Jul 2014 04:55:09 +0000 (12:55 +0800)]
f2fs: fix to put root inode in error path of fill_super

We should put root inode correctly in error path of fill_super, otherwise we
may encounter a leak case of inode resource.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid use invalid mapping of node_inode when evict meta inode
Chao Yu [Fri, 25 Jul 2014 04:00:57 +0000 (12:00 +0800)]
f2fs: avoid use invalid mapping of node_inode when evict meta inode

Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:

AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
  [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
  [<ffffffff812359af>] evict+0x15f/0x290
  [<     inlined    >] iput+0x196/0x280 iput_final
  [<ffffffff812369a6>] iput+0x196/0x280
  [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
  [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
  [<ffffffff812105fd>] kill_block_super+0x4d/0xb0
  [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
  [<ffffffff81211c98>] deactivate_super+0x68/0x80
  [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
  [<     inlined    >] SyS_umount+0xe9/0x1a0 SYSC_umount
  [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

Freed by thread T3:
  [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
  [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
  [<ffffffff8107cce2>] __do_softirq+0x142/0x380
  [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
  [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
  [<ffffffff810a8238>] kthread+0x148/0x160
  [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0

Allocated by thread T22276:
  [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
  [<ffffffff81235e2a>] iget_locked+0x10a/0x230
  [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
  [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
  [<ffffffff81211bce>] mount_bdev+0x1de/0x240
  [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
  [<ffffffff81212a85>] mount_fs+0x55/0x220
  [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
  [<     inlined    >] do_mount+0x2b4/0x1120 do_new_mount
  [<ffffffff812400d4>] do_mount+0x2b4/0x1120
  [<     inlined    >] SyS_mount+0xb2/0x110 SYSC_mount
  [<ffffffff812414a2>] SyS_mount+0xb2/0x110
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

The buggy address ffff8800587866c8 is located 48 bytes inside
  of 680-byte region [ffff880058786698ffff880058786940)

Memory state around the buggy address:
  ffff880058786100ffffffff ffffffff ffffffff ffffffff
  ffff880058786200ffffffff ffffffff ffffffrr rrrrrrrr
  ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
  ffff880058786400ffffffff ffffffff ffffffff ffffffff
  ffff880058786500ffffffff ffffffff ffffffff fffffffr
 >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
                                                ^
  ffff880058786700ffffffff ffffffff ffffffff ffffffff
  ffff880058786800ffffffff ffffffff ffffffff ffffffff
  ffff880058786900ffffffff rrrrrrrr rrrrrrrr rrrr....
  ffff880058786a00: ........ ........ ........ ........
  ffff880058786b00: ........ ........ ........ ........
Legend:
  f - 8 freed bytes
  r - 8 redzone bytes
  . - 8 allocated bytes
  x=1..7 - x allocated bytes + (8-x) redzone bytes

Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().

It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."

Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: add f2fs_balance_fs for direct IO
Huang Ying [Sat, 12 Jul 2014 12:10:00 +0000 (20:10 +0800)]
f2fs: add f2fs_balance_fs for direct IO

Otherwise, if a large amount of direct IO writes were done, the
segment allocation may be failed because no enough segments are gced.

Changes:

v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: reduce searching region of segmap when free section
Chao Yu [Mon, 14 Jul 2014 08:45:15 +0000 (16:45 +0800)]
f2fs: reduce searching region of segmap when free section

In __set_test_and_free we will check whether all segment are free in one section
When free one segment, in order to set section to free status.
But the searching region of segmap is from start segno to last segno of f2fs,
it's not necessary. So let's just only check all segment bitmap of target
section.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove the unused stat_lock
Gu Zheng [Fri, 11 Jul 2014 10:35:44 +0000 (18:35 +0800)]
f2fs: remove the unused stat_lock

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: cleanup the needless return of f2fs_create_root_stats
Gu Zheng [Fri, 11 Jul 2014 10:35:43 +0000 (18:35 +0800)]
f2fs: cleanup the needless return of f2fs_create_root_stats

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: check name_len of dir entry to prevent from deadloop
Chao Yu [Thu, 10 Jul 2014 04:37:46 +0000 (12:37 +0800)]
f2fs: check name_len of dir entry to prevent from deadloop

We assume that modification of some special application could result in zeroed
name_len, or it is consciously made by somebody. We will deadloop in
find_in_block when name_len of dir entry is zero.

This patch is added for preventing deadloop in above scenario.

change log from v1:
 o use f2fs_bug_on rather than break out from searching dir entry suggested by
Jaegeuk Kim.

Jaegeuk describe:
"Well, IMO, it would be good to add f2fs_bug_on() here with a specific comment.
In the current phase of f2fs, it is more important to investigate the file
system bugs, rather than workarounds for any corrupted images.
And, definitely it needs to stop the kernel if any corrupted image was mounted,
so that we can figure out where the bugs are occurred."

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: use inner macro and function to clean up codes
Chao Yu [Mon, 7 Jul 2014 03:21:59 +0000 (11:21 +0800)]
f2fs: use inner macro and function to clean up codes

In this patch we use below inner macro and function to clean up codes.
1. ADDRS_PER_PAGE
2. SM_I
3. f2fs_readonly

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_write_failed to handle error case when write
Chao Yu [Wed, 2 Jul 2014 05:25:04 +0000 (13:25 +0800)]
f2fs: introduce f2fs_write_failed to handle error case when write

When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk
and page cache are still kept, despite these may not be used again.

This patch introduce f2fs_write_failed() to handle the error case of these two
interfaces, it will truncate page cache and blocks of this file according to
i_size.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/data.c

Change-Id: I5bd54f04a66671550cd3a192d026a77dc47f6533

8 years agof2fs: arguments cleanup of finding file flow functions
Gu Zheng [Tue, 24 Jun 2014 10:21:23 +0000 (18:21 +0800)]
f2fs: arguments cleanup of finding file flow functions

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/dir.c

Change-Id: Ie5e28c674b7ddb32dce85f7f4758436b9dcc54f6

8 years agof2fs: remove the needless point-cast
Gu Zheng [Fri, 27 Jun 2014 09:57:04 +0000 (17:57 +0800)]
f2fs: remove the needless point-cast

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove the redundant validation check of acl
Gu Zheng [Tue, 24 Jun 2014 10:18:14 +0000 (18:18 +0800)]
f2fs: remove the redundant validation check of acl

kernel side(xx_init_acl), the acl is get/cloned from the parent dir's,
which is credible. So remove the redundant validation check of acl
here.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/acl.c

Change-Id: I9701dfbc54a6933677385cf1c63c1065c15e54c1

8 years agof2fs: reduce region of f2fs_lock_op covered for better concurrency
Chao Yu [Tue, 24 Jun 2014 06:16:24 +0000 (14:16 +0800)]
f2fs: reduce region of f2fs_lock_op covered for better concurrency

In our rename process, region of f2fs_lock_op covered is too big as some of the
code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock.

So in the extreme case like doing checkpoint when we rename old inode to exist
inode in a large directory could cause lower concurrency.

Let's reduce the region of f2fs_lock_op to fix this.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: replace count*size kzalloc by kcalloc
Fabian Frederick [Mon, 23 Jun 2014 16:39:15 +0000 (18:39 +0200)]
f2fs: replace count*size kzalloc by kcalloc

kcalloc manages count*sizeof overflow.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: refactor flush_nat_entries codes for reducing NAT writes
Chao Yu [Tue, 24 Jun 2014 01:18:20 +0000 (09:18 +0800)]
f2fs: refactor flush_nat_entries codes for reducing NAT writes

Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
node num cp count nodes/cp
based 4599.6 1803.0 2.551
patched 2714.6 1829.6 1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

node num cp count nodes/cp
based 84.5 43.7 1.933
patched 49.2 40.0 1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns) dirty nat count
3089219 24922
5129423 27422
4000250 24523

change log from v1:
 o fix wrong logic in add_nat_entry when grab a new nat entry set.
 o swith to create slab cache in create_node_manager_caches.
 o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
 o make comment position more appropriate suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: clean up an unused parameter and assignment
Jaegeuk Kim [Sat, 21 Jun 2014 04:44:02 +0000 (21:44 -0700)]
f2fs: clean up an unused parameter and assignment

This patch cleans up simple unnecessary codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: introduce f2fs_do_tmpfile for code consistency
Jaegeuk Kim [Sat, 21 Jun 2014 04:37:02 +0000 (21:37 -0700)]
f2fs: introduce f2fs_do_tmpfile for code consistency

This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem,  and
this will enable better debugging stuffs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: support ->tmpfile()
Chao Yu [Thu, 19 Jun 2014 08:23:19 +0000 (16:23 +0800)]
f2fs: support ->tmpfile()

Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic
of init_inode_metadata to enable linkat temp file.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid to truncate non-updated page partially
Chao Yu [Thu, 12 Jun 2014 05:31:50 +0000 (13:31 +0800)]
f2fs: avoid to truncate non-updated page partially

After we call find_data_page in truncate_partial_data_page, we could not
guarantee this page is updated or not as error may occurred in lower layer.

We'd better check status of the page to avoid this no updated page be
writebacked to device.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid unneeded SetPageUptodate in f2fs_write_end
Chao Yu [Thu, 12 Jun 2014 05:25:01 +0000 (13:25 +0800)]
f2fs: avoid unneeded SetPageUptodate in f2fs_write_end

We have already set page update in ->write_begin, so we should remove redundant
SetPageUptodate in ->write_end.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid to access NULL pointer in issue_flush_thread
Chao Yu [Mon, 7 Jul 2014 01:39:32 +0000 (09:39 +0800)]
f2fs: avoid to access NULL pointer in issue_flush_thread

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861

Denis 2014-05-10 11:28:59 UTC reported:
"F2FS-fs (mmcblk0p28): mounting..
 Unable to handle kernel NULL pointer dereference at virtual address 00000018
 ...
 [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c)
 [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4)
 [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)"

This patch assign cmd_control_info in sm_info before issue_flush_thread is being
created, so this make sure that issue flush thread will have no chance to access
invalid info in fcc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: check bdi->dirty_exceeded when trying to skip data writes
Jaegeuk Kim [Fri, 27 Jun 2014 16:00:41 +0000 (01:00 +0900)]
f2fs: check bdi->dirty_exceeded when trying to skip data writes

If we don't check the current backing device status, balance_dirty_pages can
fall into infinite pausing routine.

This can be occurred when a lot of directories make a small number of dirty
dentry pages including files.

Reported-by: Brian Chadwick <brianchad@westnet.com.au>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: do checkpoint for the renamed inode
Jaegeuk Kim [Mon, 30 Jun 2014 09:09:55 +0000 (18:09 +0900)]
f2fs: do checkpoint for the renamed inode

If an inode is renamed, it should be registered as file_lost_pino to conduct
checkpoint at f2fs_sync_file.
Otherwise, the inode cannot be recovered due to no dent_mark in the following
scenario.

Note that, this scenario is from xfstests/322.

1. create "a"
2. fsync "a"
3. rename "a" to "b"
4. fsync "b"
5. Sudden power-cut

After recovery is done, "b" should be seen.
However, the result shows "a", since the recovery procedure does not enter
recover_dentry due to no dent_mark.

The reason is like below.
- The nid of "a" is checkpointed during #2, f2fs_sync_file.
- The inode page for "b" produced by #3 is written without dent_mark by
sync_node_pages.

So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode.
If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers
the latest pino and its dentry information for further recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: release new entry page correctly in error path of f2fs_rename
Chao Yu [Tue, 24 Jun 2014 06:13:13 +0000 (14:13 +0800)]
f2fs: release new entry page correctly in error path of f2fs_rename

This patch correct releasing code of new_page to avoid BUG_ON in error patch of
f2fs_rename.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
fs/f2fs/namei.c

Change-Id: I35fb263de98342bd3031a35b7d978723d315fdf6

8 years agof2fs: fix error path in init_inode_metadata
Chao Yu [Tue, 24 Jun 2014 02:34:00 +0000 (10:34 +0800)]
f2fs: fix error path in init_inode_metadata

If we fail in this path:
->init_inode_metadata
  ->make_empty_dir
    ->get_new_data_page
      ->grab_cache_page return -ENOMEM

We will bug on in error path of init_inode_metadata when call remove_inode_page
because i_block = 2 (one inode block will be released later & one dentry block).

We should release the dentry block in init_inode_metadata to avoid this BUG_ON,
and avoid leak of dentry block resource, because we never have second chance to
release that block in ->evict_inode as in upper error path we make this inode
'bad'.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: check lower bound nid value in check_nid_range
Chao Yu [Thu, 12 Jun 2014 05:23:41 +0000 (13:23 +0800)]
f2fs: check lower bound nid value in check_nid_range

This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: remove unused variables in f2fs_sm_info
Chao Yu [Wed, 11 Jun 2014 10:32:23 +0000 (18:32 +0800)]
f2fs: remove unused variables in f2fs_sm_info

Remove unused variables in struct f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix not to allocate unnecessary blocks during fallocate
Jaegeuk Kim [Fri, 13 Jun 2014 04:07:31 +0000 (13:07 +0900)]
f2fs: fix not to allocate unnecessary blocks during fallocate

This patch fixes the fallocate bug like below. (See xfstests/255)

In fallocate(fd, 0, 20480),
expand_inode_data processes
for (index = pg_start; index <= pg_end; index++) {
f2fs_reserve_block();
...
}

So, even though fallocate requests 20480, 5 blocks, f2fs allocates 6 blocks
including pg_end.
So, this patch adds one condition to avoid block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: recover fallocated data and its i_size together
Jaegeuk Kim [Fri, 13 Jun 2014 04:05:55 +0000 (13:05 +0900)]
f2fs: recover fallocated data and its i_size together

This patch arranges the f2fs_locks to cover the fallocated data and its i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix to report newly allocate region as extent
Jaegeuk Kim [Fri, 13 Jun 2014 04:02:11 +0000 (13:02 +0900)]
f2fs: fix to report newly allocate region as extent

Previous get_block in f2fs didn't report the newly allocated region which has
NEW_ADDR.
For reader, it should not report, but fiemap needs this.
So, this patch introduces two get_block sharing core function.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: support f2fs_fiemap
Jaegeuk Kim [Sat, 7 Jun 2014 19:30:14 +0000 (04:30 +0900)]
f2fs: support f2fs_fiemap

This patch links f2fs_fiemap with generic function with get_block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid not to call remove_dirty_inode
Jaegeuk Kim [Fri, 6 Jun 2014 18:05:03 +0000 (03:05 +0900)]
f2fs: avoid not to call remove_dirty_inode

There is an errorneous case during the recovery like below.

In recovery_dentry,
 1) dir = f2fs_iget();
 2) mark the dir with FI_DELAY_IPUT
 3) goto unmap_out

After the end of recovery routine, there is no dirty dentries so the dir cannot
be released by iput in remove_dirty_dir_inode.

This patch fixes such the bug case by handling the iget and iput in the
recovery_dentry procedure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: recover fallocated space
Jaegeuk Kim [Thu, 5 Jun 2014 17:12:59 +0000 (02:12 +0900)]
f2fs: recover fallocated space

If a fallocated file is fsynced, we should recover the i_size after sudden
power cut.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: fix to recover data written by dio
Jaegeuk Kim [Tue, 3 Jun 2014 15:39:42 +0000 (00:39 +0900)]
f2fs: fix to recover data written by dio

If data are overwritten through dio, previous f2fs doesn't remain the fsync mark
due to no additional node writes.

Note that this patch should resolve the xfstests:311.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: large volume support
Changman Lee [Mon, 12 May 2014 03:27:43 +0000 (12:27 +0900)]
f2fs: large volume support

f2fs's cp has one page which consists of struct f2fs_checkpoint and
version bitmap of sit and nat. To support lots of segments, we need more
blocks for sit bitmap. So let's arrange sit bitmap as following:
+-----------------+------------+
| f2fs_checkpoint | sit bitmap |
| + nat bitmap    |            |
+-----------------+------------+
0                 4k        N blocks

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: simple code change for readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
8 years agof2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Chao Yu [Tue, 27 May 2014 00:41:07 +0000 (08:41 +0800)]
f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

Previously we allocate pages with no mapping in ra_sum_pages(), so we may
encounter a crash in event trace of f2fs_submit_page_mbio where we access
mapping data of the page.

We'd better allocate pages in bd_inode mapping and invalidate these pages after
we restore data from pages. It could avoid crash in above scenario.

Changes from V1
 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.

Call Trace:
 [<f1031630>] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
 [<f10377bb>] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
 [<f103c5da>] restore_node_summary+0x13a/0x280 [f2fs]
 [<f103e22d>] build_curseg+0x2bd/0x620 [f2fs]
 [<f104043b>] build_segment_manager+0x1cb/0x920 [f2fs]
 [<f1032c85>] f2fs_fill_super+0x535/0x8e0 [f2fs]
 [<c115b66a>] mount_bdev+0x16a/0x1a0
 [<f102f63f>] f2fs_mount+0x1f/0x30 [f2fs]
 [<c115c096>] mount_fs+0x36/0x170
 [<c1173635>] vfs_kern_mount+0x55/0xe0
 [<c1175388>] do_mount+0x1e8/0x900
 [<c1175d72>] SyS_mount+0x82/0xc0
 [<c16059cc>] sysenter_do_call+0x12/0x22

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>