3 years agof2fs: fix coding style
Jaegeuk Kim [Thu, 31 Jul 2014 00:25:54 +0000]
f2fs: fix coding style

This patch fixes wrong coding style.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: remove redundant lines in allocate_data_block
Dongho Sim [Wed, 30 Jul 2014 06:52:41 +0000]
f2fs: remove redundant lines in allocate_data_block

There are redundant lines in allocate_data_block.

In this function, we call refresh_sit_entry with old seg and old curseg.
After that, we call locate_dirty_segment with old curseg.

But, the new address is always allocated from old curseg and
we call locate_dirty_segment with old curseg in refresh_sit_entry.
So, we do not need to call locate_dirty_segment with old curseg again.

We've discussed like below:

Jaegeuk said:
 "When considering SSR, we need to take care of the following scenario.
  - old segno : X
  - new address : Z
  - old curseg : Y
  This means, a new block is supposed to be written to Z from X.
  And Z is newly allocated in the same path from Y.

  In that case, we should trigger locate_dirty_segment for Y, since
  it was a current_segment and can be dirty owing to SSR.
  But that was not included in the dirty list."

Changman said:
 "We already choosed old curseg(Y) and then we allocate new address(Z) from old
  curseg(Y). After that we call refresh_sit_entry(old address, new address).
  In the funcation, we call locate_dirty_segment with old seg and old curseg.
  So calling locate_dirty_segment after refresh_sit_entry again is redundant."

Jaegeuk said:
 "Right. The new address is always allocated from old_curseg."

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Dongho Sim <dh.sim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: add tracepoint for f2fs_issue_flush
Jaegeuk Kim [Sat, 26 Jul 2014 00:46:10 +0000]
f2fs: add tracepoint for f2fs_issue_flush

This patch adds a tracepoint for f2fs_issue_flush.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid retrying wrong recovery routine when error was occurred
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:25 +0000]
f2fs: avoid retrying wrong recovery routine when error was occurred

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/recovery.c

Change-Id: I914547ea612937738a5e7ea9c5e555bfa067540d

3 years agof2fs: test before set/clear bits
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:23 +0000]
f2fs: test before set/clear bits

If the bit is already set, we don't need to reset it, and vice versa.
Because we don't need to make the caches dirty for that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: fix wrong condition for unlikely
Jaegeuk Kim [Fri, 25 Jul 2014 14:41:43 +0000]
f2fs: fix wrong condition for unlikely

This patch fixes the wrongly used unlikely condition.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: enable in-place-update for fdatasync
Jaegeuk Kim [Fri, 25 Jul 2014 02:11:43 +0000]
f2fs: enable in-place-update for fdatasync

This patch enforces in-place-updates only when fdatasync is requested.
If we adopt this in-place-updates for the fdatasync, we can skip to write the
recovery information.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: skip unnecessary data writes during fsync
Jaegeuk Kim [Fri, 25 Jul 2014 02:08:02 +0000]
f2fs: skip unnecessary data writes during fsync

This patch intends to improve the fsync performance by skipping remaining the
recovery information, only when there is no data that we should recover.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: add info of appended or updated data writes
Jaegeuk Kim [Fri, 25 Jul 2014 14:40:59 +0000]
f2fs: add info of appended or updated data writes

This patch introduces a inode number list in which represents inodes having
appended data writes or updated data writes after last checkpoint.
This will be used at fsync to determine whether the recovery information
should be written or not.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: use radix_tree for ino management
Jaegeuk Kim [Fri, 25 Jul 2014 01:15:17 +0000]
f2fs: use radix_tree for ino management

For better ino management, this patch replaces the data structure from list
to radix tree.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: add infra for ino management
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:17 +0000]
f2fs: add infra for ino management

This patch changes the naming of orphan-related data structures to use as
inode numbers managed globally.
Later, we can use this facility for managing any inode number lists.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/checkpoint.c

Change-Id: Iba6f73620269080d3fad9bdeeb3b9825b9651719

3 years agof2fs: punch the core function for inode management
Jaegeuk Kim [Fri, 25 Jul 2014 22:47:16 +0000]
f2fs: punch the core function for inode management

This patch punches out the core functions to manage the inode numbers.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/checkpoint.c

Change-Id: I5d147dc8bfc9461b85f5426cd83ba4735f46e08b

3 years agof2fs: add nobarrier mount option
Jaegeuk Kim [Wed, 23 Jul 2014 16:57:31 +0000]
f2fs: add nobarrier mount option

This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/f2fs.h
fs/f2fs/super.c

Change-Id: I52b65987a3353135cbc029221203fdfb65d26212

3 years agof2fs: fix to put root inode in error path of fill_super
Chao Yu [Fri, 25 Jul 2014 04:55:09 +0000]
f2fs: fix to put root inode in error path of fill_super

We should put root inode correctly in error path of fill_super, otherwise we
may encounter a leak case of inode resource.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid use invalid mapping of node_inode when evict meta inode
Chao Yu [Fri, 25 Jul 2014 04:00:57 +0000]
f2fs: avoid use invalid mapping of node_inode when evict meta inode

Andrey Tsyvarev reported:
"Using memory error detector reveals the following use-after-free error
in 3.15.0:

AddressSanitizer: heap-use-after-free in f2fs_evict_inode
Read of size 8 by thread T22279:
  [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
  [<ffffffff812359af>] evict+0x15f/0x290
  [<     inlined    >] iput+0x196/0x280 iput_final
  [<ffffffff812369a6>] iput+0x196/0x280
  [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
  [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
  [<ffffffff812105fd>] kill_block_super+0x4d/0xb0
  [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
  [<ffffffff81211c98>] deactivate_super+0x68/0x80
  [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
  [<     inlined    >] SyS_umount+0xe9/0x1a0 SYSC_umount
  [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

Freed by thread T3:
  [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
  [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
  [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
  [<ffffffff8107cce2>] __do_softirq+0x142/0x380
  [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
  [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
  [<ffffffff810a8238>] kthread+0x148/0x160
  [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0

Allocated by thread T22276:
  [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
  [<ffffffff81235e2a>] iget_locked+0x10a/0x230
  [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
  [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
  [<ffffffff81211bce>] mount_bdev+0x1de/0x240
  [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
  [<ffffffff81212a85>] mount_fs+0x55/0x220
  [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
  [<     inlined    >] do_mount+0x2b4/0x1120 do_new_mount
  [<ffffffff812400d4>] do_mount+0x2b4/0x1120
  [<     inlined    >] SyS_mount+0xb2/0x110 SYSC_mount
  [<ffffffff812414a2>] SyS_mount+0xb2/0x110
  [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b

The buggy address ffff8800587866c8 is located 48 bytes inside
  of 680-byte region [ffff880058786698, ffff880058786940)

Memory state around the buggy address:
  ffff880058786100: ffffffff ffffffff ffffffff ffffffff
  ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
  ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
  ffff880058786400: ffffffff ffffffff ffffffff ffffffff
  ffff880058786500: ffffffff ffffffff ffffffff fffffffr
 >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
                                                ^
  ffff880058786700: ffffffff ffffffff ffffffff ffffffff
  ffff880058786800: ffffffff ffffffff ffffffff ffffffff
  ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
  ffff880058786a00: ........ ........ ........ ........
  ffff880058786b00: ........ ........ ........ ........
Legend:
  f - 8 freed bytes
  r - 8 redzone bytes
  . - 8 allocated bytes
  x=1..7 - x allocated bytes + (8-x) redzone bytes

Investigation shows, that f2fs_evict_inode, when called for
'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
iput().

It seems that in common usage scenario this use-after-free is benign,
because 'node_inode' remains partially valid data even after
kmem_cache_free().
But things may change if, while 'meta_inode' is evicted in one f2fs
filesystem, another (mounted) f2fs filesystem requests inode from cache,
and formely
'node_inode' of the first filesystem is returned."

Nids for both meta_inode and node_inode are reservation, so it's not necessary
for us to invalidate pages which will never be allocated.
To fix this issue, let's skipping needlessly invalidating pages for
{meta,node}_inode in f2fs_evict_inode.

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Tested-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: add f2fs_balance_fs for direct IO
Huang Ying [Sat, 12 Jul 2014 12:10:00 +0000]
f2fs: add f2fs_balance_fs for direct IO

Otherwise, if a large amount of direct IO writes were done, the
segment allocation may be failed because no enough segments are gced.

Changes:

v2: add f2fs_balance_fs into __get_data_block instead of f2fs_direct_IO.

Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: reduce searching region of segmap when free section
Chao Yu [Mon, 14 Jul 2014 08:45:15 +0000]
f2fs: reduce searching region of segmap when free section

In __set_test_and_free we will check whether all segment are free in one section
When free one segment, in order to set section to free status.
But the searching region of segmap is from start segno to last segno of f2fs,
it's not necessary. So let's just only check all segment bitmap of target
section.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: remove the unused stat_lock
Gu Zheng [Fri, 11 Jul 2014 10:35:44 +0000]
f2fs: remove the unused stat_lock

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: cleanup the needless return of f2fs_create_root_stats
Gu Zheng [Fri, 11 Jul 2014 10:35:43 +0000]
f2fs: cleanup the needless return of f2fs_create_root_stats

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: check name_len of dir entry to prevent from deadloop
Chao Yu [Thu, 10 Jul 2014 04:37:46 +0000]
f2fs: check name_len of dir entry to prevent from deadloop

We assume that modification of some special application could result in zeroed
name_len, or it is consciously made by somebody. We will deadloop in
find_in_block when name_len of dir entry is zero.

This patch is added for preventing deadloop in above scenario.

change log from v1:
 o use f2fs_bug_on rather than break out from searching dir entry suggested by
Jaegeuk Kim.

Jaegeuk describe:
"Well, IMO, it would be good to add f2fs_bug_on() here with a specific comment.
In the current phase of f2fs, it is more important to investigate the file
system bugs, rather than workarounds for any corrupted images.
And, definitely it needs to stop the kernel if any corrupted image was mounted,
so that we can figure out where the bugs are occurred."

Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: use inner macro and function to clean up codes
Chao Yu [Mon, 7 Jul 2014 03:21:59 +0000]
f2fs: use inner macro and function to clean up codes

In this patch we use below inner macro and function to clean up codes.
1. ADDRS_PER_PAGE
2. SM_I
3. f2fs_readonly

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: introduce f2fs_write_failed to handle error case when write
Chao Yu [Wed, 2 Jul 2014 05:25:04 +0000]
f2fs: introduce f2fs_write_failed to handle error case when write

When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk
and page cache are still kept, despite these may not be used again.

This patch introduce f2fs_write_failed() to handle the error case of these two
interfaces, it will truncate page cache and blocks of this file according to
i_size.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/data.c

Change-Id: I5bd54f04a66671550cd3a192d026a77dc47f6533

3 years agof2fs: arguments cleanup of finding file flow functions
Gu Zheng [Tue, 24 Jun 2014 10:21:23 +0000]
f2fs: arguments cleanup of finding file flow functions

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/dir.c

Change-Id: Ie5e28c674b7ddb32dce85f7f4758436b9dcc54f6

3 years agof2fs: remove the needless point-cast
Gu Zheng [Fri, 27 Jun 2014 09:57:04 +0000]
f2fs: remove the needless point-cast

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: remove the redundant validation check of acl
Gu Zheng [Tue, 24 Jun 2014 10:18:14 +0000]
f2fs: remove the redundant validation check of acl

kernel side(xx_init_acl), the acl is get/cloned from the parent dir's,
which is credible. So remove the redundant validation check of acl
here.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/acl.c

Change-Id: I9701dfbc54a6933677385cf1c63c1065c15e54c1

3 years agof2fs: reduce region of f2fs_lock_op covered for better concurrency
Chao Yu [Tue, 24 Jun 2014 06:16:24 +0000]
f2fs: reduce region of f2fs_lock_op covered for better concurrency

In our rename process, region of f2fs_lock_op covered is too big as some of the
code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock.

So in the extreme case like doing checkpoint when we rename old inode to exist
inode in a large directory could cause lower concurrency.

Let's reduce the region of f2fs_lock_op to fix this.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: replace count*size kzalloc by kcalloc
Fabian Frederick [Mon, 23 Jun 2014 16:39:15 +0000]
f2fs: replace count*size kzalloc by kcalloc

kcalloc manages count*sizeof overflow.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: refactor flush_nat_entries codes for reducing NAT writes
Chao Yu [Tue, 24 Jun 2014 01:18:20 +0000]
f2fs: refactor flush_nat_entries codes for reducing NAT writes

Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
node num cp count nodes/cp
based 4599.6 1803.0 2.551
patched 2714.6 1829.6 1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

node num cp count nodes/cp
based 84.5 43.7 1.933
patched 49.2 40.0 1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns) dirty nat count
3089219 24922
5129423 27422
4000250 24523

change log from v1:
 o fix wrong logic in add_nat_entry when grab a new nat entry set.
 o swith to create slab cache in create_node_manager_caches.
 o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
 o make comment position more appropriate suggested by Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: clean up an unused parameter and assignment
Jaegeuk Kim [Sat, 21 Jun 2014 04:44:02 +0000]
f2fs: clean up an unused parameter and assignment

This patch cleans up simple unnecessary codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: introduce f2fs_do_tmpfile for code consistency
Jaegeuk Kim [Sat, 21 Jun 2014 04:37:02 +0000]
f2fs: introduce f2fs_do_tmpfile for code consistency

This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem,  and
this will enable better debugging stuffs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: support ->tmpfile()
Chao Yu [Thu, 19 Jun 2014 08:23:19 +0000]
f2fs: support ->tmpfile()

Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic
of init_inode_metadata to enable linkat temp file.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid to truncate non-updated page partially
Chao Yu [Thu, 12 Jun 2014 05:31:50 +0000]
f2fs: avoid to truncate non-updated page partially

After we call find_data_page in truncate_partial_data_page, we could not
guarantee this page is updated or not as error may occurred in lower layer.

We'd better check status of the page to avoid this no updated page be
writebacked to device.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid unneeded SetPageUptodate in f2fs_write_end
Chao Yu [Thu, 12 Jun 2014 05:25:01 +0000]
f2fs: avoid unneeded SetPageUptodate in f2fs_write_end

We have already set page update in ->write_begin, so we should remove redundant
SetPageUptodate in ->write_end.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid to access NULL pointer in issue_flush_thread
Chao Yu [Mon, 7 Jul 2014 01:39:32 +0000]
f2fs: avoid to access NULL pointer in issue_flush_thread

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861

Denis 2014-05-10 11:28:59 UTC reported:
"F2FS-fs (mmcblk0p28): mounting..
 Unable to handle kernel NULL pointer dereference at virtual address 00000018
 ...
 [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c)
 [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4)
 [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)"

This patch assign cmd_control_info in sm_info before issue_flush_thread is being
created, so this make sure that issue flush thread will have no chance to access
invalid info in fcc.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: check bdi->dirty_exceeded when trying to skip data writes
Jaegeuk Kim [Fri, 27 Jun 2014 16:00:41 +0000]
f2fs: check bdi->dirty_exceeded when trying to skip data writes

If we don't check the current backing device status, balance_dirty_pages can
fall into infinite pausing routine.

This can be occurred when a lot of directories make a small number of dirty
dentry pages including files.

Reported-by: Brian Chadwick <brianchad@westnet.com.au>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: do checkpoint for the renamed inode
Jaegeuk Kim [Mon, 30 Jun 2014 09:09:55 +0000]
f2fs: do checkpoint for the renamed inode

If an inode is renamed, it should be registered as file_lost_pino to conduct
checkpoint at f2fs_sync_file.
Otherwise, the inode cannot be recovered due to no dent_mark in the following
scenario.

Note that, this scenario is from xfstests/322.

1. create "a"
2. fsync "a"
3. rename "a" to "b"
4. fsync "b"
5. Sudden power-cut

After recovery is done, "b" should be seen.
However, the result shows "a", since the recovery procedure does not enter
recover_dentry due to no dent_mark.

The reason is like below.
- The nid of "a" is checkpointed during #2, f2fs_sync_file.
- The inode page for "b" produced by #3 is written without dent_mark by
sync_node_pages.

So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode.
If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers
the latest pino and its dentry information for further recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: release new entry page correctly in error path of f2fs_rename
Chao Yu [Tue, 24 Jun 2014 06:13:13 +0000]
f2fs: release new entry page correctly in error path of f2fs_rename

This patch correct releasing code of new_page to avoid BUG_ON in error patch of
f2fs_rename.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Conflicts:
fs/f2fs/namei.c

Change-Id: I35fb263de98342bd3031a35b7d978723d315fdf6

3 years agof2fs: fix error path in init_inode_metadata
Chao Yu [Tue, 24 Jun 2014 02:34:00 +0000]
f2fs: fix error path in init_inode_metadata

If we fail in this path:
->init_inode_metadata
  ->make_empty_dir
    ->get_new_data_page
      ->grab_cache_page return -ENOMEM

We will bug on in error path of init_inode_metadata when call remove_inode_page
because i_block = 2 (one inode block will be released later & one dentry block).

We should release the dentry block in init_inode_metadata to avoid this BUG_ON,
and avoid leak of dentry block resource, because we never have second chance to
release that block in ->evict_inode as in upper error path we make this inode
'bad'.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: check lower bound nid value in check_nid_range
Chao Yu [Thu, 12 Jun 2014 05:23:41 +0000]
f2fs: check lower bound nid value in check_nid_range

This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: remove unused variables in f2fs_sm_info
Chao Yu [Wed, 11 Jun 2014 10:32:23 +0000]
f2fs: remove unused variables in f2fs_sm_info

Remove unused variables in struct f2fs_sm_info.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: fix not to allocate unnecessary blocks during fallocate
Jaegeuk Kim [Fri, 13 Jun 2014 04:07:31 +0000]
f2fs: fix not to allocate unnecessary blocks during fallocate

This patch fixes the fallocate bug like below. (See xfstests/255)

In fallocate(fd, 0, 20480),
expand_inode_data processes
for (index = pg_start; index <= pg_end; index++) {
f2fs_reserve_block();
...
}

So, even though fallocate requests 20480, 5 blocks, f2fs allocates 6 blocks
including pg_end.
So, this patch adds one condition to avoid block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: recover fallocated data and its i_size together
Jaegeuk Kim [Fri, 13 Jun 2014 04:05:55 +0000]
f2fs: recover fallocated data and its i_size together

This patch arranges the f2fs_locks to cover the fallocated data and its i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: fix to report newly allocate region as extent
Jaegeuk Kim [Fri, 13 Jun 2014 04:02:11 +0000]
f2fs: fix to report newly allocate region as extent

Previous get_block in f2fs didn't report the newly allocated region which has
NEW_ADDR.
For reader, it should not report, but fiemap needs this.
So, this patch introduces two get_block sharing core function.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: support f2fs_fiemap
Jaegeuk Kim [Sat, 7 Jun 2014 19:30:14 +0000]
f2fs: support f2fs_fiemap

This patch links f2fs_fiemap with generic function with get_block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid not to call remove_dirty_inode
Jaegeuk Kim [Fri, 6 Jun 2014 18:05:03 +0000]
f2fs: avoid not to call remove_dirty_inode

There is an errorneous case during the recovery like below.

In recovery_dentry,
 1) dir = f2fs_iget();
 2) mark the dir with FI_DELAY_IPUT
 3) goto unmap_out

After the end of recovery routine, there is no dirty dentries so the dir cannot
be released by iput in remove_dirty_dir_inode.

This patch fixes such the bug case by handling the iget and iput in the
recovery_dentry procedure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: recover fallocated space
Jaegeuk Kim [Thu, 5 Jun 2014 17:12:59 +0000]
f2fs: recover fallocated space

If a fallocated file is fsynced, we should recover the i_size after sudden
power cut.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: fix to recover data written by dio
Jaegeuk Kim [Tue, 3 Jun 2014 15:39:42 +0000]
f2fs: fix to recover data written by dio

If data are overwritten through dio, previous f2fs doesn't remain the fsync mark
due to no additional node writes.

Note that this patch should resolve the xfstests:311.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: large volume support
Changman Lee [Mon, 12 May 2014 03:27:43 +0000]
f2fs: large volume support

f2fs's cp has one page which consists of struct f2fs_checkpoint and
version bitmap of sit and nat. To support lots of segments, we need more
blocks for sit bitmap. So let's arrange sit bitmap as following:
+-----------------+------------+
| f2fs_checkpoint | sit bitmap |
| + nat bitmap    |            |
+-----------------+------------+
0                 4k        N blocks

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: simple code change for readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Chao Yu [Tue, 27 May 2014 00:41:07 +0000]
f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages

Previously we allocate pages with no mapping in ra_sum_pages(), so we may
encounter a crash in event trace of f2fs_submit_page_mbio where we access
mapping data of the page.

We'd better allocate pages in bd_inode mapping and invalidate these pages after
we restore data from pages. It could avoid crash in above scenario.

Changes from V1
 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim.

Call Trace:
 [<f1031630>] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs]
 [<f10377bb>] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs]
 [<f103c5da>] restore_node_summary+0x13a/0x280 [f2fs]
 [<f103e22d>] build_curseg+0x2bd/0x620 [f2fs]
 [<f104043b>] build_segment_manager+0x1cb/0x920 [f2fs]
 [<f1032c85>] f2fs_fill_super+0x535/0x8e0 [f2fs]
 [<c115b66a>] mount_bdev+0x16a/0x1a0
 [<f102f63f>] f2fs_mount+0x1f/0x30 [f2fs]
 [<c115c096>] mount_fs+0x36/0x170
 [<c1173635>] vfs_kern_mount+0x55/0xe0
 [<c1175388>] do_mount+0x1e8/0x900
 [<c1175d72>] SyS_mount+0x82/0xc0
 [<c16059cc>] sysenter_do_call+0x12/0x22

Suggested-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: avoid overflow when large directory feathure is enabled
Chao Yu [Wed, 28 May 2014 00:56:09 +0000]
f2fs: avoid overflow when large directory feathure is enabled

When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.

Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.

Changes from V1
 o modify description of calculation in f2fs.txt suggested by Changman Lee.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: fix recursive lock by f2fs_setxattr
Jaegeuk Kim [Sun, 1 Jun 2014 14:24:30 +0000]
f2fs: fix recursive lock by f2fs_setxattr

This patch should resolve the following recursive lock.

[<ffffffff8135a9c3>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffffa01749dc>] f2fs_setxattr+0x5c/0xa0 [f2fs]
[<ffffffffa0174c99>] __f2fs_set_acl+0x1b9/0x340 [f2fs]
[<ffffffffa017515a>] f2fs_init_acl+0x4a/0xcb [f2fs]
[<ffffffffa0159abe>] __f2fs_add_link+0x26e/0x780 [f2fs]
[<ffffffffa015d4d8>] f2fs_mkdir+0xb8/0x150 [f2fs]
[<ffffffff811cebd7>] vfs_mkdir+0xb7/0x160
[<ffffffff811cf89b>] SyS_mkdir+0xab/0xe0
[<ffffffff817244bf>] tracesys+0xe1/0xe6
[<ffffffffffffffff>] 0xffffffffffffffff

The call path indicates:
- f2fs_add_link
   : down_write(&fi->i_sem);

 - init_inode_metadata
   - f2fs_init_acl
     - __f2fs_set_acl
       - f2fs_setxattr
         : down_write(&fi->i_sem);

Here we should not call f2fs_setxattr, but __f2fs_setxattr.
But __f2fs_setxattr is a static function in xattr.c, so that I found the other
generic approach to use f2fs_setxattr.

In f2fs_setxattr, the page pointer is only given from init_inode_metadata.
So, this patch adds this condition to avoid this in f2fs_setxattr.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

3 years agof2fs: use inode_init_owner() to simplify codes
Chao Yu [Thu, 8 May 2014 09:09:30 +0000]
f2fs: use inode_init_owner() to simplify codes

This patch uses exported inode_init_owner() to simplify codes in
f2fs_new_inode().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Conflicts:
fs/f2fs/namei.c

Change-Id: I5231bd9f52fedd485b9efb10651c11e84375f1f1

3 years agof2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
Chao Yu [Thu, 8 May 2014 09:00:35 +0000]
f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency

If we use slab memory in f2fs_issue_flush(), we will face memory pressure and
latency time caused by racing of kmem_cache_{alloc,free}.

Let's alloc memory in stack instead of slab.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add a tracepoint for f2fs_read_data_page
Chao Yu [Tue, 6 May 2014 08:53:08 +0000]
f2fs: add a tracepoint for f2fs_read_data_page

This patch adds a tracepoint for f2fs_read_data_page to trace when page is
readed by user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
Chao Yu [Tue, 6 May 2014 08:51:24 +0000]
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages

This patch adds a tracepoint for f2fs_write_{meta,node,data}_pages to trace when
pages are fsyncing/flushing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
Chao Yu [Tue, 6 May 2014 08:48:26 +0000]
f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page

This patch adds a tracepoint for f2fs_write_{meta,node,data}_page to trace when
page is writting out.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add a tracepoint for f2fs_write_end
Chao Yu [Tue, 6 May 2014 08:47:23 +0000]
f2fs: add a tracepoint for f2fs_write_end

This patch adds a tracepoint for f2fs_write_end to trace write op of user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agoRevert "f2fs: Add f2fs write tracing points."
JP Abgrall [Tue, 30 Sep 2014 01:42:57 +0000]
Revert "f2fs: Add f2fs write tracing points."

This reverts commit 23c1f5d32079e28154840b9e5b0243affd91c12d.

Conflicts:
include/trace/events/f2fs.h

Change-Id: Ifc3493cfbca9b15e5a4ee14074a43f6b3ed41d65

3 years agof2fs: add a tracepoint for f2fs_write_begin
Chao Yu [Tue, 6 May 2014 08:46:04 +0000]
f2fs: add a tracepoint for f2fs_write_begin

This patch adds a tracepoint for f2fs_write_begin to trace write op of user.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: fix checkpatch warning
Zhang Zhen [Sun, 4 May 2014 08:37:06 +0000]
f2fs: fix checkpatch warning

fix the following checkpatch warning:
WARNING: do {} while (0) macros should not be semicolon terminated

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: deactivate inode page if the inode is evicted
Jaegeuk Kim [Wed, 30 Apr 2014 06:04:39 +0000]
f2fs: deactivate inode page if the inode is evicted

If the inode page is clean during its inode eviction, it'd better drop the page
to reduce further memory pressure.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: decrease the lock granularity during write_begin
Jaegeuk Kim [Wed, 30 Apr 2014 00:22:45 +0000]
f2fs: decrease the lock granularity during write_begin

This patch reduces the lock granularity during write_begin.
When the system is under memory pressure, it would be better to reduce
the locking time for the data pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: no need to wait on page writebck to meta pages
Jaegeuk Kim [Wed, 30 Apr 2014 00:18:53 +0000]
f2fs: no need to wait on page writebck to meta pages

This patch removes grab_cache_page_write_begin for meta pages.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: avoid grab_cache_page_write_begin for data pages
Jaegeuk Kim [Tue, 29 Apr 2014 08:35:10 +0000]
f2fs: avoid grab_cache_page_write_begin for data pages

We don't need to wait on page writeback for these cases.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: split grab_cache_page and wait_on_page_writeback for node pages
Jaegeuk Kim [Tue, 29 Apr 2014 08:28:32 +0000]
f2fs: split grab_cache_page and wait_on_page_writeback for node pages

This patch splits grab_cache_page_write_begin into grab_cache_page and
wait_on_page_writeback for node pages.

This patch intends to enhance the latency to get node pages by alleviating
unnecessary wait_on_page_writeback.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: fix to truncate inline data in inode page when setattr
Chao Yu [Tue, 29 Apr 2014 01:03:03 +0000]
f2fs: fix to truncate inline data in inode page when setattr

Previous we do not truncate inline data in inode page when setattr, so following
case could still read the inline data which has already truncated:

1.write inline data
2.ftruncate size to 0
3.ftruncate size to max inline data size
4.read from offset 0

This patch introduces truncate_inline_data() to fix this problem.

change log from v1:
 o fix a bug and do not truncate first page data after truncate inline data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: readahead multi pages of directory for performance
Chao Yu [Mon, 28 Apr 2014 09:59:43 +0000]
f2fs: readahead multi pages of directory for performance

We have no so such readahead mechanism in ->iterate() path as the one in
->read() path, it cause low performance when we read large directory.
This patch add readahead in f2fs_readdir() for better performance.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: set errno when f2fs_iget failed in recover_dentry
Chao Yu [Mon, 28 Apr 2014 09:58:34 +0000]
f2fs: set errno when f2fs_iget failed in recover_dentry

We should set the error number correctly when we fail in recover_dentry(), so
the recover flow could stop for the reason as error number shows instead of
continuing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: consider fallocated space for SEEK_DATA
Jaegeuk Kim [Mon, 28 Apr 2014 09:12:36 +0000]
f2fs: consider fallocated space for SEEK_DATA

If an amount of data are allocated though fallocate and user writes a couple of
data among the space, f2fs should return the data offset made by user when
SEEK_DATA is requested.

For example, (N: NEW_ADDR by fallocate, X: NEW_ADDR by user)
1) fallocate 0 ~ 10MB
f -> N N N N N N N N N N N N ... N

2) write 4KB at 5MB offset
f -> N N N N N X N N N N N N ... N

3) SEEK_DATA from 0 should return 5MB offset

So, this patch adds a routine to search the first dirty page to handle that.
Then, the SEEK_DATA flow skips NEW_ADDR offsets until any dirty page is found.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: return i_size if the hole is outside of i_size
Jaegeuk Kim [Mon, 28 Apr 2014 08:02:48 +0000]
f2fs: return i_size if the hole is outside of i_size

When SEEK_HOLE is requeted, it should return i_size if the hole position is
found outside of i_size.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek
Chao Yu [Wed, 23 Apr 2014 06:10:24 +0000]
f2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek

In This patch we introduce f2fs_seek_block to support SEEK_{DATA,HOLE} of
lseek(2).

change log from v1:
 o fix bug when lseek from middle of page and fix wrong calculation of
PGOFS_OF_NEXT_DNODE macro.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: introduce help function {create,destroy}_flush_cmd_control
Gu Zheng [Sun, 27 Apr 2014 06:21:33 +0000]
f2fs: introduce help function {create,destroy}_flush_cmd_control

Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: introduce struct flush_cmd_control to wrap the flush_merge fields
Gu Zheng [Sun, 27 Apr 2014 06:21:21 +0000]
f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields

Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: introduce help macro ADDRS_PER_PAGE()
Chao Yu [Sat, 26 Apr 2014 11:59:52 +0000]
f2fs: introduce help macro ADDRS_PER_PAGE()

Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: submit bio at the reclaim path
Jaegeuk Kim [Thu, 24 Apr 2014 00:49:52 +0000]
f2fs: submit bio at the reclaim path

If f2fs_write_data_page is called through the reclaim path, we should submit
the bio right away.

This patch resolves the following issue that Marc Dietrich reported.
"It took me a while to bisect a problem which causes my ARM (tegra2) netbook to
frequently stall for 5-10 seconds when I enable EXA acceleration (opentegra
experimental ddx)."
And this patch fixes that.

Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: return errors right after checking them
Jaegeuk Kim [Wed, 23 Apr 2014 03:28:18 +0000]
f2fs: return errors right after checking them

This patch adds two error conditions early in the setxattr operations.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: pass flags field to setxattr functions
Jaegeuk Kim [Wed, 23 Apr 2014 03:23:14 +0000]
f2fs: pass flags field to setxattr functions

This patch passes the "flags" field to the low level setxattr functions
to use XATTR_REPLACE in the following patches.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: clean up long variable names
Jaegeuk Kim [Wed, 23 Apr 2014 03:17:25 +0000]
f2fs: clean up long variable names

This patch includes simple clean-ups to reduce unnecessary long variable names.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: handle inline data independently in f2fs_bmap
Chao Yu [Tue, 22 Apr 2014 05:34:01 +0000]
f2fs: handle inline data independently in f2fs_bmap

We'd better handle inline data case independently in f2fs_bmap().
It can reduce our handling time in f2fs_bmap().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: adjust free mem size to flush dentry blocks
Jaegeuk Kim [Wed, 16 Apr 2014 01:47:06 +0000]
f2fs: adjust free mem size to flush dentry blocks

If so many dirty dentry blocks are cached, not reached to the flush condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: avoid BUG_ON when mouting corrupted image having garbage blocks
Jaegeuk Kim [Fri, 18 Apr 2014 06:21:04 +0000]
f2fs: avoid BUG_ON when mouting corrupted image having garbage blocks

If the disk has some garbage blocks, F2FS is able to face with BUG_ON when
recovering direct node blocks.
This patch detects the error case and avoids that prior to reaching BUG_ON.

Alexey Khoroshilov addressed the potential security issues as follows.
"An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.

[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc."

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add available_nids to fix handling max_nid correctly
Jaegeuk Kim [Fri, 18 Apr 2014 02:14:37 +0000]
f2fs: add available_nids to fix handling max_nid correctly

This patch introduces available_nids for alloc_nids() and fixes max_nid for
build_free_nids() and scan_nat_pages().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add static to get_max_meta_blks
Fabian Frederick [Thu, 17 Apr 2014 15:51:06 +0000]
f2fs: add static to get_max_meta_blks

inline get_max_meta_blks is only used in checkpoint.c
Use standard static inline format.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: introduce raw_nat_from_node_info() to simplfy codes
Chao Yu [Thu, 17 Apr 2014 02:51:05 +0000]
f2fs: introduce raw_nat_from_node_info() to simplfy codes

This patch introduce raw_nat_from_node_info() to simplfy some codes, and also
use exist function node_info_from_raw_nat() to do the same job.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: add the flush_merge handle in the remount flow
Gu Zheng [Fri, 11 Apr 2014 09:50:00 +0000]
f2fs: add the flush_merge handle in the remount flow

Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: atomically set inode->i_flags in f2fs_set_inode_flags()
Zhang Zhen [Tue, 15 Apr 2014 06:19:38 +0000]
f2fs: atomically set inode->i_flags in f2fs_set_inode_flags()

Use set_mask_bits() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
FS_IMMUTABLE_FL, FS_APPEND_FL, etc. flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: make recover_inline_xattr() static
Jingoo Han [Tue, 15 Apr 2014 08:51:05 +0000]
f2fs: make recover_inline_xattr() static

Make recover_inline_xattr() static, because this function is
used only in this file.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: remove costly dirty_dir_inode operations
Jaegeuk Kim [Tue, 15 Apr 2014 02:19:28 +0000]
f2fs: remove costly dirty_dir_inode operations

This patch removes list opeations in handling dirty dir inodes.
Previously, F2FS traverses whole the list of dirty dir inodes to check whether
there is an existing inode or not, resulting in heavy CPU overheads.

So this patch removes such the traverse operations by adding FI_DIRTY_DIR to
indicate the inode lies on the list or not.
Through this simple flag, we can remove redundant operations gracefully.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Conflicts:
fs/f2fs/recovery.c

Change-Id: Idf1896b45b8451b69bb04a07d73c83db09d65910

3 years agof2fs: fix to unlock f2fs_lock at the omitted error case
Jaegeuk Kim [Wed, 16 Apr 2014 05:22:50 +0000]
f2fs: fix to unlock f2fs_lock at the omitted error case

If it occurs an error, we should call f2fs_unlock_op.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: call redirty_page_for_writepage
Jaegeuk Kim [Tue, 15 Apr 2014 07:04:15 +0000]
f2fs: call redirty_page_for_writepage

This patch replace some general codes with redirty_page_for_writepage, which
can be enabled after consideration on additional procedure like counting dirty
pages appropriately.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: avoid to conduct roll-forward due to the remained garbage blocks
Jaegeuk Kim [Tue, 15 Apr 2014 04:57:55 +0000]
f2fs: avoid to conduct roll-forward due to the remained garbage blocks

The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
due to reallocating node ids as follows.

When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):

kernel BUG at fs/f2fs/node.c:215!
 Call Trace:
 [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
 [<ffffffff811446e7>] ? __lock_page+0x67/0x70
 [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
 [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
 [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
 [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
 [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
 [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
 [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
 [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
 [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
 [<ffffffff811b9497>] mount_fs+0x47/0x1c0
 [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
 [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
 [<ffffffff811d6763>] do_mount+0x493/0x910
 [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
 [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
 [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b

Found by Linux File System Verification project (linuxtesting.org).

Reported-by: Andrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: enable flush_merge only in f2fs is not read-only
Gu Zheng [Fri, 11 Apr 2014 09:49:55 +0000]
f2fs: enable flush_merge only in f2fs is not read-only

Enable flush_merge only in f2fs is not read-only, so does the mount
option show.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: use __GFP_ZERO to avoid appending set-NULL
Gu Zheng [Fri, 11 Apr 2014 09:49:50 +0000]
f2fs: use __GFP_ZERO to avoid appending set-NULL

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: put the bio when issue_flush completed
Gu Zheng [Fri, 11 Apr 2014 09:49:35 +0000]
f2fs: put the bio when issue_flush completed

Put the bio when the flush cmd issued, it also can fix the following
kmemleak:
unreferenced object 0xffff8800270c73c0 (size 200):
  comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff  ........@.......
    01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81559866>] kmemleak_alloc+0x72/0x96
    [<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a
    [<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157
    [<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17
    [<ffffffff81119513>] mempool_alloc+0x71/0x138
    [<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c
    [<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs]
    [<ffffffff8107ac16>] kthread+0xba/0xc2
    [<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: fix wrong statistics of inline data
Chao Yu [Mon, 7 Apr 2014 03:18:34 +0000]
f2fs: fix wrong statistics of inline data

If we remove a file that has inline data after mount, our statistics turns to
inaccurate.

cat /sys/kernel/debug/f2fs/status
  - Inline_data Inode: 4294967295

Let's add stat_inc_inline_inode() to stat inline info of the file when lookup.

Change log from v1:
 o stat in f2fs_lookup() instead of in do_read_inode() for excluding wrong stat.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: check the acl's validity before setting
ZhangZhen [Fri, 4 Apr 2014 01:47:16 +0000]
f2fs: check the acl's validity before setting

Before setting the acl, call posix_acl_valid() to check if it is
valid or not.

Signed-off-by: zhangzhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Conflicts:
fs/f2fs/acl.c

Change-Id: Ib5aee3107d0c2f434d106864f61db517d8566362

3 years agof2fs: introduce f2fs_issue_flush to avoid redundant flush issue
Jaegeuk Kim [Wed, 2 Apr 2014 06:34:36 +0000]
f2fs: introduce f2fs_issue_flush to avoid redundant flush issue

Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Conflicts:
Documentation/filesystems/f2fs.txt
fs/f2fs/f2fs.h
fs/f2fs/super.c

Change-Id: I5c413bdc265ac35f4b1067857480c7c779b9b08b

3 years agof2fs: fix to cover io->bio with io_rwsem
Jaegeuk Kim [Wed, 2 Apr 2014 00:04:42 +0000]
f2fs: fix to cover io->bio with io_rwsem

In the f2fs_wait_on_page_writeback, io->bio should be covered by io_rwsem.
Otherwise, the bio pointer can become a dangling pointer due to data races.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: fix error path when fail to read inline data
Chao Yu [Sat, 29 Mar 2014 07:30:40 +0000]
f2fs: fix error path when fail to read inline data

We should unlock page in ->readpage() path and also should unlock & release page
in error path of ->write_begin() to avoid deadlock or memory leak.
So let's add release code to fix the problem when we fail to read inline data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

3 years agof2fs: use list_for_each_entry{_safe} for simplyfying code
Chao Yu [Sat, 29 Mar 2014 03:33:17 +0000]
f2fs: use list_for_each_entry{_safe} for simplyfying code

This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for
simplfying code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>