Commits · 976e4c50aea111bc7193b48950a3b0c8bc0a25ff · nexedi / linux

23 Sep, 2014 8 commits

f2fs: update i_size when __allocate_data_block · 976e4c50

Jaegeuk Kim authored Sep 15, 2014

The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path,
we should update i_size at the changed time to update its inode page.
Otherwise, we can get wrong i_size after roll-forward recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

976e4c50

f2fs: use MAX_BIO_BLOCKS(sbi) · 90a893c7

Jaegeuk Kim authored Sep 22, 2014

This patch cleans up a simple macro.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

90a893c7

f2fs: remove redundant operation during roll-forward recovery · c52e1b10

Jaegeuk Kim authored Sep 11, 2014

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c52e1b10

f2fs: do not skip latest inode information · 19c9c466

Jaegeuk Kim authored Sep 10, 2014

In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

19c9c466

f2fs: fix roll-forward missing scenarios · 441ac5cb

Jaegeuk Kim authored Sep 15, 2014

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

441ac5cb

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

Jaegeuk Kim authored Sep 15, 2014

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: introduce a flag to represent each nat entry information · 7ef35e3b

Jaegeuk Kim authored Sep 15, 2014

This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

7ef35e3b

f2fs: use meta_inode cache to improve roll-forward speed · 4c521f49

Jaegeuk Kim authored Sep 11, 2014

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4c521f49

16 Sep, 2014 5 commits

f2fs: fix double lock for inode page during roll-foward recovery · 60979115

Jaegeuk Kim authored Sep 13, 2014

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

60979115

f2fs: fix a race condition in next_free_nid · c6e48930

Huang Ying authored Sep 12, 2014

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c6e48930

f2fs: use nm_i->next_scan_nid as default for next_free_nid · 77041823

Huang Ying authored Sep 12, 2014

Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

77041823

f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02

Jaegeuk Kim authored Sep 10, 2014

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

c1ce1b02

f2fs: expand counting dirty pages in the inode page cache · a7ffdbe2

Jaegeuk Kim authored Sep 12, 2014

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

a7ffdbe2

11 Sep, 2014 1 commit

f2fs: remove lengthy inode->i_ino · 2403c155

Jaegeuk Kim authored Sep 10, 2014

This patch is to remove lengthy name by adding a new variable.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

2403c155

09 Sep, 2014 10 commits

f2fs: fix negative value for lseek offset · 0b4c5afd

Jaegeuk Kim authored Sep 08, 2014

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.
Reported-by: Tommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

0b4c5afd

f2fs: avoid node page to be written twice in gc_node_segment · 9a01b56b

Huang Ying authored Sep 07, 2014

In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

			sync_node_pages
			  try_lock_page
			  ...
check_valid_map		  f2fs_write_node_page
			    ...
			    write_node_page
			      do_write_page
			        allocate_data_block
				  ...
				  refresh_sit_entry /* update cur_valid_map */
				  ...
			    ...
			    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9a01b56b

f2fs: use lock-less list(llist) to simplify the flush cmd management · 721bd4d5

Gu Zheng authored Sep 05, 2014

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

721bd4d5

f2fs: refactor flush_sit_entries codes for reducing SIT writes · 184a5cd2

Chao Yu authored Sep 04, 2014

In commit aec71382 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

184a5cd2

f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO · d3a14afd

Chao Yu authored Sep 04, 2014

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

d3a14afd

f2fs: need fsck.f2fs if the recovery was failed · b0c44f05

Jaegeuk Kim authored Sep 02, 2014

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b0c44f05

f2fs: handle bug cases by letting fsck.f2fs initiate · ec325b52

Jaegeuk Kim authored Sep 02, 2014

This patch adds to handle corner buggy cases for fsck.f2fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

ec325b52

f2fs: add BUG cases to initiate fsck.f2fs · 05796763

Jaegeuk Kim authored Sep 02, 2014

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

05796763

f2fs: need fsck.f2fs when f2fs_bug_on is triggered · 9850cf4a

Jaegeuk Kim authored Sep 02, 2014

If any f2fs_bug_on is triggered, fsck.f2fs is needed.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

9850cf4a

f2fs: retain inconsistency information to initiate fsck.f2fs · 2ae4c673

Jaegeuk Kim authored Sep 02, 2014

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

2ae4c673

04 Sep, 2014 1 commit

f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB · 4081363f

Jaegeuk Kim authored Sep 02, 2014

This patch adds three inline functions to clean up dirty casting codes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

4081363f

03 Sep, 2014 10 commits

Merge tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 70c8038d

Linus Torvalds authored Sep 03, 2014

Pull f2fs bug fixes from Jaegeuk Kim:
 "This series includes patches to:

   - fix recovery routines
   - fix bugs related to inline_data/xattr
   - fix when casting the dentry names
   - handle EIO or ENOMEM correctly
   - fix memory leak
   - fix lock coverage"

* tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (28 commits)
  f2fs: reposition unlock_new_inode to prevent accessing invalid inode
  f2fs: fix wrong casting for dentry name
  f2fs: simplify by using a literal
  f2fs: truncate stale block for inline_data
  f2fs: use macro for code readability
  f2fs: introduce need_do_checkpoint for readability
  f2fs: fix incorrect calculation with total/free inode num
  f2fs: remove rename and use rename2
  f2fs: skip if inline_data was converted already
  f2fs: remove rewrite_node_page
  f2fs: avoid double lock in truncate_blocks
  f2fs: prevent checkpoint during roll-forward
  f2fs: add WARN_ON in f2fs_bug_on
  f2fs: handle EIO not to break fs consistency
  f2fs: check s_dirty under cp_mutex
  f2fs: unlock_page when node page is redirtied out
  f2fs: introduce f2fs_cp_error for readability
  f2fs: give a chance to mount again when encountering errors
  f2fs: trigger release_dirty_inode in f2fs_put_super
  f2fs: don't skip checkpoint if there is no dirty node pages
  ...

70c8038d

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 5a147c9f

Linus Torvalds authored Sep 03, 2014

Pull key subsystem fixes from James Morris:
 "Fixes for the keys subsystem, one of which addresses a use-after-free
  bug"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  PEFILE: Relax the check on the length of the PKCS#7 cert
  KEYS: Fix use-after-free in assoc_array_gc()
  KEYS: Fix public_key asymmetric key subtype name
  KEYS: Increase root_maxkeys and root_maxbytes sizes

5a147c9f

ARC: [mm] Fix compilation breakage · 014018e0

Noam Camus authored Sep 03, 2014

Structure name and variable name were erroneously interchanged
Signed-off-by: Noam Camus <noamc@ezchip.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
[ Also removed pointless cast from "void *".  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

014018e0

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 955837d8

Linus Torvalds authored Sep 03, 2014

Pull more arm64 fixes from Will Deacon:
 "Another handful of arm64 fixes here.  They address some issues found
  by running smatch on the arch code (ignoring the false positives) and
  also stop 32-bit Android from losing track of its stack.

  There's one additional irq migration fix in the pipeline, but it came
  in after I'd tagged and tested this set.

   - a few fixes for real issues found by smatch (after Dan's talk at KS)

   - revert the /proc/cpuinfo changes merged during the merge window.
     We've opened a can of worms here, so we need to find out where we
     stand before we change this interface.

   - implement KSTK_ESP for compat tasks, otherwise 32-bit Android gets
     confused wondering where its [stack] has gone

   - misc fixes (fpsimd context handling, crypto, ...)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Revert "arm64: cpuinfo: print info for all CPUs"
  arm64: fix bug for reloading FPSIMD state after cpu power off
  arm64: report correct stack pointer in KSTK_ESP for compat tasks
  arm64: Add brackets around user_stack_pointer()
  arm64: perf: don't rely on layout of pt_regs when grabbing sp or pc
  arm64: ptrace: fix compat reg getter/setter return values
  arm64: ptrace: fix compat hardware watchpoint reporting
  arm64: Remove unused variable in head.S
  arm64/crypto: remove redundant update of data

955837d8

Merge tag 'pci-v3.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · f16c15a0

Linus Torvalds authored Sep 03, 2014

Pull PCI fix from Bjorn Helgaas:
 "This fixes an ARM allmodconfig build problem:

  Remove module option for ST Microelectronics SPEAr13xx"

* tag 'pci-v3.17-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: spear: Remove module option

f16c15a0

Merge branch 'leds-fixes-for-3.17' of... · 51fe4d3b

Linus Torvalds authored Sep 03, 2014

Merge branch 'leds-fixes-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds

Pull LED fix from Bryan Wu:
 "Hugh, Jiri and many other people found a kernel oops due to a LED
  change merged recently.  Now the right fix might just revert it and
  avoid the kernel oops"

* 'leds-fixes-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
  Revert "leds: convert blink timer to workqueue"

51fe4d3b

PEFILE: Relax the check on the length of the PKCS#7 cert · 0aa04094

David Howells authored Sep 02, 2014

Relax the check on the length of the PKCS#7 cert as it appears that the PE
file wrapper size gets rounded up to the nearest 8.

The debugging output looks like this:

	PEFILE: ==> verify_pefile_signature()
	PEFILE: ==> pefile_parse_binary()
	PEFILE: checksum @ 110
	PEFILE: header size = 200
	PEFILE: cert = 968 @547be0 [68 09 00 00 00 02 02 00 30 82 09 56 ]
	PEFILE: sig wrapper = { 968, 200, 2 }
	PEFILE: Signature data not PKCS#7

The wrapper is the first 8 bytes of the hex dump inside [].  This indicates a
length of 0x968 bytes, including the wrapper header - so 0x960 bytes of
payload.

The ASN.1 wrapper begins [ ... 30 82 09 56 ].  That indicates an object of size
0x956 - a four byte discrepency, presumably just padding for alignment
purposes.

So we just check that the ASN.1 container is no bigger than the payload and
reduce the recorded size appropriately.

Whilst we're at it, allow shorter PKCS#7 objects that manage to squeeze within
127 or 255 bytes.  It's just about conceivable if no X.509 certs are included
in the PKCS#7 message.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Peter Jones <pjones@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

0aa04094

KEYS: Fix use-after-free in assoc_array_gc() · 27419604

David Howells authored Sep 02, 2014

An edit script should be considered inaccessible by a function once it has
called assoc_array_apply_edit() or assoc_array_cancel_edit().

However, assoc_array_gc() is accessing the edit script just after the
gc_complete: label.
Reported-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
cc: shemming@brocade.com
cc: paulmck@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Signed-off-by: James Morris <james.l.morris@oracle.com>

27419604

KEYS: Fix public_key asymmetric key subtype name · 876c6e3e

David Howells authored Sep 02, 2014

The length of the name of an asymmetric key subtype must be stored in struct
asymmetric_key_subtype::name_len so that it can be matched by a search for
"<subkey_name>:<partial_fingerprint>".  Fix the public_key subtype to have
name_len set.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

876c6e3e

KEYS: Increase root_maxkeys and root_maxbytes sizes · 738c5d19

Steve Dickson authored Sep 02, 2014

Now that NFS client uses the kernel key ring facility to store the NFSv4
id/gid mappings, the defaults for root_maxkeys and root_maxbytes need to be
substantially increased.

These values have been soak tested:

	https://bugzilla.redhat.com/show_bug.cgi?id=1033708#c73Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

738c5d19

02 Sep, 2014 2 commits

Revert "leds: convert blink timer to workqueue" · 9067359f

Jiri Kosina authored Sep 02, 2014

This reverts commit 8b37e1be.

It's broken as it changes led_blink_set() in a way that it can now sleep
(while synchronously waiting for workqueue to be cancelled). That's a
problem, because it's possible that this function gets called from atomic
context (tpt_trig_timer() takes a readlock and thus disables preemption).

This has been brought up 3 weeks ago already [1] but no proper fix has
materialized, and I keep seeing the problem since 3.17-rc1.

[1] https://lkml.org/lkml/2014/8/16/128

 BUG: sleeping function called from invalid context at kernel/workqueue.c:2650
 in_atomic(): 1, irqs_disabled(): 0, pid: 2335, name: wpa_supplicant
 5 locks held by wpa_supplicant/2335:
  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814c7c92>] rtnl_lock+0x12/0x20
  #1:  (&wdev->mtx){+.+.+.}, at: [<ffffffffc06e649c>] cfg80211_mgd_wext_siwessid+0x5c/0x180 [cfg80211]
  #2:  (&local->mtx){+.+.+.}, at: [<ffffffffc0817dea>] ieee80211_prep_connection+0x17a/0x9a0 [mac80211]
  #3:  (&local->chanctx_mtx){+.+.+.}, at: [<ffffffffc08081ed>] ieee80211_vif_use_channel+0x5d/0x2a0 [mac80211]
  #4:  (&trig->leddev_list_lock){.+.+..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]
 CPU: 0 PID: 2335 Comm: wpa_supplicant Not tainted 3.17.0-rc3 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffff8800360b5a50 ffff8800751f76d8 ffffffff8159e97f ffff8800360b5a30
  ffff8800751f76e8 ffffffff810739a5 ffff8800751f77b0 ffffffff8106862f
  ffffffff810685d0 0aa2209200000000 ffff880000000004 ffff8800361c59d0
 Call Trace:
  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
  [<ffffffff810739a5>] __might_sleep+0xe5/0x120
  [<ffffffff8106862f>] flush_work+0x5f/0x270
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
  [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
  [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
  [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
  [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
  [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
  [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
  [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
  [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
  [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
  [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
  [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
  [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
  [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
  [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
  [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
  [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
  [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
  [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
  [<ffffffffc06e36c0>] ? cfg80211_wext_giwessid+0x50/0x50 [cfg80211]
  [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
  [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
  [<ffffffff81584fa0>] ? ioctl_standard_iw_point+0x3e0/0x3e0
  [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
  [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
  [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
  [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
  [<ffffffff815a67fb>] ? sysret_check+0x1b/0x56
  [<ffffffff810946ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0
  [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
  [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
 wlan0: send auth to 00:0b:6b:3c:8c:e4 (try 1/3)
 wlan0: authenticated
 wlan0: associate with 00:0b:6b:3c:8c:e4 (try 1/3)
 wlan0: RX AssocResp from 00:0b:6b:3c:8c:e4 (capab=0x431 status=0 aid=2)
 wlan0: associated
 IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
 cfg80211: Calling CRDA for country: NA
 wlan0: Limiting TX power to 27 (27 - 0) dBm as advertised by 00:0b:6b:3c:8c:e4

 =================================
 [ INFO: inconsistent lock state ]
 3.17.0-rc3 #1 Not tainted
 ---------------------------------
 inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
  ((&(&led_cdev->blink_work)->work)){+.?...}, at: [<ffffffff810685d0>] flush_work+0x0/0x270
 {SOFTIRQ-ON-W} state was registered at:
   [<ffffffff81094dbe>] __lock_acquire+0x30e/0x1a30
   [<ffffffff81096c81>] lock_acquire+0x91/0x110
   [<ffffffff81068608>] flush_work+0x38/0x270
   [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
   [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
   [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
   [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
   [<ffffffffc081ecdd>] ieee80211_mod_tpt_led_trig+0x9d/0x160 [mac80211]
   [<ffffffffc07e4278>] __ieee80211_recalc_idle+0x98/0x140 [mac80211]
   [<ffffffffc07e59ce>] ieee80211_idle_off+0xe/0x10 [mac80211]
   [<ffffffffc0804e5b>] ieee80211_add_chanctx+0x3b/0x220 [mac80211]
   [<ffffffffc08062e4>] ieee80211_new_chanctx+0x44/0xf0 [mac80211]
   [<ffffffffc080838a>] ieee80211_vif_use_channel+0x1fa/0x2a0 [mac80211]
   [<ffffffffc0817df8>] ieee80211_prep_connection+0x188/0x9a0 [mac80211]
   [<ffffffffc081c246>] ieee80211_mgd_auth+0x256/0x2e0 [mac80211]
   [<ffffffffc07eab33>] ieee80211_auth+0x13/0x20 [mac80211]
   [<ffffffffc06cb006>] cfg80211_mlme_auth+0x106/0x270 [cfg80211]
   [<ffffffffc06ce085>] cfg80211_conn_do_work+0x155/0x3b0 [cfg80211]
   [<ffffffffc06cf670>] cfg80211_connect+0x3f0/0x540 [cfg80211]
   [<ffffffffc06e6148>] cfg80211_mgd_wext_connect+0x158/0x1f0 [cfg80211]
   [<ffffffffc06e651e>] cfg80211_mgd_wext_siwessid+0xde/0x180 [cfg80211]
   [<ffffffffc06e36dd>] cfg80211_wext_siwessid+0x1d/0x40 [cfg80211]
   [<ffffffff81584d0c>] ioctl_standard_iw_point+0x14c/0x3e0
   [<ffffffff8158502a>] ioctl_standard_call+0x8a/0xd0
   [<ffffffff81584b76>] wireless_process_ioctl.constprop.10+0xb6/0x100
   [<ffffffff8158521d>] wext_handle_ioctl+0x5d/0xb0
   [<ffffffff814cfb29>] dev_ioctl+0x329/0x620
   [<ffffffff8149c7f2>] sock_ioctl+0x142/0x2e0
   [<ffffffff811b0140>] do_vfs_ioctl+0x300/0x520
   [<ffffffff811b03e1>] SyS_ioctl+0x81/0xa0
   [<ffffffff815a67d6>] system_call_fastpath+0x1a/0x1f
 irq event stamp: 493416
 hardirqs last  enabled at (493416): [<ffffffff81068a5f>] __cancel_work_timer+0x6f/0x100
 hardirqs last disabled at (493415): [<ffffffff81067e9f>] try_to_grab_pending+0x1f/0x160
 softirqs last  enabled at (493408): [<ffffffff81053ced>] _local_bh_enable+0x1d/0x50
 softirqs last disabled at (493409): [<ffffffff81054c75>] irq_exit+0xa5/0xb0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock((&(&led_cdev->blink_work)->work));
   <Interrupt>
     lock((&(&led_cdev->blink_work)->work));

  *** DEADLOCK ***

 2 locks held by swapper/0/0:
  #0:  (((&tpt_trig->timer))){+.-...}, at: [<ffffffff810b4c50>] call_timer_fn+0x0/0x180
  #1:  (&trig->leddev_list_lock){.+.?..}, at: [<ffffffffc081e68c>] tpt_trig_timer+0xec/0x170 [mac80211]

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.0-rc3 #1
 Hardware name: LENOVO 7470BN2/7470BN2, BIOS 6DET38WW (2.02 ) 12/19/2008
  ffffffff8246eb30 ffff88007c203b00 ffffffff8159e97f ffffffff81a194c0
  ffff88007c203b50 ffffffff81599c29 0000000000000001 ffffffff00000001
  ffff880000000000 0000000000000006 ffffffff81a194c0 ffffffff81093ad0
 Call Trace:
  <IRQ>  [<ffffffff8159e97f>] dump_stack+0x4d/0x66
  [<ffffffff81599c29>] print_usage_bug+0x1f4/0x205
  [<ffffffff81093ad0>] ? check_usage_backwards+0x140/0x140
  [<ffffffff810944d3>] mark_lock+0x223/0x2b0
  [<ffffffff81094d60>] __lock_acquire+0x2b0/0x1a30
  [<ffffffff81096c81>] lock_acquire+0x91/0x110
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff81068608>] flush_work+0x38/0x270
  [<ffffffff810685d0>] ? mod_delayed_work_on+0x80/0x80
  [<ffffffff810945ca>] ? mark_held_locks+0x6a/0x90
  [<ffffffff81068a5f>] ? __cancel_work_timer+0x6f/0x100
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff8109469d>] ? trace_hardirqs_on_caller+0xad/0x1c0
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff81068a6b>] __cancel_work_timer+0x7b/0x100
  [<ffffffff81068b0e>] cancel_delayed_work_sync+0xe/0x10
  [<ffffffff8147cf3b>] led_blink_set+0x1b/0x40
  [<ffffffffc081e6b0>] tpt_trig_timer+0x110/0x170 [mac80211]
  [<ffffffff810b4cc5>] call_timer_fn+0x75/0x180
  [<ffffffff810b4c50>] ? process_timeout+0x10/0x10
  [<ffffffffc081e5a0>] ? __ieee80211_get_rx_led_name+0x10/0x10 [mac80211]
  [<ffffffff810b50ac>] run_timer_softirq+0x1fc/0x2f0
  [<ffffffff81054805>] __do_softirq+0x115/0x2e0
  [<ffffffff81054c75>] irq_exit+0xa5/0xb0
  [<ffffffff810049b3>] do_IRQ+0x53/0xf0
  [<ffffffff815a74af>] common_interrupt+0x6f/0x6f
  <EOI>  [<ffffffff8147b56e>] ? cpuidle_enter_state+0x6e/0x180
  [<ffffffff8147b732>] cpuidle_enter+0x12/0x20
  [<ffffffff8108bba0>] cpu_startup_entry+0x330/0x360
  [<ffffffff8158fb51>] rest_init+0xc1/0xd0
  [<ffffffff8158fa90>] ? csum_partial_copy_generic+0x170/0x170
  [<ffffffff81af3ff2>] start_kernel+0x44f/0x45a
  [<ffffffff81af399c>] ? set_init_arg+0x53/0x53
  [<ffffffff81af35ad>] x86_64_start_reservations+0x2a/0x2c
  [<ffffffff81af36a0>] x86_64_start_kernel+0xf1/0xf4

Cc: Vincent Donnefort <vdonnefort@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Bryan Wu <cooloney@gmail.com>

9067359f

f2fs: reposition unlock_new_inode to prevent accessing invalid inode · b73e5282

Chao Yu authored Aug 30, 2014

As the race condition on the inode cache, following scenario can appear:
[Thread a]				[Thread b]
					->f2fs_mkdir
					  ->f2fs_add_link
					    ->__f2fs_add_link
					      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
					  ->unlock_new_inode
        ->move_data_page
					  ->make_bad_inode
					  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

b73e5282

01 Sep, 2014 3 commits

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7505ceaf

Linus Torvalds authored Sep 01, 2014

Pull irq handling fixlet from Thomas Gleixner:
 "Just an export for an interrupt flow handler which is now used in gpio
  modules"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq: Export handle_fasteoi_irq

7505ceaf

Revert "arm64: cpuinfo: print info for all CPUs" · 5e39977e

Will Deacon authored Sep 01, 2014

It turns out that vendors are relying on the format of /proc/cpuinfo,
and we've even spotted out-of-tree hacks attempting to make it look
identical to the format used by arch/arm/. That means we can't afford to
churn this interface in mainline, so revert the recent reformatting of
the file for arm64 pending discussions on the list to find out what
people actually want.

This reverts commit d7a49086.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

5e39977e

arm64: fix bug for reloading FPSIMD state after cpu power off · 7c68a9cc

Leo Yan authored Sep 01, 2014

Now arm64 defers reloading FPSIMD state, but this optimization also
introduces the bug after cpu resume back from low power mode.

The reason is after the cpu has been powered off, s/w need set the
cpu's fpsimd_last_state to NULL so that it will force to reload
FPSIMD state for the thread, otherwise there has the chance to meet
the condition for both the task's fpsimd_state.cpu field contains the
id of the current cpu, and the cpu's fpsimd_last_state per-cpu variable
points to the task's fpsimd_state, so finally kernel will skip to reload
the context during it return back to userland.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Leo Yan <leoy@marvell.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

7c68a9cc