lab.nexedi.com will be down from Thursday, 20 March 2025, 07:30:00 UTC for a duration of approximately 2 hours

  1. 16 Jun, 2014 12 commits
    • Xufeng Zhang's avatar
      sctp: reset flowi4_oif parameter on route lookup · ffd51a72
      Xufeng Zhang authored
      [ Upstream commit 85350871 ]
      
      commit 813b3b5d (ipv4: Use caller's on-stack flowi as-is
      in output route lookups.) introduces another regression which
      is very similar to the problem of commit e6b45241 (ipv4: reset
      flowi parameters on route connect) wants to fix:
      Before we call ip_route_output_key() in sctp_v4_get_dst() to
      get a dst that matches a bind address as the source address,
      we have already called this function previously and the flowi
      parameters have been initialized including flowi4_oif, so when
      we call this function again, the process in __ip_route_output_key()
      will be different because of the setting of flowi4_oif, and we'll
      get a networking device which corresponds to the inputted flowi4_oif
      as the output device, this is wrong because we'll never hit this
      place if the previously returned source address of dst match one
      of the bound addresses.
      
      To reproduce this problem, a vlan setting is enough:
        # ifconfig eth0 up
        # route del default
        # vconfig add eth0 2
        # vconfig add eth0 3
        # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
        # route add default gw 10.0.1.254 dev eth0.2
        # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
        # ip rule add from 10.0.0.14 table 4
        # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
        # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
      You'll detect that all the flow are routed to eth0.2(10.0.1.254).
      Signed-off-by: default avatarXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ffd51a72
    • Toshiaki Makita's avatar
      bridge: Handle IFLA_ADDRESS correctly when creating bridge device · 7a2d2c22
      Toshiaki Makita authored
      [ Upstream commit 30313a3d ]
      
      When bridge device is created with IFLA_ADDRESS, we are not calling
      br_stp_change_bridge_id(), which leads to incorrect local fdb
      management and bridge id calculation, and prevents us from receiving
      frames on the bridge device.
      Reported-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7a2d2c22
    • Kumar Sundararajan's avatar
      ipv6: fib: fix fib dump restart · e28e060c
      Kumar Sundararajan authored
      [ Upstream commit 1c265854 ]
      
      When the ipv6 fib changes during a table dump, the walk is
      restarted and the number of nodes dumped are skipped. But the existing
      code doesn't advance to the next node after a node is skipped. This can
      cause the dump to loop or produce lots of duplicates when the fib
      is modified during the dump.
      
      This change advances the walk to the next node if the current node is
      skipped after a restart.
      Signed-off-by: default avatarKumar Sundararajan <kumar@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e28e060c
    • David Gibson's avatar
      rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set · acefd01b
      David Gibson authored
      [ Upstream commit c53864fd ]
      
      Since 115c9b81 (rtnetlink: Fix problem with
      buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
      attribute if they were solicited by a GETLINK message containing an
      IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.
      
      That was done because some user programs broke when they received more data
      than expected - because IFLA_VFINFO_LIST contains information for each VF
      it can become large if there are many VFs.
      
      However, the IFLA_VF_PORTS attribute, supplied for devices which implement
      ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
      It supplies per-VF information and can therefore become large, but it is
      not currently conditional on the IFLA_EXT_MASK value.
      
      Worse, it interacts badly with the existing EXT_MASK handling.  When
      IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
      NLMSG_GOODSIZE.  If the information for IFLA_VF_PORTS exceeds this, then
      rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
      netlink_dump() will misinterpret this as having finished the listing and
      omit data for this interface and all subsequent ones.  That can cause
      getifaddrs(3) to enter an infinite loop.
      
      This patch addresses the problem by only supplying IFLA_VF_PORTS when
      IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      acefd01b
    • David Gibson's avatar
      rtnetlink: Warn when interface's information won't fit in our packet · d2daaf59
      David Gibson authored
      [ Upstream commit 973462bb ]
      
      Without IFLA_EXT_MASK specified, the information reported for a single
      interface in response to RTM_GETLINK is expected to fit within a netlink
      packet of NLMSG_GOODSIZE.
      
      If it doesn't, however, things will go badly wrong,  When listing all
      interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
      message in a packet as the end of the listing and omit information for
      that interface and all subsequent ones.  This can cause getifaddrs(3) to
      enter an infinite loop.
      
      This patch won't fix the problem, but it will WARN_ON() making it easier to
      track down what's going wrong.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d2daaf59
    • Vlad Yasevich's avatar
      net: sctp: cache auth_enable per endpoint · 5051b7c7
      Vlad Yasevich authored
      [ Upstream commit b14878cc ]
      
      Currently, it is possible to create an SCTP socket, then switch
      auth_enable via sysctl setting to 1 and crash the system on connect:
      
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
      task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
      [...]
      Call Trace:
      [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
      [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
      [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
      [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
      [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
      [<ffffffff8043af68>] sctp_rcv+0x588/0x630
      [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
      [<ffffffff803acb50>] ip6_input+0x2c0/0x440
      [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
      [<ffffffff80310650>] process_backlog+0xb4/0x18c
      [<ffffffff80313cbc>] net_rx_action+0x12c/0x210
      [<ffffffff80034254>] __do_softirq+0x17c/0x2ac
      [<ffffffff800345e0>] irq_exit+0x54/0xb0
      [<ffffffff800075a4>] ret_from_irq+0x0/0x4
      [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
      [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
      [<ffffffff805a88b0>] start_kernel+0x37c/0x398
      Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
      03e00008  00000000
      ---[ end trace b530b0551467f2fd ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      
      What happens while auth_enable=0 in that case is, that
      ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
      when endpoint is being created.
      
      After that point, if an admin switches over to auth_enable=1,
      the machine can crash due to NULL pointer dereference during
      reception of an INIT chunk. When we enter sctp_process_init()
      via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
      the INIT verification succeeds and while we walk and process
      all INIT params via sctp_process_param() we find that
      net->sctp.auth_enable is set, therefore do not fall through,
      but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
      dereference what we have set to NULL during endpoint
      initialization phase.
      
      The fix is to make auth_enable immutable by caching its value
      during endpoint initialization, so that its original value is
      being carried along until destruction. The bug seems to originate
      from the very first days.
      
      Fix in joint work with Daniel Borkmann.
      Reported-by: default avatarJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Tested-by: default avatarJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5051b7c7
    • Ivan Vecera's avatar
      tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled · 18a3a3a1
      Ivan Vecera authored
      commit ba67b510 upstream.
      
      The patch fixes a problem with dropped jumbo frames after usage of
      'ethtool -G ... rx'.
      
      Scenario:
      1. ip link set eth0 up
      2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo
      3. ip link set mtu 9000 dev eth0
      
      The ethtool command set rx_jumbo_pending to zero so any received jumbo
      packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N'
      to workaround the issue.
      The patch changes the logic so rx_jumbo_pending value is changed only if
      jumbo frames are enabled (MTU > 1500).
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18a3a3a1
    • dingtianhong's avatar
      vlan: Fix lockdep warning when vlan dev handle notification · 55087f4c
      dingtianhong authored
      [ Upstream commit dc8eaaa0 ]
      
      When I open the LOCKDEP config and run these steps:
      
      modprobe 8021q
      vconfig add eth2 20
      vconfig add eth2.20 30
      ifconfig eth2 xx.xx.xx.xx
      
      then the Call Trace happened:
      
      [32524.386288] =============================================
      [32524.386293] [ INFO: possible recursive locking detected ]
      [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G           O
      [32524.386302] ---------------------------------------------
      [32524.386306] ifconfig/3103 is trying to acquire lock:
      [32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386326]
      [32524.386326] but task is already holding lock:
      [32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386341]
      [32524.386341] other info that might help us debug this:
      [32524.386345]  Possible unsafe locking scenario:
      [32524.386345]
      [32524.386350]        CPU0
      [32524.386352]        ----
      [32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
      [32524.386364]
      [32524.386364]  *** DEADLOCK ***
      [32524.386364]
      [32524.386368]  May be due to missing lock nesting notation
      [32524.386368]
      [32524.386373] 2 locks held by ifconfig/3103:
      [32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
      [32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
      [32524.386398]
      [32524.386398] stack backtrace:
      [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ #35
      [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
      [32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
      [32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
      [32524.386435] Call Trace:
      [32524.386441]  [<ffffffff814f68a2>] dump_stack+0x6a/0x78
      [32524.386448]  [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
      [32524.386454]  [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
      [32524.386459]  [<ffffffff810a4874>] lock_acquire+0xe4/0x110
      [32524.386464]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386471]  [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
      [32524.386476]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
      [32524.386481]  [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
      [32524.386489]  [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
      [32524.386495]  [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
      [32524.386500]  [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
      [32524.386506]  [<ffffffff8141b3cf>] __dev_open+0xef/0x150
      [32524.386511]  [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
      [32524.386516]  [<ffffffff8141b292>] dev_change_flags+0x32/0x80
      [32524.386524]  [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
      [32524.386532]  [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
      [32524.386540]  [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
      [32524.386550]  [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
      [32524.386558]  [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
      [32524.386568]  [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
      [32524.386578]  [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
      [32524.386586]  [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
      [32524.386594]  [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
      [32524.386604]  [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b
      
      ========================================================================
      
      The reason is that all of the addr_lock_key for vlan dev have the same class,
      so if we change the status for vlan dev, the vlan dev and its real dev will
      hold the same class of addr_lock_key together, so the warning happened.
      
      we should distinguish the lock depth for vlan dev and its real dev.
      
      v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
      	could support to add 8 vlan id on a same vlan dev, I think it is enough for current
      	scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
      	and the vlan dev would not meet the same class key with its real dev.
      
      	The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
      	dev could get a suitable class key.
      
      v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
      	and its real dev, but it make no sense, because the difference for subclass in the
      	lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
      	to distinguish the different depth for every vlan dev, the same depth of the vlan dev
      	could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
      	I think it is enough here, the lockdep should never exceed that value.
      
      v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
      	we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
      	and use the depth as the subclass for addr_lock_key.
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55087f4c
    • Nicolas Dichtel's avatar
      ip6_gre: don't allow to remove the fb_tunnel_dev · 62aee6a4
      Nicolas Dichtel authored
      [ Upstream commit 54d63f78 ]
      
      It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
      this is unsafe, the module always supposes that this device exists. For example,
      ip6gre_tunnel_lookup() may use it unconditionally.
      
      Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
      let ip6gre_destroy_tunnels() do the job).
      
      Introduced by commit c12b395a ("gre: Support GRE over IPv6").
      
      CC: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      62aee6a4
    • Mathias Krause's avatar
      filter: prevent nla extensions to peek beyond the end of the message · 9b687f2a
      Mathias Krause authored
      [ Upstream commit 05ab8f26 ]
      
      The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
      for a minimal message length before testing the supplied offset to be
      within the bounds of the message. This allows the subtraction of the nla
      header to underflow and therefore -- as the data type is unsigned --
      allowing far to big offset and length values for the search of the
      netlink attribute.
      
      The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
      also wrong. It has the minuend and subtrahend mixed up, therefore
      calculates a huge length value, allowing to overrun the end of the
      message while looking for the netlink attribute.
      
      The following three BPF snippets will trigger the bugs when attached to
      a UNIX datagram socket and parsing a message with length 1, 2 or 3.
      
       ,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nla
       | ret	a
       `---
      
       ,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
       | ld	#0x87654321
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
       ,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
       | ; (needs a fake netlink header at offset 0)
       | ld	#0
       | ldx	#42
       | ld	#nlan
       | ret	a
       `---
      
      Fix the first issue by ensuring the message length fulfills the minimal
      size constrains of a nla header. Fix the second bug by getting the math
      for the remainder calculation right.
      
      Fixes: 4738c1db ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
      Fixes: d214c753 ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9b687f2a
    • Julian Anastasov's avatar
      ipv4: return valid RTA_IIF on ip route get · bf063234
      Julian Anastasov authored
      [ Upstream commit 91146153 ]
      
      Extend commit 13378cad
      ("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
      RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
      which is displayed as 'iif *'.
      
      inet_iif is not appropriate to use because skb_iif is not set.
      Use the skb->dev->ifindex instead.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf063234
    • Wang, Xiaoming's avatar
      net: ipv4: current group_info should be put after using. · eb80dee7
      Wang, Xiaoming authored
      [ Upstream commit b04c4619 ]
      
      Plug a group_info refcount leak in ping_init.
      group_info is only needed during initialization and
      the code failed to release the reference on exit.
      While here move grabbing the reference to a place
      where it is actually needed.
      Signed-off-by: default avatarChuansheng Liu <chuansheng.liu@intel.com>
      Signed-off-by: default avatarZhang Dongxing <dongxing.zhang@intel.com>
      Signed-off-by: default avatarxiaoming wang <xiaoming.wang@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      eb80dee7
  2. 13 Jun, 2014 15 commits
  3. 12 Jun, 2014 13 commits
    • Miao Xie's avatar
      Btrfs: fix inode caching vs tree log · 90a397de
      Miao Xie authored
      commit 1c70d8fb upstream.
      
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      [ kamal: backport to 3.8-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      90a397de
    • Johan Hovold's avatar
      USB: serial: fix sysfs-attribute removal deadlock · 73f9ecd1
      Johan Hovold authored
      commit 10164c2a upstream.
      
      Fix driver new_id sysfs-attribute removal deadlock by making sure to
      not hold any locks that the attribute operations grab when removing the
      attribute.
      
      Specifically, usb_serial_deregister holds the table mutex when
      deregistering the driver, which includes removing the new_id attribute.
      This can lead to a deadlock as writing to new_id increments the
      attribute's active count before trying to grab the same mutex in
      usb_serial_probe.
      
      The deadlock can easily be triggered by inserting a sleep in
      usb_serial_deregister and writing the id of an unbound device to new_id
      during module unload.
      
      As the table mutex (in this case) is used to prevent subdriver unload
      during probe, it should be sufficient to only hold the lock while
      manipulating the usb-serial driver list during deregister. A racing
      probe will then either fail to find a matching subdriver or fail to get
      the corresponding module reference.
      
      Since v3.15-rc1 this also triggers the following lockdep warning:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc2 #123 Tainted: G        W
      -------------------------------------------------------
      modprobe/190 is trying to acquire lock:
       (s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
      
      but task is already holding lock:
       (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (table_lock){+.+.+.}:
             [<c0075f84>] __lock_acquire+0x1694/0x1ce4
             [<c0076de8>] lock_acquire+0xb4/0x154
             [<c03af3cc>] _raw_spin_lock+0x4c/0x5c
             [<c02bbc24>] usb_store_new_id+0x14c/0x1ac
             [<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
             [<c025f568>] drv_attr_store+0x30/0x3c
             [<c01690e0>] sysfs_kf_write+0x5c/0x60
             [<c01682c0>] kernfs_fop_write+0xd4/0x194
             [<c010881c>] vfs_write+0xbc/0x198
             [<c0108e4c>] SyS_write+0x4c/0xa0
             [<c000f880>] ret_fast_syscall+0x0/0x48
      
      -> #0 (s_active#4){++++.+}:
             [<c03a7a28>] print_circular_bug+0x68/0x2f8
             [<c0076218>] __lock_acquire+0x1928/0x1ce4
             [<c0076de8>] lock_acquire+0xb4/0x154
             [<c0166b70>] __kernfs_remove+0x254/0x310
             [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
             [<c0169fb8>] remove_files.isra.1+0x48/0x84
             [<c016a2fc>] sysfs_remove_group+0x58/0xac
             [<c016a414>] sysfs_remove_groups+0x34/0x44
             [<c02623b8>] driver_remove_groups+0x1c/0x20
             [<c0260e9c>] bus_remove_driver+0x3c/0xe4
             [<c026235c>] driver_unregister+0x38/0x58
             [<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
             [<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
             [<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
             [<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
             [<c009d6cc>] SyS_delete_module+0x184/0x210
             [<c000f880>] ret_fast_syscall+0x0/0x48
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(table_lock);
                                     lock(s_active#4);
                                     lock(table_lock);
        lock(s_active#4);
      
       *** DEADLOCK ***
      
      1 lock held by modprobe/190:
       #0:  (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
      
      stack backtrace:
      CPU: 0 PID: 190 Comm: modprobe Tainted: G        W     3.15.0-rc2 #123
      [<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
      [<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
      [<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
      [<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
      [<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
      [<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
      [<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
      [<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
      [<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
      [<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
      [<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
      [<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
      [<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
      [<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
      [<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
      [<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
      [<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
      [<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
      [<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      73f9ecd1
    • Liu Hua's avatar
      ARM: 8030/1: ARM : kdump : add arch_crash_save_vmcoreinfo · e24726d3
      Liu Hua authored
      commit 56b700fd upstream.
      
      For vmcore generated by LPAE enabled kernel, user space
      utility such as crash needs additional infomation to
      parse.
      
      So this patch add arch_crash_save_vmcoreinfo as what PAE enabled
      i386 linux does.
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLiu Hua <sdu.liu@huawei.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e24726d3
    • Xiangyu Lu's avatar
      ARM: 8027/1: fix do_div() bug in big-endian systems · f79031c4
      Xiangyu Lu authored
      commit 80bb3ef1 upstream.
      
      In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result.
      
      When viewing ftrace record in big-endian ARM systems, we found that
      the timestamp errors:
      
      swapper-0   [001] 1325.970000:   0:120:R ==> [001]    16:120:R events/1
      events/1-16 [001] 1325.970000:   16:120:S ==> [001]    0:120:R swapper
      swapper-0   [000] 1325.1000000:  0:120:R   + [000]    15:120:R events/0
      swapper-0   [000] 1325.1000000:  0:120:R ==> [000]    15:120:R events/0
      swapper-0   [000] 1326.030000:   0:120:R   + [000]  1150:120:R sshd
      swapper-0   [000] 1326.030000:   0:120:R ==> [000]  1150:120:R sshd
      
      When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2".
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarAlex Wu <wuquanming@huawei.com>
      Signed-off-by: default avatarXiangyu Lu <luxiangyu@huawei.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f79031c4
    • Linus Torvalds's avatar
      mm: make fixup_user_fault() check the vma access rights too · ab790124
      Linus Torvalds authored
      commit 1b17844b upstream.
      
      fixup_user_fault() is used by the futex code when the direct user access
      fails, and the futex code wants it to either map in the page in a usable
      form or return an error.  It relied on handle_mm_fault() to map the
      page, and correctly checked the error return from that, but while that
      does map the page, it doesn't actually guarantee that the page will be
      mapped with sufficient permissions to be then accessed.
      
      So do the appropriate tests of the vma access rights by hand.
      
      [ Side note: arguably handle_mm_fault() could just do that itself, but
        we have traditionally done it in the caller, because some callers -
        notably get_user_pages() - have been able to access pages even when
        they are mapped with PROT_NONE.  Maybe we should re-visit that design
        decision, but in the meantime this is the minimal patch. ]
      
      Found by Dave Jones running his trinity tool.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ab790124
    • Alex Deucher's avatar
      drm/radeon: fix ATPX detection on non-VGA GPUs · 0428c851
      Alex Deucher authored
      commit e9a4099a upstream.
      
      Some newer PX laptops have the pci device class
      set to DISPLAY_OTHER rather than DISPLAY_VGA.  This
      properly detects ATPX on those laptops.
      
      Based on a patch from: Pali Rohár <pali.rohar@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: airlied@gmail.com
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0428c851
    • Hans de Goede's avatar
      Input: synaptics - add min/max quirk for ThinkPad T431s, L440, L540, S1 Yoga and X1 · 8d412189
      Hans de Goede authored
      commit 46a2986e upstream.
      
      We expect that all the Haswell series will need such quirks, sigh.
      
      The T431s seems to be T430 hardware in a T440s case, using the T440s touchpad,
      with the same min/max issue.
      
      The X1 Carbon 3rd generation name says 2nd while it is a 3rd generation.
      
      The X1 and T431s share a PnPID with the T540p, but the reported ranges are
      closer to those of the T440s.
      
      HdG: Squashed 5 quirk patches into one. T431s + L440 + L540 are written by me,
      S1 Yoga and X1 are written by Benjamin Tissoires.
      
      Hdg: Standardized S1 Yoga and X1 values, Yoga uses the same touchpad as the
      X240, X1 uses the same touchpad as the T440.
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8d412189
    • Dan Williams's avatar
      libata/ahci: accommodate tag ordered controllers · 21c6ff1b
      Dan Williams authored
      commit 8a4aeec8 upstream.
      
      The AHCI spec allows implementations to issue commands in tag order
      rather than FIFO order:
      
      	5.3.2.12 P:SelectCmd
      	HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
      	or HBA selects the command to issue that has had the
      	PxCI bit set to '1' longer than any other command
      	pending to be issued.
      
      The result is that commands posted sequentially (time-wise) may play out
      of sequence when issued by hardware.
      
      This behavior has likely been hidden by drives that arrange for commands
      to complete in issue order.  However, it appears recent drives (two from
      different vendors that we have found so far) inflict out-of-order
      completions as a matter of course.  So, we need to take care to maintain
      ordered submission, otherwise we risk triggering a drive to fall out of
      sequential-io automation and back to random-io processing, which incurs
      large latency and degrades throughput.
      
      This issue was found in simple benchmarks where QD=2 seq-write
      performance was 30-50% *greater* than QD=32 seq-write performance.
      
      Tagging for -stable and making the change globally since it has a low
      risk-to-reward ratio.  Also, word is that recent versions of an unnamed
      OS also does it this way now.  So, drives in the field are already
      experienced with this tag ordering scheme.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
      Reviewed-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      21c6ff1b
    • Jeff Layton's avatar
      nfsd: set timeparms.to_maxval in setup_callback_client · e44dbc8c
      Jeff Layton authored
      commit 3758cf7e upstream.
      
      ...otherwise the logic in the timeout handling doesn't work correctly.
      Spotted-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e44dbc8c
    • Thomas Gleixner's avatar
      genirq: Allow forcing cpu affinity of interrupts · d8e6babd
      Thomas Gleixner authored
      commit 01f8fa4f upstream.
      
      The current implementation of irq_set_affinity() refuses rightfully to
      route an interrupt to an offline cpu.
      
      But there is a special case, where this is actually desired. Some of
      the ARM SoCs have per cpu timers which require setting the affinity
      during cpu startup where the cpu is not yet in the online mask.
      
      If we can't do that, then the local timer interrupt for the about to
      become online cpu is routed to some random online cpu.
      
      The developers of the affected machines tried to work around that
      issue, but that results in a massive mess in that timer code.
      
      We have a yet unused argument in the set_affinity callbacks of the irq
      chips, which I added back then for a similar reason. It was never
      required so it got not used. But I'm happy that I never removed it.
      
      That allows us to implement a sane handling of the above scenario. So
      the affected SoC drivers can add the required force handling to their
      interrupt chip, switch the timer code to irq_force_affinity() and
      things just work.
      
      This does not affect any existing user of irq_set_affinity().
      
      Tagged for stable to allow a simple fix of the affected SoC clock
      event drivers.
      Reported-and-tested-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Tomasz Figa <t.figa@samsung.com>,
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
      Cc: Kukjin Kim <kgene.kim@samsung.com>
      Cc: linux-arm-kernel@lists.infradead.org,
      Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d8e6babd
    • Jeff Layton's avatar
      locks: allow __break_lease to sleep even when break_time is 0 · d69bf9da
      Jeff Layton authored
      commit f1c6bb2c upstream.
      
      A fl->fl_break_time of 0 has a special meaning to the lease break code
      that basically means "never break the lease". knfsd uses this to ensure
      that leases don't disappear out from under it.
      
      Unfortunately, the code in __break_lease can end up passing this value
      to wait_event_interruptible as a timeout, which prevents it from going
      to sleep at all. This makes __break_lease to spin in a tight loop and
      causes soft lockups.
      
      Fix this by ensuring that we pass a minimum value of 1 as a timeout
      instead.
      
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Reported-by: default avatarTerry Barnaby <terry1@beam.ltd.uk>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d69bf9da
    • Theodore Ts'o's avatar
      ext4: use i_size_read in ext4_unaligned_aio() · b8299364
      Theodore Ts'o authored
      commit 6e6358fc upstream.
      
      We haven't taken i_mutex yet, so we need to use i_size_read().
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b8299364
    • Jan Kara's avatar
      ext4: fix jbd2 warning under heavy xattr load · 02ff3c9c
      Jan Kara authored
      commit ec4cb1aa upstream.
      
      When heavily exercising xattr code the assertion that
      jbd2_journal_dirty_metadata() shouldn't return error was triggered:
      
      WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
      jbd2_journal_dirty_metadata+0x1ba/0x260()
      
      CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
      Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
       ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
       ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
       0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
      Call Trace:
       [<ffffffff816311b0>] dump_stack+0x19/0x1b
       [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
       [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
       [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
       [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
       [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
       [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
       [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
       [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
       [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
       [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
       [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
       [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
       [<ffffffff8119421e>] setxattr+0x13e/0x1e0
       [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
       [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
       [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
      
      The reason for the warning is that buffer_head passed into
      jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
      caused by the following race of two ext4_xattr_release_block() calls:
      
      CPU1                                CPU2
      ext4_xattr_release_block()          ext4_xattr_release_block()
      lock_buffer(bh);
      /* False */
      if (BHDR(bh)->h_refcount == cpu_to_le32(1))
      } else {
        le32_add_cpu(&BHDR(bh)->h_refcount, -1);
        unlock_buffer(bh);
                                          lock_buffer(bh);
                                          /* True */
                                          if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                            get_bh(bh);
                                            ext4_free_blocks()
                                              ...
                                              jbd2_journal_forget()
                                                jbd2_journal_unfile_buffer()
                                                -> JH is gone
        error = ext4_handle_dirty_xattr_block(handle, inode, bh);
        -> triggers the warning
      
      We fix the problem by moving ext4_handle_dirty_xattr_block() under the
      buffer lock. Sadly this cannot be done in nojournal mode as that
      function can call sync_dirty_buffer() which would deadlock. Luckily in
      nojournal mode the race is harmless (we only dirty already freed buffer)
      and thus for nojournal mode we leave the dirtying outside of the buffer
      lock.
      Reported-by: default avatarSage Weil <sage@inktank.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      02ff3c9c