1. 13 Mar, 2017 26 commits
  2. 11 Mar, 2017 2 commits
    • Marcelo Ricardo Leitner's avatar
      sctp: deny peeloff operation on asocs with threads sleeping on it · 8aee8e6c
      Marcelo Ricardo Leitner authored
      commit dfcb9f4f upstream.
      
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8aee8e6c
    • colyli@suse.de's avatar
      md linear: fix a race between linear_add() and linear_congested() · d93cf670
      colyli@suse.de authored
      commit 03a9e24e upstream.
      
      Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
      disk to a md linear device causes kernel crash at linear_congested(). From
      the crash image analysis, I find in linear_congested(), mddev->raid_disks
      contains value N, but conf->disks[] only has N-1 pointers available. Then
      a NULL pointer deference crashes the kernel.
      
      There is a race between linear_add() and linear_congested(), RCU stuffs
      used in these two functions cannot avoid the race. Since Linuv v4.0
      RCU code is replaced by introducing mddev_suspend().  After checking the
      upstream code, it seems linear_congested() is not called in
      generic_make_request() code patch, so mddev_suspend() cannot provent it
      from being called. The possible race still exists.
      
      Here I explain how the race still exists in current code.  For a machine
      has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
      md linear device; at the same time on other CPU, linear_congested() is
      called to detect whether this md linear device is congested before issuing
      an I/O request onto it.
      
      Now I use a possible code execution time sequence to demo how the possible
      race happens,
      
      seq    linear_add()                linear_congested()
       0                                 conf=mddev->private
       1   oldconf=mddev->private
       2   mddev->raid_disks++
       3                              for (i=0; i<mddev->raid_disks;i++)
       4                                bdev_get_queue(conf->disks[i].rdev->bdev)
       5   mddev->private=newconf
      
      In linear_add() mddev->raid_disks is increased in time seq 2, and on
      another CPU in linear_congested() the for-loop iterates conf->disks[i] by
      the increased mddev->raid_disks in time seq 3,4. But conf with one more
      element (which is a pointer to struct dev_info type) to conf->disks[] is
      not updated yet, accessing its structure member in time seq 4 will cause a
      NULL pointer deference fault.
      
      To fix this race, there are 2 parts of modification in the patch,
       1) Add 'int raid_disks' in struct linear_conf, as a copy of
          mddev->raid_disks. It is initialized in linear_conf(), always being
          consistent with pointers number of 'struct dev_info disks[]'. When
          iterating conf->disks[] in linear_congested(), use conf->raid_disks to
          replace mddev->raid_disks in the for-loop, then NULL pointer deference
          will not happen again.
       2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
          free oldconf memory. Because oldconf may be referenced as mddev->private
          in linear_congested(), kfree_rcu() makes sure that its memory will not
          be released until no one uses it any more.
      Also some code comments are added in this patch, to make this modification
      to be easier understandable.
      
      This patch can be applied for kernels since v4.0 after commit:
      3be260cc ("md/linear: remove rcu protections in favour of
      suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
      people who maintain kernels before Linux v4.0, they need to do some back
      back port to this patch.
      
      Changelog:
       - V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
             replace rcu_call() in linear_add().
       - v2: add RCU stuffs by suggestion from Shaohua and Neil.
       - v1: initial effort.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Neil Brown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d93cf670
  3. 06 Mar, 2017 1 commit
  4. 03 Mar, 2017 11 commits
    • Johan Hovold's avatar
      USB: cdc-acm: fix failed open not being detected · e6b7fdc0
      Johan Hovold authored
      commit 8727bf68 upstream.
      
      Fix errors during open not being returned to userspace. Specifically,
      failed control-line manipulations or control or read urb submissions
      would not be detected.
      
      Fixes: 7fb57a01 ("USB: cdc-acm: Fix potential deadlock (lockdep
      warning)")
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e6b7fdc0
    • Johan Hovold's avatar
      USB: cdc-acm: fix open and suspend race · 0bb6506d
      Johan Hovold authored
      commit 703df329 upstream.
      
      We must not do the usb_autopm_put_interface() before submitting the read
      urbs or we might end up doing I/O to a suspended device.
      
      Fixes: 088c64f8 ("USB: cdc-acm: re-write read processing")
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0bb6506d
    • Alexey Khoroshilov's avatar
      USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate() · bdd76f45
      Alexey Khoroshilov authored
      commit 070c0b17 upstream.
      
      If acm_submit_read_urbs() fails in acm_port_activate(), error handling
      code calls usb_autopm_put_interface() while it is already called
      before acm_submit_read_urbs(). The patch reorganizes error handling code
      to avoid double decrement of USB interface's PM-usage counter.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: default avatarOliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bdd76f45
    • Heiko Carstens's avatar
      net: filter: s390: fix JIT address randomization · 5cb402ee
      Heiko Carstens authored
      commit e84d2f8d upstream.
      
      This is the s390 variant of Alexei's JIT bug fix.
      (patch description below stolen from Alexei's patch)
      
      bpf_alloc_binary() adds 128 bytes of room to JITed program image
      and rounds it up to the nearest page size. If image size is close
      to page size (like 4000), it is rounded to two pages:
      round_up(4000 + 4 + 128) == 8192
      then 'hole' is computed as 8192 - (4000 + 4) = 4188
      If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
      then kernel will crash during bpf_jit_free():
      
      kernel BUG at arch/x86/mm/pageattr.c:887!
      Call Trace:
       [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
       [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
       [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
       [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
       [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
       [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
       [<ffffffff8106c90c>] worker_thread+0x11c/0x370
      
      since bpf_jit_free() does:
        unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
        struct bpf_binary_header *header = (void *)addr;
      to compute start address of 'bpf_binary_header'
      and header->pages will pass junk to:
        set_memory_rw(addr, header->pages);
      
      Fix it by making sure that &header->image[prandom_u32() % hole] and &header
      are in the same page.
      
      Fixes: aa2d2c73 ("s390/bpf,jit: address randomize and write protect jit code")
      Reported-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5cb402ee
    • Alexei Starovoitov's avatar
      net: filter: x86: fix JIT address randomization · ba6f34fe
      Alexei Starovoitov authored
      commit 773cd38f upstream.
      
      bpf_alloc_binary() adds 128 bytes of room to JITed program image
      and rounds it up to the nearest page size. If image size is close
      to page size (like 4000), it is rounded to two pages:
      round_up(4000 + 4 + 128) == 8192
      then 'hole' is computed as 8192 - (4000 + 4) = 4188
      If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
      then kernel will crash during bpf_jit_free():
      
      kernel BUG at arch/x86/mm/pageattr.c:887!
      Call Trace:
       [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
       [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
       [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
       [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
       [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
       [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
       [<ffffffff8106c90c>] worker_thread+0x11c/0x370
      
      since bpf_jit_free() does:
        unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
        struct bpf_binary_header *header = (void *)addr;
      to compute start address of 'bpf_binary_header'
      and header->pages will pass junk to:
        set_memory_rw(addr, header->pages);
      
      Fix it by making sure that &header->image[prandom_u32() % hole] and &header
      are in the same page
      
      Fixes: 314beb9b ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ba6f34fe
    • Claudiu Manoil's avatar
      gianfar: Check if phydev present on ethtool -A · b24cd48c
      Claudiu Manoil authored
      commit 98a46d46 upstream.
      
      This fixes a seg fault on 'ethtool -A' entry if the
      interface is down.  Obviously we need to have the
      phy device initialized / "connected" (see of_phy_connect())
      to be able to advertise pause frame capabilities.
      
      Fixes: 23402bddSigned-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b24cd48c
    • Thomas Petazzoni's avatar
      PCI: mvebu: split PCIe BARs into multiple MBus windows when needed · 42e99f1c
      Thomas Petazzoni authored
      commit 398f5d5e upstream.
      
      MBus windows are used on Marvell platforms to map certain peripherals
      in the physical address space. In the PCIe context, MBus windows are
      needed to map PCIe I/O and memory regions in the physical address.
      
      However, those MBus windows can only have power of two sizes, while
      PCIe BAR do not necessarily guarantee this. For this reason, the
      current pci-mvebu breaks on platforms where PCIe devices have BARs
      that don't sum up to a power of two size at the emulated bridge level.
      
      This commit fixes this by allowing the pci-mvebu driver to create
      multiple contiguous MBus windows (each having a power of two size) to
      cover a given PCIe BAR.
      
      To achieve this, two functions are added: mvebu_pcie_add_windows() and
      mvebu_pcie_del_windows() to respectively add and remove all the MBus
      windows that are needed to map the provided PCIe region base and
      size. The emulated PCI bridge code now calls those functions, instead
      of directly calling the mvebu-mbus driver functions.
      
      Fixes: 45361a4f ('pci: PCIe driver for Marvell Armada 370/XP systems')
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Link: https://lkml.kernel.org/r/1397823593-1932-8-git-send-email-thomas.petazzoni@free-electrons.comTested-by: default avatarNeil Greatorex <neil@fatboyfat.co.uk>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      42e99f1c
    • Jingoo Han's avatar
      PCI: mvebu: Use max_t() instead of max(resource_size_t,) · 9921463b
      Jingoo Han authored
      commit 06489002 upstream.
      
      Use max_t() instead of max(resource_size_t,) in order to fix
      the following checkpatch warning.
      
        WARNING: max() should probably be max_t(resource_size_t, SZ_64K, size)
        WARNING: max() should probably be max_t(resource_size_t, SZ_1M, size)
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9921463b
    • Steffen Klassert's avatar
      vti4: Don't count header length twice. · ca5fbde4
      Steffen Klassert authored
      commit a3245236 upstream.
      
      We currently count the size of LL_MAX_HEADER and struct iphdr
      twice for vti4 devices, this leads to a wrong device mtu.
      The size of LL_MAX_HEADER and struct iphdr is already counted in
      ip_tunnel_bind_dev(), so don't do it again in vti_tunnel_init().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ca5fbde4
    • Daniel Borkmann's avatar
      net: sctp: rework multihoming retransmission path selection to rfc4960 · d2c2bdcd
      Daniel Borkmann authored
      commit 4c47af4d upstream.
      
      Problem statement: 1) both paths (primary path1 and alternate
      path2) are up after the association has been established i.e.,
      HB packets are normally exchanged, 2) path2 gets inactive after
      path_max_retrans * max_rto timed out (i.e. path2 is down completely),
      3) now, if a transmission times out on the only surviving/active
      path1 (any ~1sec network service impact could cause this like
      a channel bonding failover), then the retransmitted packets are
      sent over the inactive path2; this happens with partial failover
      and without it.
      
      Besides not being optimal in the above scenario, a small failure
      or timeout in the only existing path has the potential to cause
      long delays in the retransmission (depending on RTO_MAX) until
      the still active path is reselected. Further, when the T3-timeout
      occurs, we have active_patch == retrans_path, and even though the
      timeout occurred on the initial transmission of data, not a
      retransmit, we end up updating retransmit path.
      
      RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
      6.4.1. "Failover from an Inactive Destination Address" the
      following:
      
        Some of the transport addresses of a multi-homed SCTP endpoint
        may become inactive due to either the occurrence of certain
        error conditions (see Section 8.2) or adjustments from the
        SCTP user.
      
        When there is outbound data to send and the primary path
        becomes inactive (e.g., due to failures), or where the SCTP
        user explicitly requests to send data to an inactive
        destination transport address, before reporting an error to
        its ULP, the SCTP endpoint should try to send the data to an
        alternate __active__ destination transport address if one
        exists.
      
        When retransmitting data that timed out, if the endpoint is
        multihomed, it should consider each source-destination address
        pair in its retransmission selection policy. When retransmitting
        timed-out data, the endpoint should attempt to pick the most
        divergent source-destination pair from the original
        source-destination pair to which the packet was transmitted.
      
        Note: Rules for picking the most divergent source-destination
        pair are an implementation decision and are not specified
        within this document.
      
      So, we should first reconsider to take the current active
      retransmission transport if we cannot find an alternative
      active one. If all of that fails, we can still round robin
      through unkown, partial failover, and inactive ones in the
      hope to find something still suitable.
      
      Commit 4141ddc0 ("sctp: retran_path update bug fix") broke
      that behaviour by selecting the next inactive transport when
      no other active transport was found besides the current assoc's
      peer.retran_path. Before commit 4141ddc0, we would have
      traversed through the list until we reach our peer.retran_path
      again, and in case that is still in state SCTP_ACTIVE, we would
      take it and return. Only if that is not the case either, we
      take the next inactive transport.
      
      Besides all that, another issue is that transports in state
      SCTP_UNKNOWN could be preferred over transports in state
      SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
      SCTP_UNKNOWN in the transport list yielding a weaker transport
      state to be used in retransmission.
      
      This patch mostly reverts 4141ddc0, but also rewrites
      this function to introduce more clarity and strictness into
      the code. A strict priority of transport states is enforced
      in this patch, hence selection is active > unkown > partial
      failover > inactive.
      
      Fixes: 4141ddc0 ("sctp: retran_path update bug fix")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: default avatarVlad Yasevich <yasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d2c2bdcd
    • Hannes Frederic Sowa's avatar
      ipv6: simplify detection of first operational link-local address on interface · 6a484057
      Hannes Frederic Sowa authored
      commit 11ffff75 upstream.
      
      In commit 1ec047eb ("ipv6: introduce per-interface counter for
      dad-completed ipv6 addresses") I build the detection of the first
      operational link-local address much to complex. Additionally this code
      now has a race condition.
      
      Replace it with a much simpler variant, which just scans the address
      list when duplicate address detection completes, to check if this is
      the first valid link local address and send RS and MLD reports then.
      
      Fixes: 1ec047eb ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
      Reported-by: default avatarJiri Pirko <jiri@resnulli.us>
      Cc: Flavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarFlavio Leitner <fbl@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6a484057