1. 24 May, 2015 37 commits
    • Eric Dumazet's avatar
      tcp: avoid looping in tcp_send_fin() · f944afb2
      Eric Dumazet authored
      [ Upstream commit 845704a5 ]
      
      Presence of an unbound loop in tcp_send_fin() had always been hard
      to explain when analyzing crash dumps involving gigantic dying processes
      with millions of sockets.
      
      Lets try a different strategy :
      
      In case of memory pressure, try to add the FIN flag to last packet
      in write queue, even if packet was already sent. TCP stack will
      be able to deliver this FIN after a timeout event. Note that this
      FIN being delivered by a retransmit, it also carries a Push flag
      given our current implementation.
      
      By checking sk_under_memory_pressure(), we anticipate that cooking
      many FIN packets might deplete tcp memory.
      
      In the case we could not allocate a packet, even with __GFP_WAIT
      allocation, then not sending a FIN seems quite reasonable if it allows
      to get rid of this socket, free memory, and not block the process from
      eventually doing other useful work.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Drop inapplicable change to sk_forced_wmem_schedule()
       - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 82241580)
      [wt: backported to 2.6.32: s/TCPHDR_FIN/TCPCB_FLAG_FIN/]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f944afb2
    • Sebastian Phn's avatar
      ip_forward: Drop frames with attached skb->sk · b19feb6e
      Sebastian Phn authored
      [ Upstream commit 2ab95749 ]
      
      Initial discussion was:
      [FYI] xfrm: Don't lookup sk_policy for timewait sockets
      
      Forwarded frames should not have a socket attached. Especially
      tw sockets will lead to panics later-on in the stack.
      
      This was observed with TPROXY assigning a tw socket and broken
      policy routing (misconfigured). As a result frame enters
      forwarding path instead of input. We cannot solve this in
      TPROXY as it cannot know that policy routing is broken.
      
      v2:
      Remove useless comment
      Signed-off-by: default avatarSebastian Poehn <sebastian.poehn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit fccb908d)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b19feb6e
    • Eric Dumazet's avatar
      tcp: make connect() mem charging friendly · b49fbe0a
      Eric Dumazet authored
      [ Upstream commit 355a901e ]
      
      While working on sk_forward_alloc problems reported by Denys
      Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
      sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
      sk_forward_alloc is negative while connect is in progress.
      
      We can fix this by calling regular sk_stream_alloc_skb() both for the
      SYN packet (in tcp_connect()) and the syn_data packet in
      tcp_send_syn_data()
      
      Then, tcp_send_syn_data() can avoid copying syn_data as we simply
      can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
      
      Instead of open coding memcpy_fromiovecend(), simply use this helper.
      
      This leaves in socket write queue clean fast clone skbs.
      
      This was tested against our fastopen packetdrill tests.
      Reported-by: default avatarDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Drop the Fast Open changes
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3e2eb894)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b49fbe0a
    • Al Viro's avatar
      rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() · 876846f7
      Al Viro authored
      [ Upstream commit 7d985ed1 ]
      
      [I would really like an ACK on that one from dhowells; it appears to be
      quite straightforward, but...]
      
      MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
      fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
      in there.  It gets passed via flags; in fact, another such check in the same
      function is done correctly - as flags & MSG_PEEK.
      
      It had been that way (effectively disabled) for 8 years, though, so the patch
      needs beating up - that case had never been tested.  If it is correct, it's
      -stable fodder.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 10c82cd7)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      876846f7
    • Arnd Bergmann's avatar
      rds: avoid potential stack overflow · 71372d0e
      Arnd Bergmann authored
      [ Upstream commit f862e07c ]
      
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3fe2d645)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      71372d0e
    • Alexey Kodanev's avatar
      net: sysctl_net_core: check SNDBUF and RCVBUF for min length · c5424cf0
      Alexey Kodanev authored
      [ Upstream commit b1cb59cf ]
      
      sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
      set to incorrect values. Given that 'struct sk_buff' allocates from
      rcvbuf, incorrectly set buffer length could result to memory
      allocation failures. For example, set them as follows:
      
          # sysctl net.core.rmem_default=64
            net.core.wmem_default = 64
          # sysctl net.core.wmem_default=64
            net.core.wmem_default = 64
          # ping localhost -s 1024 -i 0 > /dev/null
      
      This could result to the following failure:
      
      skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
      head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
      kernel BUG at net/core/skbuff.c:102!
      invalid opcode: 0000 [#1] SMP
      ...
      task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
      RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
      RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
      RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
      RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
      FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
      Stack:
       ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
       ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
       ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
      Call Trace:
       [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
       [<ffffffff81556f56>] sock_write_iter+0x146/0x160
       [<ffffffff811d9612>] new_sync_write+0x92/0xd0
       [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
       [<ffffffff811da499>] SyS_write+0x59/0xd0
       [<ffffffff81651532>] system_call_fastpath+0x12/0x17
      Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
            00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
            eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
      RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP <ffff88003ae8bc68>
      Kernel panic - not syncing: Fatal exception
      
      Moreover, the possible minimum is 1, so we can get another kernel panic:
      ...
      BUG: unable to handle kernel paging request at ffff88013caee5c0
      IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
      ...
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: delete now-unused 'one' variable]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 2d6dfb10)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c5424cf0
    • bingtian.ly@taobao.com's avatar
      net: avoid to hang up on sending due to sysctl configuration overflow. · 0f8a4ca1
      bingtian.ly@taobao.com authored
      commit cdda8891 upstream.
      
          I found if we write a larger than 4GB value to some sysctl
      variables, the sending syscall will hang up forever, because these
      variables are 32 bits, such large values make them overflow to 0 or
      negative.
      
          This patch try to fix overflow or prevent from zero value setup
      of below sysctl variables:
      
      net.core.wmem_default
      net.core.rmem_default
      
      net.core.rmem_max
      net.core.wmem_max
      
      net.ipv4.udp_rmem_min
      net.ipv4.udp_wmem_min
      
      net.ipv4.tcp_wmem
      net.ipv4.tcp_rmem
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLi Yu <raise.sail@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2:
       - Adjust context
       - Delete now-unused 'zero' variable]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 98eee187)
      [wt: backported to 2.6.32: set strategy to sysctl_intvec where relevant]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0f8a4ca1
    • Michal Kubeček's avatar
      udp: only allow UFO for packets from SOCK_DGRAM sockets · 65f26669
      Michal Kubeček authored
      [ Upstream commit acf8dd0a ]
      
      If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
      UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
      CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
      checksum is to be computed on segmentation. However, in this case,
      skb->csum_start and skb->csum_offset are never set as raw socket
      transmit path bypasses udp_send_skb() where they are usually set. As a
      result, driver may access invalid memory when trying to calculate the
      checksum and store the result (as observed in virtio_net driver).
      
      Moreover, the very idea of modifying the userspace provided UDP header
      is IMHO against raw socket semantics (I wasn't able to find a document
      clearly stating this or the opposite, though). And while allowing
      CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
      too intrusive change just to handle a corner case like this. Therefore
      disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 332640b2)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      65f26669
    • Steffen Klassert's avatar
      ipv4: Don't use ufo handling on later transformed packets · 417b2efd
      Steffen Klassert authored
      We might call ip_ufo_append_data() for packets that will be IPsec
      transformed later. This function should be used just for real
      udp packets. So we check for rt->dst.header_len which is only
      nonzero on IPsec handling and call ip_ufo_append_data() just
      if rt->dst.header_len is zero.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit c146066a)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      417b2efd
    • Matthew Thode's avatar
      net: reject creation of netdev names with colons · a7357ca5
      Matthew Thode authored
      [ Upstream commit a4176a93 ]
      
      colons are used as a separator in netdev device lookup in dev_ioctl.c
      
      Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
      Signed-off-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit d501ebeb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a7357ca5
    • Ignacy Gawędzki's avatar
      ematch: Fix auto-loading of ematch modules. · 20914ec4
      Ignacy Gawędzki authored
      [ Upstream commit 34eea79e ]
      
      In tcf_em_validate(), after calling request_module() to load the
      kind-specific module, set em->ops to NULL before returning -EAGAIN, so
      that module_put() is not called again by tcf_em_tree_destroy().
      Signed-off-by: default avatarIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 9405be73)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      20914ec4
    • Florian Westphal's avatar
      ppp: deflate: never return len larger than output buffer · 221956a2
      Florian Westphal authored
      [ Upstream commit e2a4800e ]
      
      When we've run out of space in the output buffer to store more data, we
      will call zlib_deflate with a NULL output buffer until we've consumed
      remaining input.
      
      When this happens, olen contains the size the output buffer would have
      consumed iff we'd have had enough room.
      
      This can later cause skb_over_panic when ppp_generic skb_put()s
      the returned length.
      Reported-by: default avatarIain Douglas <centos@1n6.org.uk>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 8bcd6442)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      221956a2
    • Ani Sinha's avatar
      net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland. · 964a5909
      Ani Sinha authored
      commit 6a2a2b3a upstream.
      
      Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
      msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
      value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
      break old binaries and any code for which there is no access to source code.
      To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.
      Signed-off-by: default avatarAni Sinha <ani@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit d29f1f53)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      964a5909
    • Jann Horn's avatar
      fs: take i_mutex during prepare_binprm for set[ug]id executables · 0c5d4221
      Jann Horn authored
      commit 8b01fc86 upstream.
      
      This prevents a race between chown() and execve(), where chowning a
      setuid-user binary to root would momentarily make the binary setuid
      root.
      
      This patch was mostly written by Linus Torvalds.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Drop the task_no_new_privs() and user namespace checks
       - Open-code file_inode()
       - s/READ_ONCE/ACCESS_ONCE/
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 470e517b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0c5d4221
    • D.S. Ljungmark's avatar
      ipv6: Don't reduce hop limit for an interface · daacd26b
      D.S. Ljungmark authored
      commit 6fd99094 upstream.
      
      A local route may have a lower hop_limit set than global routes do.
      
      RFC 3756, Section 4.2.7, "Parameter Spoofing"
      
      >   1.  The attacker includes a Current Hop Limit of one or another small
      >       number which the attacker knows will cause legitimate packets to
      >       be dropped before they reach their destination.
      
      >   As an example, one possible approach to mitigate this threat is to
      >   ignore very small hop limits.  The nodes could implement a
      >   configurable minimum hop limit, and ignore attempts to set it below
      >   said limit.
      Signed-off-by: default avatarD.S. Ljungmark <ljungmark@modio.se>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust ND_PRINTK() usage]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit f10f7d2a)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      daacd26b
    • Sasha Levin's avatar
      net: rds: use correct size for max unacked packets and bytes · 6246ff96
      Sasha Levin authored
      commit db27ebb1 upstream.
      
      Max unacked packets/bytes is an int while sizeof(long) was used in the
      sysctl table.
      
      This means that when they were getting read we'd also leak kernel memory
      to userspace along with the timeout values.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3760b67b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6246ff96
    • Sasha Levin's avatar
      net: llc: use correct size for sysctl timeout entries · 716fff2a
      Sasha Levin authored
      commit 6b8d9117 upstream.
      
      The timeout entries are sizeof(int) rather than sizeof(long), which
      means that when they were getting read we'd also leak kernel memory
      to userspace along with the timeout values.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 88fe14be)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      716fff2a
    • Shachar Raindel's avatar
      IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic · e402af7b
      Shachar Raindel authored
      commit 8494057a upstream.
      
      Properly verify that the resulting page aligned end address is larger
      than both the start address and the length of the memory area requested.
      
      Both the start and length arguments for ib_umem_get are controlled by
      the user. A misbehaving user can provide values which will cause an
      integer overflow when calculating the page aligned end address.
      
      This overflow can cause also miscalculation of the number of pages
      mapped, and additional logic issues.
      
      Addresses: CVE-2014-8159
      Signed-off-by: default avatarShachar Raindel <raindel@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 485f16b7)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e402af7b
    • Daniel Borkmann's avatar
      net: sctp: fix slab corruption from use after free on INIT collisions · 1b9143bd
      Daniel Borkmann authored
      commit 600ddd68 upstream
      
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1b9143bd
    • Daniel Borkmann's avatar
      net: sctp: fix memory leak in auth key management · f014e54c
      Daniel Borkmann authored
      commit 4184b2a7 upstream.
      
      A very minimal and simple user space application allocating an SCTP
      socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
      the socket again will leak the memory containing the authentication
      key from user space:
      
      unreferenced object 0xffff8800837047c0 (size 16):
        comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
        hex dump (first 16 bytes):
          01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
          [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
          [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
          [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
          [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
          [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
          [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This is bad because of two things, we can bring down a machine from
      user space when auth_enable=1, but also we would leave security sensitive
      keying material in memory without clearing it after use. The issue is
      that sctp_auth_create_key() already sets the refcount to 1, but after
      allocation sctp_auth_set_key() does an additional refcount on it, and
      thus leaving it around when we free the socket.
      
      Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3af10169)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f014e54c
    • Jan Kara's avatar
      isofs: Fix unchecked printing of ER records · 09b5d759
      Jan Kara authored
      commit 4e202462 upstream
      
      We didn't check length of rock ridge ER records before printing them.
      Thus corrupted isofs image can cause us to access and print some memory
      behind the buffer with obvious consequences.
      Reported-and-tested-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      09b5d759
    • Jan Kara's avatar
      isofs: Fix infinite looping over CE entries · 08313e26
      Jan Kara authored
      commit f54e18f1 upstream
      
      Rock Ridge extensions define so called Continuation Entries (CE) which
      define where is further space with Rock Ridge data. Corrupted isofs
      image can contain arbitrarily long chain of these, including a one
      containing loop and thus causing kernel to end in an infinite loop when
      traversing these entries.
      
      Limit the traversal to 32 entries which should be more than enough space
      to store all the Rock Ridge data.
      Reported-by: default avatarP J P <ppandit@redhat.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      08313e26
    • Florian Westphal's avatar
      netfilter: conntrack: disable generic tracking for known protocols · 8e421332
      Florian Westphal authored
      commit db29a950 upstream
      
      Given following iptables ruleset:
      
      -P FORWARD DROP
      -A FORWARD -m sctp --dport 9 -j ACCEPT
      -A FORWARD -p tcp --dport 80 -j ACCEPT
      -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
      
      One would assume that this allows SCTP on port 9 and TCP on port 80.
      Unfortunately, if the SCTP conntrack module is not loaded, this allows
      *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
      which we think is a security issue.
      
      This is because on the first SCTP packet on port 9, we create a dummy
      "generic l4" conntrack entry without any port information (since
      conntrack doesn't know how to extract this information).
      
      All subsequent packets that are unknown will then be in established
      state since they will fallback to proto_generic and will match the
      'generic' entry.
      
      Our originally proposed version [1] completely disabled generic protocol
      tracking, but Jozsef suggests to not track protocols for which a more
      suitable helper is available, hence we now mitigate the issue for in
      tree known ct protocol helpers only, so that at least NAT and direction
      information will still be preserved for others.
      
       [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
      
      Joint work with Daniel Borkmann.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      [bwh: Backported to 2.6.32: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8e421332
    • Ben Hutchings's avatar
      splice: Apply generic position and size checks to each write · 7e6536a2
      Ben Hutchings authored
      We need to check the position and size of file writes against various
      limits, using generic_write_check().  This was not being done for
      the splice write path.  It was fixed upstream by commit 8d020765
      ("->splice_write() via ->write_iter()") but we can't apply that.
      
      CVE-2014-7822
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7e6536a2
    • Robert Baldyga's avatar
      serial: samsung: wait for transfer completion before clock disable · 17f35338
      Robert Baldyga authored
      This patch adds waiting until transmit buffer and shifter will be empty
      before clock disabling.
      
      Without this fix it's possible to have clock disabled while data was
      not transmited yet, which causes unproper state of TX line and problems
      in following data transfers.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRobert Baldyga <r.baldyga@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit 1ff383a4)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      17f35338
    • Shai Fultheim's avatar
      x86: Conditionally update time when ack-ing pending irqs · d7522130
      Shai Fultheim authored
      commit 42fa4250 upstream.
      
      On virtual environments, apic_read could take a long time. As a
      result, under certain conditions the ack pending loop may exit
      without any queued irqs left, but after more than one second. A
      warning will be printed needlessly in this case.
      
      If the loop is about to exit regardless of max_loops, don't
      update it.
      Signed-off-by: default avatarShai Fultheim <shai@scalemp.com>
      [ rebased and reworded the commit message]
      Signed-off-by: default avatarIdo Yariv <ido@wizery.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit c9f1417b)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d7522130
    • Andy Lutomirski's avatar
      x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization · c069a4f2
      Andy Lutomirski authored
      commit 956421fb upstream.
      
      'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
      the related state make sense for 'ret_from_sys_call'.  This is
      entirely the wrong check.  TS_COMPAT would make a little more
      sense, but there's really no point in keeping this optimization
      at all.
      
      This fixes a return to the wrong user CS if we came from int
      0x80 in a 64-bit task.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
      [ Backported from tip:x86/asm. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 159891c0)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c069a4f2
    • Borislav Petkov's avatar
      x86, cpu, amd: Add workaround for family 16h, erratum 793 · ebcbe139
      Borislav Petkov authored
      commit 3b564968 upstream
      
      This adds the workaround for erratum 793 as a precaution in case not
      every BIOS implements it.  This addresses CVE-2013-6885.
      
      Erratum text:
      
      [Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
      document 51810 Rev. 3.04 November 2013]
      
      793 Specific Combination of Writes to Write Combined Memory Types and
      Locked Instructions May Cause Core Hang
      
      Description
      
      Under a highly specific and detailed set of internal timing
      conditions, a locked instruction may trigger a timing sequence whereby
      the write to a write combined memory type is not flushed, causing the
      locked instruction to stall indefinitely.
      
      Potential Effect on System
      
      Processor core hang.
      
      Suggested Workaround
      
      BIOS should set MSR
      C001_1020[15] = 1b.
      
      Fix Planned
      
      No fix planned
      
      [ hpa: updated description, fixed typo in MSR name ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnicTested-by: default avatarAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      [bwh: Backported to 3.2:
       - Adjust filename
       - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to
         avoid crashing on KVM.  This was fixed upstream by commit 8f86a737
         ("x86, AMD: Convert to the new bit access MSR accessors") but that's too
         much trouble to backport.  Here we must use {rd,wr}msrl_amd_safe().]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebcbe139
    • Hector Marco-Gisbert's avatar
      ASLR: fix stack randomization on 64-bit systems · ba455d81
      Hector Marco-Gisbert authored
      commit 4e7c22d4 upstream
      
      The issue is that the stack for processes is not properly randomized on 64 bit
      architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file "fs/binfmt_elf.c":
      
      static unsigned long randomize_stack_top(unsigned long stack_top)
      {
               unsigned int random_variable = 0;
      
               if ((current->flags & PF_RANDOMIZE) &&
                       !(current->personality & ADDR_NO_RANDOMIZE)) {
                       random_variable = get_random_int() & STACK_RND_MASK;
                       random_variable <<= PAGE_SHIFT;
               }
               return PAGE_ALIGN(stack_top) + random_variable;
               return PAGE_ALIGN(stack_top) - random_variable;
      }
      
      Note that, it declares the "random_variable" variable as "unsigned int". Since
      the result of the shifting operation between STACK_RND_MASK (which is
      0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold the
      (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One
      fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved in the
      operations in the functions randomize_stack_top() and stack_maxrandom_size().
      
      The successful fix can be tested with:
      $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
      7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
      7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
      7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
      7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
      ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff, rather
      than always being 7fff.
      
      CVE-2015-1593
      Signed-off-by: default avatarHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: default avatarIsmael Ripoll <iripoll@upv.es>
      [kees: rebase, fix 80 char, clean up commit message, add test example, cve]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ba455d81
    • Andy Lutomirski's avatar
      x86_64, vdso: Fix the vdso address randomization algorithm · 7f36f1ac
      Andy Lutomirski authored
      commit 394f56fe upstream
      
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      [bwh: Backported to 2.6.32:
       - The whole file is only built for x86_64; adjust context and comment for this
       - We don't have align_vdso_addr()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7f36f1ac
    • Andy Lutomirski's avatar
      x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit · b23c0b06
      Andy Lutomirski authored
      commit 29fa6825 upstream
      
      paravirt_enabled has the following effects:
      
       - Disables the F00F bug workaround warning.  There is no F00F bug
         workaround any more because Linux's standard IDT handling already
         works around the F00F bug, but the warning still exists.  This
         is only cosmetic, and, in any event, there is no such thing as
         KVM on a CPU with the F00F bug.
      
       - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
         there should be no APM BIOS anyway.
      
       - Disables tboot.  I think that the tboot code should check the
         CPUID hypervisor bit directly if it matters.
      
       - paravirt_enabled disables espfix32.  espfix32 should *not* be
         disabled under KVM paravirt.
      
      The last point is the purpose of this patch.  It fixes a leak of the
      high 16 bits of the kernel stack address on 32-bit KVM paravirt
      guests.  Fixes CVE-2014-8134.
      Suggested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [bwh: Backported to 2.6.32: adjust indentation, context]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b23c0b06
    • Andy Lutomirski's avatar
      x86/tls: Don't validate lm in set_thread_area() after all · c52fdba6
      Andy Lutomirski authored
      commit 3fb2f423 upstream.
      
      It turns out that there's a lurking ABI issue.  GCC, when
      compiling this in a 32-bit program:
      
      struct user_desc desc = {
      	.entry_number    = idx,
      	.base_addr       = base,
      	.limit           = 0xfffff,
      	.seg_32bit       = 1,
      	.contents        = 0, /* Data, grow-up */
      	.read_exec_only  = 0,
      	.limit_in_pages  = 1,
      	.seg_not_present = 0,
      	.useable         = 0,
      };
      
      will leave .lm uninitialized.  This means that anything in the
      kernel that reads user_desc.lm for 32-bit tasks is unreliable.
      
      Revert the .lm check in set_thread_area().  The value never did
      anything in the first place.
      
      Fixes: 0e58af4e ("x86/tls: Disallow unusual TLS segments")
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit c759a579)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c52fdba6
    • Andy Lutomirski's avatar
      x86/tls: Disallow unusual TLS segments · 18cb16aa
      Andy Lutomirski authored
      commit 0e58af4e upstream.
      
      Users have no business installing custom code segments into the
      GDT, and segments that are not present but are otherwise valid
      are a historical source of interesting attacks.
      
      For completeness, block attempts to set the L bit.  (Prior to
      this patch, the L bit would have been silently dropped.)
      
      This is an ABI break.  I've checked glibc, musl, and Wine, and
      none of them look like they'll have any trouble.
      
      Note to stable maintainers: this is a hardening patch that fixes
      no known bugs.  Given the possibility of ABI issues, this
      probably shouldn't be backported quickly.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit fbc3c534)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      18cb16aa
    • Andy Lutomirski's avatar
      x86, tls: Interpret an all-zero struct user_desc as "no segment" · 1f50d3c7
      Andy Lutomirski authored
      commit 3669ef9f upstream.
      
      The Witcher 2 did something like this to allocate a TLS segment index:
      
              struct user_desc u_info;
              bzero(&u_info, sizeof(u_info));
              u_info.entry_number = (uint32_t)-1;
      
              syscall(SYS_set_thread_area, &u_info);
      
      Strictly speaking, this code was never correct.  It should have set
      read_exec_only and seg_not_present to 1 to indicate that it wanted
      to find a free slot without putting anything there, or it should
      have put something sensible in the TLS slot if it wanted to allocate
      a TLS entry for real.  The actual effect of this code was to
      allocate a bogus segment that could be used to exploit espfix.
      
      The set_thread_area hardening patches changed the behavior, causing
      set_thread_area to return -EINVAL and crashing the game.
      
      This changes set_thread_area to interpret this as a request to find
      a free slot and to leave it empty, which isn't *quite* what the game
      expects but should be close enough to keep it working.  In
      particular, using the code above to allocate two segments will
      allocate the same segment both times.
      
      According to FrostbittenKing on Github, this fixes The Witcher 2.
      
      If this somehow still causes problems, we could instead allocate
      a limit==0 32-bit data segment, but that seems rather ugly to me.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 3175b4cb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1f50d3c7
    • Andy Lutomirski's avatar
      x86, tls, ldt: Stop checking lm in LDT_empty · 598b6280
      Andy Lutomirski authored
      commit e30ab185 upstream.
      
      32-bit programs don't have an lm bit in their ABI, so they can't
      reliably cause LDT_empty to return true without resorting to memset.
      They shouldn't need to do this.
      
      This should fix a longstanding, if minor, issue in all 64-bit kernels
      as well as a potential regression in the TLS hardening code.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit f62570cb)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      598b6280
    • Andy Lutomirski's avatar
      x86/tls: Validate TLS entries to protect espfix · 85e01300
      Andy Lutomirski authored
      commit 41bdc785 upstream
      
      Installing a 16-bit RW data segment into the GDT defeats espfix.
      AFAICT this will not affect glibc, Wine, or dosemu at all.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      85e01300
    • Andy Lutomirski's avatar
      x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs · f0d8cc6f
      Andy Lutomirski authored
      commit 7ddc6a21 upstream.
      
      These functions can be executed on the int3 stack, so kprobes
      are dangerous. Tracing is probably a bad idea, too.
      
      Fixes: b645af2d ("x86_64, traps: Rework bad_iret")
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 3.2:
       - Use __kprobes instead of NOKPROBE_SYMBOL()
       - Don't use __visible]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      (cherry picked from commit 8ea4c465)
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f0d8cc6f
  2. 13 Dec, 2014 3 commits