1. 06 Feb, 2019 6 commits
    • Stefano Brivio's avatar
      ipv6: Check available headroom in ip6_xmit() even without options · 901de1ce
      Stefano Brivio authored
      BugLink: https://bugs.launchpad.net/bugs/1811080
      
      [ Upstream commit 66033f47 ]
      
      Even if we send an IPv6 packet without options, MAX_HEADER might not be
      enough to account for the additional headroom required by alignment of
      hardware headers.
      
      On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
      sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
      100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
      bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
      
      Those would be enough to append our 14 bytes header, but we're going to
      align that to 16 bytes, and write 2 bytes out of the allocated slab in
      neigh_hh_output().
      
      KASan says:
      
      [  264.967848] ==================================================================
      [  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
      [  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
      [  264.967870]
      [  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
      [  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      [  264.967887] Call Trace:
      [  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
      [  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
      [  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
      [  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
      [  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
      [  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
      [  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
      [  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
      [  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
      [  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
      [  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
      [  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
      [  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
      [  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
      [  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
      [  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
      [  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
      [  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
      [  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
      [  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
      [  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
      [  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
      [  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
      [  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
      [  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
      [  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
      [  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
      [  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
      [  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0
      
      [...]
      
      Just like ip_finish_output2() does for IPv4, check that we have enough
      headroom in ip6_xmit(), and reallocate it if we don't.
      
      This issue is older than git history.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      901de1ce
    • Daniel Axtens's avatar
      UBUNTU: SAUCE: bcache: never writeback a discard operation · ea1ae9de
      Daniel Axtens authored
      BugLink: https://bugs.launchpad.net/bugs/1793901
      
      Some users see panics like the following when performing fstrim on a bcached volume:
      
      [  529.803060] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  530.183928] #PF error: [normal kernel read fault]
      [  530.412392] PGD 8000001f42163067 P4D 8000001f42163067 PUD 1f42168067 PMD 0
      [  530.750887] Oops: 0000 [#1] SMP PTI
      [  530.920869] CPU: 10 PID: 4167 Comm: fstrim Kdump: loaded Not tainted 5.0.0-rc1+ #3
      [  531.290204] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
      [  531.693137] RIP: 0010:blk_queue_split+0x148/0x620
      [  531.922205] Code: 60 38 89 55 a0 45 31 db 45 31 f6 45 31 c9 31 ff 89 4d 98 85 db 0f 84 7f 04 00 00 44 8b 6d 98 4c 89 ee 48 c1 e6 04 49 03 70 78 <8b> 46 08 44 8b 56 0c 48
      8b 16 44 29 e0 39 d8 48 89 55 a8 0f 47 c3
      [  532.838634] RSP: 0018:ffffb9b708df39b0 EFLAGS: 00010246
      [  533.093571] RAX: 00000000ffffffff RBX: 0000000000046000 RCX: 0000000000000000
      [  533.441865] RDX: 0000000000000200 RSI: 0000000000000000 RDI: 0000000000000000
      [  533.789922] RBP: ffffb9b708df3a48 R08: ffff940d3b3fdd20 R09: 0000000000000000
      [  534.137512] R10: ffffb9b708df3958 R11: 0000000000000000 R12: 0000000000000000
      [  534.485329] R13: 0000000000000000 R14: 0000000000000000 R15: ffff940d39212020
      [  534.833319] FS:  00007efec26e3840(0000) GS:ffff940d1f480000(0000) knlGS:0000000000000000
      [  535.224098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  535.504318] CR2: 0000000000000008 CR3: 0000001f4e256004 CR4: 00000000001606e0
      [  535.851759] Call Trace:
      [  535.970308]  ? mempool_alloc_slab+0x15/0x20
      [  536.174152]  ? bch_data_insert+0x42/0xd0 [bcache]
      [  536.403399]  blk_mq_make_request+0x97/0x4f0
      [  536.607036]  generic_make_request+0x1e2/0x410
      [  536.819164]  submit_bio+0x73/0x150
      [  536.980168]  ? submit_bio+0x73/0x150
      [  537.149731]  ? bio_associate_blkg_from_css+0x3b/0x60
      [  537.391595]  ? _cond_resched+0x1a/0x50
      [  537.573774]  submit_bio_wait+0x59/0x90
      [  537.756105]  blkdev_issue_discard+0x80/0xd0
      [  537.959590]  ext4_trim_fs+0x4a9/0x9e0
      [  538.137636]  ? ext4_trim_fs+0x4a9/0x9e0
      [  538.324087]  ext4_ioctl+0xea4/0x1530
      [  538.497712]  ? _copy_to_user+0x2a/0x40
      [  538.679632]  do_vfs_ioctl+0xa6/0x600
      [  538.853127]  ? __do_sys_newfstat+0x44/0x70
      [  539.051951]  ksys_ioctl+0x6d/0x80
      [  539.212785]  __x64_sys_ioctl+0x1a/0x20
      [  539.394918]  do_syscall_64+0x5a/0x110
      [  539.568674]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      We have observed it where both:
      1) LVM/devmapper is involved (bcache backing device is LVM volume) and
      2) writeback cache is involved (bcache cache_mode is writeback)
      
      On one machine, we can reliably reproduce it with:
      
       # echo writeback > /sys/block/bcache0/bcache/cache_mode # not sure if this is required
       # mount /dev/bcache0 /test
       # for i in {0..10}; do file="$(mktemp /test/zero.XXX)"; dd if=/dev/zero of="$file" bs=1M count=256; sync; rm $file; done; fstrim -v /test
      
      Observing this with tracepoints on, we see the following writes:
      
      fstrim-18019 [022] .... 91107.302026: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4260112 + 196352 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302050: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4456464 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302075: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 4718608 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302094: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5324816 + 180224 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302121: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5505040 + 262144 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.302145: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 5767184 + 81920 hit 0 bypass 1
      fstrim-18019 [022] .... 91107.308777: bcache_write: 73f95583-561c-408f-a93a-4cbd2498f5c8 inode 0  DS 6373392 + 180224 hit 1 bypass 0
      <crash>
      
      Note the final one has different hit/bypass flags.
      
      This is because in should_writeback(), we were hitting a case where
      the partial stripe condition was returning true and so
      should_writeback() was returning true early.
      
      If that hadn't been the case, it would have hit the would_skip test, and
      as would_skip == s->iop.bypass == true, should_writeback() would have
      returned false.
      
      Looking at the git history from 72c27061 ("bcache: Write out full
      stripes"), it looks like the idea was to optimise for raid5/6:
      
             * If a stripe is already dirty, force writes to that stripe to
      	 writeback mode - to help build up full stripes of dirty data
      
      To fix this issue, make sure that should_writeback() on a discard op
      never returns true.
      
      More details of debugging: https://www.spinics.net/lists/linux-bcache/msg06996.html
      
      Previous reports:
       - https://bugzilla.kernel.org/show_bug.cgi?id=201051
       - https://bugzilla.kernel.org/show_bug.cgi?id=196103
       - https://www.spinics.net/lists/linux-bcache/msg06885.html
      
      Cc: Kent Overstreet <koverstreet@google.com>
      Fixes: 72c27061 ("bcache: Write out full stripes")
      Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
      (backported from linux-bcache mailing list:
       https://www.spinics.net/lists/linux-bcache/msg06997.html
       Expected to land in v5.1:
       https://www.spinics.net/lists/linux-bcache/msg06998.html
       backport posted on-list at:
       https://www.spinics.net/lists/linux-bcache/msg07024.html)
      Signed-off-by: default avatarDaniel Axtens <daniel.axtens@canonical.com>
      Acked-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      Acked-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      ea1ae9de
    • Dmitry Safonov's avatar
      tty: Don't hold ldisc lock in tty_reopen() if ldisc present · e00cb1b9
      Dmitry Safonov authored
      BugLink: https://bugs.launchpad.net/bugs/1813873
      
      Try to get reference for ldisc during tty_reopen().
      If ldisc present, we don't need to do tty_ldisc_reinit() and lock the
      write side for line discipline semaphore.
      Effectively, it optimizes fast-path for tty_reopen(), but more
      importantly it won't interrupt ongoing IO on the tty as no ldisc change
      is needed.
      Fixes user-visible issue when tty_reopen() interrupted login process for
      user with a long password, observed and reported by Lukas.
      
      Fixes: c96cf923 ("tty: Don't block on IO when ldisc change is pending")
      Fixes: 83d817f4 ("tty: Hold tty_ldisc_lock() during tty_reopen()")
      Cc: Jiri Slaby <jslaby@suse.com>
      Reported-by: default avatarLukas F. Hartmann <lukas@mntmn.com>
      Tested-by: default avatarLukas F. Hartmann <lukas@mntmn.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit d3736d82)
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Acked-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      Acked-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      e00cb1b9
    • David Herrmann's avatar
      fork: record start_time late · b6d1e4fe
      David Herrmann authored
      This changes the fork(2) syscall to record the process start_time after
      initializing the basic task structure but still before making the new
      process visible to user-space.
      
      Technically, we could record the start_time anytime during fork(2).  But
      this might lead to scenarios where a start_time is recorded long before
      a process becomes visible to user-space.  For instance, with
      userfaultfd(2) and TLS, user-space can delay the execution of fork(2)
      for an indefinite amount of time (and will, if this causes network
      access, or similar).
      
      By recording the start_time late, it much closer reflects the point in
      time where the process becomes live and can be observed by other
      processes.
      
      Lastly, this makes it much harder for user-space to predict and control
      the start_time they get assigned.  Previously, user-space could fork a
      process and stall it in copy_thread_tls() before its pid is allocated,
      but after its start_time is recorded.  This can be misused to later-on
      cycle through PIDs and resume the stalled fork(2) yielding a process
      that has the same pid and start_time as a process that existed before.
      This can be used to circumvent security systems that identify processes
      by their pid+start_time combination.
      
      Even though user-space was always aware that start_time recording is
      flaky (but several projects are known to still rely on start_time-based
      identification), changing the start_time to be recorded late will help
      mitigate existing attacks and make it much harder for user-space to
      control the start_time a process gets assigned.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      CVE-2019-6133
      
      (cherry picked from commit 7b558513)
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Acked-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      Acked-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      b6d1e4fe
    • Juerg Haefliger's avatar
      UBUNTU: SAUCE: fan: Fix NULL pointer dereference · cd399450
      Juerg Haefliger authored
      BugLink: https://bugs.launchpad.net/bugs/1811803
      
      Fix a NULL pointer dereference in fan code that can easily be triggered
      by running:
      $ sudo ip link add foo type ipip
      
      Which leads to:
      [    1.330067] BUG: unable to handle kernel NULL pointer dereference at 0000000000000108
      [    1.330792] IP: [<ffffffff817e8132>] ipip_netlink_fan.isra.7+0x12/0x280
      [    1.331399] PGD 800000003fb94067 PUD 3fb93067 PMD 0
      [    1.331882] Oops: 0000 [#1] SMP
      [    1.332200] Modules linked in:
      [    1.332492] CPU: 0 PID: 137 Comm: ip Not tainted 4.4.167+ #5
      [    1.333001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
      [    1.333740] task: ffff88003c38a640 ti: ffff88003fb5c000 task.ti: ffff88003fb5c000
      [    1.334375] RIP: 0010:[<ffffffff817e8132>]  [<ffffffff817e8132>] ipip_netlink_fan.isra.7+0x12/0x280
      [    1.335193] RSP: 0018:ffff88003fb5f778  EFLAGS: 00010246
      [    1.335671] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [    1.336305] RDX: ffff88003fb5f7f0 RSI: ffff88003fa3f840 RDI: 0000000000000000
      [    1.336940] RBP: ffff88003fb5f7a0 R08: 000000000000000a R09: 0000000000000092
      [    1.337587] R10: 0000000000000000 R11: 00000000000001ad R12: ffff88003fa3f000
      [    1.338267] R13: ffff88003fb5f9d0 R14: ffff88003fa3f840 R15: ffffffff81f4b240
      [    1.338904] FS:  00007f535979b700(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
      [    1.339590] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    1.340066] CR2: 0000000000000108 CR3: 000000003fb60000 CR4: 0000000000000670
      [    1.340750] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    1.341341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    1.341909] Stack:
      [    1.342080]  0000000000000000 ffff88003fa3f000 ffff88003fb5f9d0 ffff88003fa3f840
      [    1.342725]  ffffffff81f4b240 ffff88003fb5f828 ffffffff817e8515 0000000381356f0e
      [    1.343334]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [    1.343943] Call Trace:
      [    1.344141]  [<ffffffff817e8515>] ipip_newlink+0xa5/0xc0
      [    1.344553]  [<ffffffff81782f5b>] ? __netlink_ns_capable+0x3b/0x40
      [    1.345029]  [<ffffffff817651fd>] rtnl_newlink+0x6fd/0x8b0
      [    1.345699]  [<ffffffff811f92b1>] ? kmem_cache_alloc+0x1a1/0x1f0
      [    1.346165]  [<ffffffff8119abd5>] ? mempool_alloc_slab+0x15/0x20
      [    1.346630]  [<ffffffff81436463>] ? validate_nla+0x93/0x1a0
      [    1.347060]  [<ffffffff81436680>] ? nla_parse+0xa0/0x100
      [    1.347474]  [<ffffffff81436732>] ? nla_strlcpy+0x52/0x60
      [    1.347891]  [<ffffffff81762099>] ? rtnl_link_ops_get+0x39/0x50
      [    1.348347]  [<ffffffff81764c76>] ? rtnl_newlink+0x176/0x8b0
      [    1.348784]  [<ffffffff8176373c>] rtnetlink_rcv_msg+0xec/0x230
      [    1.349237]  [<ffffffff811fce3b>] ? __kmalloc_node_track_caller+0x24b/0x310
      [    1.349774]  [<ffffffff8173e397>] ? __alloc_skb+0x87/0x1d0
      [    1.350198]  [<ffffffff81763650>] ? rtnetlink_rcv+0x30/0x30
      [    1.350628]  [<ffffffff81786da6>] netlink_rcv_skb+0xa6/0xc0
      [    1.351059]  [<ffffffff81763648>] rtnetlink_rcv+0x28/0x30
      [    1.351476]  [<ffffffff81786770>] netlink_unicast+0x190/0x240
      [    1.351919]  [<ffffffff81786b5a>] netlink_sendmsg+0x33a/0x3b0
      [    1.352363]  [<ffffffff813af211>] ? aa_sock_msg_perm+0x61/0x150
      [    1.352820]  [<ffffffff81734bde>] sock_sendmsg+0x3e/0x50
      [    1.353235]  [<ffffffff817356a7>] ___sys_sendmsg+0x287/0x2a0
      [    1.353672]  [<ffffffff8120ed2b>] ? mem_cgroup_try_charge+0x6b/0x1e0
      [    1.354162]  [<ffffffff811cb9ed>] ? handle_mm_fault+0xecd/0x1b80
      [    1.354625]  [<ffffffff81239fc7>] ? __alloc_fd+0xc7/0x190
      [    1.355044]  [<ffffffff81736021>] __sys_sendmsg+0x51/0x90
      [    1.355525]  [<ffffffff81736072>] SyS_sendmsg+0x12/0x20
      [    1.355933]  [<ffffffff81866e1b>] entry_SYSCALL_64_fastpath+0x22/0xcb
      [    1.356426] Code: 50 01 00 00 01 eb d3 49 8d 94 24 b8 08 00 00 eb ac e8 83 cf 89 ff 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 <48> 8b 9f 08 01 00 00 48 85 db 74 1e 8b 02 85 c0 75 25 44 0f b7
      [    1.358557] RIP  [<ffffffff817e8132>] ipip_netlink_fan.isra.7+0x12/0x280
      [    1.359086]  RSP <ffff88003fb5f778>
      [    1.359359] CR2: 0000000000000108
      [    1.359637] ---[ end trace 7820fbc7ced5dd6e ]---
      Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
      Acked-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Acked-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
      cd399450
    • Juerg Haefliger's avatar
      UBUNTU: Start new release · 10e7710d
      Juerg Haefliger authored
      Ignore: yes
      Signed-off-by: default avatarJuerg Haefliger <juergh@canonical.com>
      10e7710d
  2. 16 Jan, 2019 34 commits