1. 10 Jul, 2015 6 commits
    • Alexander Sverdlin's avatar
      sctp: Fix race between OOTB responce and route removal · 59a460c3
      Alexander Sverdlin authored
      [ Upstream commit 29c4afc4 ]
      
      There is NULL pointer dereference possible during statistics update if the route
      used for OOTB responce is removed at unfortunate time. If the route exists when
      we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
      ABORT, but in the meantime route is removed under our feet, we take "no_route"
      path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
      
      But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
      sctp_transport_set_owner() and therefore there is no asoc associated with this
      packet. Probably temporary asoc just for OOTB responces is overkill, so just
      introduce a check like in all other places in sctp_packet_transmit(), where
      "asoc" is dereferenced.
      
      To reproduce this, one needs to
      0. ensure that sctp module is loaded (otherwise ABORT is not generated)
      1. remove default route on the machine
      2. while true; do
           ip route del [interface-specific route]
           ip route add [interface-specific route]
         done
      3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
         responce
      
      On x86_64 the crash looks like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: ...
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
      Hardware name: ...
      task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
      RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
      RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
      RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
      R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
      R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
      FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
      Stack:
       ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
       ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
       0000000000000000 0000000000000001 0000000000000000 0000000000000000
      Call Trace:
       <IRQ>
       [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
       [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
       [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
       [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
       [<ffffffff810e0329>] ? update_process_times+0x59/0x60
       [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
       [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
       [<ffffffff8101f599>] ? read_tsc+0x9/0x10
       [<ffffffff8116d4b5>] ? put_page+0x55/0x60
       [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
       [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
       [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
       [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
       [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
       [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
       [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
       [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
       [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
       [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
       [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
       [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
       [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
       [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
       [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
       [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
       [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
       [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
       [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
       [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
       [<ffffffff8147896a>] net_rx_action+0x21a/0x360
       [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
       [<ffffffff8107912d>] irq_exit+0xad/0xb0
       [<ffffffff8157d158>] do_IRQ+0x58/0xf0
       [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
       <EOI>
       [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
       [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
       [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
       [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
       [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
       [<ffffffff8156b365>] rest_init+0x85/0x90
       [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
       [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
       [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
      Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
      RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
       RSP <ffff880127c037b8>
      CR2: 0000000000000020
      ---[ end trace 5aec7fd2dc983574 ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      drm_kms_helper: panic occurred, switching back to text console
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59a460c3
    • Willem de Bruijn's avatar
      packet: avoid out of bounds read in round robin fanout · 4552ddd0
      Willem de Bruijn authored
      [ Upstream commit 468479e6 ]
      
      PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
      f->num_members. It returns the old value unconditionally, but
      f->num_members may have changed since the last store. Ensure
      that the return value is always < num.
      
      When modifying the logic, simplify it further by replacing the loop
      with an unconditional atomic increment.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4552ddd0
    • Eric Dumazet's avatar
      packet: read num_members once in packet_rcv_fanout() · 7eee92bf
      Eric Dumazet authored
      [ Upstream commit f98f4514 ]
      
      We need to tell compiler it must not read f->num_members multiple
      times. Otherwise testing if num is not zero is flaky, and we could
      attempt an invalid divide by 0 in fanout_demux_cpu()
      
      Note bug was present in packet_rcv_fanout_hash() and
      packet_rcv_fanout_lb() but final 3.1 had a simple location
      after commit 95ec3eb4 ("packet: Add 'cpu' fanout policy.")
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7eee92bf
    • Nikolay Aleksandrov's avatar
      bridge: fix br_stp_set_bridge_priority race conditions · 019b1332
      Nikolay Aleksandrov authored
      [ Upstream commit 2dab80a8 ]
      
      After the ->set() spinlocks were removed br_stp_set_bridge_priority
      was left running without any protection when used via sysfs. It can
      race with port add/del and could result in use-after-free cases and
      corrupted lists. Tested by running port add/del in a loop with stp
      enabled while setting priority in a loop, crashes are easily
      reproducible.
      The spinlocks around sysfs ->set() were removed in commit:
      14f98f25 ("bridge: range check STP parameters")
      There's also a race condition in the netlink priority support that is
      fixed by this change, but it was introduced recently and the fixes tag
      covers it, just in case it's needed the commit is:
      af615762 ("bridge: add ageing_time, stp_state, priority over netlink")
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 14f98f25 ("bridge: range check STP parameters")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      019b1332
    • Nikolay Aleksandrov's avatar
      bridge: fix multicast router rlist endless loop · 80cbc4a3
      Nikolay Aleksandrov authored
      [ Upstream commit 1a040eac ]
      
      Since the addition of sysfs multicast router support if one set
      multicast_router to "2" more than once, then the port would be added to
      the hlist every time and could end up linking to itself and thus causing an
      endless loop for rlist walkers.
      So to reproduce just do:
      echo 2 > multicast_router; echo 2 > multicast_router;
      in a bridge port and let some igmp traffic flow, for me it hangs up
      in br_multicast_flood().
      Fix this by adding a check in br_multicast_add_router() if the port is
      already linked.
      The reason this didn't happen before the addition of multicast_router
      sysfs entries is because there's a !hlist_unhashed check that prevents
      it.
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Fixes: 0909e117 ("bridge: Add multicast_router sysfs entries")
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80cbc4a3
    • Sowmini Varadhan's avatar
      sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context · dc911b31
      Sowmini Varadhan authored
      Upstream commit 671d7732
      
      Since it is possible for vnet_event_napi to end up doing
      vnet_control_pkt_engine -> ... -> vnet_send_attr ->
      vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
      (i.e., in softirq context), kzalloc() should be called with
      GFP_ATOMIC from ldc_alloc_exp_dring.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc911b31
  2. 04 Jul, 2015 23 commits
  3. 29 Jun, 2015 6 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.82 · b3d78448
      Greg Kroah-Hartman authored
      b3d78448
    • James Smart's avatar
      lpfc: Add iotag memory barrier · 17c06c69
      James Smart authored
      commit 27f344eb upstream.
      
      Add a memory barrier to ensure the valid bit is read before
      any of the cqe payload is read. This fixes an issue seen
      on Power where the cqe payload was getting loaded before
      the valid bit. When this occurred, we saw an iotag out of
      range error when a command completed, but since the iotag
      looked invalid the command didn't get completed to scsi core.
      Later we hit the command timeout, attempted to abort the command,
      then waited for the aborted command to get returned. Since the
      adapter already returned the command, we timeout waiting,
      and end up escalating EEH all the way to host reset. This
      patch fixes this issue.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Smart <james.smart@emulex.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17c06c69
    • Ben Hutchings's avatar
      pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic · 14f81062
      Ben Hutchings authored
      pipe_iov_copy_{from,to}_user() may be tried twice with the same iovec,
      the first time atomically and the second time not.  The second attempt
      needs to continue from the iovec position, pipe buffer offset and
      remaining length where the first attempt failed, but currently the
      pipe buffer offset and remaining length are reset.  This will corrupt
      the piped data (possibly also leading to an information leak between
      processes) and may also corrupt kernel memory.
      
      This was fixed upstream by commits f0d1bec9 ("new helper:
      copy_page_from_iter()") and 637b58c2 ("switch pipe_read() to
      copy_page_to_iter()"), but those aren't suitable for stable.  This fix
      for older kernel versions was made by Seth Jennings for RHEL and I
      have extracted it from their update.
      
      CVE-2015-1805
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1202855Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14f81062
    • Adam Jackson's avatar
      drm/mgag200: Reject non-character-cell-aligned mode widths · 97d905e8
      Adam Jackson authored
      commit 25161084 upstream.
      
      Turns out 1366x768 does not in fact work on this hardware.
      Signed-off-by: default avatarAdam Jackson <ajax@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97d905e8
    • Steven Rostedt's avatar
      tracing: Have filter check for balanced ops · 63dec311
      Steven Rostedt authored
      commit 2cf30dc1 upstream.
      
      When the following filter is used it causes a warning to trigger:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: No error
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
       Modules linked in: bnep lockd grace bluetooth  ...
       CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
        0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
        ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
       Call Trace:
        [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
        [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
        [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
        [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81159065>] replace_preds+0x3c5/0x990
        [<ffffffff811596b2>] create_filter+0x82/0xb0
        [<ffffffff81159944>] apply_event_filter+0xd4/0x180
        [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
        [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
        [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
        [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
        [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
        [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
        [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
       ---[ end trace e11028bd95818dcd ]---
      
      Worse yet, reading the error message (the filter again) it says that
      there was no error, when there clearly was. The issue is that the
      code that checks the input does not check for balanced ops. That is,
      having an op between a closed parenthesis and the next token.
      
      This would only cause a warning, and fail out before doing any real
      harm, but it should still not caues a warning, and the error reported
      should work:
      
       # cd /sys/kernel/debug/tracing
       # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
      -bash: echo: write error: Invalid argument
       # cat events/ext4/ext4_truncate_exit/filter
      ((dev==1)blocks==2)
      ^
      parse_error: Meaningless filter expression
      
      And give no kernel warning.
      
      Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [ luis: backported to 3.16:
        - unconditionally decrement cnt as the OP_NOT logic was introduced only
          by e12c09cf ("tracing: Add NOT to filtering logic") ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63dec311
    • Steve Cornelius's avatar
      crypto: caam - fix RNG buffer cache alignment · b6d2d39a
      Steve Cornelius authored
      commit 412c98c1 upstream.
      
      The hwrng output buffers (2) are cast inside of a a struct (caam_rng_ctx)
      allocated in one DMA-tagged region. While the kernel's heap allocator
      should place the overall struct on a cacheline aligned boundary, the 2
      buffers contained within may not necessarily align. Consenquently, the ends
      of unaligned buffers may not fully flush, and if so, stale data will be left
      behind, resulting in small repeating patterns.
      
      This fix aligns the buffers inside the struct.
      
      Note that not all of the data inside caam_rng_ctx necessarily needs to be
      DMA-tagged, only the buffers themselves require this. However, a fix would
      incur the expense of error-handling bloat in the case of allocation failure.
      Signed-off-by: default avatarSteve Cornelius <steve.cornelius@freescale.com>
      Signed-off-by: default avatarVictoria Milhoan <vicki.milhoan@freescale.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6d2d39a
  4. 22 Jun, 2015 5 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.81 · 28114597
      Greg Kroah-Hartman authored
      28114597
    • Jeff Mahoney's avatar
      btrfs: cleanup orphans while looking up default subvolume · 9272a6ce
      Jeff Mahoney authored
      commit 727b9784 upstream.
      
      Orphans in the fs tree are cleaned up via open_ctree and subvolume
      orphans are cleaned via btrfs_lookup_dentry -- except when a default
      subvolume is in use.  The name for the default subvolume uses a manual
      lookup that doesn't trigger orphan cleanup and needs to trigger it
      manually as well. This doesn't apply to the remount case since the
      subvolumes are cleaned up by walking the root radix tree.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9272a6ce
    • Chengyu Song's avatar
      btrfs: incorrect handling for fiemap_fill_next_extent return · 8d1529ce
      Chengyu Song authored
      commit 26e726af upstream.
      
      fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
      the last extent that will fit in user array. If 1 is returned, the return
      value may eventually returned to user space, which should not happen, according
      to manpage of ioctl.
      Signed-off-by: default avatarChengyu Song <csong84@gatech.edu>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d1529ce
    • Johannes Berg's avatar
      cfg80211: wext: clear sinfo struct before calling driver · 98d94f20
      Johannes Berg authored
      commit 9c5a18a3 upstream.
      
      Until recently, mac80211 overwrote all the statistics it could
      provide when getting called, but it now relies on the struct
      having been zeroed by the caller. This was always the case in
      nl80211, but wext used a static struct which could even cause
      values from one device leak to another.
      
      Using a static struct is OK (as even documented in a comment)
      since the whole usage of this function and its return value is
      always locked under RTNL. Not clearing the struct for calling
      the driver has always been wrong though, since drivers were
      free to only fill values they could report, so calling this
      for one device and then for another would always have leaked
      values from one to the other.
      
      Fix this by initializing the structure in question before the
      driver method call.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691Reported-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Reported-by: default avatarAlexander Kaltsas <alexkaltsas@gmail.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98d94f20
    • Gu Zheng's avatar
      mm/memory_hotplug.c: set zone->wait_table to null after freeing it · 31c6d4e4
      Gu Zheng authored
      commit 85bd8399 upstream.
      
      Izumi found the following oops when hot re-adding a node:
      
          BUG: unable to handle kernel paging request at ffffc90008963690
          IP: __wake_up_bit+0x20/0x70
          Oops: 0000 [#1] SMP
          CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
          Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
          task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
          RIP: 0010:[<ffffffff810dff80>]  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
          RSP: 0018:ffff880017b97be8  EFLAGS: 00010246
          RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
          RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
          RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
          R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
          FS:  00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
          Call Trace:
            unlock_page+0x6d/0x70
            generic_write_end+0x53/0xb0
            xfs_vm_write_end+0x29/0x80 [xfs]
            generic_perform_write+0x10a/0x1e0
            xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
            xfs_file_write_iter+0x79/0x120 [xfs]
            __vfs_write+0xd4/0x110
            vfs_write+0xac/0x1c0
            SyS_write+0x58/0xd0
            system_call_fastpath+0x12/0x76
          Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
          RIP  [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
           RSP <ffff880017b97be8>
          CR2: ffffc90008963690
      
      Reproduce method (re-add a node)::
        Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)
      
      This seems an use-after-free problem, and the root cause is
      zone->wait_table was not set to *NULL* after free it in
      try_offline_node.
      
      When hot re-add a node, we will reuse the pgdat of it, so does the zone
      struct, and when add pages to the target zone, it will init the zone
      first (including the wait_table) if the zone is not initialized.  The
      judgement of zone initialized is based on zone->wait_table:
      
      	static inline bool zone_is_initialized(struct zone *zone)
      	{
      		return !!zone->wait_table;
      	}
      
      so if we do not set the zone->wait_table to *NULL* after free it, the
      memory hotplug routine will skip the init of new zone when hot re-add
      the node, and the wait_table still points to the freed memory, then we
      will access the invalid address when trying to wake up the waiting
      people after the i/o operation with the page is done, such as mentioned
      above.
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarTaku Izumi <izumi.taku@jp.fujitsu.com>
      Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31c6d4e4