1. 29 Apr, 2018 36 commits
    • Willem de Bruijn's avatar
      packet: fix bitfield update race · 183f20fb
      Willem de Bruijn authored
      
      [ Upstream commit a6361f0c ]
      
      Updates to the bitfields in struct packet_sock are not atomic.
      Serialize these read-modify-write cycles.
      
      Move po->running into a separate variable. Its writes are protected by
      po->bind_lock (except for one startup case at packet_create). Also
      replace a textual precondition warning with lockdep annotation.
      
      All others are set only in packet_setsockopt. Serialize these
      updates by holding the socket lock. Analogous to other field updates,
      also hold the lock when testing whether a ring is active (pg_vec).
      
      Fixes: 8dc41944 ("[PACKET]: Add optional checksum computation for recvmsg")
      Reported-by: default avatarDaeRyong Jeong <threeearcat@gmail.com>
      Reported-by: default avatarByoungyoung Lee <byoungyoung@purdue.edu>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      183f20fb
    • Xin Long's avatar
      team: fix netconsole setup over team · c3da49b8
      Xin Long authored
      
      [ Upstream commit 9cf2f437 ]
      
      The same fix in Commit dbe17307 ("bridge: fix netconsole
      setup over bridge") is also needed for team driver.
      
      While at it, remove the unnecessary parameter *team from
      team_port_enable_netpoll().
      
      v1->v2:
        - fix it in a better way, as does bridge.
      
      Fixes: 0fb52a27 ("team: cleanup netpoll clode")
      Reported-by: default avatarJoão Avelino Bellomo Filho <jbellomo@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3da49b8
    • Paolo Abeni's avatar
      team: avoid adding twice the same option to the event list · 8e6805ab
      Paolo Abeni authored
      
      [ Upstream commit 4fb0534f ]
      
      When parsing the options provided by the user space,
      team_nl_cmd_options_set() insert them in a temporary list to send
      multiple events with a single message.
      While each option's attribute is correctly validated, the code does
      not check for duplicate entries before inserting into the event
      list.
      
      Exploiting the above, the syzbot was able to trigger the following
      splat:
      
      kernel BUG at lib/list_debug.c:31!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4466 Comm: syzkaller556835 Not tainted 4.16.0+ #17
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__list_add_valid+0xaa/0xb0 lib/list_debug.c:29
      RSP: 0018:ffff8801b04bf248 EFLAGS: 00010286
      RAX: 0000000000000058 RBX: ffff8801c8fc7a90 RCX: 0000000000000000
      RDX: 0000000000000058 RSI: ffffffff815fbf41 RDI: ffffed0036097e3f
      RBP: ffff8801b04bf260 R08: ffff8801b0b2a700 R09: ffffed003b604f90
      R10: ffffed003b604f90 R11: ffff8801db027c87 R12: ffff8801c8fc7a90
      R13: ffff8801c8fc7a90 R14: dffffc0000000000 R15: 0000000000000000
      FS:  0000000000b98880(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000043fc30 CR3: 00000001afe8e000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        __list_add include/linux/list.h:60 [inline]
        list_add include/linux/list.h:79 [inline]
        team_nl_cmd_options_set+0x9ff/0x12b0 drivers/net/team/team.c:2571
        genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
        genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624
        netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
        genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xd5/0x120 net/socket.c:639
        ___sys_sendmsg+0x805/0x940 net/socket.c:2117
        __sys_sendmsg+0x115/0x270 net/socket.c:2155
        SYSC_sendmsg net/socket.c:2164 [inline]
        SyS_sendmsg+0x29/0x30 net/socket.c:2162
        do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4458b9
      RSP: 002b:00007ffd1d4a7278 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 000000000000001b RCX: 00000000004458b9
      RDX: 0000000000000010 RSI: 0000000020000d00 RDI: 0000000000000004
      RBP: 00000000004a74ed R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000213 R12: 00007ffd1d4a7348
      R13: 0000000000402a60 R14: 0000000000000000 R15: 0000000000000000
      Code: 75 e8 eb a9 48 89 f7 48 89 75 e8 e8 d1 85 7b fe 48 8b 75 e8 eb bb 48
      89 f2 48 89 d9 4c 89 e6 48 c7 c7 a0 84 d8 87 e8 ea 67 28 fe <0f> 0b 0f 1f
      40 00 48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41
      RIP: __list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: ffff8801b04bf248
      
      This changeset addresses the avoiding list_add() if the current
      option is already present in the event list.
      
      Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Fixes: 2fcdb2c9 ("team: allow to send multiple set events in one message")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e6805ab
    • Jann Horn's avatar
      tcp: don't read out-of-bounds opsize · 09a37b36
      Jann Horn authored
      
      [ Upstream commit 7e5a206a ]
      
      The old code reads the "opsize" variable from out-of-bounds memory (first
      byte behind the segment) if a broken TCP segment ends directly after an
      opcode that is neither EOL nor NOP.
      
      The result of the read isn't used for anything, so the worst thing that
      could theoretically happen is a pagefault; and since the physmap is usually
      mostly contiguous, even that seems pretty unlikely.
      
      The following C reproducer triggers the uninitialized read - however, you
      can't actually see anything happen unless you put something like a
      pr_warn() in tcp_parse_md5sig_option() to print the opsize.
      
      ====================================
      #define _GNU_SOURCE
      #include <arpa/inet.h>
      #include <stdlib.h>
      #include <errno.h>
      #include <stdarg.h>
      #include <net/if.h>
      #include <linux/if.h>
      #include <linux/ip.h>
      #include <linux/tcp.h>
      #include <linux/in.h>
      #include <linux/if_tun.h>
      #include <err.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/ioctl.h>
      #include <assert.h>
      
      void systemf(const char *command, ...) {
        char *full_command;
        va_list ap;
        va_start(ap, command);
        if (vasprintf(&full_command, command, ap) == -1)
          err(1, "vasprintf");
        va_end(ap);
        printf("systemf: <<<%s>>>\n", full_command);
        system(full_command);
      }
      
      char *devname;
      
      int tun_alloc(char *name) {
        int fd = open("/dev/net/tun", O_RDWR);
        if (fd == -1)
          err(1, "open tun dev");
        static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
        strcpy(req.ifr_name, name);
        if (ioctl(fd, TUNSETIFF, &req))
          err(1, "TUNSETIFF");
        devname = req.ifr_name;
        printf("device name: %s\n", devname);
        return fd;
      }
      
      #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
      
      void sum_accumulate(unsigned int *sum, void *data, int len) {
        assert((len&2)==0);
        for (int i=0; i<len/2; i++) {
          *sum += ntohs(((unsigned short *)data)[i]);
        }
      }
      
      unsigned short sum_final(unsigned int sum) {
        sum = (sum >> 16) + (sum & 0xffff);
        sum = (sum >> 16) + (sum & 0xffff);
        return htons(~sum);
      }
      
      void fix_ip_sum(struct iphdr *ip) {
        unsigned int sum = 0;
        sum_accumulate(&sum, ip, sizeof(*ip));
        ip->check = sum_final(sum);
      }
      
      void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
        unsigned int sum = 0;
        struct {
          unsigned int saddr;
          unsigned int daddr;
          unsigned char pad;
          unsigned char proto_num;
          unsigned short tcp_len;
        } fakehdr = {
          .saddr = ip->saddr,
          .daddr = ip->daddr,
          .proto_num = ip->protocol,
          .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
        };
        sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
        sum_accumulate(&sum, tcp, tcp->doff*4);
        tcp->check = sum_final(sum);
      }
      
      int main(void) {
        int tun_fd = tun_alloc("inject_dev%d");
        systemf("ip link set %s up", devname);
        systemf("ip addr add 192.168.42.1/24 dev %s", devname);
      
        struct {
          struct iphdr ip;
          struct tcphdr tcp;
          unsigned char tcp_opts[20];
        } __attribute__((packed)) syn_packet = {
          .ip = {
            .ihl = sizeof(struct iphdr)/4,
            .version = 4,
            .tot_len = htons(sizeof(syn_packet)),
            .ttl = 30,
            .protocol = IPPROTO_TCP,
            /* FIXUP check */
            .saddr = IPADDR(192,168,42,2),
            .daddr = IPADDR(192,168,42,1)
          },
          .tcp = {
            .source = htons(1),
            .dest = htons(1337),
            .seq = 0x12345678,
            .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
            .syn = 1,
            .window = htons(64),
            .check = 0 /*FIXUP*/
          },
          .tcp_opts = {
            /* INVALID: trailing MD5SIG opcode after NOPs */
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 19
          }
        };
        fix_ip_sum(&syn_packet.ip);
        fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
        while (1) {
          int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
          if (write_res != sizeof(syn_packet))
            err(1, "packet write failed");
        }
      }
      ====================================
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09a37b36
    • Cong Wang's avatar
      llc: delete timers synchronously in llc_sk_free() · 6ebd6a11
      Cong Wang authored
      
      [ Upstream commit b905ef9a ]
      
      The connection timers of an llc sock could be still flying
      after we delete them in llc_sk_free(), and even possibly
      after we free the sock. We could just wait synchronously
      here in case of troubles.
      
      Note, I leave other call paths as they are, since they may
      not have to wait, at least we can change them to synchronously
      when needed.
      
      Also, move the code to net/llc/llc_conn.c, which is apparently
      a better place.
      
      Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ebd6a11
    • Eric Dumazet's avatar
      net: validate attribute sizes in neigh_dump_table() · d6e78baf
      Eric Dumazet authored
      
      [ Upstream commit 7dd07c14 ]
      
      Since neigh_dump_table() calls nlmsg_parse() without giving policy
      constraints, attributes can have arbirary size that we must validate
      
      Reported by syzbot/KMSAN :
      
      BUG: KMSAN: uninit-value in neigh_master_filtered net/core/neighbour.c:2292 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_table net/core/neighbour.c:2348 [inline]
      BUG: KMSAN: uninit-value in neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
      CPU: 1 PID: 3575 Comm: syzkaller268891 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       neigh_master_filtered net/core/neighbour.c:2292 [inline]
       neigh_dump_table net/core/neighbour.c:2348 [inline]
       neigh_dump_info+0x1af0/0x2250 net/core/neighbour.c:2438
       netlink_dump+0x9ad/0x1540 net/netlink/af_netlink.c:2225
       __netlink_dump_start+0x1167/0x12a0 net/netlink/af_netlink.c:2322
       netlink_dump_start include/linux/netlink.h:214 [inline]
       rtnetlink_rcv_msg+0x1435/0x1560 net/core/rtnetlink.c:4598
       netlink_rcv_skb+0x355/0x5f0 net/netlink/af_netlink.c:2447
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4653
       netlink_unicast_kernel net/netlink/af_netlink.c:1311 [inline]
       netlink_unicast+0x1672/0x1750 net/netlink/af_netlink.c:1337
       netlink_sendmsg+0x1048/0x1310 net/netlink/af_netlink.c:1900
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43fed9
      RSP: 002b:00007ffddbee2798 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fed9
      RDX: 0000000000000000 RSI: 0000000020005000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401800
      R13: 0000000000401890 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       netlink_alloc_large_skb net/netlink/af_netlink.c:1183 [inline]
       netlink_sendmsg+0x9a6/0x1310 net/netlink/af_netlink.c:1875
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
       __sys_sendmsg net/socket.c:2080 [inline]
       SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091
       SyS_sendmsg+0x54/0x80 net/socket.c:2087
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 21fdd092 ("net: Add support for filtering neigh dump by master device")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6e78baf
    • Guillaume Nault's avatar
      l2tp: check sockaddr length in pppol2tp_connect() · ddecae86
      Guillaume Nault authored
      
      [ Upstream commit eb1c28c0 ]
      
      Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that
      it actually points to valid data.
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddecae86
    • Eric Biggers's avatar
      KEYS: DNS: limit the length of option strings · 153e9cdb
      Eric Biggers authored
      
      [ Upstream commit 9c438d7a ]
      
      Adding a dns_resolver key whose payload contains a very long option name
      resulted in that string being printed in full.  This hit the WARN_ONCE()
      in set_precision() during the printk(), because printk() only supports a
      precision of up to 32767 bytes:
      
          precision 1000000 too large
          WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
      
      Fix it by limiting option strings (combined name + value) to a much more
      reasonable 128 bytes.  The exact limit is arbitrary, but currently the
      only recognized option is formatted as "dnserror=%lu" which fits well
      within this limit.
      
      Also ratelimit the printks.
      
      Reproducer:
      
          perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
      
      This bug was found using syzkaller.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Fixes: 4a2d7892 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      153e9cdb
    • Xin Long's avatar
      bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave · 654fecae
      Xin Long authored
      [ Upstream commit ddea788c ]
      
      After Commit 8a8efa22 ("bonding: sync netpoll code with bridge"), it
      would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
      if bond->dev->npinfo was set.
      
      However now slave_dev npinfo is set with bond->dev->npinfo before calling
      slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
      in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
      It causes that the lower dev of this slave dev can't set its npinfo.
      
      One way to reproduce it:
      
        # modprobe bonding
        # brctl addbr br0
        # brctl addif br0 eth1
        # ifconfig bond0 192.168.122.1/24 up
        # ifenslave bond0 eth2
        # systemctl restart netconsole
        # ifenslave bond0 br0
        # ifconfig eth2 down
        # systemctl restart netconsole
      
      The netpoll won't really work.
      
      This patch is to remove that slave_dev npinfo setting in bond_enslave().
      
      Fixes: 8a8efa22 ("bonding: sync netpoll code with bridge")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      654fecae
    • Martin Schwidefsky's avatar
      s390: correct module section names for expoline code revert · 01d5df54
      Martin Schwidefsky authored
      [ Upstream commit 6cf09958 ]
      
      The main linker script vmlinux.lds.S for the kernel image merges
      the expoline code patch tables into two section ".nospec_call_table"
      and ".nospec_return_table". This is *not* done for the modules,
      there the sections retain their original names as generated by gcc:
      ".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg".
      
      The module_finalize code has to check for the compiler generated
      section names, otherwise no code patching is done. This slows down
      the module code in case of "spectre_v2=off".
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01d5df54
    • Martin Schwidefsky's avatar
      s390: correct nospec auto detection init order · c2d46e7b
      Martin Schwidefsky authored
      [ Upstream commit 6a3d1e81 ]
      
      With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via
      early_initcall is done *after* the early_param functions. This
      overwrites any settings done with the nobp/no_spectre_v2/spectre_v2
      parameters. The code patching for the kernel is done after the
      evaluation of the early parameters but before the early_initcall
      is done. The end result is a kernel image that is patched correctly
      but the kernel modules are not.
      
      Make sure that the nospec auto detection function is called before the
      early parameters are evaluated and before the code patching is done.
      
      Fixes: 6e179d64 ("s390: add automatic detection of the spectre defense")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2d46e7b
    • Martin Schwidefsky's avatar
      s390: add sysfs attributes for spectre · 6aa300a0
      Martin Schwidefsky authored
      [ Upstream commit d424986f ]
      
      Set CONFIG_GENERIC_CPU_VULNERABILITIES and provide the two functions
      cpu_show_spectre_v1 and cpu_show_spectre_v2 to report the spectre
      mitigations.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6aa300a0
    • Martin Schwidefsky's avatar
      s390: report spectre mitigation via syslog · 6fbb47b1
      Martin Schwidefsky authored
      [ Upstream commit bc035599 ]
      
      Add a boot message if either of the spectre defenses is active.
      The message is
          "Spectre V2 mitigation: execute trampolines."
      or  "Spectre V2 mitigation: limited branch prediction."
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6fbb47b1
    • Martin Schwidefsky's avatar
      s390: add automatic detection of the spectre defense · 768da41f
      Martin Schwidefsky authored
      [ Upstream commit 6e179d64 ]
      
      Automatically decide between nobp vs. expolines if the spectre_v2=auto
      kernel parameter is specified or CONFIG_EXPOLINE_AUTO=y is set.
      
      The decision made at boot time due to CONFIG_EXPOLINE_AUTO=y being set
      can be overruled with the nobp, nospec and spectre_v2 kernel parameters.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      768da41f
    • Martin Schwidefsky's avatar
      s390: move nobp parameter functions to nospec-branch.c · 5298e6f9
      Martin Schwidefsky authored
      [ Upstream commit b2e2f43a ]
      
      Keep the code for the nobp parameter handling with the code for
      expolines. Both are related to the spectre v2 mitigation.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5298e6f9
    • Christian Borntraeger's avatar
      s390/entry.S: fix spurious zeroing of r0 · ec440c15
      Christian Borntraeger authored
      [ Upstream commit d3f46896 ]
      
      when a system call is interrupted we might call the critical section
      cleanup handler that re-does some of the operations. When we are between
      .Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
      problem state registers r0-r7:
      
      .Lcleanup_system_call:
      [...]
      0:      # update accounting time stamp
              mvc     __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
              # set up saved register r11
              lg      %r15,__LC_KERNEL_STACK
              la      %r9,STACK_FRAME_OVERHEAD(%r15)
              stg     %r9,24(%r11)            # r11 pt_regs pointer
              # fill pt_regs
              mvc     __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
      --->    stmg    %r0,%r7,__PT_R0(%r9)
      
      The problem is now, that we might have already zeroed out r0.
      The fix is to move the zeroing of r0 after sysc_do_svc.
      Reported-by: default avatarFarhan Ali <alifm@linux.vnet.ibm.com>
      Fixes: 7041d281 ("s390: scrub registers on kernel entry and KVM exit")
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec440c15
    • Martin Schwidefsky's avatar
      s390: do not bypass BPENTER for interrupt system calls · bbf89d78
      Martin Schwidefsky authored
      [ Upstream commit d5feec04 ]
      
      The system call path can be interrupted before the switch back to the
      standard branch prediction with BPENTER has been done. The critical
      section cleanup code skips forward to .Lsysc_do_svc and bypasses the
      BPENTER. In this case the kernel and all subsequent code will run with
      the limited branch prediction.
      
      Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbf89d78
    • Eugeniu Rosca's avatar
      s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*) · f3303788
      Eugeniu Rosca authored
      [ Upstream commit 2cb370d6 ]
      
      I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which
      obviously always evaluate to false. Fix this.
      
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarEugeniu Rosca <erosca@de.adit-jv.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3303788
    • Martin Schwidefsky's avatar
      s390: introduce execute-trampolines for branches · d6e925ec
      Martin Schwidefsky authored
      [ Upstream commit f19fbd5e ]
      
      Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
      -mfunction_return= compiler options to create a kernel fortified against
      the specte v2 attack.
      
      With CONFIG_EXPOLINE=y all indirect branches will be issued with an
      execute type instruction. For z10 or newer the EXRL instruction will
      be used, for older machines the EX instruction. The typical indirect
      call
      
      	basr	%r14,%r1
      
      is replaced with a PC relative call to a new thunk
      
      	brasl	%r14,__s390x_indirect_jump_r1
      
      The thunk contains the EXRL/EX instruction to the indirect branch
      
      __s390x_indirect_jump_r1:
      	exrl	0,0f
      	j	.
      0:	br	%r1
      
      The detour via the execute type instruction has a performance impact.
      To get rid of the detour the new kernel parameter "nospectre_v2" and
      "spectre_v2=[on,off,auto]" can be used. If the parameter is specified
      the kernel and module code will be patched at runtime.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6e925ec
    • Martin Schwidefsky's avatar
      s390: run user space and KVM guests with modified branch prediction · 2d863392
      Martin Schwidefsky authored
      [ Upstream commit 6b73044b ]
      
      Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
      plumbing in entry.S to be able to run user space and KVM guests with
      limited branch prediction.
      
      To switch a user space process to limited branch prediction the
      s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
      guest associated with the current task with limited branch prediction
      call s390_isolate_bp_guest().
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d863392
    • Martin Schwidefsky's avatar
      s390: add options to change branch prediction behaviour for the kernel · 76c45c1b
      Martin Schwidefsky authored
      [ Upstream commit d768bd89 ]
      
      Add the PPA instruction to the system entry and exit path to switch
      the kernel to a different branch prediction behaviour. The instructions
      are added via CPU alternatives and can be disabled with the "nospec"
      or the "nobp=0" kernel parameter. If the default behaviour selected
      with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
      used to enable the changed kernel branch prediction.
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c45c1b
    • Martin Schwidefsky's avatar
      s390/alternative: use a copy of the facility bit mask · bc650e56
      Martin Schwidefsky authored
      [ Upstream commit cf148998 ]
      
      To be able to switch off specific CPU alternatives with kernel parameters
      make a copy of the facility bit mask provided by STFLE and use the copy
      for the decision to apply an alternative.
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc650e56
    • Martin Schwidefsky's avatar
      s390: add optimized array_index_mask_nospec · cf6bc919
      Martin Schwidefsky authored
      [ Upstream commit e2dd8333 ]
      
      Add an optimized version of the array_index_mask_nospec function for
      s390 based on a compare and a subtract with borrow.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf6bc919
    • Martin Schwidefsky's avatar
      s390: scrub registers on kernel entry and KVM exit · e24d6c4b
      Martin Schwidefsky authored
      [ Upstream commit 7041d281 ]
      
      Clear all user space registers on entry to the kernel and all KVM guest
      registers on KVM guest exit if the register does not contain either a
      parameter or a result value.
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e24d6c4b
    • Christian Borntraeger's avatar
      KVM: s390: wire up bpb feature · 1482b96a
      Christian Borntraeger authored
      [ Upstream commit 35b3fde6 ]
      
      The new firmware interfaces for branch prediction behaviour changes
      are transparently available for the guest. Nevertheless, there is
      new state attached that should be migrated and properly resetted.
      Provide a mechanism for handling reset, migration and VSIE.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      [Changed capability number to 152. - Radim]
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1482b96a
    • Heiko Carstens's avatar
      s390: enable CPU alternatives unconditionally · 243f3abc
      Heiko Carstens authored
      [ Upstream commit 049a2c2d ]
      
      Remove the CPU_ALTERNATIVES config option and enable the code
      unconditionally. The config option was only added to avoid a conflict
      with the named saved segment support. Since that code is gone there is
      no reason to keep the CPU_ALTERNATIVES config option.
      
      Just enable it unconditionally to also reduce the number of config
      options and make it less likely that something breaks.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      243f3abc
    • Vasily Gorbik's avatar
      s390: introduce CPU alternatives · 4eaf8dac
      Vasily Gorbik authored
      [ Upstream commit 686140a1 ]
      
      Implement CPU alternatives, which allows to optionally patch newer
      instructions at runtime, based on CPU facilities availability.
      
      A new kernel boot parameter "noaltinstr" disables patching.
      
      Current implementation is derived from x86 alternatives. Although
      ideal instructions padding (when altinstr is longer then oldinstr)
      is added at compile time, and no oldinstr nops optimization has to be
      done at runtime. Also couple of compile time sanity checks are done:
      1. oldinstr and altinstr must be <= 254 bytes long,
      2. oldinstr and altinstr must not have an odd length.
      
      alternative(oldinstr, altinstr, facility);
      alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
      
      Both compile time and runtime padding consists of either 6/4/2 bytes nop
      or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
      
      .altinstructions and .altinstr_replacement sections are part of
      __init_begin : __init_end region and are freed after initialization.
      Signed-off-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4eaf8dac
    • Karthikeyan Periyasamy's avatar
      Revert "ath10k: send (re)assoc peer command when NSS changed" · 8cd93573
      Karthikeyan Periyasamy authored
      commit 55cc11da upstream.
      
      This reverts commit 55884c04.
      
      When Ath10k is in AP mode and an unassociated STA sends a VHT action frame
      (Operating Mode Notification for the NSS change) periodically to AP this causes
      ath10k to call ath10k_station_assoc() which sends WMI_PEER_ASSOC_CMDID during
      NSS update. Over the time (with a certain client it can happen within 15 mins
      when there are over 500 of these VHT action frames) continuous calls of
      WMI_PEER_ASSOC_CMDID cause firmware to assert due to resource exhaust.
      
      To my knowledge setting WMI_PEER_NSS peer param itself enough to handle NSS
      updates and no need to call ath10k_station_assoc(). So revert the original
      commit from 2014 as it's unclear why the change was really needed.
      Now the firmware assert doesn't happen anymore.
      
      Issue observed in QCA9984 platform with firmware version:10.4-3.5.3-00053.
      This Change tested in QCA9984 with firmware version: 10.4-3.5.3-00053 and
      QCA988x platform with firmware version: 10.2.4-1.0-00036.
      
      Firmware Assert log:
      
      ath10k_pci 0002:01:00.0: firmware crashed! (guid e61f1274-9acd-4c5b-bcca-e032ea6e723c)
      ath10k_pci 0002:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
      ath10k_pci 0002:01:00.0: kconfig debug 1 debugfs 1 tracing 0 dfs 1 testmode 1
      ath10k_pci 0002:01:00.0: firmware ver 10.4-3.5.3-00053 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 4c56a386
      ath10k_pci 0002:01:00.0: board_file api 2 bmi_id 0:4 crc32 c2271344
      ath10k_pci 0002:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal otp max-sta 512 raw 0 hwcrypto 1
      ath10k_pci 0002:01:00.0: firmware register dump:
      ath10k_pci 0002:01:00.0: [00]: 0x0000000A 0x000015B3 0x00981E5F 0x00975B31
      ath10k_pci 0002:01:00.0: [04]: 0x00981E5F 0x00060530 0x00000011 0x00446C60
      ath10k_pci 0002:01:00.0: [08]: 0x0042F1FC 0x00458080 0x00000017 0x00000000
      ath10k_pci 0002:01:00.0: [12]: 0x00000009 0x00000000 0x00973ABC 0x00973AD2
      ath10k_pci 0002:01:00.0: [16]: 0x00973AB0 0x00960E62 0x009606CA 0x00000000
      ath10k_pci 0002:01:00.0: [20]: 0x40981E5F 0x004066DC 0x00400000 0x00981E34
      ath10k_pci 0002:01:00.0: [24]: 0x80983B48 0x0040673C 0x000000C0 0xC0981E5F
      ath10k_pci 0002:01:00.0: [28]: 0x80993DEB 0x0040676C 0x00431AB8 0x0045D0C4
      ath10k_pci 0002:01:00.0: [32]: 0x80993E5C 0x004067AC 0x004303C0 0x0045D0C4
      ath10k_pci 0002:01:00.0: [36]: 0x80994AAB 0x004067DC 0x00000000 0x0045D0C4
      ath10k_pci 0002:01:00.0: [40]: 0x809971A0 0x0040681C 0x004303C0 0x00441B00
      ath10k_pci 0002:01:00.0: [44]: 0x80991904 0x0040688C 0x004303C0 0x0045D0C4
      ath10k_pci 0002:01:00.0: [48]: 0x80963AD3 0x00406A7C 0x004303C0 0x009918FC
      ath10k_pci 0002:01:00.0: [52]: 0x80960E80 0x00406A9C 0x0000001F 0x00400000
      ath10k_pci 0002:01:00.0: [56]: 0x80960E51 0x00406ACC 0x00400000 0x00000000
      ath10k_pci 0002:01:00.0: Copy Engine register dump:
      ath10k_pci 0002:01:00.0: index: addr: sr_wr_idx: sr_r_idx: dst_wr_idx: dst_r_idx:
      ath10k_pci 0002:01:00.0: [00]: 0x0004a000 15 15 3 3
      ath10k_pci 0002:01:00.0: [01]: 0x0004a400 17 17 212 213
      ath10k_pci 0002:01:00.0: [02]: 0x0004a800 21 21 20 21
      ath10k_pci 0002:01:00.0: [03]: 0x0004ac00 25 25 27 25
      ath10k_pci 0002:01:00.0: [04]: 0x0004b000 515 515 144 104
      ath10k_pci 0002:01:00.0: [05]: 0x0004b400 28 28 155 156
      ath10k_pci 0002:01:00.0: [06]: 0x0004b800 12 12 12 12
      ath10k_pci 0002:01:00.0: [07]: 0x0004bc00 1 1 1 1
      ath10k_pci 0002:01:00.0: [08]: 0x0004c000 0 0 127 0
      ath10k_pci 0002:01:00.0: [09]: 0x0004c400 1 1 1 1
      ath10k_pci 0002:01:00.0: [10]: 0x0004c800 0 0 0 0
      ath10k_pci 0002:01:00.0: [11]: 0x0004cc00 0 0 0 0
      ath10k_pci 0002:01:00.0: CE[1] write_index 212 sw_index 213 hw_index 0 nentries_mask 0x000001ff
      ath10k_pci 0002:01:00.0: CE[2] write_index 20 sw_index 21 hw_index 0 nentries_mask 0x0000007f
      ath10k_pci 0002:01:00.0: CE[5] write_index 155 sw_index 156 hw_index 0 nentries_mask 0x000001ff
      ath10k_pci 0002:01:00.0: DMA addr: nbytes: meta data: byte swap: gather:
      ath10k_pci 0002:01:00.0: [455]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [456]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [457]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [458]: 0x594a0038 0 0 0 1
      ath10k_pci 0002:01:00.0: [459]: 0x580c0a42 0 0 0 0
      ath10k_pci 0002:01:00.0: [460]: 0x594a0060 0 0 0 1
      ath10k_pci 0002:01:00.0: [461]: 0x580c0c42 0 0 0 0
      ath10k_pci 0002:01:00.0: [462]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [463]: 0x580c0c42 0 0 0 0
      ath10k_pci 0002:01:00.0: [464]: 0x594a0038 0 0 0 1
      ath10k_pci 0002:01:00.0: [465]: 0x580c0a42 0 0 0 0
      ath10k_pci 0002:01:00.0: [466]: 0x594a0060 0 0 0 1
      ath10k_pci 0002:01:00.0: [467]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [468]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [469]: 0x580c1c42 0 0 0 0
      ath10k_pci 0002:01:00.0: [470]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [471]: 0x580c1c42 0 0 0 0
      ath10k_pci 0002:01:00.0: [472]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [473]: 0x580c1c42 0 0 0 0
      ath10k_pci 0002:01:00.0: [474]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [475]: 0x580c0642 0 0 0 0
      ath10k_pci 0002:01:00.0: [476]: 0x594a0038 0 0 0 1
      ath10k_pci 0002:01:00.0: [477]: 0x580c0842 0 0 0 0
      ath10k_pci 0002:01:00.0: [478]: 0x594a0060 0 0 0 1
      ath10k_pci 0002:01:00.0: [479]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [480]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [481]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [482]: 0x594a0038 0 0 0 1
      ath10k_pci 0002:01:00.0: [483]: 0x580c0842 0 0 0 0
      ath10k_pci 0002:01:00.0: [484]: 0x594a0060 0 0 0 1
      ath10k_pci 0002:01:00.0: [485]: 0x580c0642 0 0 0 0
      ath10k_pci 0002:01:00.0: [486]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [487]: 0x580c0642 0 0 0 0
      ath10k_pci 0002:01:00.0: [488]: 0x594a0038 0 0 0 1
      ath10k_pci 0002:01:00.0: [489]: 0x580c0842 0 0 0 0
      ath10k_pci 0002:01:00.0: [490]: 0x594a0060 0 0 0 1
      ath10k_pci 0002:01:00.0: [491]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [492]: 0x58174040 0 1 0 0
      ath10k_pci 0002:01:00.0: [493]: 0x5a946040 0 1 0 0
      ath10k_pci 0002:01:00.0: [494]: 0x59909040 0 1 0 0
      ath10k_pci 0002:01:00.0: [495]: 0x5ae5a040 0 1 0 0
      ath10k_pci 0002:01:00.0: [496]: 0x58096040 0 1 0 0
      ath10k_pci 0002:01:00.0: [497]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [498]: 0x580c0642 0 0 0 0
      ath10k_pci 0002:01:00.0: [499]: 0x5c1e0040 0 1 0 0
      ath10k_pci 0002:01:00.0: [500]: 0x58153040 0 1 0 0
      ath10k_pci 0002:01:00.0: [501]: 0x58129040 0 1 0 0
      ath10k_pci 0002:01:00.0: [502]: 0x5952f040 0 1 0 0
      ath10k_pci 0002:01:00.0: [503]: 0x59535040 0 1 0 0
      ath10k_pci 0002:01:00.0: [504]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [505]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [506]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [507]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [508]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [509]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [510]: 0x594a0010 0 0 0 1
      ath10k_pci 0002:01:00.0: [511]: 0x580c0042 0 0 0 0
      ath10k_pci 0002:01:00.0: [512]: 0x5adcc040 0 1 0 0
      ath10k_pci 0002:01:00.0: [513]: 0x5cf3d040 0 1 0 0
      ath10k_pci 0002:01:00.0: [514]: 0x5c1e9040 64 1 0 0
      ath10k_pci 0002:01:00.0: [515]: 0x00000000 0 0 0 0
      Signed-off-by: default avatarKarthikeyan Periyasamy <periyasa@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cd93573
    • Sahitya Tummala's avatar
      jbd2: fix use after free in kjournald2() · 87dfe99e
      Sahitya Tummala authored
      commit dbfcef6b upstream.
      
      Below is the synchronization issue between unmount and kjournald2
      contexts, which results into use after free issue in kjournald2().
      Fix this issue by using journal->j_state_lock to synchronize the
      wait_event() done in journal_kill_thread() and the wake_up() done
      in kjournald2().
      
      TASK 1:
      umount cmd:
         |--jbd2_journal_destroy() {
             |--journal_kill_thread() {
                  write_lock(&journal->j_state_lock);
      	    journal->j_flags |= JBD2_UNMOUNT;
      	    ...
      	    write_unlock(&journal->j_state_lock);
      	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
      	    					   kjournald2() {
      						     ...
      						     checks JBD2_UNMOUNT flag and calls goto end-loop;
      						     ...
      						     end_loop:
      						       write_unlock(&journal->j_state_lock);
      						       journal->j_task = NULL; --> If this thread gets
      						       pre-empted here, then TASK 1 wait_event will
      						       exit even before this thread is completely
      						       done.
      	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
      	    ...
      	    write_lock(&journal->j_state_lock);
      	    write_unlock(&journal->j_state_lock);
      	  }
             |--kfree(journal);
           }
      }
      						       wake_up(&journal->j_wait_done_commit); --> this step
      						       now results into use after free issue.
      						   }
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87dfe99e
    • Felix Fietkau's avatar
      ath9k_hw: check if the chip failed to wake up · 9441c6de
      Felix Fietkau authored
      commit a34d0a0d upstream.
      
      In an RFC patch, Sven Eckelmann and Simon Wunderlich reported:
      
      "QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a
      state in which a read of AR_CFG always returns 0xdeadbeef.
      This should not happen when when the power_mode of the device is
      ATH9K_PM_AWAKE."
      
      Include the check for the default register state in the existing MAC
      hang check.
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9441c6de
    • Dmitry Torokhov's avatar
      Input: drv260x - fix initializing overdrive voltage · 666f5e34
      Dmitry Torokhov authored
      commit 74c82dae upstream.
      
      We were accidentally initializing haptics->rated_voltage twice, and did not
      initialize overdrive voltage.
      Acked-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Amit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      666f5e34
    • Grant Grundler's avatar
      r8152: add Linksys USB3GIGV1 id · 79658ce8
      Grant Grundler authored
      commit 90841047 upstream.
      
      This linksys dongle by default comes up in cdc_ether mode.
      This patch allows r8152 to claim the device:
         Bus 002 Device 002: ID 13b1:0041 Linksys
      Signed-off-by: default avatarGrant Grundler <grundler@chromium.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [krzk: Rebase on v4.4]
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79658ce8
    • Chen Feng's avatar
      staging: ion : Donnot wakeup kswapd in ion system alloc · e1df9302
      Chen Feng authored
      commit 2ef23053 upstream.
      
      Since ion alloc can be called by userspace,eg gralloc.
      When it is called frequently, the efficiency of kswapd is
      to low. And the reclaimed memory is too lower. In this way,
      the kswapd can use to much cpu resources.
      
      With 3.5GB DMA Zone and 0.5 Normal Zone.
      
      pgsteal_kswapd_dma 9364140
      pgsteal_kswapd_normal 7071043
      pgscan_kswapd_dma 10428250
      pgscan_kswapd_normal 37840094
      
      With this change the reclaim ratio has greatly improved
      18.9% -> 72.5%
      Signed-off-by: default avatarChen Feng <puck.chen@hisilicon.com>
      Signed-off-by: default avatarLu bing <albert.lubing@hisilicon.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1df9302
    • Jiri Olsa's avatar
      perf: Return proper values for user stack errors · 585af47e
      Jiri Olsa authored
      commit 78b562fb upstream.
      
      Return immediately when we find issue in the user stack checks. The
      error value could get overwritten by following check for
      PERF_SAMPLE_REGS_INTR.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: x86@kernel.org
      Fixes: 60e2364e ("perf: Add ability to sample machine state on interrupt")
      Link: http://lkml.kernel.org/r/20180415092352.12403-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      585af47e
    • Xiaoming Gao's avatar
      x86/tsc: Prevent 32bit truncation in calc_hpet_ref() · 624786b9
      Xiaoming Gao authored
      commit d3878e16 upstream.
      
      The TSC calibration code uses HPET as reference. The conversion normalizes
      the delta of two HPET timestamps:
      
          hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
      
      and then divides the normalized delta of the corresponding TSC timestamps
      by the result to calulate the TSC frequency.
      
          tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
      
      This uses do_div() which takes an u32 as the divisor, which worked so far
      because the HPET frequency was low enough that 'hpetref' never exceeded
      32bit.
      
      On Skylake machines the HPET frequency increased so 'hpetref' can exceed
      32bit. do_div() truncates the divisor, which causes the calibration to
      fail.
      
      Use div64_u64() to avoid the problem.
      
      [ tglx: Fixes whitespace mangled patch and rewrote changelog ]
      Signed-off-by: default avatarXiaoming Gao <newtongao@tencent.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      624786b9
    • Steve French's avatar
      cifs: do not allow creating sockets except with SMB1 posix exensions · 5f3a3e86
      Steve French authored
      commit 1d0cffa6 upstream.
      
      RHBZ: 1453123
      
      Since at least the 3.10 kernel and likely a lot earlier we have
      not been able to create unix domain sockets in a cifs share
      when mounted using the SFU mount option (except when mounted
      with the cifs unix extensions to Samba e.g.)
      Trying to create a socket, for example using the af_unix command from
      xfstests will cause :
      BUG: unable to handle kernel NULL pointer dereference at 00000000
      00000040
      
      Since no one uses or depends on being able to create unix domains sockets
      on a cifs share the easiest fix to stop this vulnerability is to simply
      not allow creation of any other special files than char or block devices
      when sfu is used.
      
      Added update to Ronnie's patch to handle a tcon link leak, and
      to address a buf leak noticed by Gustavo and Colin.
      Acked-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      CC:  Colin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Reported-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f3a3e86
  2. 24 Apr, 2018 4 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.129 · 8e2def05
      Greg Kroah-Hartman authored
      8e2def05
    • Greg Thelen's avatar
      writeback: safer lock nesting · 6f051f89
      Greg Thelen authored
      commit 2e898e4c upstream.
      
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reported-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarWang Long <wanglong19@meituan.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f051f89
    • Amir Goldstein's avatar
      fanotify: fix logic of events on child · 87d7ccbf
      Amir Goldstein authored
      commit 54a307ba upstream.
      
      When event on child inodes are sent to the parent inode mark and
      parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
      will not be delivered to the listener process. However, if the same
      process also has a mount mark, the event to the parent inode will be
      delivered regadless of the mount mark mask.
      
      This behavior is incorrect in the case where the mount mark mask does
      not contain the specific event type. For example, the process adds
      a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
      and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).
      
      A modify event on a file inside that directory (and inside that mount)
      should not create a FAN_MODIFY event, because neither of the marks
      requested to get that event on the file.
      
      Fixes: 1968f5ee ("fanotify: use both marks when possible")
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [natechancellor: Fix small conflict due to lack of 3cd5eca8]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87d7ccbf
    • wangguang's avatar
      ext4: bugfix for mmaped pages in mpage_release_unused_pages() · a529f29a
      wangguang authored
      commit 4e800c03 upstream.
      
      Pages clear buffers after ext4 delayed block allocation failed,
      However, it does not clean its pte_dirty flag.
      if the pages unmap ,in cording to the pte_dirty ,
      unmap_page_range may try to call __set_page_dirty,
      
      which may lead to the bugon at
      mpage_prepare_extent_to_map:head = page_buffers(page);.
      
      This patch just call clear_page_dirty_for_io to clean pte_dirty
      at mpage_release_unused_pages for pages mmaped.
      
      Steps to reproduce the bug:
      
      (1) mmap a file in ext4
      	addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
      	       	            fd, 0);
      	memset(addr, 'i', 4096);
      
      (2) return EIO at
      
      	ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent
      
      which causes this log message to be print:
      
                      ext4_msg(sb, KERN_CRIT,
                              "Delayed block allocation failed for "
                              "inode %lu at logical offset %llu with"
                              " max blocks %u with error %d",
                              inode->i_ino,
                              (unsigned long long)map->m_lblk,
                              (unsigned)map->m_len, -err);
      
      (3)Unmap the addr cause warning at
      
      	__set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page));
      
      (4) wait for a minute,then bugon happen.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarwangguang <wangguang03@zte.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      [@nathanchance: Resolved conflict from lack of 09cbfeaf]
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a529f29a