1. 24 Apr, 2018 18 commits
  2. 23 Apr, 2018 12 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 77621f02
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree,
      they are:
      
      1) Fix SIP conntrack with phones sending session descriptions for different
         media types but same port numbers, from Florian Westphal.
      
      2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian
         Anastasov.
      
      3) Skip compat array allocation in ebtables if there is no entries, also
         from Florian.
      
      4) Do not lose left/right bits when shifting marks from xt_connmark, from
         Jack Ma.
      
      5) Silence false positive memleak in conntrack extensions, from Cong Wang.
      
      6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann.
      
      7) Cannot kfree rule that is already in list in nf_tables, switch order
         so this error handling is not required, from Florian Westphal.
      
      8) Release set name in error path, from Florian.
      
      9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell.
      
      10) NAT chain and extensions depend on NF_TABLES.
      
      11) Out of bound access when renaming chains, from Taehee Yoo.
      
      12) Incorrect casting in xt_connmark leads to wrong bitshifting.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77621f02
    • Eric Dumazet's avatar
      ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy · aa8f8778
      Eric Dumazet authored
      KMSAN reported use of uninit-value that I tracked to lack
      of proper size check on RTA_TABLE attribute.
      
      I also believe RTA_PREFSRC lacks a similar check.
      
      Fixes: 86872cb5 ("[IPv6] route: FIB6 configuration using struct fib6_config")
      Fixes: c3968a85 ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa8f8778
    • Xin Long's avatar
      bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave · ddea788c
      Xin Long authored
      After Commit 8a8efa22 ("bonding: sync netpoll code with bridge"), it
      would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
      if bond->dev->npinfo was set.
      
      However now slave_dev npinfo is set with bond->dev->npinfo before calling
      slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
      in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
      It causes that the lower dev of this slave dev can't set its npinfo.
      
      One way to reproduce it:
      
        # modprobe bonding
        # brctl addbr br0
        # brctl addif br0 eth1
        # ifconfig bond0 192.168.122.1/24 up
        # ifenslave bond0 eth2
        # systemctl restart netconsole
        # ifenslave bond0 br0
        # ifconfig eth2 down
        # systemctl restart netconsole
      
      The netpoll won't really work.
      
      This patch is to remove that slave_dev npinfo setting in bond_enslave().
      
      Fixes: 8a8efa22 ("bonding: sync netpoll code with bridge")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddea788c
    • Jann Horn's avatar
      tcp: don't read out-of-bounds opsize · 7e5a206a
      Jann Horn authored
      The old code reads the "opsize" variable from out-of-bounds memory (first
      byte behind the segment) if a broken TCP segment ends directly after an
      opcode that is neither EOL nor NOP.
      
      The result of the read isn't used for anything, so the worst thing that
      could theoretically happen is a pagefault; and since the physmap is usually
      mostly contiguous, even that seems pretty unlikely.
      
      The following C reproducer triggers the uninitialized read - however, you
      can't actually see anything happen unless you put something like a
      pr_warn() in tcp_parse_md5sig_option() to print the opsize.
      
      ====================================
      #define _GNU_SOURCE
      #include <arpa/inet.h>
      #include <stdlib.h>
      #include <errno.h>
      #include <stdarg.h>
      #include <net/if.h>
      #include <linux/if.h>
      #include <linux/ip.h>
      #include <linux/tcp.h>
      #include <linux/in.h>
      #include <linux/if_tun.h>
      #include <err.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/ioctl.h>
      #include <assert.h>
      
      void systemf(const char *command, ...) {
        char *full_command;
        va_list ap;
        va_start(ap, command);
        if (vasprintf(&full_command, command, ap) == -1)
          err(1, "vasprintf");
        va_end(ap);
        printf("systemf: <<<%s>>>\n", full_command);
        system(full_command);
      }
      
      char *devname;
      
      int tun_alloc(char *name) {
        int fd = open("/dev/net/tun", O_RDWR);
        if (fd == -1)
          err(1, "open tun dev");
        static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
        strcpy(req.ifr_name, name);
        if (ioctl(fd, TUNSETIFF, &req))
          err(1, "TUNSETIFF");
        devname = req.ifr_name;
        printf("device name: %s\n", devname);
        return fd;
      }
      
      #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
      
      void sum_accumulate(unsigned int *sum, void *data, int len) {
        assert((len&2)==0);
        for (int i=0; i<len/2; i++) {
          *sum += ntohs(((unsigned short *)data)[i]);
        }
      }
      
      unsigned short sum_final(unsigned int sum) {
        sum = (sum >> 16) + (sum & 0xffff);
        sum = (sum >> 16) + (sum & 0xffff);
        return htons(~sum);
      }
      
      void fix_ip_sum(struct iphdr *ip) {
        unsigned int sum = 0;
        sum_accumulate(&sum, ip, sizeof(*ip));
        ip->check = sum_final(sum);
      }
      
      void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
        unsigned int sum = 0;
        struct {
          unsigned int saddr;
          unsigned int daddr;
          unsigned char pad;
          unsigned char proto_num;
          unsigned short tcp_len;
        } fakehdr = {
          .saddr = ip->saddr,
          .daddr = ip->daddr,
          .proto_num = ip->protocol,
          .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
        };
        sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
        sum_accumulate(&sum, tcp, tcp->doff*4);
        tcp->check = sum_final(sum);
      }
      
      int main(void) {
        int tun_fd = tun_alloc("inject_dev%d");
        systemf("ip link set %s up", devname);
        systemf("ip addr add 192.168.42.1/24 dev %s", devname);
      
        struct {
          struct iphdr ip;
          struct tcphdr tcp;
          unsigned char tcp_opts[20];
        } __attribute__((packed)) syn_packet = {
          .ip = {
            .ihl = sizeof(struct iphdr)/4,
            .version = 4,
            .tot_len = htons(sizeof(syn_packet)),
            .ttl = 30,
            .protocol = IPPROTO_TCP,
            /* FIXUP check */
            .saddr = IPADDR(192,168,42,2),
            .daddr = IPADDR(192,168,42,1)
          },
          .tcp = {
            .source = htons(1),
            .dest = htons(1337),
            .seq = 0x12345678,
            .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
            .syn = 1,
            .window = htons(64),
            .check = 0 /*FIXUP*/
          },
          .tcp_opts = {
            /* INVALID: trailing MD5SIG opcode after NOPs */
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 19
          }
        };
        fix_ip_sum(&syn_packet.ip);
        fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
        while (1) {
          int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
          if (write_res != sizeof(syn_packet))
            err(1, "packet write failed");
        }
      }
      ====================================
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e5a206a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 986e54cd
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-04-21
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a deadlock between mm->mmap_sem and bpf_event_mutex when
         one task is detaching a BPF prog via perf_event_detach_bpf_prog()
         and another one dumping through bpf_prog_array_copy_info(). For
         the latter we move the copy_to_user() out of the bpf_event_mutex
         lock to fix it, from Yonghong.
      
      2) Fix test_sock and test_sock_addr.sh failures. The former was
         hitting rlimit issues and the latter required ping to specify
         the address family, from Yonghong.
      
      3) Remove a dead check in sockmap's sock_map_alloc(), from Jann.
      
      4) Add generated files to BPF kselftests gitignore that were previously
         missed, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      986e54cd
    • Thomas Falcon's avatar
      ibmvnic: Clean actual number of RX or TX pools · 660e309d
      Thomas Falcon authored
      Avoid using value stored in the login response buffer when
      cleaning TX and RX buffer pools since these could be inconsistent
      depending on the device state. Instead use the field in the driver's
      private data that tracks the number of active pools.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      660e309d
    • David S. Miller's avatar
      Merge branch 'net-sched-ife-malformed-ife-packet-fixes' · 906cce04
      David S. Miller authored
      Alexander Aring says:
      
      ====================
      net: sched: ife: malformed ife packet fixes
      
      As promised at netdev 2.2 tc workshop I am working on adding scapy support for
      tdc testing. It is still work in progress. I will submit the patches to tdc
      later (they are not in good shape yet). The good news is I have been able to
      find bugs which normal packet testing would not be able to find.
      With fuzzy testing I was able to craft certain malformed packets that IFE
      action was not able to deal with. This patch set fixes those bugs.
      
      changes since v4:
       - use pskb_may_pull before pointer assign
      
      changes since v3:
       - use pskb_may_pull
      
      changes since v2:
       - remove inline from __ife_tlv_meta_valid
       - add const to cast to meta_tlvhdr
       - add acked and reviewed tags
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      906cce04
    • Alexander Aring's avatar
      net: sched: ife: check on metadata length · d57493d6
      Alexander Aring authored
      This patch checks if sk buffer is available to dererence ife header. If
      not then NULL will returned to signal an malformed ife packet. This
      avoids to crashing the kernel from outside.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d57493d6
    • Alexander Aring's avatar
      net: sched: ife: handle malformed tlv length · cc74eddd
      Alexander Aring authored
      There is currently no handling to check on a invalid tlv length. This
      patch adds such handling to avoid killing the kernel with a malformed
      ife packet.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc74eddd
    • Alexander Aring's avatar
      net: sched: ife: signal not finding metaid · f6cd1453
      Alexander Aring authored
      We need to record stats for received metadata that we dont know how
      to process. Have find_decode_metaid() return -ENOENT to capture this.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6cd1453
    • Doron Roberts-Kedes's avatar
      strparser: Do not call mod_delayed_work with a timeout of LONG_MAX · 7c5aba21
      Doron Roberts-Kedes authored
      struct sock's sk_rcvtimeo is initialized to
      LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling
      mod_delayed_work with a timeout of LONG_MAX causes spurious execution of
      the work function. timer->expires is set equal to jiffies + LONG_MAX.
      When timer_base->clk falls behind the current value of jiffies,
      the delta between timer_base->clk and jiffies + LONG_MAX causes the
      expiration to be in the past. Returning early from strp_start_timer if
      timeo == LONG_MAX solves this problem.
      
      Found while testing net/tls_sw recv path.
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c5aba21
    • Ahmed Abdelsalam's avatar
      ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts · a957fa19
      Ahmed Abdelsalam authored
      In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
      in order to set the src addr of outer IPv6 header.
      
      The net_device is required for set_tun_src(). However calling ip6_dst_idev()
      on dst_entry in case of IPv4 traffic results on the following bug.
      
      Using just dst->dev should fix this BUG.
      
      [  196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [  196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0
      [  196.243329] Oops: 0000 [#1] SMP PTI
      [  196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci
      [  196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1
      [  196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300
      [  196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202
      [  196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000
      [  196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850
      [  196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800
      [  196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808
      [  196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200
      [  196.246846] FS:  00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000
      [  196.247286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0
      [  196.247804] Call Trace:
      [  196.247972]  seg6_do_srh+0x15b/0x1c0
      [  196.248156]  seg6_output+0x3c/0x220
      [  196.248341]  ? prandom_u32+0x14/0x20
      [  196.248526]  ? ip_idents_reserve+0x6c/0x80
      [  196.248723]  ? __ip_select_ident+0x90/0x100
      [  196.248923]  ? ip_append_data.part.50+0x6c/0xd0
      [  196.249133]  lwtunnel_output+0x44/0x70
      [  196.249328]  ip_send_skb+0x15/0x40
      [  196.249515]  raw_sendmsg+0x8c3/0xac0
      [  196.249701]  ? _copy_from_user+0x2e/0x60
      [  196.249897]  ? rw_copy_check_uvector+0x53/0x110
      [  196.250106]  ? _copy_from_user+0x2e/0x60
      [  196.250299]  ? copy_msghdr_from_user+0xce/0x140
      [  196.250508]  sock_sendmsg+0x36/0x40
      [  196.250690]  ___sys_sendmsg+0x292/0x2a0
      [  196.250881]  ? _cond_resched+0x15/0x30
      [  196.251074]  ? copy_termios+0x1e/0x70
      [  196.251261]  ? _copy_to_user+0x22/0x30
      [  196.251575]  ? tty_mode_ioctl+0x1c3/0x4e0
      [  196.251782]  ? _cond_resched+0x15/0x30
      [  196.251972]  ? mutex_lock+0xe/0x30
      [  196.252152]  ? vvar_fault+0xd2/0x110
      [  196.252337]  ? __do_fault+0x1f/0xc0
      [  196.252521]  ? __handle_mm_fault+0xc1f/0x12d0
      [  196.252727]  ? __sys_sendmsg+0x63/0xa0
      [  196.252919]  __sys_sendmsg+0x63/0xa0
      [  196.253107]  do_syscall_64+0x72/0x200
      [  196.253305]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  196.253530] RIP: 0033:0x7fc4480b0690
      [  196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690
      [  196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003
      [  196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002
      [  196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070
      [  196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe
      [  196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10
      [  196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60
      [  196.256445] CR2: 0000000000000000
      [  196.256676] ---[ end trace 71af7d093603885c ]---
      
      Fixes: 8936ef76 ("ipv6: sr: fix NULL pointer dereference when setting encap source address")
      Signed-off-by: default avatarAhmed Abdelsalam <amsalam20@gmail.com>
      Acked-by: default avatarDavid Lebrun <dlebrun@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a957fa19
  3. 22 Apr, 2018 10 commits
    • Cong Wang's avatar
      llc: fix NULL pointer deref for SOCK_ZAPPED · 3a04ce71
      Cong Wang authored
      For SOCK_ZAPPED socket, we don't need to care about llc->sap,
      so we should just skip these refcount functions in this case.
      
      Fixes: f7e43672 ("llc: hold llc_sap before release_sock()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a04ce71
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: cpsw: fix tx vlan priority mapping · 5e391dc5
      Ivan Khoronzhuk authored
      The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping
      register and basically replaces vlan pcp field for tagged packets.
      So, set it to be 1:1 mapping. Otherwise, it will cause unexpected
      change of egress vlan tagged packets, like prio 2 -> prio 5.
      
      Fixes: e05107e6 ("net: ethernet: ti: cpsw: add multi queue support")
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e391dc5
    • Cong Wang's avatar
      llc: delete timers synchronously in llc_sk_free() · b905ef9a
      Cong Wang authored
      The connection timers of an llc sock could be still flying
      after we delete them in llc_sk_free(), and even possibly
      after we free the sock. We could just wait synchronously
      here in case of troubles.
      
      Note, I leave other call paths as they are, since they may
      not have to wait, at least we can change them to synchronously
      when needed.
      
      Also, move the code to net/llc/llc_conn.c, which is apparently
      a better place.
      
      Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b905ef9a
    • Guillaume Nault's avatar
      l2tp: fix {pppol2tp, l2tp_dfs}_seq_stop() in case of seq_file overflow · 5411b618
      Guillaume Nault authored
      Commit 0e0c3fee ("l2tp: hold reference on tunnels printed in pppol2tp proc file")
      assumed that if pppol2tp_seq_stop() was called with non-NULL private
      data (the 'v' pointer), then pppol2tp_seq_start() would not be called
      again. It turns out that this isn't guaranteed, and overflowing the
      seq_file's buffer in pppol2tp_seq_show() is a way to get into this
      situation.
      
      Therefore, pppol2tp_seq_stop() needs to reset pd->tunnel, so that
      pppol2tp_seq_start() won't drop a reference again if it gets called.
      We also have to clear pd->session, because the rest of the code expects
      a non-NULL tunnel when pd->session is set.
      
      The l2tp_debugfs module has the same issue. Fix it in the same way.
      
      Fixes: 0e0c3fee ("l2tp: hold reference on tunnels printed in pppol2tp proc file")
      Fixes: f726214d ("l2tp: hold reference on tunnels printed in l2tp/tunnels debugfs file")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5411b618
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · 353697e6
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-04-19
      
      Please apply the following qeth fixes for 4.17. The common theme
      seems to be error handling improvements in various areas of cmd IO.
      
      Patches 1-3 should also go back to stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      353697e6
    • Julian Wiedmann's avatar
      s390/qeth: use Read device to query hypervisor for MAC · b7493e91
      Julian Wiedmann authored
      For z/VM NICs, qeth needs to consider which of the three CCW devices in
      an MPC group it uses for requesting a managed MAC address.
      
      On the Base device, the hypervisor returns a default MAC which is
      pre-assigned when creating the NIC (this MAC is also returned by the
      READ MAC primitive). Querying any other device results in the allocation
      of an additional MAC address.
      
      For consistency with READ MAC and to avoid using up more addresses than
      necessary, it is preferable to use the NIC's default MAC. So switch the
      the diag26c over to using a NIC's Read device, which should always be
      identical to the Base device.
      
      Fixes: ec61bd2f ("s390/qeth: use diag26c to get MAC address on L2")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7493e91
    • Julian Wiedmann's avatar
      s390/qeth: fix request-side race during cmd IO timeout · db71bbbd
      Julian Wiedmann authored
      Submitting a cmd IO request (usually on the WRITE device, but for IDX
      also on the READ device) is currently done with ccw_device_start()
      and a manual timeout in the caller.
      On timeout, the caller cleans up the related resources (eg. IO buffer).
      But 1) the IO might still be active and utilize those resources, and
          2) when the IO completes, qeth_irq() will attempt to clean up the
             same resources again.
      
      Instead of introducing additional resource locking, switch to
      ccw_device_start_timeout() to ensure IO termination after timeout, and
      let the IRQ handler alone deal with cleaning up after a request.
      
      This also removes a stray write->irq_pending reset from
      clear_ipacmd_list(). The routine doesn't terminate any pending IO on
      the WRITE device, so this should be handled properly via IO timeout
      in the IRQ handler.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db71bbbd
    • Julian Wiedmann's avatar
      s390/qeth: fix MAC address update sequence · bcacfcbc
      Julian Wiedmann authored
      When changing the MAC address on a L2 qeth device, current code first
      unregisters the old address, then registers the new one.
      If HW rejects the new address (or the IO fails), the device ends up with
      no operable address at all.
      
      Re-order the code flow so that the old address only gets dropped if the
      new address was registered successfully. While at it, add logic to catch
      some corner-cases.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcacfcbc
    • Julian Wiedmann's avatar
      s390/qeth: handle failure on workqueue creation · a936b1ef
      Julian Wiedmann authored
      Creating the global workqueue during driver init may fail, deal with it.
      Also, destroy the created workqueue on any subsequent error.
      
      Fixes: 0f54761d ("qeth: Support VEPA mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a936b1ef
    • Julian Wiedmann's avatar
      s390/qeth: avoid control IO completion stalls · 901e3f49
      Julian Wiedmann authored
      For control IO, qeth currently tracks the index of the buffer that it
      expects to complete the next IO on each qeth_channel. If the channel
      presents an IRQ while this buffer has not yet completed, no completion
      processing for _any_ completed buffer takes place.
      So if the 'next buffer' is skipped for any sort of reason* (eg. when it
      is released due to error conditions, before the IO is started), the
      buffer obviously won't switch to PROCESSED until it is eventually
      allocated for a _different_ IO and completes.
      Until this happens, all completion processing on that channel stalls
      and pending requests possibly time out.
      
      As a fix, remove the whole 'next buffer' logic and simply process any
      IO buffer right when it completes. A channel will never have more than
      one IO pending, so there's no risk of processing out-of-sequence.
      
      *Note: currently just one location in the code really handles this problem,
             by advancing the 'next' index manually.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      901e3f49