1. 16 Aug, 2017 3 commits
  2. 10 Aug, 2017 25 commits
    • David S. Miller's avatar
      Merge branch 'sparc64-M7-memcpy' · fa5dc772
      David S. Miller authored
      Babu Moger says:
      
      ====================
      sparc64: Update memcpy, memset etc. for M7/M8 architectures
      
      This series of patches updates the memcpy, memset, copy_to_user, copy_from_user
      etc for SPARC M7/M8 architecture.
      
      New algorithm here takes advantage of the M7/M8 block init store ASIs, with much
      more optimized way to improve the performance. More detail are in code comments.
      
      Tested and compared the latency measured in ticks(NG4memcpy vs new M7memcpy).
      
      1. Memset numbers(Aligned memset)
      
      No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		77		25		-67.53
        7		43		33		-23.25
        32		72		68		 -5.55
        128		164		44		-73.17
        256		335		68		-79.70
        512		511		220		-56.94
        1024		1552		627		-59.60
        2048		3515		1322		-62.38
        4096		6303		2472		-60.78
        8192		13118		4867		-62.89
        16384		26206		10371		-60.42
        32768		52501		18569		-64.63
        65536		100219		35899		-64.17
      
      2. Memcpy numbers(Aligned memcpy)
      
      No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		20		19		-5
        7		29		27		-6.89
        32		30		28		-6.66
        128		89		69		-22.47
        256		142		143		 0.70
        512		341		283		-17.00
        1024		1588		655		-58.75
        2048		3553		1357		-61.80
        4096		7218		2590		-64.11
        8192		13701		5231		-61.82
        16384		28304		10716		-62.13
        32768		56516		22995		-59.31
        65536		115443		50840		-55.96
      
      3. Memset numbers(un-aligned memset)
      
      No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		40		31		-22.5
        7		52		29		-44.2307692308
        32		89		86		-3.3707865169
        128		201		74		-63.184079602
        256		340		154		-54.7058823529
        512		961		335		-65.1404786681
        1024		1799		686		-61.8677042802
        2048		3575		1260		-64.7552447552
        4096		6560		2627		-59.9542682927
        8192		13161		6018		-54.273991338
        16384		26465		10439		-60.5554505951
        32768		52119		18649		-64.2184232238
        65536		101593		35724		-64.8361599717
      
      4. Memcpy numbers(un-aligned memcpy)
      
      No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		26		19		-26.9230769231
        7		48		45		-6.25
        32		52		49		-5.7692307692
        128		284		334		17.6056338028
        256		430		482		12.0930232558
        512		646		690		6.8111455108
        1024		1051		1016		-3.3301617507
        2048		1787		1818		1.7347509793
        4096		3309		3376		2.0247809006
        8192		8151		7444		-8.673782358
        16384		34222		34556		0.9759803635
        32768		87851		95044		8.1877269468
        65536		158331		159572		0.7838010244
      
      There is not much difference in numbers with Un-aligned copies
      between NG4memcpy and M7memcpy because they both mostly use the
      same algorithems.
      
      v2:
       1. Fixed indentation issues found by David Miller
       2. Used ENTRY and ENDPROC for the labels in M7patch.S as suggested by David Miller
       3. Now M8 also will use M7memcpy. Also tested on M8 config.
       4. These patches are created on top of below M8 patches
          https://patchwork.ozlabs.org/patch/792661/
          https://patchwork.ozlabs.org/patch/792662/
          However, I did not see these patches in sparc-next tree. It may be in queue now.
          It is possible these patches might cause some build problems. It will resolve
          once all M8 patches are in sparc-next tree.
      
      v0: Initial version
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa5dc772
    • Babu Moger's avatar
      arch/sparc: Add accurate exception reporting in M7memcpy · 34060b8f
      Babu Moger authored
      Add accurate exception reporting in M7memcpy
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34060b8f
    • Babu Moger's avatar
      arch/sparc: Optimized memcpy, memset, copy_to_user, copy_from_user for M7/M8 · b3a04ed5
      Babu Moger authored
      New algorithm that takes advantage of the M7/M8 block init store
      ASI, ie, overlapping pipelines and miss buffer filling.
      Full details in code comments.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3a04ed5
    • Babu Moger's avatar
      arch/sparc: Rename exception handlers · 1ab32693
      Babu Moger authored
      Rename exception handlers to memcpy_xxx as these
      are going to be used by new memcpy routines and these
      handlers are not exclusive to NG4memcpy anymore.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ab32693
    • Babu Moger's avatar
      arch/sparc: Separate the exception handlers from NG4memcpy · de5c073e
      Babu Moger authored
      Separate the exception handlers from NG4memcpy so that it can be
      used with new memcpy routines. Make a separate file for all these handlers.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de5c073e
    • Sam Ravnborg's avatar
      sparc64: update comments in U3memcpy · 061273f9
      Sam Ravnborg authored
      Update comments about the range the different
      parts of the code copies, the original comments were wrong.
      
      Introduce a few descriptive labels too.
      
      No functional changes.
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      061273f9
    • David S. Miller's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26273939
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.
      
       2) Fix stats handling in bcm_sysport_get_stats(), from Florian
          Fainelli.
      
       3) Reject 16777215 VNI value in geneve_validate(), from Girish
          Moodalbail.
      
       4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.
      
       5) Once a UFO fragmented frame is treated as UFO, we should continue
          doing so. Likewise once a frame has been segmented, we should
          continue doing that and not try to convert it to a UFO frame. From
          Willem de Bruijn.
      
       6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
          prevent races. From Willem de Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        packet: fix tp_reserve race in packet_set_ring
        udp: consistently apply ufo or fragmentation
        net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
        igmp: Fix regression caused by igmp sysctl namespace code.
        geneve: maximum value of VNI cannot be used
        net: systemport: Fix software statistics for SYSTEMPORT Lite
        tipc: remove premature ESTABLISH FSM event at link synchronization
      26273939
    • Willem de Bruijn's avatar
      packet: fix tp_reserve race in packet_set_ring · c27927e3
      Willem de Bruijn authored
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c27927e3
    • Willem de Bruijn's avatar
      udp: consistently apply ufo or fragmentation · 85f1bd9a
      Willem de Bruijn authored
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85f1bd9a
    • David S. Miller's avatar
      sparc64: Revert 16GB huge page support. · 4d9fbf53
      David S. Miller authored
      It overflows the amount of space available in the initial .text section
      of trap handler assembler in some configurations, resulting in build
      failures.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d9fbf53
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f213ad38
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
       1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.
      
       2) Prevent crashes when bringing up sunvdc virtual block devices in
          some environments. From Jim Quigley.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
        sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
        sparc64: recognize and support sparc M8 cpu type
        sparc64: properly name the cpu constants
      f213ad38
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 96d97030
      Xin Long authored
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96d97030
    • Nikolay Borisov's avatar
      igmp: Fix regression caused by igmp sysctl namespace code. · 1714020e
      Nikolay Borisov authored
      Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
      function is only called as part of per-namespace initialization, only if
      CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
      compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
      being left initialized with 0. However, there are certain functions, such as
      ip_mc_join_group which are always compiled and make use of some of those
      sysctls. Let's do a partial revert of the aforementioned commit and move the
      sysctl initialization into inet_init_net, that way they will always have
      sane values.
      
      Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: default avatarGerardo Exequiel Pozzi <vmlinuz386@gmail.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1714020e
    • Girish Moodalbail's avatar
      geneve: maximum value of VNI cannot be used · 04db70d9
      Girish Moodalbail authored
      Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
      of values for it would be from 0 to 16777215 (2^24 -1).  However, one
      cannot create a geneve device with VNI set to 16777215. This patch fixes
      this issue.
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04db70d9
    • Florian Fainelli's avatar
      net: systemport: Fix software statistics for SYSTEMPORT Lite · 50ddfbaf
      Florian Fainelli authored
      With SYSTEMPORT Lite we have holes in our statistics layout that make us
      skip over the hardware MIB counters, bcm_sysport_get_stats() was not
      taking that into account resulting in reporting 0 for all SW-maintained
      statistics, fix this by skipping accordingly.
      
      Fixes: 44a4524c ("net: systemport: Add support for SYSTEMPORT Lite")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50ddfbaf
    • Jon Paul Maloy's avatar
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy authored
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed43594a
    • Jim Quigley's avatar
      sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain · 3ee70591
      Jim Quigley authored
      Using mpgroup to define multiple paths for a virtual disk causes multiple
      virtual-device-port ports to be created for that virtual device.
      Each virtual-device-port port then gets a vdisk created for it by the Linux
      sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
      cannot handle multiple ports for a single vdisk, leading to a kernel panic
      at startup.
      
      This fix prevents more than one vdisk per virtual-device-port being created
      until full virtual disk multipathing (mpgroup) support is implemented.
      Signed-off-by: default avatarJim Quigley <Jim.Quigley@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Reviewed-by: default avatarAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: default avatarAaron Young <aaron.young@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ee70591
    • David S. Miller's avatar
      Merge branch 'sparc64-Use-low-latency-path-to-resume-idle-cpu' · 5389e239
      David S. Miller authored
      Vijay Kumar says:
      
      ====================
      sparc64: Use low latency path to resume idle cpu
      
      CPU_POKE is a low latency path to resume the target cpu if suspended
      using CPU_YIELD. Use CPU_POKE to resume cpu if supported by hypervisor.
      
      	     Hackbench results (lower is better):
      Number of
      Process:		w/o fix		with fix
      1  			0.012		 0.010
      10			0.021		 0.019
      100			0.151		 0.148
      
      Changelog:
      v2:
        - Fixed comments and spacing (2/2)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5389e239
    • Vijay Kumar's avatar
      sparc64: Use CPU_POKE to resume idle cpu · 8536e02e
      Vijay Kumar authored
      Use CPU_POKE hypervisor call to resume idle cpu if supported.
      Signed-off-by: default avatarVijay Kumar <vijay.ac.kumar@oracle.com>
      Reviewed-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8536e02e
    • Vijay Kumar's avatar
      sparc64: Add a new hypercall CPU_POKE · 28d43de7
      Vijay Kumar authored
      This adds a new hypercall CPU_POKE for quickly waking up an idle CPU.
      CPU_POKE should only be sent to valid non-local CPUs.
      Signed-off-by: default avatarRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: default avatarVijay Kumar <vijay.ac.kumar@oracle.com>
      Reviewed-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28d43de7
    • David S. Miller's avatar
      Merge branch 'sparc64-Add-16GB-hugepage-support' · 99274b81
      David S. Miller authored
      Nitin Gupta says:
      
      ====================
      sparc64: Add 16GB hugepage support
      
      SPARC architecture supports 16G hugepages but the kernel did not
      support these. This patch series adds support for it and also cleanes
      up some page walk/alloc functions.
      
      Patch 1/3: Core changes needed to add 16G hugepage support: To map a
        single 16G hugepage, two PUD entries are used. Each PUD entry maps
        8G portion of a 16G page. This page table encoding scheme is same as
        that used for hugepages at PMD level (8M, 256M and 2G pages) where
        each PMD entry points successively to 8M regions within a page.  No
        page table entries below the PUD level are allocated for 16G
        hugepage since those are not required.
      
        TSB entries for a 16G page are created at every 4M boundary since
        the HUGE_TSB is used for these pages which is configured with page
        size of 4M.  When walking page tables (on a TSB miss), bits [32:22]
        are transferred from vaddr to PUD to resolve addresses at 4M
        boundary. The resolved address mapping is then stored in HUGE_TSB.
      
      Patch 2/3: get_user_pages() etc. are used for direct IO. These
        functions were not aware of hugepages at the PUD level and would try
        to continue walking page tables beyond the PUD level. Since 16G
        hugepages have page tables allocated till PUD level only, these
        accesses would result in invalid access. This patch adds the case
        for PUD huge pages to these functions.
      
      Patch 3/3: Patch 1 added the case of PUD entry being huge in page
        table walk and alloc functions. This new case further increased
        nesting in these functions and made them harder to follow. This
        patch flattens these functions for better readability.
      
      Cc: sparclinux@vger.kernel.org
      
      Changelog v5 vs v4:
       - Checking at PUD level for hugepage entry during page table walk is
         patched out if 16GB hugepages are not being used.
      Changelog v4 vs v3:
       - Added cover letter (patch 0/4) for patch series.
      Changelog v3 vs v2:
       - Fixed email headers so the subject shows up correctly.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99274b81
    • Nitin Gupta's avatar
      sparc64: Cleanup hugepage table walk functions · 76379784
      Nitin Gupta authored
      Flatten out nested code structure in huge_pte_offset()
      and huge_pte_alloc().
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76379784
    • Nitin Gupta's avatar
      sparc64: Add 16GB hugepage support · f10bb007
      Nitin Gupta authored
      Adds support for 16GB hugepage size. To use this page size
      use kernel parameters as:
      
      default_hugepagesz=16G hugepagesz=16G hugepages=10
      
      Testing:
      
      Tested with the stream benchmark which allocates 48G of
      arrays backed by 16G hugepages and does RW operation on
      them in parallel.
      
      Orabug: 25362942
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f10bb007
    • Nitin Gupta's avatar
      sparc64: Support huge PUD case in get_user_pages · c9a844c5
      Nitin Gupta authored
      get_user_pages() is used to do direct IO. It already
      handles the case where the address range is backed
      by PMD huge pages. This patch now adds the case where
      the range could be backed by PUD huge pages.
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9a844c5
  3. 09 Aug, 2017 12 commits
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 8d31f80e
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "These are the pin control fixes I have gathered since the return from
        my vacation. They boiled in -next a while so let's get them in.
      
        Apart from the documentation build it is purely driver fixes. Which is
        nice. The Intel fixes seem kind of important.
      
         - Fix the documentation build as the docs were moved
      
         - Correct the UART pin list on the Intel Merrifield
      
         - Fix pin assignment and number of pins on the Marvell Armada 37xx
           pin controller
      
         - Cover the Setzer models in the Chromebook DMI quirk in the Intel
           cheryview driver so they start working
      
         - Add the missing "sim" function to the sunxi driver
      
         - Fix USB pin definitions on Uniphier Pro4
      
         - Smatch fix for invalid reference in the zx pin control driver"
      
      * tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: generic: update references to Documentation/pinctrl.txt
        pinctrl: intel: merrifield: Correct UART pin lists
        pinctrl: armada-37xx: Fix number of pin in south bridge
        pinctrl: armada-37xx: Fix the pin 23 on south bridge
        pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
        pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
        pinctrl: uniphier: fix USB3 pin assignment for Pro4
        pinctrl: zte: fix dereference of 'data' in zx_set_mux()
      8d31f80e
    • Mel Gorman's avatar
      futex: Remove unnecessary warning from get_futex_key · 48fb6f4d
      Mel Gorman authored
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48fb6f4d
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 358f8c26
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "The main thing is to allow empty id_tables for ACPI to make some
        drivers get probed again. It looks a bit bigger than usual because it
        needs some internal renaming, too.
      
        Other than that, there is a fix for broken DSTDs, a super simple
        enablement for ARM MPS, and two documentation fixes which I'd like to
        see in v4.13 already"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rephrase explanation of I2C_CLASS_DEPRECATED
        i2c: allow i2c-versatile for ARM MPS platforms
        i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
        i2c: designware: Print clock freq on invalid clock freq error
        i2c: core: Allow empty id_table in ACPI case as well
        i2c: mux: pinctrl: mention correct module name in Kconfig help text
      358f8c26
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 31cf92f3
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Three patches that should go into this release.
      
        Two of them are from Paolo and fix up some corner cases with BFQ, and
        the last patch is from Ming and fixes up a potential usage count
        imbalance regression due to the recent NOWAIT work"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
        block, bfq: consider also in_service_entity to state whether an entity is active
        block, bfq: reset in_service_entity if it becomes idle
      31cf92f3
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · d555eb6b
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "Fix two regressions in the inside-secure driver with respect to
        hmac(sha1)"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
        crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
      d555eb6b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4530cca1
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "The pull requests are getting smaller, that's progress I suppose :-)
      
         1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.
      
         2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
            from Koichiro Den.
      
         3) Missing u64_stats_init() calls in several drivers, from Florian
            Fainelli.
      
         4) TCP can set the congestion window to an invalid ssthresh value
            after congestion window reductions, from Yuchung Cheng.
      
         5) Fix BPF jit branch generation on s390, from Daniel Borkmann.
      
         6) Correct MIPS ebpf JIT merge, from David Daney.
      
         7) Correct byte order test in BPF test_verifier.c, from Daniel
            Borkmann.
      
         8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.
      
         9) Handle SCTP checksums properly in mlx4 driver, from Davide
            Caratti.
      
        10) We can potentially enter tcp_connect() with a cached route
            already, due to fastopen, so we have to explicitly invalidate it.
      
        11) skb_warn_bad_offload() can bark in legitimate situations, fix from
            Willem de Bruijn"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        net: avoid skb_warn_bad_offload false positives on UFO
        qmi_wwan: fix NULL deref on disconnect
        ppp: fix xmit recursion detection on ppp channels
        rds: Reintroduce statistics counting
        tcp: fastopen: tcp_connect() must refresh the route
        net: sched: set xt_tgchk_param par.net properly in ipt_init_target
        net: dsa: mediatek: add adjust link support for user ports
        net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
        qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
        hysdn: fix to a race condition in put_log_buffer
        s390/qeth: fix L3 next-hop in xmit qeth hdr
        asix: Fix small memory leak in ax88772_unbind()
        asix: Ensure asix_rx_fixup_info members are all reset
        asix: Add rx->ax_skb = NULL after usbnet_skb_return()
        bpf: fix selftest/bpf/test_pkt_md_access on s390x
        netvsc: fix race on sub channel creation
        bpf: fix byte order test in test_verifier
        xgene: Always get clk source, but ignore if it's missing for SGMII ports
        MIPS: Add missing file for eBPF JIT.
        bpf, s390: fix build for libbpf and selftest suite
        ...
      4530cca1
    • Willem de Bruijn's avatar
      net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6
      Willem de Bruijn authored
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d63bee6
    • Bjørn Mork's avatar
      qmi_wwan: fix NULL deref on disconnect · bbae08e5
      Bjørn Mork authored
      qmi_wwan_disconnect is called twice when disconnecting devices with
      separate control and data interfaces.  The first invocation will set
      the interface data to NULL for both interfaces to flag that the
      disconnect has been handled.  But the matching NULL check was left
      out when qmi_wwan_disconnect was added, resulting in this oops:
      
        usb 2-1.4: USB disconnect, device number 4
        qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
        IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Modules linked in: <stripped irrelevant module list>
        CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
        Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
        Workqueue: usb_hub_wq hub_event [usbcore]
        task: ffff8c882b716040 task.stack: ffffb8e800d84000
        RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
        RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
        R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
        R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
        Call Trace:
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
         ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
         ? usbnet_disconnect+0x6c/0xf0 [usbnet]
         ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
      Reported-and-tested-by: default avatarNathaniel Roach <nroach44@gmail.com>
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Cc: Daniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbae08e5
    • Guillaume Nault's avatar
      ppp: fix xmit recursion detection on ppp channels · 0a0e1a85
      Guillaume Nault authored
      Commit e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp
      devices") dropped the xmit_recursion counter incrementation in
      ppp_channel_push() and relied on ppp_xmit_process() for this task.
      But __ppp_channel_push() can also send packets directly (using the
      .start_xmit() channel callback), in which case the xmit_recursion
      counter isn't incremented anymore. If such packets get routed back to
      the parent ppp unit, ppp_xmit_process() won't notice the recursion and
      will call ppp_channel_push() on the same channel, effectively creating
      the deadlock situation that the xmit_recursion mechanism was supposed
      to prevent.
      
      This patch re-introduces the xmit_recursion counter incrementation in
      ppp_channel_push(). Since the xmit_recursion variable is now part of
      the parent ppp unit, incrementation is skipped if the channel doesn't
      have any. This is fine because only packets routed through the parent
      unit may enter the channel recursively.
      
      Finally, we have to ensure that pch->ppp is not going to be modified
      while executing ppp_channel_push(). Instead of taking this lock only
      while calling ppp_xmit_process(), we now have to hold it for the full
      ppp_channel_push() execution. This respects the ppp locks ordering
      which requires locking ->upl before ->downl.
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a0e1a85
    • Håkon Bugge's avatar
      rds: Reintroduce statistics counting · 05bfd7db
      Håkon Bugge authored
      In commit 7e3f2952 ("rds: don't let RDS shutdown a connection
      while senders are present"), refilling the receive queue was removed
      from rds_ib_recv(), along with the increment of
      s_ib_rx_refill_from_thread.
      
      Commit 73ce4317 ("RDS: make sure we post recv buffers")
      re-introduces filling the receive queue from rds_ib_recv(), but does
      not add the statistics counter. rds_ib_recv() was later renamed to
      rds_ib_recv_path().
      
      This commit reintroduces the statistics counting of
      s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: default avatarKnut Omang <knut.omang@oracle.com>
      Reviewed-by: default avatarWei Lin Guay <wei.lin.guay@oracle.com>
      Reviewed-by: default avatarShamir Rabinovitch <shamir.rabinovitch@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05bfd7db
    • Eric Dumazet's avatar
      tcp: fastopen: tcp_connect() must refresh the route · 8ba60924
      Eric Dumazet authored
      With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
      to call tcp_connect() while socket sk_dst_cache is either NULL
      or invalid.
      
       +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
       +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
       +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
       +0 connect(4, ..., ...) = 0
      
      << sk->sk_dst_cache becomes obsolete, or even set to NULL >>
      
       +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
      
      We need to refresh the route otherwise bad things can happen,
      especially when syzkaller is running on the host :/
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ba60924
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.net properly in ipt_init_target · ec0acb09
      Xin Long authored
      Now xt_tgchk_param par in ipt_init_target is a local varibale,
      par.net is not initialized there. Later when xt_check_target
      calls target's checkentry in which it may access par.net, it
      would cause kernel panic.
      
      Jaroslav found this panic when running:
      
        # ip link add TestIface type dummy
        # tc qd add dev TestIface ingress handle ffff:
        # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
          action xt -j CONNMARK --set-mark 4
      
      This patch is to pass net param into ipt_init_target and set
      par.net with it properly in there.
      
      v1->v2:
        As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
        it by also passing net_id to __tcf_ipt_init.
      v2->v3:
        Missed the fixes tag, so add it.
      
      Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put")
      Reported-by: default avatarJaroslav Aster <jaster@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec0acb09