1. 16 Aug, 2017 16 commits
  2. 10 Aug, 2017 24 commits
    • David S. Miller's avatar
      Merge branch 'sparc64-M7-memcpy' · fa5dc772
      David S. Miller authored
      Babu Moger says:
      
      ====================
      sparc64: Update memcpy, memset etc. for M7/M8 architectures
      
      This series of patches updates the memcpy, memset, copy_to_user, copy_from_user
      etc for SPARC M7/M8 architecture.
      
      New algorithm here takes advantage of the M7/M8 block init store ASIs, with much
      more optimized way to improve the performance. More detail are in code comments.
      
      Tested and compared the latency measured in ticks(NG4memcpy vs new M7memcpy).
      
      1. Memset numbers(Aligned memset)
      
      No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		77		25		-67.53
        7		43		33		-23.25
        32		72		68		 -5.55
        128		164		44		-73.17
        256		335		68		-79.70
        512		511		220		-56.94
        1024		1552		627		-59.60
        2048		3515		1322		-62.38
        4096		6303		2472		-60.78
        8192		13118		4867		-62.89
        16384		26206		10371		-60.42
        32768		52501		18569		-64.63
        65536		100219		35899		-64.17
      
      2. Memcpy numbers(Aligned memcpy)
      
      No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		20		19		-5
        7		29		27		-6.89
        32		30		28		-6.66
        128		89		69		-22.47
        256		142		143		 0.70
        512		341		283		-17.00
        1024		1588		655		-58.75
        2048		3553		1357		-61.80
        4096		7218		2590		-64.11
        8192		13701		5231		-61.82
        16384		28304		10716		-62.13
        32768		56516		22995		-59.31
        65536		115443		50840		-55.96
      
      3. Memset numbers(un-aligned memset)
      
      No.of bytes   NG4memset	   M7memset    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		40		31		-22.5
        7		52		29		-44.2307692308
        32		89		86		-3.3707865169
        128		201		74		-63.184079602
        256		340		154		-54.7058823529
        512		961		335		-65.1404786681
        1024		1799		686		-61.8677042802
        2048		3575		1260		-64.7552447552
        4096		6560		2627		-59.9542682927
        8192		13161		6018		-54.273991338
        16384		26465		10439		-60.5554505951
        32768		52119		18649		-64.2184232238
        65536		101593		35724		-64.8361599717
      
      4. Memcpy numbers(un-aligned memcpy)
      
      No.of bytes   NG4memcpy	   M7memcpy    	Delta ((B-A)/A)*100
      	     (Avg.Ticks A) (Avg.Ticks B) (latency reduction)
        3		26		19		-26.9230769231
        7		48		45		-6.25
        32		52		49		-5.7692307692
        128		284		334		17.6056338028
        256		430		482		12.0930232558
        512		646		690		6.8111455108
        1024		1051		1016		-3.3301617507
        2048		1787		1818		1.7347509793
        4096		3309		3376		2.0247809006
        8192		8151		7444		-8.673782358
        16384		34222		34556		0.9759803635
        32768		87851		95044		8.1877269468
        65536		158331		159572		0.7838010244
      
      There is not much difference in numbers with Un-aligned copies
      between NG4memcpy and M7memcpy because they both mostly use the
      same algorithems.
      
      v2:
       1. Fixed indentation issues found by David Miller
       2. Used ENTRY and ENDPROC for the labels in M7patch.S as suggested by David Miller
       3. Now M8 also will use M7memcpy. Also tested on M8 config.
       4. These patches are created on top of below M8 patches
          https://patchwork.ozlabs.org/patch/792661/
          https://patchwork.ozlabs.org/patch/792662/
          However, I did not see these patches in sparc-next tree. It may be in queue now.
          It is possible these patches might cause some build problems. It will resolve
          once all M8 patches are in sparc-next tree.
      
      v0: Initial version
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa5dc772
    • Babu Moger's avatar
      arch/sparc: Add accurate exception reporting in M7memcpy · 34060b8f
      Babu Moger authored
      Add accurate exception reporting in M7memcpy
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34060b8f
    • Babu Moger's avatar
      arch/sparc: Optimized memcpy, memset, copy_to_user, copy_from_user for M7/M8 · b3a04ed5
      Babu Moger authored
      New algorithm that takes advantage of the M7/M8 block init store
      ASI, ie, overlapping pipelines and miss buffer filling.
      Full details in code comments.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3a04ed5
    • Babu Moger's avatar
      arch/sparc: Rename exception handlers · 1ab32693
      Babu Moger authored
      Rename exception handlers to memcpy_xxx as these
      are going to be used by new memcpy routines and these
      handlers are not exclusive to NG4memcpy anymore.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ab32693
    • Babu Moger's avatar
      arch/sparc: Separate the exception handlers from NG4memcpy · de5c073e
      Babu Moger authored
      Separate the exception handlers from NG4memcpy so that it can be
      used with new memcpy routines. Make a separate file for all these handlers.
      Signed-off-by: default avatarBabu Moger <babu.moger@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de5c073e
    • Sam Ravnborg's avatar
      sparc64: update comments in U3memcpy · 061273f9
      Sam Ravnborg authored
      Update comments about the range the different
      parts of the code copies, the original comments were wrong.
      
      Introduce a few descriptive labels too.
      
      No functional changes.
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      061273f9
    • David S. Miller's avatar
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26273939
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix handling of initial STATE message in TIPC, from Jon Paul Maloy.
      
       2) Fix stats handling in bcm_sysport_get_stats(), from Florian
          Fainelli.
      
       3) Reject 16777215 VNI value in geneve_validate(), from Girish
          Moodalbail.
      
       4) Fix initial IGMP sysctl setting regression, from Nikolay Borisov.
      
       5) Once a UFO fragmented frame is treated as UFO, we should continue
          doing so. Likewise once a frame has been segmented, we should
          continue doing that and not try to convert it to a UFO frame. From
          Willem de Bruijn.
      
       6) Test the AF_PACKET RX/TX ring pg_vec state under the socket lock to
          prevent races. From Willem de Bruijn.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        packet: fix tp_reserve race in packet_set_ring
        udp: consistently apply ufo or fragmentation
        net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
        igmp: Fix regression caused by igmp sysctl namespace code.
        geneve: maximum value of VNI cannot be used
        net: systemport: Fix software statistics for SYSTEMPORT Lite
        tipc: remove premature ESTABLISH FSM event at link synchronization
      26273939
    • Willem de Bruijn's avatar
      packet: fix tp_reserve race in packet_set_ring · c27927e3
      Willem de Bruijn authored
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c27927e3
    • Willem de Bruijn's avatar
      udp: consistently apply ufo or fragmentation · 85f1bd9a
      Willem de Bruijn authored
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85f1bd9a
    • David S. Miller's avatar
      sparc64: Revert 16GB huge page support. · 4d9fbf53
      David S. Miller authored
      It overflows the amount of space available in the initial .text section
      of trap handler assembler in some configurations, resulting in build
      failures.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d9fbf53
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f213ad38
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
       1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.
      
       2) Prevent crashes when bringing up sunvdc virtual block devices in
          some environments. From Jim Quigley.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain
        sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.
        sparc64: recognize and support sparc M8 cpu type
        sparc64: properly name the cpu constants
      f213ad38
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 96d97030
      Xin Long authored
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96d97030
    • Nikolay Borisov's avatar
      igmp: Fix regression caused by igmp sysctl namespace code. · 1714020e
      Nikolay Borisov authored
      Commit dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      moved the igmp sysctls initialization from tcp_sk_init to igmp_net_init. This
      function is only called as part of per-namespace initialization, only if
      CONFIG_IP_MULTICAST is defined, otherwise igmp_mc_init() call in ip_init is
      compiled out, casuing the igmp pernet ops to not be registerd and those sysctl
      being left initialized with 0. However, there are certain functions, such as
      ip_mc_join_group which are always compiled and make use of some of those
      sysctls. Let's do a partial revert of the aforementioned commit and move the
      sysctl initialization into inet_init_net, that way they will always have
      sane values.
      
      Fixes: dcd87999 ("igmp: net: Move igmp namespace init to correct file")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=196595Reported-by: default avatarGerardo Exequiel Pozzi <vmlinuz386@gmail.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1714020e
    • Girish Moodalbail's avatar
      geneve: maximum value of VNI cannot be used · 04db70d9
      Girish Moodalbail authored
      Geneve's Virtual Network Identifier (VNI) is 24 bit long, so the range
      of values for it would be from 0 to 16777215 (2^24 -1).  However, one
      cannot create a geneve device with VNI set to 16777215. This patch fixes
      this issue.
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04db70d9
    • Florian Fainelli's avatar
      net: systemport: Fix software statistics for SYSTEMPORT Lite · 50ddfbaf
      Florian Fainelli authored
      With SYSTEMPORT Lite we have holes in our statistics layout that make us
      skip over the hardware MIB counters, bcm_sysport_get_stats() was not
      taking that into account resulting in reporting 0 for all SW-maintained
      statistics, fix this by skipping accordingly.
      
      Fixes: 44a4524c ("net: systemport: Add support for SYSTEMPORT Lite")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50ddfbaf
    • Jon Paul Maloy's avatar
      tipc: remove premature ESTABLISH FSM event at link synchronization · ed43594a
      Jon Paul Maloy authored
      When a link between two nodes come up, both endpoints will initially
      send out a STATE message to the peer, to increase the probability that
      the peer endpoint also is up when the first traffic message arrives.
      Thereafter, if the establishing link is the second link between two
      nodes, this first "traffic" message is a TUNNEL_PROTOCOL/SYNCH message,
      helping the peer to perform initial synchronization between the two
      links.
      
      However, the initial STATE message may be lost, in which case the SYNCH
      message will be the first one arriving at the peer. This should also
      work, as the SYNCH message itself will be used to take up the link
      endpoint before  initializing synchronization.
      
      Unfortunately the code for this case is broken. Currently, the link is
      brought up through a tipc_link_fsm_evt(ESTABLISHED) when a SYNCH
      arrives, whereupon __tipc_node_link_up() is called to distribute the
      link slots and take the link into traffic. But, __tipc_node_link_up() is
      itself starting with a test for whether the link is up, and if true,
      returns without action. Clearly, the tipc_link_fsm_evt(ESTABLISHED) call
      is unnecessary, since tipc_node_link_up() is itself issuing such an
      event, but also harmful, since it inhibits tipc_node_link_up() to
      perform the test of its tasks, and the link endpoint in question hence
      is never taken into traffic.
      
      This problem has been exposed when we set up dual links between pre-
      and post-4.4 kernels, because the former ones don't send out the
      initial STATE message described above.
      
      We fix this by removing the unnecessary event call.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed43594a
    • Jim Quigley's avatar
      sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain · 3ee70591
      Jim Quigley authored
      Using mpgroup to define multiple paths for a virtual disk causes multiple
      virtual-device-port ports to be created for that virtual device.
      Each virtual-device-port port then gets a vdisk created for it by the Linux
      sunvdc driver. As mpgroup is not supported by the Linux sunvdc driver it
      cannot handle multiple ports for a single vdisk, leading to a kernel panic
      at startup.
      
      This fix prevents more than one vdisk per virtual-device-port being created
      until full virtual disk multipathing (mpgroup) support is implemented.
      Signed-off-by: default avatarJim Quigley <Jim.Quigley@oracle.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Reviewed-by: default avatarAlexandre Chartre <alexandre.chartre@oracle.com>
      Reviewed-by: default avatarAaron Young <aaron.young@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ee70591
    • David S. Miller's avatar
      Merge branch 'sparc64-Use-low-latency-path-to-resume-idle-cpu' · 5389e239
      David S. Miller authored
      Vijay Kumar says:
      
      ====================
      sparc64: Use low latency path to resume idle cpu
      
      CPU_POKE is a low latency path to resume the target cpu if suspended
      using CPU_YIELD. Use CPU_POKE to resume cpu if supported by hypervisor.
      
      	     Hackbench results (lower is better):
      Number of
      Process:		w/o fix		with fix
      1  			0.012		 0.010
      10			0.021		 0.019
      100			0.151		 0.148
      
      Changelog:
      v2:
        - Fixed comments and spacing (2/2)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5389e239
    • Vijay Kumar's avatar
      sparc64: Use CPU_POKE to resume idle cpu · 8536e02e
      Vijay Kumar authored
      Use CPU_POKE hypervisor call to resume idle cpu if supported.
      Signed-off-by: default avatarVijay Kumar <vijay.ac.kumar@oracle.com>
      Reviewed-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8536e02e
    • Vijay Kumar's avatar
      sparc64: Add a new hypercall CPU_POKE · 28d43de7
      Vijay Kumar authored
      This adds a new hypercall CPU_POKE for quickly waking up an idle CPU.
      CPU_POKE should only be sent to valid non-local CPUs.
      Signed-off-by: default avatarRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: default avatarVijay Kumar <vijay.ac.kumar@oracle.com>
      Reviewed-by: default avatarAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28d43de7
    • David S. Miller's avatar
      Merge branch 'sparc64-Add-16GB-hugepage-support' · 99274b81
      David S. Miller authored
      Nitin Gupta says:
      
      ====================
      sparc64: Add 16GB hugepage support
      
      SPARC architecture supports 16G hugepages but the kernel did not
      support these. This patch series adds support for it and also cleanes
      up some page walk/alloc functions.
      
      Patch 1/3: Core changes needed to add 16G hugepage support: To map a
        single 16G hugepage, two PUD entries are used. Each PUD entry maps
        8G portion of a 16G page. This page table encoding scheme is same as
        that used for hugepages at PMD level (8M, 256M and 2G pages) where
        each PMD entry points successively to 8M regions within a page.  No
        page table entries below the PUD level are allocated for 16G
        hugepage since those are not required.
      
        TSB entries for a 16G page are created at every 4M boundary since
        the HUGE_TSB is used for these pages which is configured with page
        size of 4M.  When walking page tables (on a TSB miss), bits [32:22]
        are transferred from vaddr to PUD to resolve addresses at 4M
        boundary. The resolved address mapping is then stored in HUGE_TSB.
      
      Patch 2/3: get_user_pages() etc. are used for direct IO. These
        functions were not aware of hugepages at the PUD level and would try
        to continue walking page tables beyond the PUD level. Since 16G
        hugepages have page tables allocated till PUD level only, these
        accesses would result in invalid access. This patch adds the case
        for PUD huge pages to these functions.
      
      Patch 3/3: Patch 1 added the case of PUD entry being huge in page
        table walk and alloc functions. This new case further increased
        nesting in these functions and made them harder to follow. This
        patch flattens these functions for better readability.
      
      Cc: sparclinux@vger.kernel.org
      
      Changelog v5 vs v4:
       - Checking at PUD level for hugepage entry during page table walk is
         patched out if 16GB hugepages are not being used.
      Changelog v4 vs v3:
       - Added cover letter (patch 0/4) for patch series.
      Changelog v3 vs v2:
       - Fixed email headers so the subject shows up correctly.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99274b81
    • Nitin Gupta's avatar
      sparc64: Cleanup hugepage table walk functions · 76379784
      Nitin Gupta authored
      Flatten out nested code structure in huge_pte_offset()
      and huge_pte_alloc().
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76379784
    • Nitin Gupta's avatar
      sparc64: Add 16GB hugepage support · f10bb007
      Nitin Gupta authored
      Adds support for 16GB hugepage size. To use this page size
      use kernel parameters as:
      
      default_hugepagesz=16G hugepagesz=16G hugepages=10
      
      Testing:
      
      Tested with the stream benchmark which allocates 48G of
      arrays backed by 16G hugepages and does RW operation on
      them in parallel.
      
      Orabug: 25362942
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f10bb007