1. 11 Nov, 2018 3 commits
    • Eric Dumazet's avatar
      act_mirred: clear skb->tstamp on redirect · 7236ead1
      Eric Dumazet authored
      If sch_fq is used at ingress, skbs that might have been
      timestamped by net_timestamp_set() if a packet capture
      is requesting timestamps could be delayed by arbitrary
      amount of time, since sch_fq time base is MONOTONIC.
      
      Fix this problem by moving code from sch_netem.c to act_mirred.c.
      
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7236ead1
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Fix clearing of stats counters · a9049ff9
      Andrew Lunn authored
      The mv88e6161 would sometime fail to probe with a timeout waiting for
      the switch to complete an operation. This operation is supposed to
      clear the statistics counters. However, due to a read/modify/write,
      without the needed mask, the operation actually carried out was more
      random, with invalid parameters, resulting in the switch not
      responding. We need to preserve the histogram mode bits, so apply a
      mask to keep them.
      Reported-by: default avatarChris Healy <Chris.Healy@zii.aero>
      Fixes: 40cff8fc ("net: dsa: mv88e6xxx: Fix stats histogram mode")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9049ff9
    • Jon Maloy's avatar
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy authored
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ab412d3
  2. 10 Nov, 2018 4 commits
    • Jakub Kicinski's avatar
      net: sched: cls_flower: validate nested enc_opts_policy to avoid warning · 63c82997
      Jakub Kicinski authored
      TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
      currently contain further nested attributes, which are parsed by
      hand, so the policy is never actually used resulting in a W=1
      build warning:
      
      net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=]
       enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {
      
      Add the validation anyway to avoid potential bugs when other
      attributes are added and to make the attribute structure slightly
      more clear.  Validation will also set extact to point to bad
      attribute on error.
      
      Fixes: 0a6e7778 ("net/sched: allow flower to match tunnel options")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63c82997
    • Alexandre Belloni's avatar
      net: mvneta: correct typo · fbd1d524
      Alexandre Belloni authored
      The reserved variable should be named reserved1.
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbd1d524
    • 배석진's avatar
      flow_dissector: do not dissect l4 ports for fragments · 62230715
      배석진 authored
      Only first fragment has the sport/dport information,
      not the following ones.
      
      If we want consistent hash for all fragments, we need to
      ignore ports even for first fragment.
      
      This bug is visible for IPv6 traffic, if incoming fragments
      do not have a flow label, since skb_get_hash() will give
      different results for first fragment and following ones.
      
      It is also visible if any routing rule wants dissection
      and sport or dport.
      
      See commit 5e5d6fed ("ipv6: route: dissect flow
      in input path if fib rules need it") for details.
      
      [edumazet] rewrote the changelog completely.
      
      Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
      Signed-off-by: default avatar배석진 <soukjin.bae@samsung.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62230715
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix incorrect assignment of real_dev · d02854dc
      Subash Abhinov Kasiviswanathan authored
      A null dereference was observed when a sysctl was being set
      from userspace and rmnet was stuck trying to complete some actions
      in the NETDEV_REGISTER callback. This is because the real_dev is set
      only after the device registration handler completes.
      
      sysctl call stack -
      
      <6> Unable to handle kernel NULL pointer dereference at
          virtual address 00000108
      <2> pc : rmnet_vnd_get_iflink+0x1c/0x28
      <2> lr : dev_get_iflink+0x2c/0x40
      <2>  rmnet_vnd_get_iflink+0x1c/0x28
      <2>  inet6_fill_ifinfo+0x15c/0x234
      <2>  inet6_ifinfo_notify+0x68/0xd4
      <2>  ndisc_ifinfo_sysctl_change+0x1b8/0x234
      <2>  proc_sys_call_handler+0xac/0x100
      <2>  proc_sys_write+0x3c/0x4c
      <2>  __vfs_write+0x54/0x14c
      <2>  vfs_write+0xcc/0x188
      <2>  SyS_write+0x60/0xc0
      <2>  el0_svc_naked+0x34/0x38
      
      device register call stack -
      
      <2>  notifier_call_chain+0x84/0xbc
      <2>  raw_notifier_call_chain+0x38/0x48
      <2>  call_netdevice_notifiers_info+0x40/0x70
      <2>  call_netdevice_notifiers+0x38/0x60
      <2>  register_netdevice+0x29c/0x3d8
      <2>  rmnet_vnd_newlink+0x68/0xe8
      <2>  rmnet_newlink+0xa0/0x160
      <2>  rtnl_newlink+0x57c/0x6c8
      <2>  rtnetlink_rcv_msg+0x1dc/0x328
      <2>  netlink_rcv_skb+0xac/0x118
      <2>  rtnetlink_rcv+0x24/0x30
      <2>  netlink_unicast+0x158/0x1f0
      <2>  netlink_sendmsg+0x32c/0x338
      <2>  sock_sendmsg+0x44/0x60
      <2>  SyS_sendto+0x150/0x1ac
      <2>  el0_svc_naked+0x34/0x38
      
      Fixes: b752eff5 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d02854dc
  3. 09 Nov, 2018 15 commits
  4. 08 Nov, 2018 8 commits
  5. 07 Nov, 2018 3 commits
  6. 06 Nov, 2018 7 commits
    • Miroslav Lichvar's avatar
      igb: shorten maximum PHC timecounter update interval · 4c9b658e
      Miroslav Lichvar authored
      The timecounter needs to be updated at least once per ~550 seconds in
      order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
      timestamp.
      
      Since commit 500462a9 ("timers: Switch to a non-cascading wheel"),
      scheduling of delayed work seems to be less accurate and a requested
      delay of 540 seconds may actually be longer than 550 seconds. Also, the
      PHC may be adjusted to run up to 6% faster than real time and the system
      clock up to 10% slower. Shorten the delay to 360 seconds to be sure the
      timecounter is updated in time.
      
      This fixes an issue with HW timestamps on 82580/I350/I354 being off by
      ~1100 seconds for few seconds every ~9 minutes.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4c9b658e
    • Brett Creeley's avatar
      ice: Fix the bytecount sent to netdev_tx_sent_queue · d944b469
      Brett Creeley authored
      Currently if the driver does a TSO offload the bytecount sent to
      netdev_tx_sent_queue will be incorrect. This is because in ice_tso we
      overwrite the initial value that we set in ice_tx_map. This creates a
      mismatch between the Tx and Tx clean flow. In the Tx clean flow we
      calculate the bytecount (called total_bytes) as we clean the
      descriptors so the value used in the Tx clean path is correct. Fix this
      by using += in ice_tso instead of =. This fixes the mismatch in
      bytecount mentioned above.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d944b469
    • Brett Creeley's avatar
      ice: Fix tx_timeout in PF driver · c585ea42
      Brett Creeley authored
      Prior to this commit the driver was running into tx_timeouts when a
      queue was stressed enough. This was happening because the HW tail
      and SW tail (NTU) were incorrectly out of sync. Consequently this was
      causing the HW head to collide with the HW tail, which to the hardware
      means that all descriptors posted for Tx have been processed.
      
      Due to the Tx logic used in the driver SW tail and HW tail are allowed
      to be out of sync. This is done as an optimization because it allows the
      driver to write HW tail as infrequently as possible, while still
      updating the SW tail index to keep track. However, there are situations
      where this results in the tail never getting updated, resulting in Tx
      timeouts.
      
      Tx HW tail write condition:
      	if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more)
      		writel(sw_tail, tx_ring->tail);
      
      An issue was found in the Tx logic that was causing the afore mentioned
      condition for updating HW tail to never happen, causing tx_timeouts.
      
      In ice_xmit_frame_ring we calculate how many descriptors we need for the
      Tx transaction based on the skb the kernel hands us. This is then passed
      into ice_maybe_stop_tx along with some extra padding to determine if we
      have enough descriptors available for this transaction. If we don't then
      we return -EBUSY to the stack, otherwise we move on and eventually
      prepare the Tx descriptors accordingly in ice_tx_map and set
      next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx
      with a value of MAX_SKB_FRAGS + 4. The key here is that this value is
      possibly less than the value we sent in the first call to
      ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused
      descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first
      call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update
      the HW tail because of the "Tx HW tail write condition" above. This is
      because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx
      instead of calling __ice_maybe_stop_tx and subsequently calling
      netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This
      bit is then checked in the "Tx HW tail write condition" by calling
      netif_xmit_stopped and subsequently updating HW tail if the
      afore mentioned bit is set.
      
      In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning
      the descriptors that HW sets the DD bit on and we have the budget. The
      HW head will eventually run into the HW tail in response to the
      description in the paragraph above.
      
      The next time through ice_xmit_frame_ring we make the initial call to
      ice_maybe_stop_tx with another skb from the stack. This time we do not
      have enough descriptors available and we return NETDEV_TX_BUSY to the
      stack and end up setting next_to_watch to NULL.
      
      This is where we are stuck. In ice_clean_tx_irq we never clean anything
      because next_to_watch is always NULL and in ice_xmit_frame_ring we never
      update HW tail because we already return NETDEV_TX_BUSY to the stack and
      eventually we hit a tx_timeout.
      
      This issue was fixed by making sure that the second call to
      ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value
      that was used on the initial call to ice_maybe_stop_tx in
      ice_xmit_frame_ring. This was done by adding the following defines to
      make the logic more clear and to reduce the chance of mucking this up
      again:
      
      ICE_CACHE_LINE_BYTES		64
      ICE_DESCS_PER_CACHE_LINE	(ICE_CACHE_LINE_BYTES / \
      				 sizeof(struct ice_tx_desc))
      ICE_DESCS_FOR_CTX_DESC		1
      ICE_DESCS_FOR_SKB_DATA_PTR	1
      
      The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we
      don't have to figure this out on every pass through the Tx path. Instead
      I added a sanity check in ice_probe to verify cache line size and print
      a message if it's not 64 Bytes. This will make it easier to file issues
      if they are seen when the cache line size is not 64 Bytes when reading
      from the GLPCI_CNF2 register.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c585ea42
    • Dave Ertman's avatar
      ice: Fix napi delete calls for remove · 25525b69
      Dave Ertman authored
      In the remove path, the vsi->netdev is being set to NULL before the call
      to free vectors. This is causing the netif_napi_del call to never be made.
      
      Add a call to ice_napi_del to the same location as the calls to
      unregister_netdev and just prior to them. This will use the reverse flow
      as the register and netif_napi_add calls.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      25525b69
    • Anirudh Venkataramanan's avatar
      ice: Fix typo in error message · 31082519
      Anirudh Venkataramanan authored
      Print should say "Enabling" instead of "Enaabling"
      Signed-off-by: default avatarAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      31082519
    • Md Fahad Iqbal Polash's avatar
      ice: Fix flags for port VLAN · 58297dd1
      Md Fahad Iqbal Polash authored
      According to the spec, whenever insert PVID field is set, the VLAN
      driver insertion mode should be set to 01b which isn't done currently.
      Fix it.
      Signed-off-by: default avatarMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      58297dd1
    • Anirudh Venkataramanan's avatar
      ice: Remove duplicate addition of VLANs in replay path · 9ecd25c2
      Anirudh Venkataramanan authored
      ice_restore_vlan and active_vlans were originally put in place to
      reprogram VLAN filters in the replay path. This is now done as part
      of the much broader VSI rebuild/replay framework. So remove both
      ice_restore_vlan and active_vlans
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9ecd25c2