1. 06 Apr, 2017 17 commits
    • David Howells's avatar
      afs: Fix directory read/modify race · 928f45ef
      David Howells authored
      Because parsing of the directory wasn't being done under any sort of lock,
      the pages holding the directory content can get invalidated whilst the
      parsing is ongoing.
      
      Further, the directory page check function gets called outside of the page
      lock, so if the page gets cleared or updated, this may return reports of
      bad magic numbers in the directory page.
      
      Also, the directory may change size whilst checking and parsing are
      ongoing, so more care needs to be taken here.
      
      Fix this by:
      
       (1) Perform the page check from the page filling function before we set
           PageUptodate and drop the page lock.
      
       (2) Check for the file having shrunk and the page having been abandoned
           before checking the page contents.
      
       (3) Lock the page whilst parsing it for the directory iterator.
      
      Whilst we're at it, add a tracepoint to report check failure.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      928f45ef
    • David Howells's avatar
      afs: Don't call afs_sync_file() from afs_write_begin() · df44355f
      David Howells authored
      Don't call afs_sync_file() from afs_write_begin() as this will end up with
      a deadlock because the caller of afs_write_begin() holds the inode mutex.
      
      Instead, flush out the single conflicting writeback directly.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      df44355f
    • David Howells's avatar
      afs: Fix error code returned from wait · fcb871a2
      David Howells authored
      If a client operation for which we're waiting gets interrupted, we need to try
      aborting it and then call rxrpc_kernel_recv_data() to find out how the call
      actually completed (we could race with an incoming abort, for example).
      
      If we did manage to abort the call, we also need to log the fact that the call
      is now complete.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      fcb871a2
    • David Howells's avatar
      afs: Rewrite writeback handling · 20181273
      David Howells authored
      Rewrite the writeback handling to make the writeback records refcounted
      separately from the completion management so that a ref can be taken on one
      without preventing completion from happening.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      20181273
    • David Howells's avatar
      afs: Trace the sending of pages · f357dbca
      David Howells authored
      Add a pair of tracepoints to log the sending of pages for an FS.StoreData
      or FS.StoreData64 operation.
      
      Tracepoint afs_send_pages notes each set of pages added to the operation.
      There may be several of these per operation as we get up at most 8
      contiguous pages in one go because the bvec we're using is on the stack.
      
      Tracepoint afs_sent_pages notes the end of adding data from a whole run of
      pages to the operation and the completion of the request phase.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f357dbca
    • David Howells's avatar
      afs: Trace the initiation and completion of client calls · bdfd1105
      David Howells authored
      Add tracepoints to trace the initiation and completion of client calls
      within the kafs filesystem.
      
      The afs_make_vl_call tracepoint watches calls to the volume location
      database server.
      
      The afs_make_fs_call tracepoint watches calls to the file server.
      
      The afs_call_done tracepoint watches for call completion.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      bdfd1105
    • David Howells's avatar
      afs: afs_fsync() does two flushes, one of which is redundant · 80f5be5a
      David Howells authored
      afs_fsync() calls filemap_write_and_wait_range() and then does a walk
      through the writeback records and flushes those - which should achieve
      exactly the same thing.
      
      Get rid of the filemap_write_and_wait_range() since that's uninterruptible,
      whereas the wait for the writeback records is interruptible.
      
      Further, we can at least contract the inode-locked region to just the
      afs_writeback_call().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      80f5be5a
    • David Howells's avatar
      afs: Don't pass vnode pointer around in afs_writeback struct · f67bf645
      David Howells authored
      Don't pass the vnode pointer around in the afs_writeback struct as the
      struct may get freed, yet we still need the pointer.  Pass it around
      separately.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f67bf645
    • David Howells's avatar
      rxrpc: Trace client call connection · 89ca6948
      David Howells authored
      Add a tracepoint (rxrpc_connect_call) to log the combination of rxrpc_call
      pointer, afs_call pointer/user data and wire call parameters to make it
      easier to match the tracebuffer contents to captured network packets.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      89ca6948
    • David Howells's avatar
      rxrpc: Trace changes in a call's receive window size · 740586d2
      David Howells authored
      Add a tracepoint (rxrpc_rx_rwind_change) to log changes in a call's receive
      window size as imposed by the peer through an ACK packet.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      740586d2
    • David Howells's avatar
      rxrpc: Trace received aborts · 005ede28
      David Howells authored
      Add a tracepoint (rxrpc_rx_abort) to record received aborts.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      005ede28
    • David Howells's avatar
      rxrpc: Trace protocol errors in received packets · fb46f6ee
      David Howells authored
      Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
      packets.  The following changes are made:
      
       (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
           call and mark the call aborted.  This is wrapped by
           rxrpc_abort_eproto() that makes the why string usable in trace.
      
       (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
           generation points, replacing rxrpc_abort_call() with the latter.
      
       (3) Only send an abort packet in rxkad_verify_packet*() if we actually
           managed to abort the call.
      
      Note that a trace event is also emitted if a kernel user (e.g. afs) tries
      to send data through a call when it's not in the transmission phase, though
      it's not technically a receive event.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      fb46f6ee
    • David Howells's avatar
      rxrpc: Handle temporary errors better in rxkad security · ef68622d
      David Howells authored
      In the rxkad security module, when we encounter a temporary error (such as
      ENOMEM) from which we could conceivably recover, don't abort the
      connection, but rather permit retransmission of the relevant packets to
      induce a retry.
      
      Note that I'm leaving some places that could be merged together to insert
      tracing in the next patch.
      
      Signed-off-by; David Howells <dhowells@redhat.com>
      ef68622d
    • David Howells's avatar
      rxrpc: Note a successfully aborted kernel operation · 84a4c09c
      David Howells authored
      Make rxrpc_kernel_abort_call() return an indication as to whether it
      actually aborted the operation or not so that kafs can trace the failure of
      the operation.  Note that 'success' in this context means changing the
      state of the call, not necessarily successfully transmitting an ABORT
      packet.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      84a4c09c
    • David Howells's avatar
      rxrpc: Use negative error codes in rxrpc_call struct · 3a92789a
      David Howells authored
      Use negative error codes in struct rxrpc_call::error because that's what
      the kernel normally deals with and to make the code consistent.  We only
      turn them positive when transcribing into a cmsg for userspace recvmsg.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      3a92789a
    • Jarod Wilson's avatar
      bonding: attempt to better support longer hw addresses · faeeb317
      Jarod Wilson authored
      People are using bonding over Infiniband IPoIB connections, and who knows
      what else. Infiniband has a hardware address length of 20 octets
      (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
      Various places in the bonding code are currently hard-wired to 6 octets
      (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
      only alb is currently possible on Infiniband links right now anyway, due
      to commit 1533e773, so the alb code is where most of the changes are.
      
      One major component of this change is the addition of a bond_hw_addr_copy
      function that takes a length argument, instead of using ether_addr_copy
      everywhere that hardware addresses need to be copied about. The other
      major component of this change is converting the bonding code from using
      struct sockaddr for address storage to struct sockaddr_storage, as the
      former has an address storage space of only 14, while the latter is 128
      minus a few, which is necessary to support bonding over device with up to
      MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
      up some memory corruption issues with the current code, where it's
      possible to write an infiniband hardware address into a sockaddr declared
      on the stack.
      
      Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
      hardware address now:
      
      $ cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: mlx4_ib0 (primary_reselect always)
      Currently Active Slave: mlx4_ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: mlx4_ib0
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
      Slave queue ID: 0
      
      Slave Interface: mlx4_ib1
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
      Slave queue ID: 0
      
      Also tested with a standard 1Gbps NIC bonding setup (with a mix of
      e1000 and e1000e cards), running LNST's bonding tests.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faeeb317
    • Edward Cree's avatar
      sfc: don't insert mc_list on low-latency firmware if it's too long · 148cbab6
      Edward Cree authored
      If the mc_list is longer than 256 addresses, we enter mc_promisc mode.
      If we're in mc_promisc mode and the firmware doesn't support cascaded
       multicast, normally we also insert our mc_list, to prevent stealing by
       another VI.  However, if the mc_list was too long, this isn't really
       helpful - the MC groups that didn't fit in the list can still get
       stolen, and having only some of them stealable will probably cause
       more confusing behaviour than having them all stealable.  Since
       inserting 256 multicast filters takes a long time and can lead to MCDI
       state machine timeouts, just skip the mc_list insert in this overflow
       condition.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      148cbab6
  2. 05 Apr, 2017 23 commits