1. 23 Jan, 2016 26 commits
  2. 15 Dec, 2015 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.3.3 · 35483418
      Greg Kroah-Hartman authored
      35483418
    • Filipe Manana's avatar
      Btrfs: fix regression running delayed references when using qgroups · e0aa614d
      Filipe Manana authored
      commit b06c4bf5 upstream.
      
      In the kernel 4.2 merge window we had a big changes to the implementation
      of delayed references and qgroups which made the no_quota field of delayed
      references not used anymore. More specifically the no_quota field is not
      used anymore as of:
      
        commit 0ed4792a ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.")
      
      Leaving the no_quota field actually prevents delayed references from
      getting merged, which in turn cause the following BUG_ON(), at
      fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:
      
        static int run_delayed_tree_ref(...)
        {
           (...)
           BUG_ON(node->ref_mod != 1);
           (...)
        }
      
      This happens on a scenario like the following:
      
        1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
      
        2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
           It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.
      
        3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
           It's not merged with the reference at the tail of the list of refs
           for bytenr X because the reference at the tail, Ref2 is incompatible
           due to Ref2->no_quota != Ref3->no_quota.
      
        4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
           It's not merged with the reference at the tail of the list of refs
           for bytenr X because the reference at the tail, Ref3 is incompatible
           due to Ref3->no_quota != Ref4->no_quota.
      
        5) We run delayed references, trigger merging of delayed references,
           through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().
      
        6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
           all other conditions are satisfied too. So Ref1 gets a ref_mod
           value of 2.
      
        7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
           all other conditions are satisfied too. So Ref2 gets a ref_mod
           value of 2.
      
        8) Ref1 and Ref2 aren't merged, because they have different values
           for their no_quota field.
      
        9) Delayed reference Ref1 is picked for running (select_delayed_ref()
           always prefers references with an action == BTRFS_ADD_DELAYED_REF).
           So run_delayed_tree_ref() is called for Ref1 which triggers the
           BUG_ON because Ref1->red_mod != 1 (equals 2).
      
      So fix this by removing the no_quota field, as it's not used anymore as
      of commit 0ed4792a ("btrfs: qgroup: Switch to new extent-oriented
      qgroup mechanism.").
      
      The use of no_quota was also buggy in at least two places:
      
      1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting
         no_quota to 0 instead of 1 when the following condition was true:
         is_fstree(ref_root) || !fs_info->quota_enabled
      
      2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to
         reset a node's no_quota when the condition "!is_fstree(root_objectid)
         || !root->fs_info->quota_enabled" was true but we did it only in
         an unused local stack variable, that is, we never reset the no_quota
         value in the node itself.
      
      This fixes the remainder of problems several people have been having when
      running delayed references, mostly while a balance is running in parallel,
      on a 4.2+ kernel.
      
      Very special thanks to Stéphane Lesimple for helping debugging this issue
      and testing this fix on his multi terabyte filesystem (which took more
      than one day to balance alone, plus fsck, etc).
      
      Also, this fixes deadlock issue when using the clone ioctl with qgroups
      enabled, as reported by Elias Probst in the mailing list. The deadlock
      happens because after calling btrfs_insert_empty_item we have our path
      holding a write lock on a leaf of the fs/subvol tree and then before
      releasing the path we called check_ref() which did backref walking, when
      qgroups are enabled, and tried to read lock the same leaf. The trace for
      this case is the following:
      
        INFO: task systemd-nspawn:6095 blocked for more than 120 seconds.
        (...)
        Call Trace:
          [<ffffffff86999201>] schedule+0x74/0x83
          [<ffffffff863ef64c>] btrfs_tree_read_lock+0xc0/0xea
          [<ffffffff86137ed7>] ? wait_woken+0x74/0x74
          [<ffffffff8639f0a7>] btrfs_search_old_slot+0x51a/0x810
          [<ffffffff863a129b>] btrfs_next_old_leaf+0xdf/0x3ce
          [<ffffffff86413a00>] ? ulist_add_merge+0x1b/0x127
          [<ffffffff86411688>] __resolve_indirect_refs+0x62a/0x667
          [<ffffffff863ef546>] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
          [<ffffffff864122d3>] find_parent_nodes+0xaf3/0xfc6
          [<ffffffff86412838>] __btrfs_find_all_roots+0x92/0xf0
          [<ffffffff864128f2>] btrfs_find_all_roots+0x45/0x65
          [<ffffffff8639a75b>] ? btrfs_get_tree_mod_seq+0x2b/0x88
          [<ffffffff863e852e>] check_ref+0x64/0xc4
          [<ffffffff863e9e01>] btrfs_clone+0x66e/0xb5d
          [<ffffffff863ea77f>] btrfs_ioctl_clone+0x48f/0x5bb
          [<ffffffff86048a68>] ? native_sched_clock+0x28/0x77
          [<ffffffff863ed9b0>] btrfs_ioctl+0xabc/0x25cb
        (...)
      
      The problem goes away by eleminating check_ref(), which no longer is
      needed as its purpose was to get a value for the no_quota field of
      a delayed reference (this patch removes the no_quota field as mentioned
      earlier).
      Reported-by: default avatarStéphane Lesimple <stephane_btrfs@lesimple.fr>
      Tested-by: default avatarStéphane Lesimple <stephane_btrfs@lesimple.fr>
      Reported-by: default avatarElias Probst <mail@eliasprobst.eu>
      Reported-by: default avatarPeter Becker <floyd.net@gmail.com>
      Reported-by: default avatarMalte Schröder <malte@tnxip.de>
      Reported-by: default avatarDerek Dongray <derek@valedon.co.uk>
      Reported-by: default avatarErkki Seppala <flux-btrfs@inside.org>
      Cc: stable@vger.kernel.org  # 4.2+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      e0aa614d
    • Hans Verkuil's avatar
      cobalt: fix Kconfig dependency · 4f2347e6
      Hans Verkuil authored
      commit fc88dd16 upstream.
      
      The cobalt driver should depend on VIDEO_V4L2_SUBDEV_API.
      
      This fixes this kbuild error:
      
      tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
      head:   99bc7215
      commit: 85756a06 [media] cobalt: add new driver
      config: x86_64-randconfig-s0-09201514 (attached as .config)
      reproduce:
        git checkout 85756a06
        # save the attached .config to linux build tree
        make ARCH=x86_64
      
      All error/warnings (new ones prefixed by >>):
      
         drivers/media/i2c/adv7604.c: In function 'adv76xx_get_format':
      >> drivers/media/i2c/adv7604.c:1853:9: error: implicit declaration of function 'v4l2_subdev_get_try_format' [-Werror=implicit-function-declaration]
            fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
                  ^
         drivers/media/i2c/adv7604.c:1853:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
            fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
                ^
         drivers/media/i2c/adv7604.c: In function 'adv76xx_set_format':
         drivers/media/i2c/adv7604.c:1882:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
            fmt = v4l2_subdev_get_try_format(sd, cfg, format->pad);
                ^
         cc1: some warnings being treated as errors
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f2347e6
    • Lu, Han's avatar
      ALSA: hda/hdmi - apply Skylake fix-ups to Broxton display codec · de48652f
      Lu, Han authored
      commit e2656412 upstream.
      
      Broxton and Skylake have the same behavior on display audio. So this patch
      applys Skylake fix-ups to Broxton.
      Signed-off-by: default avatarLu, Han <han.lu@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de48652f
    • Dan Williams's avatar
      ALSA: pci: depend on ZONE_DMA · 4ba2bd2c
      Dan Williams authored
      commit 2db1a579 upstream.
      
      There are several sound drivers that 'select ZONE_DMA'.  This is
      backwards as ZONE_DMA is an architecture capability exported to drivers.
      Switch the polarity of the dependency to disable these drivers when the
      architecture does not support ZONE_DMA.  This was discovered in the
      context of testing/enabling devm_memremap_pages() which depends on
      ZONE_DEVICE.  ZONE_DEVICE in turn depends on !ZONE_DMA.
      Reported-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ba2bd2c
    • Arnd Bergmann's avatar
      ceph: fix message length computation · 4bf8855e
      Arnd Bergmann authored
      commit 777d738a upstream.
      
      create_request_message() computes the maximum length of a message,
      but uses the wrong type for the time stamp: sizeof(struct timespec)
      may be 8 or 16 depending on the architecture, while sizeof(struct
      ceph_timespec) is always 8, and that is what gets put into the
      message.
      
      Found while auditing the uses of timespec for y2038 problems.
      
      Fixes: b8e69066 ("ceph: include time stamp in every MDS request")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bf8855e
    • Ming Lei's avatar
      block: fix segment split · 278b54df
      Ming Lei authored
      commit 578270bf upstream.
      
      Inside blk_bio_segment_split(), previous bvec pointer(bvprvp)
      always points to the iterator local variable, which is obviously
      wrong, so fix it by pointing to the local variable of 'bvprv'.
      
      Fixes: 5014c311(block: fix bogus compiler warnings in blk-merge.c)
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: default avatarMark Salter <msalter@redhat.com>
      Tested-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Tested-by: default avatarMark Salter <msalter@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      278b54df
    • Junxiao Bi's avatar
      ocfs2: fix umask ignored issue · eeec50cb
      Junxiao Bi authored
      commit 8f1eb487 upstream.
      
      New created file's mode is not masked with umask, and this makes umask not
      work for ocfs2 volume.
      
      Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Gang He <ghe@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eeec50cb
    • Jeff Layton's avatar
      nfs: if we have no valid attrs, then don't declare the attribute cache valid · 3994c2de
      Jeff Layton authored
      commit c812012f upstream.
      
      If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
      (correctly) not update any of the attributes, but it then clears the
      NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
      up to date. Don't clear the flag if the fattr struct has no valid
      attrs to apply.
      Reviewed-by: default avatarSteve French <steve.french@primarydata.com>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3994c2de
    • Jeff Layton's avatar
      nfs4: resend LAYOUTGET when there is a race that changes the seqid · 4ffcc6f9
      Jeff Layton authored
      commit 4f2e9dce upstream.
      
      pnfs_layout_process will check the returned layout stateid against what
      the kernel has in-core. If it turns out that the stateid we received is
      older, then we should resend the LAYOUTGET instead of falling back to
      MDS I/O.
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ffcc6f9
    • Benjamin Coddington's avatar
      nfs4: start callback_ident at idr 1 · 9a08c6f6
      Benjamin Coddington authored
      commit c68a027c upstream.
      
      If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
      it when the nfs_client is freed.  A decoding or server bug can then find
      and try to put that first nfs_client which would lead to a crash.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: d6870312 ("nfs4client: convert to idr_alloc()")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a08c6f6
    • Daniel Borkmann's avatar
      debugfs: fix refcount imbalance in start_creating · 84897330
      Daniel Borkmann authored
      commit 0ee9608c upstream.
      
      In debugfs' start_creating(), we pin the file system to safely access
      its root. When we failed to create a file, we unpin the file system via
      failed_creating() to release the mount count and eventually the reference
      of the vfsmount.
      
      However, when we run into an error during lookup_one_len() when still
      in start_creating(), we only release the parent's mutex but not so the
      reference on the mount. Looks like it was done in the past, but after
      splitting portions of __create_file() into start_creating() and
      end_creating() via 190afd81 ("debugfs: split the beginning and the
      end of __create_file() off"), this seemed missed. Noticed during code
      review.
      
      Fixes: 190afd81 ("debugfs: split the beginning and the end of __create_file() off")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84897330
    • Andrew Elble's avatar
      nfsd: eliminate sending duplicate and repeated delegations · 4e9f4529
      Andrew Elble authored
      commit 34ed9872 upstream.
      
      We've observed the nfsd server in a state where there are
      multiple delegations on the same nfs4_file for the same client.
      The nfs client does attempt to DELEGRETURN these when they are presented to
      it - but apparently under some (unknown) circumstances the client does not
      manage to return all of them. This leads to the eventual
      attempt to CB_RECALL more than one delegation with the same nfs
      filehandle to the same client. The first recall will succeed, but the
      next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
      having delegations on cl_revoked that the client has no way to FREE
      or DELEGRETURN, with resulting inability to recover. The state manager
      on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
      and the state manager on the client will be looping unable to satisfy
      the server.
      
      List discussion also reports a race between OPEN and DELEGRETURN that
      will be avoided by only sending the delegation once to the
      client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.
      
      So, let's:
      
      1.) Not hand out duplicate delegations.
      2.) Only send them to the client once.
      
      RFC 5561:
      
      9.1.1:
      "Delegations and layouts, on the other hand, are not associated with a
      specific owner but are associated with the client as a whole
      (identified by a client ID)."
      
      10.2:
      "...the stateid for a delegation is associated with a client ID and may be
      used on behalf of all the open-owners for the given client.  A
      delegation is made to the client as a whole and not to any specific
      process or thread of control within it."
      Reported-by: default avatarEric Meddaugh <etmsys@rit.edu>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Olga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e9f4529
    • Jeff Layton's avatar
      nfsd: serialize state seqid morphing operations · 9a4a7278
      Jeff Layton authored
      commit 35a92fe8 upstream.
      
      Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
      running in parallel. The server would receive the OPEN_DOWNGRADE first
      and check its seqid, but then an OPEN would race in and bump it. The
      OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
      was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
      it should have been rejected since the seqid changed.
      
      The only recourse we have here I think is to serialize operations that
      bump the seqid in a stateid, particularly when we're given a seqid in
      the call. To address this, we add a new rw_semaphore to the
      nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
      after looking up the stateid to ensure that nothing else is going to
      bump it while we're operating on it.
      
      In the case of OPEN, we do a down_read, as the call doesn't contain a
      seqid. Those can run in parallel -- we just need to serialize them when
      there is a concurrent OPEN_DOWNGRADE or CLOSE.
      
      LOCK and LOCKU however always take the write lock as there is no
      opportunity for parallelizing those.
      Reported-and-Tested-by: default avatarAndrew W Elble <aweits@rit.edu>
      Signed-off-by: default avatarJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a4a7278