1. 30 Dec, 2012 5 commits
    • Eric Dumazet's avatar
      veth: extend device features · 8093315a
      Eric Dumazet authored
      veth is lacking most modern facilities, like SG, checksums, TSO.
      
      It makes sense to extend dev->features to get them, or GRO aggregation
      is defeated by a forced segmentation.
      Reported-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8093315a
    • Eric Dumazet's avatar
      veth: reduce stat overhead · 2681128f
      Eric Dumazet authored
      veth stats are a bit bloated. There is no need to account transmit
      and receive stats, since they are absolutely symmetric.
      
      Also use a per device atomic64_t for the dropped counter, as it
      should never be used in fast path.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2681128f
    • Flavio Leitner's avatar
      team: implement carrier change · 4cafe373
      Flavio Leitner authored
      The user space teamd daemon may need to control the
      master's carrier state depending on the selected mode.
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cafe373
    • stephen hemminger's avatar
      bridge: respect RFC2863 operational state · 576eb625
      stephen hemminger authored
      The bridge link detection should follow the operational state
      of the lower device, rather than the carrier bit. This allows devices
      like tunnels that are controlled by userspace control plane to work
      with bridge STP link management.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Reviewed-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      576eb625
    • Daniel Borkmann's avatar
      net: filter: return -EINVAL if BPF_S_ANC* operation is not supported · aa1113d9
      Daniel Borkmann authored
      Currently, we return -EINVAL for malformed or wrong BPF filters.
      However, this is not done for BPF_S_ANC* operations, which makes it
      more difficult to detect if it's actually supported or not by the
      BPF machine. Therefore, we should also return -EINVAL if K is within
      the SKF_AD_OFF universe and the ancillary operation did not match.
      
      Why exactly is it needed? If tools such as libpcap/tcpdump want to
      make use of new ancillary operations (like filtering VLAN in kernel
      space), there is currently no sane way to test if this feature /
      BPF_S_ANC* op is present or not, since no error is returned. This
      patch will make life easier for that and allow for a proper usage
      for user space applications.
      
      There was concern, if this patch will break userland. Short answer: Yes
      and no. Long answer: It will "break" only for code that calls ...
      
        { BPF_LD | BPF_(W|H|B) | BPF_ABS, 0, 0, <K> },
      
      ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is *not* an
      ancillary. And here comes the BUT: assuming some *old* code will have
      such an instruction where <K> is between [0xfffff000, 0xffffffff] and
      it doesn't know ancillary operations, then this will give a
      non-expected / unwanted behavior as well (since we do not return the
      BPF machine with 0 after a failed load_pointer(), which was the case
      before introducing ancillary operations, but load sth. into the
      accumulator instead, and continue with the next instruction, for
      instance). Thus, user space code would already have been broken by
      introducing ancillary operations into the BPF machine per se. Code
      that does such a direct load, e.g. "load word at packet offset
      0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken,
      isn't it? The whole assumption of ancillary operations is that no-one
      intentionally calls things like "ld [0xffffffff]" and expect this
      word to be loaded from such a packet offset. Hence, we can also safely
      make use of this feature testing patch and facilitate application
      development. Therefore, at least from this patch onwards, we have
      *for sure* a check whether current or in future implemented BPF_S_ANC*
      ops are supported in the kernel. Patch was tested on x86_64.
      
      (Thanks to Eric for the previous review.)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reported-by: default avatarAni Sinha <ani@aristanetworks.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa1113d9
  2. 29 Dec, 2012 3 commits
  3. 28 Dec, 2012 6 commits
  4. 27 Dec, 2012 3 commits
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 101e5c74
      Linus Torvalds authored
      Pull hwmon fixes from Guenter Roeck:
      
       - Report i2c errors to userspace in lm73 driver
      
       - Fix problem with DIV_ROUND_CLOSEST and unsigned divisors in emc6w201
         driver
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (emc6w201) Fix DIV_ROUND_CLOSEST problem with unsigned divisors
        hwmon: (lm73} Detect and report i2c bus errors
      101e5c74
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · ddf75ae3
      Linus Torvalds authored
      Pull namespace fixes from Eric Biederman:
       "This tree includes two bug fixes for problems Oleg spotted on his
        review of the recent pid namespace work.  A small fix to not enable
        bottom halves with irqs disabled, and a trivial build fix for f2fs
        with user namespaces enabled."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        f2fs: Don't assign e_id in f2fs_acl_from_disk
        proc: Allow proc_free_inum to be called from any context
        pidns: Stop pid allocation when init dies
        pidns: Outlaw thread creation after unshare(CLONE_NEWPID)
      ddf75ae3
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7fd83b47
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
      1) GRE tunnel drivers don't set the transport header properly, they also
         blindly deref the inner protocol ipv4 and needs some checks.  Fixes
         from Isaku Yamahata.
      
      2) Fix sleeps while atomic in netdevice rename code, from Eric Dumazet.
      
      3) Fix double-spinlock in solos-pci driver, from Dan Carpenter.
      
      4) More ARP bug fixes.  Fix lockdep splat in arp_solicit() and then the
         bug accidentally added by that fix.  From Eric Dumazet and Cong Wang.
      
      5) Remove some __dev* annotations that slipped back in, as well as all
         HOTPLUG references.  From Greg KH
      
      6) RDS protocol uses wrong interfaces to access scatter-gather elements,
         causing a regression.  From Mike Marciniszyn.
      
      7) Fix build error in cpts driver, from Richard Cochran.
      
      8) Fix arithmetic in packet scheduler, from Stefan Hasko.
      
      9) Similarly, fix association during calculation of random backoff in
         batman-adv.  From Akinobu Mita.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
        ipv6/ip6_gre: set transport header correctly
        ipv4/ip_gre: set transport header correctly to gre header
        IB/rds: suppress incompatible protocol when version is known
        IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
        net/vxlan: Use the underlying device index when joining/leaving multicast groups
        tcp: should drop incoming frames without ACK flag set
        netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
        cpts: fix a run time warn_on.
        cpts: fix build error by removing useless code.
        batman-adv: fix random jitter calculation
        arp: fix a regression in arp_solicit()
        net: sched: integer overflow fix
        CONFIG_HOTPLUG removal from networking core
        Drivers: network: more __dev* removal
        bridge: call br_netpoll_disable in br_add_if
        ipv4: arp: fix a lockdep splat in arp_solicit()
        tuntap: dont use a private kmem_cache
        net: devnet_rename_seq should be a seqcount
        ip_gre: fix possible use after free
        ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
        ...
      7fd83b47
  5. 26 Dec, 2012 14 commits
    • Isaku Yamahata's avatar
      ipv6/ip6_gre: set transport header correctly · ae782bb1
      Isaku Yamahata authored
      ip6gre_xmit2() incorrectly sets transport header to inner payload
      instead of GRE header. It seems copy-and-pasted from ipip.c.
      Set transport header to gre header.
      (In ipip case the transport header is the inner ip header, so that's
      correct.)
      
      Found by inspection. In practice the incorrect transport header
      doesn't matter because the skb usually is sent to another net_device
      or socket, so the transport header isn't referenced.
      Signed-off-by: default avatarIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae782bb1
    • Isaku Yamahata's avatar
      ipv4/ip_gre: set transport header correctly to gre header · 861aa6d5
      Isaku Yamahata authored
      ipgre_tunnel_xmit() incorrectly sets transport header to inner payload
      instead of GRE header. It seems copy-and-pasted from ipip.c.
      So set transport header to gre header.
      (In ipip case the transport header is the inner ip header, so that's
      correct.)
      
      Found by inspection. In practice the incorrect transport header
      doesn't matter because the skb usually is sent to another net_device
      or socket, so the transport header isn't referenced.
      Signed-off-by: default avatarIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      861aa6d5
    • Marciniszyn, Mike's avatar
      IB/rds: suppress incompatible protocol when version is known · a4967598
      Marciniszyn, Mike authored
      Add an else to only print the incompatible protocol message
      when version hasn't been established.
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4967598
    • Marciniszyn, Mike's avatar
      IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len · f2e9bd70
      Marciniszyn, Mike authored
      0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
      added uses of sg_dma_len() and sg_dma_address(). This makes
      RDS DOA with the qib driver.
      
      IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
      respectively since some HCAs overload ib_sg_dma* operations.
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2e9bd70
    • Yan Burman's avatar
      net/vxlan: Use the underlying device index when joining/leaving multicast groups · af9b078e
      Yan Burman authored
      The socket calls from vxlan to join/leave multicast group aren't
      using the index of the underlying device, as a result the stack uses
      the first interface that is up. This results in vxlan being non functional
      over a device which isn't the 1st to be up.
      Fix this by providing the iflink field to the vxlan instance
      to the multicast calls.
      Signed-off-by: default avatarYan Burman <yanb@mellanox.com>
      Acked-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af9b078e
    • Eric Dumazet's avatar
      tcp: should drop incoming frames without ACK flag set · c3ae62af
      Eric Dumazet authored
      In commit 96e0bf4b (tcp: Discard segments that ack data not yet
      sent) John Dykstra enforced a check against ack sequences.
      
      In commit 354e4aa3 (tcp: RFC 5961 5.2 Blind Data Injection Attack
      Mitigation) I added more safety tests.
      
      But we missed fact that these tests are not performed if ACK bit is
      not set.
      
      RFC 793 3.9 mandates TCP should drop a frame without ACK flag set.
      
      " fifth check the ACK field,
            if the ACK bit is off drop the segment and return"
      
      Not doing so permits an attacker to only guess an acceptable sequence
      number, evading stronger checks.
      
      Many thanks to Zhiyun Qian for bringing this issue to our attention.
      
      See :
      http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdfReported-by: default avatarZhiyun Qian <zhiyunq@umich.edu>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: John Dykstra <john.dykstra1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3ae62af
    • Christoffer Dall's avatar
      mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED · ad4b3fb7
      Christoffer Dall authored
      Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
      (PageHead) is true, for tail pages.  If this is indeed the intended
      behavior, which I doubt because it breaks cache cleaning on some ARM
      systems, then the nomenclature is highly problematic.
      
      This patch makes sure PageHead is only true for head pages and PageTail
      is only true for tail pages, and neither is true for non-compound pages.
      
      [ This buglet seems ancient - seems to have been introduced back in Apr
        2008 in commit 6a1e7f77: "pageflags: convert to the use of new
        macros".  And the reason nobody noticed is because the PageHead()
        tests are almost all about just sanity-checking, and only used on
        pages that are actual page heads.  The fact that the old code returned
        true for tail pages too was thus not really noticeable.   - Linus ]
      Signed-off-by: default avatarChristoffer Dall <cdall@cs.columbia.edu>
      Acked-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Cc: Steve Capper <Steve.Capper@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: stable@kernel.org  # 2.6.26+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad4b3fb7
    • Li Zefan's avatar
      netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled · 3d0dcfbd
      Li Zefan authored
      sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d0dcfbd
    • Richard Cochran's avatar
      cpts: fix a run time warn_on. · ccb6e984
      Richard Cochran authored
      This patch fixes a warning in clk_enable by calling clk_prepare_enable
      instead.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccb6e984
    • Richard Cochran's avatar
      cpts: fix build error by removing useless code. · cbc44dbe
      Richard Cochran authored
      The cpts driver tries to obtain the input clock frequency by calling the
      clock's internal 'recalc' method. Since <plat/clock.h> has been removed,
      this code can no longer compile.
      
      However, the driver never makes use of the frequency value, so this patch
      fixes the issue by removing the offending code altogether.
      Signed-off-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbc44dbe
    • Akinobu Mita's avatar
      batman-adv: fix random jitter calculation · 143cdd8f
      Akinobu Mita authored
      batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
      in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
      
              msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
              msecs += (random32() % 2 * BATADV_JITTER);
      
      But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
      because '%' and '*' have same precedence and associativity is
      left-to-right.
      
      This adds the parentheses at the appropriate position so that it matches
      original intension.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: default avatarAntonio Quartulli <ordex@autistici.org>
      Cc: Marek Lindner <lindner_marek@yahoo.de>
      Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
      Cc: Antonio Quartulli <ordex@autistici.org>
      Cc: b.a.t.m.a.n@lists.open-mesh.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      143cdd8f
    • Eric W. Biederman's avatar
      f2fs: Don't assign e_id in f2fs_acl_from_disk · 48c6d121
      Eric W. Biederman authored
      With user namespaces enabled building f2fs fails with:
      
       CC      fs/f2fs/acl.o
      fs/f2fs/acl.c: In function ‘f2fs_acl_from_disk’:
      fs/f2fs/acl.c:85:21: error: ‘struct posix_acl_entry’ has no member named ‘e_id’
      make[2]: *** [fs/f2fs/acl.o] Error 1
      make[2]: Target `__build' not remade because of errors.
      
      e_id is a backwards compatibility field only used for file systems
      that haven't been converted to use kuids and kgids.  When the posix
      acl tag field is neither ACL_USER nor ACL_GROUP assigning e_id is
      unnecessary.  Remove the assignment so f2fs will build with user
      namespaces enabled.
      
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Amit Sahrawat <a.sahrawat@samsung.com>
      Acked-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      48c6d121
    • Eric W. Biederman's avatar
      proc: Allow proc_free_inum to be called from any context · dfb2ea45
      Eric W. Biederman authored
      While testing the pid namespace code I hit this nasty warning.
      
      [  176.262617] ------------[ cut here ]------------
      [  176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
      [  176.265145] Hardware name: Bochs
      [  176.265677] Modules linked in:
      [  176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
      [  176.266564] Call Trace:
      [  176.266564]  [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0
      [  176.266564]  [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20
      [  176.266564]  [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0
      [  176.266564]  [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20
      [  176.266564]  [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50
      [  176.266564]  [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80
      [  176.266564]  [<ffffffff8111d195>] put_pid_ns+0x35/0x50
      [  176.266564]  [<ffffffff810c608a>] put_pid+0x4a/0x60
      [  176.266564]  [<ffffffff8146b177>] tty_ioctl+0x717/0xc10
      [  176.266564]  [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90
      [  176.266564]  [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10
      [  176.266564]  [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70
      [  176.266564]  [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550
      [  176.266564]  [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60
      [  176.266564]  [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80
      [  176.266564]  [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0
      [  176.266564]  [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0
      [  176.266564]  [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50
      [  176.266564]  [<ffffffff81939199>] system_call_fastpath+0x16/0x1b
      [  176.266564] ---[ end trace 387af88219ad6143 ]---
      
      It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
      put_pid is called with another spinlock held and irqs disabled.
      
      For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
      in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      dfb2ea45
    • Eric W. Biederman's avatar
      pidns: Stop pid allocation when init dies · c876ad76
      Eric W. Biederman authored
      Oleg pointed out that in a pid namespace the sequence.
      - pid 1 becomes a zombie
      - setns(thepidns), fork,...
      - reaping pid 1.
      - The injected processes exiting.
      
      Can lead to processes attempting access their child reaper and
      instead following a stale pointer.
      
      That waitpid for init can return before all of the processes in
      the pid namespace have exited is also unfortunate.
      
      Avoid these problems by disabling the allocation of new pids in a pid
      namespace when init dies, instead of when the last process in a pid
      namespace is reaped.
      Pointed-out-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      c876ad76
  6. 25 Dec, 2012 2 commits
  7. 23 Dec, 2012 3 commits
  8. 22 Dec, 2012 4 commits