1. 06 Dec, 2014 24 commits
  2. 22 Nov, 2014 2 commits
  3. 21 Nov, 2014 14 commits
    • Matthew Leach's avatar
      net: socket: error on a negative msg_namelen · 4e073683
      Matthew Leach authored
      When copying in a struct msghdr from the user, if the user has set the
      msg_namelen parameter to a negative value it gets clamped to a valid
      size due to a comparison between signed and unsigned values.
      
      Ensure the syscall errors when the user passes in a negative value.
      Signed-off-by: default avatarMatthew Leach <matthew.leach@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit dbb490b9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4e073683
    • Vlad Yasevich's avatar
      net: core: Always propagate flag changes to interfaces · 66e1bc6b
      Vlad Yasevich authored
      The following commit:
          b6c40d68
          net: only invoke dev->change_rx_flags when device is UP
      
      tried to fix a problem with VLAN devices and promiscuouse flag setting.
      The issue was that VLAN device was setting a flag on an interface that
      was down, thus resulting in bad promiscuity count.
      This commit blocked flag propagation to any device that is currently
      down.
      
      A later commit:
          deede2fa
          vlan: Don't propagate flag changes on down interfaces
      
      fixed VLAN code to only propagate flags when the VLAN interface is up,
      thus fixing the same issue as above, only localized to VLAN.
      
      The problem we have now is that if we have create a complex stack
      involving multiple software devices like bridges, bonds, and vlans,
      then it is possible that the flags would not propagate properly to
      the physical devices.  A simple examle of the scenario is the
      following:
      
        eth0----> bond0 ----> bridge0 ---> vlan50
      
      If bond0 or eth0 happen to be down at the time bond0 is added to
      the bridge, then eth0 will never have promisc mode set which is
      currently required for operation as part of the bridge.  As a
      result, packets with vlan50 will be dropped by the interface.
      
      The only 2 devices that implement the special flag handling are
      VLAN and DSA and they both have required code to prevent incorrect
      flag propagation.  As a result we can remove the generic solution
      introduced in b6c40d68 and leave
      it to the individual devices to decide whether they will block
      flag propagation or not.
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Suggested-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit d2615bf4)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      66e1bc6b
    • Dan Carpenter's avatar
      net: clamp ->msg_namelen instead of returning an error · a6ac08ab
      Dan Carpenter authored
      If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
      original code that would lead to memory corruption in the kernel if you
      had audit configured.  If you didn't have audit configured it was
      harmless.
      
      There are some programs such as beta versions of Ruby which use too
      large of a buffer and returning an error code breaks them.  We should
      clamp the ->msg_namelen value instead.
      
      Fixes: 1661bf36 ("net: heap overflow in __audit_sockaddr()")
      Reported-by: default avatarEric Wong <normalperson@yhbt.net>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit db31c55a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a6ac08ab
    • Roman Gushchin's avatar
      net: check net.core.somaxconn sysctl values · 3f5d7e1d
      Roman Gushchin authored
      It's possible to assign an invalid value to the net.core.somaxconn
      sysctl variable, because there is no checks at all.
      
      The sk_max_ack_backlog field of the sock structure is defined as
      unsigned short. Therefore, the backlog argument in inet_listen()
      shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
      is truncated to the somaxconn value. So, the somaxconn value shouldn't
      exceed 65535 (USHRT_MAX).
      Also, negative values of somaxconn are meaningless.
      
      before:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      net.core.somaxconn = 65536
      $ sysctl -w net.core.somaxconn=-100
      net.core.somaxconn = -100
      
      after:
      $ sysctl -w net.core.somaxconn=256
      net.core.somaxconn = 256
      $ sysctl -w net.core.somaxconn=65536
      error: "Invalid argument" setting key "net.core.somaxconn"
      $ sysctl -w net.core.somaxconn=-100
      error: "Invalid argument" setting key "net.core.somaxconn"
      
      Based on a prior patch from Changli Gao.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Reported-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 5f671d6b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3f5d7e1d
    • Michal Tesar's avatar
      sysctl net: Keep tcp_syn_retries inside the boundary · f71a8f4c
      Michal Tesar authored
      Limit the min/max value passed to the
      /proc/sys/net/ipv4/tcp_syn_retries.
      Signed-off-by: default avatarMichal Tesar <mtesar@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 651e9271)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f71a8f4c
    • Nikola Pajkovsky's avatar
      crypto: api - Fix race condition in larval lookup · e323f565
      Nikola Pajkovsky authored
      crypto_larval_lookup should only return a larval if it created one.
      Any larval created by another entity must be processed through
      crypto_larval_wait before being returned.
      
      Otherwise this will lead to a larval being killed twice, which
      will most likely lead to a crash.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      
      (cherry picked from commit 77dbd7a9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e323f565
    • Julian Anastasov's avatar
      ipvs: fix CHECKSUM_PARTIAL for TCP, UDP · 9ad4a0fa
      Julian Anastasov authored
       	Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
      UDP not tested because it needs network card with HW CSUM support.
      May be fixes problem where IPVS can not be used in virtual boxes.
      Problem appears with DNAT to local address when the local stack
      sends reply in CHECKSUM_PARTIAL mode.
      
       	Fix tcp_dnat_handler and udp_dnat_handler to provide
      vaddr and daddr in right order (old and new IP) when calling
      tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      
      (cherry picked from commit 5bc9068e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9ad4a0fa
    • Willy Tarreau's avatar
      x86, ptrace: fix build breakage with gcc 4.7 (second try) · 084375ba
      Willy Tarreau authored
      syscall_trace_enter() and syscall_trace_leave() are only called from
      within asm code and do not need to be declared in the .c at all.
      Removing their reference fixes the build issue that was happening
      with gcc 4.7.
      
      Both Sven-Haegar Koch and Christoph Biedl confirmed this patch
      addresses their respective build issues.
      
      Cc: Sven-Haegar Koch <haegar@sdinet.de>
      Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      (cherry picked from commit 40c74e0d)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      084375ba
    • Weiping Pan's avatar
      rds: set correct msg_namelen · 94cc7d0e
      Weiping Pan authored
      Jay Fenlason (fenlason@redhat.com) found a bug,
      that recvfrom() on an RDS socket can return the contents of random kernel
      memory to userspace if it was called with a address length larger than
      sizeof(struct sockaddr_in).
      rds_recvmsg() also fails to set the addr_len paramater properly before
      returning, but that's just a bug.
      There are also a number of cases wher recvfrom() can return an entirely bogus
      address. Anything in rds_recvmsg() that returns a non-negative value but does
      not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
      at the end of the while(1) loop will return up to 128 bytes of kernel memory
      to userspace.
      
      And I write two test programs to reproduce this bug, you will see that in
      rds_server, fromAddr will be overwritten and the following sock_fd will be
      destroyed.
      Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
      better to make the kernel copy the real length of address to user space in
      such case.
      
      How to run the test programs ?
      I test them on 32bit x86 system, 3.5.0-rc7.
      
      1 compile
      gcc -o rds_client rds_client.c
      gcc -o rds_server rds_server.c
      
      2 run ./rds_server on one console
      
      3 run ./rds_client on another console
      
      4 you will see something like:
      server is waiting to receive data...
      old socket fd=3
      server received data from client:data from client
      msg.msg_namelen=32
      new socket fd=-1067277685
      sendmsg()
      : Bad file descriptor
      
      /***************** rds_client.c ********************/
      
      int main(void)
      {
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	struct sockaddr_in toAddr;
      	char recvBuffer[128] = "data from client";
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if (sock_fd < 0) {
      		perror("create socket error\n");
      		exit(1);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4001);
      
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	memset(&toAddr, 0, sizeof(toAddr));
      	toAddr.sin_family = AF_INET;
      	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	toAddr.sin_port = htons(4000);
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	if (sendmsg(sock_fd, &msg, 0) == -1) {
      		perror("sendto() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("client send data:%s\n", recvBuffer);
      
      	memset(recvBuffer, '\0', 128);
      
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      	if (recvmsg(sock_fd, &msg, 0) == -1) {
      		perror("recvmsg() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("receive data from server:%s\n", recvBuffer);
      
      	close(sock_fd);
      
      	return 0;
      }
      
      /***************** rds_server.c ********************/
      
      int main(void)
      {
      	struct sockaddr_in fromAddr;
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	unsigned int addrLen;
      	char recvBuffer[128];
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if(sock_fd < 0) {
      		perror("create socket error\n");
      		exit(0);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4000);
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("server is waiting to receive data...\n");
      	msg.msg_name = &fromAddr;
      
      	/*
      	 * I add 16 to sizeof(fromAddr), ie 32,
      	 * and pay attention to the definition of fromAddr,
      	 * recvmsg() will overwrite sock_fd,
      	 * since kernel will copy 32 bytes to userspace.
      	 *
      	 * If you just use sizeof(fromAddr), it works fine.
      	 * */
      	msg.msg_namelen = sizeof(fromAddr) + 16;
      	/* msg.msg_namelen = sizeof(fromAddr); */
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	while (1) {
      		printf("old socket fd=%d\n", sock_fd);
      		if (recvmsg(sock_fd, &msg, 0) == -1) {
      			perror("recvmsg() error\n");
      			close(sock_fd);
      			exit(1);
      		}
      		printf("server received data from client:%s\n", recvBuffer);
      		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
      		printf("new socket fd=%d\n", sock_fd);
      		strcat(recvBuffer, "--data from server");
      		if (sendmsg(sock_fd, &msg, 0) == -1) {
      			perror("sendmsg()\n");
      			close(sock_fd);
      			exit(1);
      		}
      	}
      
      	close(sock_fd);
      	return 0;
      }
      Signed-off-by: default avatarWeiping Pan <wpan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 06b6a1cf)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      94cc7d0e
    • Mathias Krause's avatar
      llc: Fix missing msg_namelen update in llc_ui_recvmsg() · d66e6f36
      Mathias Krause authored
      For stream sockets the code misses to update the msg_namelen member
      to 0 and therefore makes net/socket.c leak the local, uninitialized
      sockaddr_storage variable to userland -- 128 bytes of kernel stack
      memory. The msg_namelen update is also missing for datagram sockets
      in case the socket is shutting down during receive.
      
      Fix both issues by setting msg_namelen to 0 early. It will be
      updated later if we're going to fill the msg_name member.
      
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit c77a4b9c)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d66e6f36
    • Mathias Krause's avatar
      atm: update msg_namelen in vcc_recvmsg() · acca86d7
      Mathias Krause authored
      The current code does not fill the msg_name member in case it is set.
      It also does not set the msg_namelen member to 0 and therefore makes
      net/socket.c leak the local, uninitialized sockaddr_storage variable
      to userland -- 128 bytes of kernel stack memory.
      
      Fix that by simply setting msg_namelen to 0 as obviously nobody cared
      about vcc_recvmsg() not filling the msg_name in case it was set.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9b3e617f)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      acca86d7
    • Eric Dumazet's avatar
      softirq: reduce latencies · 04288af7
      Eric Dumazet authored
      In various network workloads, __do_softirq() latencies can be up
      to 20 ms if HZ=1000, and 200 ms if HZ=100.
      
      This is because we iterate 10 times in the softirq dispatcher,
      and some actions can consume a lot of cycles.
      
      This patch changes the fallback to ksoftirqd condition to :
      
      - A time limit of 2 ms.
      - need_resched() being set on current task
      
      When one of this condition is met, we wakeup ksoftirqd for further
      softirq processing if we still have pending softirqs.
      
      Using need_resched() as the only condition can trigger RCU stalls,
      as we can keep BH disabled for too long.
      
      I ran several benchmarks and got no significant difference in
      throughput, but a very significant reduction of latencies (one order
      of magnitude) :
      
      In following bench, 200 antagonist "netperf -t TCP_RR" are started in
      background, using all available cpus.
      
      Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
      IRQ (hard+soft)
      
      Before patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=550110.424
      MIN_LATENCY=146858
      MAX_LATENCY=997109
      P50_LATENCY=305000
      P90_LATENCY=550000
      P99_LATENCY=710000
      MEAN_LATENCY=376989.12
      STDDEV_LATENCY=184046.92
      
      After patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=40545.492
      MIN_LATENCY=9834
      MAX_LATENCY=78366
      P50_LATENCY=33583
      P90_LATENCY=59000
      P99_LATENCY=69000
      MEAN_LATENCY=38364.67
      STDDEV_LATENCY=12865.26
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit c10d7367)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      04288af7
    • Eric Dumazet's avatar
      net: reduce net_rx_action() latency to 2 HZ · 1626cfc6
      Eric Dumazet authored
      We should use time_after_eq() to get maximum latency of two ticks,
      instead of three.
      
      Bug added in commit 24f8b238 (net: increase receive packet quantum)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit d1f41b67)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1626cfc6
    • Mathias Krause's avatar
      Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg() · 7ecb3f41
      Mathias Krause authored
      If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
      early with 0 without updating the possibly set msg_namelen member. This,
      in turn, leads to a 128 byte kernel stack leak in net/socket.c.
      
      Fix this by updating msg_namelen in this case. For all other cases it
      will be handled in bt_sock_stream_recvmsg().
      
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit e11e0455)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7ecb3f41