1. 05 Mar, 2020 3 commits
    • Yonghong Song's avatar
      selftests/bpf: Add send_signal_sched_switch test · c4ef2f32
      Yonghong Song authored
      Added one test, send_signal_sched_switch, to test bpf_send_signal()
      helper triggered by sched/sched_switch tracepoint. This test can be used
      to verify kernel deadlocks fixed by the previous commit. The test itself
      is heavily borrowed from Commit eac9153f ("bpf/stackmap: Fix deadlock
      with rq_lock in bpf_get_stack()").
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200304191105.2796601-1-yhs@fb.com
      c4ef2f32
    • Yonghong Song's avatar
      bpf: Fix deadlock with rq_lock in bpf_send_signal() · 1bc7896e
      Yonghong Song authored
      When experimenting with bpf_send_signal() helper in our production
      environment (5.2 based), we experienced a deadlock in NMI mode:
         #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
         #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
         #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
         #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
         #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
        #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
        #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
        #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
        #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
        #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
        #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
        #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
        #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
        #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
        #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
        #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
        #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
        #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
        #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068
      
      The above call stack is actually very similar to an issue
      reported by Commit eac9153f ("bpf/stackmap: Fix deadlock with
      rq_lock in bpf_get_stack()") by Song Liu. The only difference is
      bpf_send_signal() helper instead of bpf_get_stack() helper.
      
      The above deadlock is triggered with a perf_sw_event.
      Similar to Commit eac9153f, the below almost identical reproducer
      used tracepoint point sched/sched_switch so the issue can be easily caught.
        /* stress_test.c */
        #include <stdio.h>
        #include <stdlib.h>
        #include <sys/mman.h>
        #include <pthread.h>
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <fcntl.h>
      
        #define THREAD_COUNT 1000
        char *filename;
        void *worker(void *p)
        {
              void *ptr;
              int fd;
              char *pptr;
      
              fd = open(filename, O_RDONLY);
              if (fd < 0)
                      return NULL;
              while (1) {
                      struct timespec ts = {0, 1000 + rand() % 2000};
      
                      ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                      usleep(1);
                      if (ptr == MAP_FAILED) {
                              printf("failed to mmap\n");
                              break;
                      }
                      munmap(ptr, 4096 * 64);
                      usleep(1);
                      pptr = malloc(1);
                      usleep(1);
                      pptr[0] = 1;
                      usleep(1);
                      free(pptr);
                      usleep(1);
                      nanosleep(&ts, NULL);
              }
              close(fd);
              return NULL;
        }
      
        int main(int argc, char *argv[])
        {
              void *ptr;
              int i;
              pthread_t threads[THREAD_COUNT];
      
              if (argc < 2)
                      return 0;
      
              filename = argv[1];
      
              for (i = 0; i < THREAD_COUNT; i++) {
                      if (pthread_create(threads + i, NULL, worker, NULL)) {
                              fprintf(stderr, "Error creating thread\n");
                              return 0;
                      }
              }
      
              for (i = 0; i < THREAD_COUNT; i++)
                      pthread_join(threads[i], NULL);
              return 0;
        }
      and the following command:
        1. run `stress_test /bin/ls` in one windown
        2. hack bcc trace.py with the following change:
           --- a/tools/trace.py
           +++ b/tools/trace.py
           @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
                    __data.tgid = __tgid;
                    __data.pid = __pid;
                    bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
           +        bpf_send_signal(10);
            %s
            %s
                    %s.perf_submit(%s, &__data, sizeof(__data));
        3. in a different window run
           ./trace.py -p $(pidof stress_test) t:sched:sched_switch
      
      The deadlock can be reproduced in our production system.
      
      Similar to Song's fix, the fix is to delay sending signal if
      irqs is disabled to avoid deadlocks involving with rq_lock.
      With this change, my above stress-test in our production system
      won't cause deadlock any more.
      
      I also implemented a scale-down version of reproducer in the
      selftest (a subsequent commit). With latest bpf-next,
      it complains for the following potential deadlock.
        [   32.832450] -> #1 (&p->pi_lock){-.-.}:
        [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
        [   32.833696]        task_rq_lock+0x2c/0xa0
        [   32.834182]        task_sched_runtime+0x59/0xd0
        [   32.834721]        thread_group_cputime+0x250/0x270
        [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
        [   32.835959]        do_task_stat+0x8a7/0xb80
        [   32.836461]        proc_single_show+0x51/0xb0
        ...
        [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
        [   32.840275]        __lock_acquire+0x1358/0x1a20
        [   32.840826]        lock_acquire+0xc7/0x1d0
        [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
        [   32.841916]        __lock_task_sighand+0x79/0x160
        [   32.842465]        do_send_sig_info+0x35/0x90
        [   32.842977]        bpf_send_signal+0xa/0x10
        [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
        [   32.844301]        trace_call_bpf+0x115/0x270
        [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
        [   32.845411]        perf_trace_sched_switch+0x10f/0x180
        [   32.846014]        __schedule+0x45d/0x880
        [   32.846483]        schedule+0x5f/0xd0
        ...
      
        [   32.853148] Chain exists of:
        [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
        [   32.853148]
        [   32.854451]  Possible unsafe locking scenario:
        [   32.854451]
        [   32.855173]        CPU0                    CPU1
        [   32.855745]        ----                    ----
        [   32.856278]   lock(&rq->lock);
        [   32.856671]                                lock(&p->pi_lock);
        [   32.857332]                                lock(&rq->lock);
        [   32.857999]   lock(&(&sighand->siglock)->rlock);
      
        Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
        but it has been held by CPU1 and CPU1 tries to grab &rq->lock
        and cannot get it.
      
        This is not exactly the callstack in our production environment,
        but sympotom is similar and both locks are using spin_lock_irqsave()
        to acquire the lock, and both involves rq_lock. The fix to delay
        sending signal when irq is disabled also fixed this issue.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
      1bc7896e
    • Quentin Monnet's avatar
      mailmap: Update email address · 52e7c083
      Quentin Monnet authored
      My Netronome address is no longer active. I am no maintainer, but
      get_maintainer.pl sometimes returns my name for a small number of
      files (BPF-related). Add an entry to .mailmap for good measure.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200226171353.18982-1-quentin@isovalent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      52e7c083
  2. 04 Mar, 2020 6 commits
    • Dajun Jin's avatar
      drivers/of/of_mdio.c:fix of_mdiobus_register() · 209c65b6
      Dajun Jin authored
      When registers a phy_device successful, should terminate the loop
      or the phy_device would be registered in other addr. If there are
      multiple PHYs without reg properties, it will go wrong.
      Signed-off-by: default avatarDajun Jin <adajunjin@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      209c65b6
    • Vishal Kulkarni's avatar
      cxgb4: fix checks for max queues to allocate · 116ca924
      Vishal Kulkarni authored
      Hardware can support more than 8 queues currently limited by
      netif_get_num_default_rss_queues(). So, rework and fix checks for max
      number of queues to allocate. The checks should be based on how many are
      actually supported by hardware, OR the number of online cpus; whichever
      is lower.
      
      Fixes: 5952dde7 ("cxgb4: set maximal number of default RSS queues")
      Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>"
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      116ca924
    • Hauke Mehrtens's avatar
      phylink: Improve error message when validate failed · 20d8bb0d
      Hauke Mehrtens authored
      This should improve the error message when the PHY validate in the MAC
      driver failed. I ran into this problem multiple times that I put wrong
      interface values into the device tree and was searching why it is
      failing with -22 (-EINVAL). This should make it easier to spot the
      problem.
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Acked-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20d8bb0d
    • Jonas Gorski's avatar
      net: phy: bcm63xx: fix OOPS due to missing driver name · 43de81b0
      Jonas Gorski authored
      719655a1 ("net: phy: Replace phy driver features u32 with link_mode
      bitmap") was a bit over-eager and also removed the second phy driver's
      name, resulting in a nasty OOPS on registration:
      
      [    1.319854] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 804dd50c, ra == 804dd4f0
      [    1.330859] Oops[#1]:
      [    1.333138] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.22 #0
      [    1.339217] $ 0   : 00000000 00000001 87ca7f00 805c1874
      [    1.344590] $ 4   : 00000000 00000047 00585000 8701f800
      [    1.349965] $ 8   : 8701f800 804f4a5c 00000003 64726976
      [    1.355341] $12   : 00000001 00000000 00000000 00000114
      [    1.360718] $16   : 87ca7f80 00000000 00000000 80639fe4
      [    1.366093] $20   : 00000002 00000000 806441d0 80b90000
      [    1.371470] $24   : 00000000 00000000
      [    1.376847] $28   : 87c1e000 87c1fda0 80b90000 804dd4f0
      [    1.382224] Hi    : d1c8f8da
      [    1.385180] Lo    : 5518a480
      [    1.388182] epc   : 804dd50c kset_find_obj+0x3c/0x114
      [    1.393345] ra    : 804dd4f0 kset_find_obj+0x20/0x114
      [    1.398530] Status: 10008703 KERNEL EXL IE
      [    1.402833] Cause : 00800008 (ExcCode 02)
      [    1.406952] BadVA : 00000000
      [    1.409913] PrId  : 0002a075 (Broadcom BMIPS4350)
      [    1.414745] Modules linked in:
      [    1.417895] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
      [    1.426214] Stack : 87cec000 80630000 80639370 80640658 80640000 80049af4 80639fe4 8063a0d8
      [    1.434816]         8063a0d8 802ef078 00000002 00000000 806441d0 80b90000 8063a0d8 802ef114
      [    1.443417]         87cea0de 87c1fde0 00000000 804de488 87cea000 8063a0d8 8063a0d8 80334e48
      [    1.452018]         80640000 8063984c 80639bf4 00000000 8065de48 00000001 8063a0d8 80334ed0
      [    1.460620]         806441d0 80b90000 80b90000 802ef164 8065dd70 80620000 80b90000 8065de58
      [    1.469222]         ...
      [    1.471734] Call Trace:
      [    1.474255] [<804dd50c>] kset_find_obj+0x3c/0x114
      [    1.479141] [<802ef078>] driver_find+0x1c/0x44
      [    1.483665] [<802ef114>] driver_register+0x74/0x148
      [    1.488719] [<80334e48>] phy_driver_register+0x9c/0xd0
      [    1.493968] [<80334ed0>] phy_drivers_register+0x54/0xe8
      [    1.499345] [<8001061c>] do_one_initcall+0x7c/0x1f4
      [    1.504374] [<80644ed8>] kernel_init_freeable+0x1d4/0x2b4
      [    1.509940] [<804f4e24>] kernel_init+0x10/0xf8
      [    1.514502] [<80018e68>] ret_from_kernel_thread+0x14/0x1c
      [    1.520040] Code: 1060000c  02202025  90650000 <90810000> 24630001  14250004  24840001  14a0fffb  90650000
      [    1.530061]
      [    1.531698] ---[ end trace d52f1717cd29bdc8 ]---
      
      Fix it by readding the name.
      
      Fixes: 719655a1 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43de81b0
    • Jacob Keller's avatar
      devlink: remove trigger command from devlink-region.rst · 70751834
      Jacob Keller authored
      The devlink trigger command does not exist. While rewriting the
      documentation for devlink into the reStructuredText format,
      documentation for the trigger command was accidentally merged in. This
      occurred because the author was also working on a potential extension to
      devlink regions which included this trigger command, and accidentally
      squashed the documentation incorrectly.
      
      Further review eventually settled on using the previously unused "new"
      command instead of creating a new trigger command.
      
      Fix this by removing mention of the trigger command from the
      documentation.
      
      Fixes: 0b0f945f ("devlink: add a file documenting devlink regions", 2020-01-10)
      Noticed-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70751834
    • Jonathan Neuschäfer's avatar
  3. 03 Mar, 2020 22 commits
  4. 02 Mar, 2020 4 commits
  5. 01 Mar, 2020 5 commits
    • Vladimir Oltean's avatar
      net: dsa: sja1105: Don't destroy not-yet-created xmit_worker · 52c0d4e3
      Vladimir Oltean authored
      Fixes the following NULL pointer dereference on PHY connect error path
      teardown:
      
      [    2.291010] sja1105 spi0.1: Probed switch chip: SJA1105T
      [    2.310044] sja1105 spi0.1: Enabled switch tagging
      [    2.314970] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
      [    2.322463] 8<--- cut here ---
      [    2.325497] Unable to handle kernel NULL pointer dereference at virtual address 00000018
      [    2.333555] pgd = (ptrval)
      [    2.336241] [00000018] *pgd=00000000
      [    2.339797] Internal error: Oops: 5 [#1] SMP ARM
      [    2.344384] Modules linked in:
      [    2.347420] CPU: 1 PID: 64 Comm: kworker/1:1 Not tainted 5.5.0-rc5 #1
      [    2.353820] Hardware name: Freescale LS1021A
      [    2.358070] Workqueue: events deferred_probe_work_func
      [    2.363182] PC is at kthread_destroy_worker+0x4/0x74
      [    2.368117] LR is at sja1105_teardown+0x70/0xb4
      [    2.372617] pc : [<c036cdd4>]    lr : [<c0b89238>]    psr: 60000013
      [    2.378845] sp : eeac3d30  ip : eeab1900  fp : eef45480
      [    2.384036] r10: eef4549c  r9 : 00000001  r8 : 00000000
      [    2.389227] r7 : eef527c0  r6 : 00000034  r5 : ed8ddd0c  r4 : ed8ddc40
      [    2.395714] r3 : 00000000  r2 : 00000000  r1 : eef4549c  r0 : 00000000
      [    2.402204] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [    2.409297] Control: 10c5387d  Table: 8020406a  DAC: 00000051
      [    2.415008] Process kworker/1:1 (pid: 64, stack limit = 0x(ptrval))
      [    2.421237] Stack: (0xeeac3d30 to 0xeeac4000)
      [    2.612635] [<c036cdd4>] (kthread_destroy_worker) from [<c0b89238>] (sja1105_teardown+0x70/0xb4)
      [    2.621379] [<c0b89238>] (sja1105_teardown) from [<c10717fc>] (dsa_switch_teardown.part.1+0x48/0x74)
      [    2.630467] [<c10717fc>] (dsa_switch_teardown.part.1) from [<c1072438>] (dsa_register_switch+0x8b0/0xbf4)
      [    2.639984] [<c1072438>] (dsa_register_switch) from [<c0b89c30>] (sja1105_probe+0x2ac/0x464)
      [    2.648378] [<c0b89c30>] (sja1105_probe) from [<c0b11a5c>] (spi_drv_probe+0x7c/0xa0)
      [    2.656081] [<c0b11a5c>] (spi_drv_probe) from [<c0a26ab8>] (really_probe+0x208/0x480)
      [    2.663871] [<c0a26ab8>] (really_probe) from [<c0a26f0c>] (driver_probe_device+0x78/0x1c4)
      [    2.672093] [<c0a26f0c>] (driver_probe_device) from [<c0a24c48>] (bus_for_each_drv+0x80/0xc4)
      [    2.680574] [<c0a24c48>] (bus_for_each_drv) from [<c0a26810>] (__device_attach+0xd0/0x168)
      [    2.688794] [<c0a26810>] (__device_attach) from [<c0a259d8>] (bus_probe_device+0x84/0x8c)
      [    2.696927] [<c0a259d8>] (bus_probe_device) from [<c0a25f24>] (deferred_probe_work_func+0x84/0xc4)
      [    2.705842] [<c0a25f24>] (deferred_probe_work_func) from [<c03667b0>] (process_one_work+0x22c/0x560)
      [    2.714926] [<c03667b0>] (process_one_work) from [<c0366d8c>] (worker_thread+0x2a8/0x5d4)
      [    2.723059] [<c0366d8c>] (worker_thread) from [<c036cf94>] (kthread+0x150/0x154)
      [    2.730416] [<c036cf94>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
      
      Checking for NULL pointer is correct because the per-port xmit kernel
      threads are created in sja1105_probe immediately after calling
      dsa_register_switch.
      
      Fixes: a68578c2 ("net: dsa: Make deferred_xmit private to sja1105")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52c0d4e3
    • Hangbin Liu's avatar
      net/ipv6: use configured metric when add peer route · 07758eb9
      Hangbin Liu authored
      When we add peer address with metric configured, IPv4 could set the dest
      metric correctly, but IPv6 do not. e.g.
      
      ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20
      ]# ip route show dev eth1
      192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20
      ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20
      ]# ip -6 route show dev eth1
      2001:db8::1 proto kernel metric 20 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      
      Fix this by using configured metric instead of default one.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes")
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      07758eb9
    • Russell King's avatar
      net: dsa: mv88e6xxx: fix lockup on warm boot · 0395823b
      Russell King authored
      If the switch is not hardware reset on a warm boot, interrupts can be
      left enabled, and possibly pending. This will cause us to enter an
      infinite loop trying to service an interrupt we are unable to handle,
      thereby preventing the kernel from booting.
      
      Ensure that the global 2 interrupt sources are disabled before we claim
      the parent interrupt.
      
      Observed on the ZII development revision B and C platforms with
      reworked serdes support, and using reboot -f to reboot the platform.
      
      Fixes: dc30c35b ("net: dsa: mv88e6xxx: Implement interrupt support.")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0395823b
    • Randy Dunlap's avatar
      atm: nicstar: fix if-statement empty body warning · 8a171c5c
      Randy Dunlap authored
      When debugging via PRINTK() is not enabled, make the PRINTK()
      macro be an empty do-while block.
      
      Thix fixes a gcc warning when -Wextra is set:
      ../drivers/atm/nicstar.c:1819:23: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body]
      
      I have verified that there is no object code change (with gcc 7.5.0).
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Chas Williams <3chas3@gmail.com>
      Cc: linux-atm-general@lists.sourceforge.net
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a171c5c
    • Pablo Neira Ayuso's avatar
      netlink: Use netlink header as base to calculate bad attribute offset · 84b32680
      Pablo Neira Ayuso authored
      Userspace might send a batch that is composed of several netlink
      messages. The netlink_ack() function must use the pointer to the netlink
      header as base to calculate the bad attribute offset.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84b32680