1. 07 Apr, 2023 2 commits
  2. 06 Apr, 2023 15 commits
  3. 05 Apr, 2023 19 commits
    • Daniel Vetter's avatar
      Merge tag 'drm-misc-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 3dfa8926
      Daniel Vetter authored
      Short summary of fixes pull:
      
       * ivpu: DMA fence and suspend fixes
       * nouveau: Color-depth fixes
       * panfrost: Fix mmap error handling
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230405182855.GA1551@linux-uq9g
      3dfa8926
    • Ard Biesheuvel's avatar
      arm64: compat: Work around uninitialized variable warning · 32d85999
      Ard Biesheuvel authored
      Dan reports that smatch complains about a potential uninitialized
      variable being used in the compat alignment fixup code.
      
      The logic is not wrong per se, but we do end up using an uninitialized
      variable if reading the instruction that triggered the alignment fault
      from user space faults, even if the fault ensures that the uninitialized
      value doesn't propagate any further.
      
      Given that we just give up and return 1 if any fault occurs when reading
      the instruction, let's get rid of the 'success handling' pattern that
      captures the fault in a variable and aborts later, and instead, just
      return 1 immediately if any of the get_user() calls result in an
      exception.
      
      Fixes: 3fc24ef3 ("arm64: compat: Implement misalignment fixups for multiword loads")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Link: https://lore.kernel.org/r/202304021214.gekJ8yRc-lkp@intel.com/Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20230404103625.2386382-1-ardb@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      32d85999
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 99ddf225
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix timerlat notification, as it was not triggering the notify to
         users when a new max latency was hit.
      
       - Do not trigger max latency if the tracing is off.
      
         When tracing is off, the ring buffer is not updated, it does not make
         sense to notify when there's a new max latency detected by the
         tracer, as why that latency happened is not available. The tracing
         logic still runs when the ring buffer is disabled, but it should not
         be triggering notifications.
      
       - Fix race on freeing the synthetic event "last_cmd" variable by adding
         a mutex around it.
      
       - Fix race between reader and writer of the ring buffer by adding
         memory barriers. When the writer is still on the reader page it must
         have its content visible on the buffer before it moves the commit
         index that the reader uses to know how much content is on the page.
      
       - Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
         _RET_IP_, which gets broken if it is not inlined.
      
       - Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
      
         The field formats of trace events are calculated by using
         sizeof(type) and other means by what is passed into the structure
         macros like __field(). The __field() macro is only meant for atom
         types like int, long, short, pointer, etc. It is not meant for
         arrays.
      
         The code will currently compile with arrays, but then the format
         produced will be inaccurate, and user space parsing tools will break.
      
         Two bugs have already been fixed, now add code that will make the
         kernel fail to build if another trace event includes this buggy field
         format.
      
       - Fix boot up snapshot code:
      
         Boot snapshots were triggering when not even asked for on the kernel
         command line. This was caused by two bugs:
      
          1) It would trigger a snapshot on any instance if one was created
             from the kernel command line.
      
          2) The error handling would only affect the top level instance.
             So the fact that a snapshot was done on a instance that didn't
             allocate a buffer triggered a warning written into the top level
             buffer, and worse yet, disabled the top level buffer.
      
       - Fix memory leak that was caused when an error was logged in a trace
         buffer instance, and then the buffer instance was removed.
      
         The allocated error log messages still needed to be freed.
      
      * tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Free error logs of tracing instances
        tracing: Fix ftrace_boot_snapshot command line logic
        tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
        tracing: Error if a trace event has an array for a __field()
        tracing/osnoise: Fix notify new tracing_max_latency
        tracing/timerlat: Notify new max thread latency
        ftrace: Mark get_lock_parent_ip() __always_inline
        ring-buffer: Fix race while reader and writer are on the same page
        tracing/synthetic: Fix races on freeing last_cmd
      99ddf225
    • Steven Rostedt (Google)'s avatar
      tracing: Free error logs of tracing instances · 3357c6e4
      Steven Rostedt (Google) authored
      When a tracing instance is removed, the error messages that hold errors
      that occurred in the instance needs to be freed. The following reports a
      memory leak:
      
       # cd /sys/kernel/tracing
       # mkdir instances/foo
       # echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger
       # cat instances/foo/error_log
       [  117.404795] hist:sched:sched_switch: error: Couldn't find field
         Command: hist:keys=x
                            ^
       # rmdir instances/foo
      
      Then check for memory leaks:
      
       # echo scan > /sys/kernel/debug/kmemleak
       # cat /sys/kernel/debug/kmemleak
      unreferenced object 0xffff88810d8ec700 (size 192):
        comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
        hex dump (first 32 bytes):
          60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff  `.ha....`.ha....
          a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00  .0......&.......
        backtrace:
          [<00000000dae26536>] kmalloc_trace+0x2a/0xa0
          [<00000000b2938940>] tracing_log_err+0x277/0x2e0
          [<000000004a0e1b07>] parse_atom+0x966/0xb40
          [<0000000023b24337>] parse_expr+0x5f3/0xdb0
          [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
          [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
          [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
          [<000000002cadc509>] vfs_write+0x162/0x670
          [<0000000059c3b9be>] ksys_write+0xca/0x170
          [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
          [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
      unreferenced object 0xffff888170c35a00 (size 32):
        comm "bash", pid 869, jiffies 4294950577 (age 215.752s)
        hex dump (first 32 bytes):
          0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74  .  Command: hist
          3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00  :keys=x.........
        backtrace:
          [<000000006a747de5>] __kmalloc+0x4d/0x160
          [<000000000039df5f>] tracing_log_err+0x29b/0x2e0
          [<000000004a0e1b07>] parse_atom+0x966/0xb40
          [<0000000023b24337>] parse_expr+0x5f3/0xdb0
          [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560
          [<00000000293a9645>] trigger_process_regex+0x135/0x1a0
          [<000000005c22b4f2>] event_trigger_write+0x87/0xf0
          [<000000002cadc509>] vfs_write+0x162/0x670
          [<0000000059c3b9be>] ksys_write+0xca/0x170
          [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0
          [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The problem is that the error log needs to be freed when the instance is
      removed.
      
      Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.hr/
      Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Fixes: 2f754e77 ("tracing: Have the error logs show up in the proper instances")
      Reported-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Tested-by: default avatarMirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      3357c6e4
    • Oliver Hartkopp's avatar
      can: isotp: fix race between isotp_sendsmg() and isotp_release() · 05173743
      Oliver Hartkopp authored
      As discussed with Dae R. Jeong and Hillf Danton here [1] the sendmsg()
      function in isotp.c might get into a race condition when restoring the
      former tx.state from the old_state.
      
      Remove the old_state concept and implement proper locking for the
      ISOTP_IDLE transitions in isotp_sendmsg(), inspired by a
      simplification idea from Hillf Danton.
      
      Introduce a new tx.state ISOTP_SHUTDOWN and use the same locking
      mechanism from isotp_release() which resolves a potential race between
      isotp_sendsmg() and isotp_release().
      
      [1] https://lore.kernel.org/linux-can/ZB%2F93xJxq%2FBUqAgG@dragonet
      
      v1: https://lore.kernel.org/all/20230331102114.15164-1-socketcan@hartkopp.net
      v2: https://lore.kernel.org/all/20230331123600.3550-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_release()
      v3: https://lore.kernel.org/all/20230331130654.9886-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_sendmsg() in the wait_tx_done case
      v4: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
          take care of signal interrupts for wait_event_interruptible() in
          isotp_sendmsg() in ALL cases
      
      Cc: Dae R. Jeong <threeearcat@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Fixes: 4f027cba ("can: isotp: split tx timer into transmission and timeout")
      Link: https://lore.kernel.org/all/20230331131935.21465-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      [mkl: rephrase commit message]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      05173743
    • Daniel Vetter's avatar
      Merge tag 'drm-intel-fixes-2023-04-05' of... · 1a4edef8
      Daniel Vetter authored
      Merge tag 'drm-intel-fixes-2023-04-05' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      drm/i915 fixes for v6.3-rc6:
      - Fix DP MST DSC M/N calculation to use compressed bpp
      - Fix racy use-after-free in perf ioctl
      - Fix context runtime accounting
      - Fix handling of GT reset during HuC loading
      - Fix use of unsigned vm_fault_t for error values
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      From: Jani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/87zg7mzomz.fsf@intel.com
      1a4edef8
    • Michal Sojka's avatar
      can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events · 79e19fa7
      Michal Sojka authored
      When using select()/poll()/epoll() with a non-blocking ISOTP socket to
      wait for when non-blocking write is possible, a false EPOLLOUT event
      is sometimes returned. This can happen at least after sending a
      message which must be split to multiple CAN frames.
      
      The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
      not equal to ISOTP_IDLE and this behavior is not reflected in
      datagram_poll(), which is used in isotp_ops.
      
      This is fixed by introducing ISOTP-specific poll function, which
      suppresses the EPOLLOUT events in that case.
      
      v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz
      v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.cz
          https://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.netSigned-off-by: default avatarMichal Sojka <michal.sojka@cvut.cz>
      Reported-by: default avatarJakub Jira <jirajak2@fel.cvut.cz>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      79e19fa7
    • Oliver Hartkopp's avatar
      can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos · 0145462f
      Oliver Hartkopp authored
      isotp.c was still using sock_recv_timestamp() which does not provide
      control messages to detect dropped PDUs in the receive path.
      
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      0145462f
    • Oleksij Rempel's avatar
      can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access · b45193cb
      Oleksij Rempel authored
      In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
      could occur during the memcpy() operation if the size of skb->cb is
      larger than the size of struct j1939_sk_buff_cb. This is because the
      memcpy() operation uses the size of skb->cb, leading to a read beyond
      the struct j1939_sk_buff_cb.
      
      Updated the memcpy() operation to use the size of struct
      j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
      memcpy() operation only reads the memory within the bounds of struct
      j1939_sk_buff_cb, preventing out-of-bounds memory access.
      
      Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
      is greater than or equal to the size of struct j1939_sk_buff_cb. This
      ensures that the skb->cb buffer is large enough to hold the
      j1939_sk_buff_cb structure.
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Reported-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Tested-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
      Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
      Cc: stable@vger.kernel.org
      [mkl: rephrase commit message]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      b45193cb
    • Jacek Lawrynowicz's avatar
      accel/ivpu: Fix S3 system suspend when not idle · 0ec86718
      Jacek Lawrynowicz authored
      Wait for VPU to be idle in ivpu_pm_suspend_cb() before powering off
      the device, so jobs are not lost and TDRs are not triggered after
      resume.
      
      Fixes: 852be13f ("accel/ivpu: Add PM support")
      Signed-off-by: default avatarStanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
      Reviewed-by: default avatarJeffrey Hugo <quic_jhugo@quicinc.com>
      Signed-off-by: default avatarJacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-3-stanislaw.gruszka@linux.intel.com
      0ec86718
    • Karol Wachowski's avatar
      accel/ivpu: Add dma fence to command buffers only · 774e7cb5
      Karol Wachowski authored
      Currently job->done_fence is added to every BO handle within a job. If job
      handle (command buffer) is shared between multiple submits, KMD will add
      the fence in each of them. Then bo_wait_ioctl() executed on command buffer
      will exit only when all jobs containing that handle are done.
      
      This creates deadlock scenario for user mode driver in case when job handle
      is added as dependency of another job, because bo_wait_ioctl() of first job
      will wait until second job finishes, and second job can not finish before
      first one.
      
      Having fences added only to job buffer handle allows user space to execute
      bo_wait_ioctl() on the job even if it's handle is submitted with other job.
      
      Fixes: cd727221 ("accel/ivpu: Add command buffer submission logic")
      Signed-off-by: default avatarKarol Wachowski <karol.wachowski@linux.intel.com>
      Signed-off-by: default avatarStanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
      Reviewed-by: default avatarJeffrey Hugo <quic_jhugo@quicinc.com>
      Signed-off-by: default avatarJacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-2-stanislaw.gruszka@linux.intel.com
      774e7cb5
    • Steven Rostedt (Google)'s avatar
      tracing: Fix ftrace_boot_snapshot command line logic · e9489164
      Steven Rostedt (Google) authored
      The kernel command line ftrace_boot_snapshot by itself is supposed to
      trigger a snapshot at the end of boot up of the main top level trace
      buffer. A ftrace_boot_snapshot=foo will do the same for an instance called
      foo that was created by trace_instance=foo,...
      
      The logic was broken where if ftrace_boot_snapshot was by itself, it would
      trigger a snapshot for all instances that had tracing enabled, regardless
      if it asked for a snapshot or not.
      
      When a snapshot is requested for a buffer, the buffer's
      tr->allocated_snapshot is set to true. Use that to know if a trace buffer
      wants a snapshot at boot up or not.
      
      Since the top level buffer is part of the ftrace_trace_arrays list,
      there's no reason to treat it differently than the other buffers. Just
      iterate the list if ftrace_boot_snapshot was specified.
      
      Link: https://lkml.kernel.org/r/20230405022341.895334039@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Fixes: 9c1c251d ("tracing: Allow boot instances to have snapshot buffers")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      e9489164
    • Steven Rostedt (Google)'s avatar
      tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance · 9d52727f
      Steven Rostedt (Google) authored
      If a trace instance has a failure with its snapshot code, the error
      message is to be written to that instance's buffer. But currently, the
      message is written to the top level buffer. Worse yet, it may also disable
      the top level buffer and not the instance that had the issue.
      
      Link: https://lkml.kernel.org/r/20230405022341.688730321@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <zwisler@google.com>
      Fixes: 2824f503 ("tracing: Make the snapshot trigger work with instances")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      9d52727f
    • Shailend Chand's avatar
      gve: Secure enough bytes in the first TX desc for all TCP pkts · 3ce93455
      Shailend Chand authored
      Non-GSO TCP packets whose SKBs' linear portion did not include the
      entire TCP header were not populating the first Tx descriptor with
      as many bytes as the vNIC expected. This change ensures that all
      TCP packets populate the first descriptor with the correct number of
      bytes.
      
      Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC")
      Signed-off-by: default avatarShailend Chand <shailend@google.com>
      Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ce93455
    • Eric Dumazet's avatar
      netlink: annotate lockless accesses to nlk->max_recvmsg_len · a1865f2e
      Eric Dumazet authored
      syzbot reported a data-race in data-race in netlink_recvmsg() [1]
      
      Indeed, netlink_recvmsg() can be run concurrently,
      and netlink_dump() also needs protection.
      
      [1]
      BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
      
      read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
      netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
      __do_sys_recvfrom net/socket.c:2212 [inline]
      __se_sys_recvfrom net/socket.c:2208 [inline]
      __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
      netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      ____sys_recvmsg+0x156/0x310 net/socket.c:2720
      ___sys_recvmsg net/socket.c:2762 [inline]
      do_recvmmsg+0x2e5/0x710 net/socket.c:2856
      __sys_recvmmsg net/socket.c:2935 [inline]
      __do_sys_recvmmsg net/socket.c:2958 [inline]
      __se_sys_recvmmsg net/socket.c:2951 [inline]
      __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0x0000000000001000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
      
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1865f2e
    • Andy Roulin's avatar
      ethtool: reset #lanes when lanes is omitted · e847c767
      Andy Roulin authored
      If the number of lanes was forced and then subsequently the user
      omits this parameter, the ksettings->lanes is reset. The driver
      should then reset the number of lanes to the device's default
      for the specified speed.
      
      However, although the ksettings->lanes is set to 0, the mod variable
      is not set to true to indicate the driver and userspace should be
      notified of the changes.
      
      The consequence is that the same ethtool operation will produce
      different results based on the initial state.
      
      If the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: on
      
      then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      While if the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      This patch fixes this behavior. Omitting lanes will always results in
      the driver choosing the default lane width for the chosen speed. In this
      scenario, regardless of the initial state, the end state will be, e.g.,
      
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      Fixes: 012ce4dd ("ethtool: Extend link modes settings uAPI with lanes")
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e847c767
    • Jakub Kicinski's avatar
      Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp' · 95fac540
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      raw/ping: Fix locking in /proc/net/{raw,icmp}.
      
      The first patch fixes a NULL deref for /proc/net/raw and second one fixes
      the same issue for ping sockets.
      
      The first patch also converts hlist_nulls to hlist, but this is because
      the current code uses sk_nulls_for_each() for lockless readers, instead
      of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
      does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
      
      OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
      conversion can be posted later for net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95fac540
    • Kuniyuki Iwashima's avatar
      ping: Fix potentail NULL deref for /proc/net/icmp. · ab5fb73f
      Kuniyuki Iwashima authored
      After commit dbca1596 ("ping: convert to RCU lookups, get rid
      of rwlock"), we use RCU for ping sockets, but we should use spinlock
      for /proc/net/icmp to avoid a potential NULL deref mentioned in
      the previous patch.
      
      Let's go back to using spinlock there.
      
      Note we can convert ping sockets to use hlist instead of hlist_nulls
      because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
      
      Fixes: dbca1596 ("ping: convert to RCU lookups, get rid of rwlock")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab5fb73f
    • Kuniyuki Iwashima's avatar
      raw: Fix NULL deref in raw_get_next(). · 0a78cf72
      Kuniyuki Iwashima authored
      Dae R. Jeong reported a NULL deref in raw_get_next() [0].
      
      It seems that the repro was running these sequences in parallel so
      that one thread was iterating on a socket that was being freed in
      another netns.
      
        unshare(0x40060200)
        r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
        socket$inet_icmp_raw(0x2, 0x3, 0x1)
        pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
      
      After commit 0daf07e5 ("raw: convert raw sockets to RCU"), we
      use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
      sockets.  However, we should use spinlock for slow paths to avoid
      the NULL deref.
      
      Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
      is not reused during iteration in the grace period.  In fact, the
      lockless readers do not check the nulls marker with get_nulls_value().
      So, SOCK_RAW should use hlist instead of hlist_nulls.
      
      Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
      let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
      fast paths and sk_for_each() and spinlock for /proc/net/raw.
      
      [0]:
      general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
       seq_read+0x224/0x320 fs/seq_file.c:162
       pde_read fs/proc/inode.c:316 [inline]
       proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
       vfs_read+0x31e/0xd30 fs/read_write.c:468
       ksys_pread64 fs/read_write.c:665 [inline]
       __do_sys_pread64 fs/read_write.c:675 [inline]
       __se_sys_pread64 fs/read_write.c:672 [inline]
       __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x478d29
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
      RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
      RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
      R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
      R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 0daf07e5 ("raw: convert raw sockets to RCU")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a78cf72
  4. 04 Apr, 2023 4 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 76f598ba
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "PPC:
         - Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
      
        s390:
         - Fix handling of external interrupts in protected guests
      
        x86:
         - Resample the pending state of IOAPIC interrupts when unmasking them
      
         - Fix usage of Hyper-V "enlightened TLB" on AMD
      
         - Small fixes to real mode exceptions
      
         - Suppress pending MMIO write exits if emulator detects exception
      
        Documentation:
         - Fix rST syntax"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        docs: kvm: x86: Fix broken field list
        KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
        KVM: s390: pv: fix external interruption loop not always detected
        KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
        KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
        KVM: x86: Suppress pending MMIO write exits if emulator detects exception
        KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
        KVM: irqfd: Make resampler_list an RCU list
        KVM: SVM: Flush Hyper-V TLB when required
      76f598ba
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · ceeea1b7
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Fix a crash and a resource leak in NFSv4 COMPOUND processing
      
       - Fix issues with AUTH_SYS credential handling
      
       - Try again to address an NFS/NFSD/SUNRPC build dependency regression
      
      * tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        NFSD: callback request does not use correct credential for AUTH_SYS
        NFS: Remove "select RPCSEC_GSS_KRB5
        sunrpc: only free unix grouplist after RCU settles
        nfsd: call op_release, even when op_func returns an error
        NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
      ceeea1b7
    • Takahiro Itazuri's avatar
      docs: kvm: x86: Fix broken field list · fb5015bc
      Takahiro Itazuri authored
      Add a missing ":" to fix a broken field list.
      Signed-off-by: default avatarTakahiro Itazuri <itazur@amazon.com>
      Fixes: ba7bb663 ("KVM: x86: Provide per VM capability for disabling PMU virtualization")
      Message-Id: <20230331093116.99820-1-itazur@amazon.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb5015bc
    • Arnd Bergmann's avatar
      asm-generic: avoid __generic_cmpxchg_local warnings · 656e9007
      Arnd Bergmann authored
      Code that passes a 32-bit constant into cmpxchg() produces a harmless
      sparse warning because of the truncation in the branch that is not taken:
      
      fs/erofs/zdata.c: note: in included file (through /home/arnd/arm-soc/arch/arm/include/asm/cmpxchg.h, /home/arnd/arm-soc/arch/arm/include/asm/atomic.h, /home/arnd/arm-soc/include/linux/atomic.h, ...):
      include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
      include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
      include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
      include/asm-generic/cmpxchg-local.h:30:42: warning: cast truncates bits from constant value (5f0edead becomes ad)
      include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
      include/asm-generic/cmpxchg-local.h:34:44: warning: cast truncates bits from constant value (5f0edead becomes dead)
      
      This was reported as a regression to Matt's recent __generic_cmpxchg_local
      patch, though this patch only added more warnings on top of the ones
      that were already there.
      
      Rewording the truncation to use an explicit bitmask instead of a cast
      to a smaller type avoids the warning but otherwise leaves the code
      unchanged.
      
      I had another look at why the cast is even needed for atomic_cmpxchg(),
      and as Matt describes the problem here is that atomic_t contains a
      signed 'int', but cmpxchg() takes an 'unsigned long' argument, and
      converting between the two leads to a 64-bit sign-extension of
      negative 32-bit atomics.
      
      I checked the other implementations of arch_cmpxchg() and did not find
      any others that run into the same problem as __generic_cmpxchg_local(),
      but it's easy to be on the safe side here and always convert the
      signed int into an unsigned int when calling arch_cmpxchg(), as this
      will work even when any of the arch_cmpxchg() implementations run
      into the same problem.
      
      Fixes: 62465415 ("locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value")
      Reviewed-by: default avatarMatt Evans <mev@rivosinc.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      656e9007