Commit 8430557f authored by Peter Xu's avatar Peter Xu Committed by Andrew Morton

mm/page_table_check: support userfault wr-protect entries

Allow page_table_check hooks to check over userfaultfd wr-protect criteria
upon pgtable updates.  The rule is no co-existance allowed for any
writable flag against userfault wr-protect flag.

This should be better than c2da319c, where we used to only sanitize such
issues during a pgtable walk, but when hitting such issue we don't have a
good chance to know where does that writable bit came from [1], so that
even the pgtable walk exposes a kernel bug (which is still helpful on
triaging) but not easy to track and debug.

Now we switch to track the source.  It's much easier too with the recent
introduction of page table check.

There are some limitations with using the page table check here for
userfaultfd wr-protect purpose:

  - It is only enabled with explicit enablement of page table check configs
  and/or boot parameters, but should be good enough to track at least
  syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for
  x86 [1].  We used to have DEBUG_VM but it's now off for most distros,
  while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which
  is similar.

  - It conditionally works with the ptep_modify_prot API.  It will be
  bypassed when e.g. XEN PV is enabled, however still work for most of the
  rest scenarios, which should be the common cases so should be good
  enough.

  - Hugetlb check is a bit hairy, as the page table check cannot identify
  hugetlb pte or normal pte via trapping at set_pte_at(), because of the
  current design where hugetlb maps every layers to pte_t... For example,
  the default set_huge_pte_at() can invoke set_pte_at() directly and lose
  the hugetlb context, treating it the same as a normal pte_t. So far it's
  fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as
  long as supported (x86 only).  It'll be a bigger problem when we'll
  define _PAGE_UFFD_WP differently at various pgtable levels, because then
  one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now
  we can leave this for later too.

This patch also removes commit c2da319c altogether, as we have something
better now.

[1] https://lore.kernel.org/all/000000000000dce0530615c89210@google.com/

Link: https://lkml.kernel.org/r/20240417212549.2766883-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent 3ccae1dc
...@@ -14,7 +14,7 @@ Page table check performs extra verifications at the time when new pages become ...@@ -14,7 +14,7 @@ Page table check performs extra verifications at the time when new pages become
accessible from the userspace by getting their page table entries (PTEs PMDs accessible from the userspace by getting their page table entries (PTEs PMDs
etc.) added into the table. etc.) added into the table.
In case of detected corruption, the kernel is crashed. There is a small In case of most detected corruption, the kernel is crashed. There is a small
performance and memory overhead associated with the page table check. Therefore, performance and memory overhead associated with the page table check. Therefore,
it is disabled by default, but can be optionally enabled on systems where the it is disabled by default, but can be optionally enabled on systems where the
extra hardening outweighs the performance costs. Also, because page table check extra hardening outweighs the performance costs. Also, because page table check
...@@ -22,6 +22,13 @@ is synchronous, it can help with debugging double map memory corruption issues, ...@@ -22,6 +22,13 @@ is synchronous, it can help with debugging double map memory corruption issues,
by crashing kernel at the time wrong mapping occurs instead of later which is by crashing kernel at the time wrong mapping occurs instead of later which is
often the case with memory corruptions bugs. often the case with memory corruptions bugs.
It can also be used to do page table entry checks over various flags, dump
warnings when illegal combinations of entry flags are detected. Currently,
userfaultfd is the only user of such to sanity check wr-protect bit against
any writable flags. Illegal flag combinations will not directly cause data
corruption in this case immediately, but that will cause read-only data to
be writable, leading to corrupt when the page content is later modified.
Double mapping detection logic Double mapping detection logic
============================== ==============================
......
...@@ -388,23 +388,7 @@ static inline pte_t pte_wrprotect(pte_t pte) ...@@ -388,23 +388,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
static inline int pte_uffd_wp(pte_t pte) static inline int pte_uffd_wp(pte_t pte)
{ {
bool wp = pte_flags(pte) & _PAGE_UFFD_WP; return pte_flags(pte) & _PAGE_UFFD_WP;
#ifdef CONFIG_DEBUG_VM
/*
* Having write bit for wr-protect-marked present ptes is fatal,
* because it means the uffd-wp bit will be ignored and write will
* just go through.
*
* Use any chance of pgtable walking to verify this (e.g., when
* page swapped out or being migrated for all purposes). It means
* something is already wrong. Tell the admin even before the
* process crashes. We also nail it with wrong pgtable setup.
*/
WARN_ON_ONCE(wp && pte_write(pte));
#endif
return wp;
} }
static inline pte_t pte_mkuffd_wp(pte_t pte) static inline pte_t pte_mkuffd_wp(pte_t pte)
......
...@@ -7,6 +7,8 @@ ...@@ -7,6 +7,8 @@
#include <linux/kstrtox.h> #include <linux/kstrtox.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/page_table_check.h> #include <linux/page_table_check.h>
#include <linux/swap.h>
#include <linux/swapops.h>
#undef pr_fmt #undef pr_fmt
#define pr_fmt(fmt) "page_table_check: " fmt #define pr_fmt(fmt) "page_table_check: " fmt
...@@ -182,6 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) ...@@ -182,6 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
} }
EXPORT_SYMBOL(__page_table_check_pud_clear); EXPORT_SYMBOL(__page_table_check_pud_clear);
/* Whether the swap entry cached writable information */
static inline bool swap_cached_writable(swp_entry_t entry)
{
return is_writable_device_exclusive_entry(entry) ||
is_writable_device_private_entry(entry) ||
is_writable_migration_entry(entry);
}
static inline void page_table_check_pte_flags(pte_t pte)
{
if (pte_present(pte) && pte_uffd_wp(pte))
WARN_ON_ONCE(pte_write(pte));
else if (is_swap_pte(pte) && pte_swp_uffd_wp(pte))
WARN_ON_ONCE(swap_cached_writable(pte_to_swp_entry(pte)));
}
void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr) unsigned int nr)
{ {
...@@ -190,6 +208,8 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, ...@@ -190,6 +208,8 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
if (&init_mm == mm) if (&init_mm == mm)
return; return;
page_table_check_pte_flags(pte);
for (i = 0; i < nr; i++) for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, ptep_get(ptep + i)); __page_table_check_pte_clear(mm, ptep_get(ptep + i));
if (pte_user_accessible_page(pte)) if (pte_user_accessible_page(pte))
...@@ -197,11 +217,21 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, ...@@ -197,11 +217,21 @@ void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
} }
EXPORT_SYMBOL(__page_table_check_ptes_set); EXPORT_SYMBOL(__page_table_check_ptes_set);
static inline void page_table_check_pmd_flags(pmd_t pmd)
{
if (pmd_present(pmd) && pmd_uffd_wp(pmd))
WARN_ON_ONCE(pmd_write(pmd));
else if (is_swap_pmd(pmd) && pmd_swp_uffd_wp(pmd))
WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd)));
}
void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd) void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd)
{ {
if (&init_mm == mm) if (&init_mm == mm)
return; return;
page_table_check_pmd_flags(pmd);
__page_table_check_pmd_clear(mm, *pmdp); __page_table_check_pmd_clear(mm, *pmdp);
if (pmd_user_accessible_page(pmd)) { if (pmd_user_accessible_page(pmd)) {
page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT, page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment