1. 13 Apr, 2010 4 commits
    • Linus Torvalds's avatar
      anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b
      Linus Torvalds authored
      Otherwise we might be mapping in a page in a new mapping, but that page
      (through the swapcache) would later be mapped into an old mapping too.
      The page->mapping must be the case that works for everybody, not just
      the mapping that happened to page it in first.
      
      Here's the scenario:
      
       - page gets allocated/mapped by process A. Let's call the anon_vma we
         associate the page with 'A' to keep it easy to track.
      
       - Process A forks, creating process B. The anon_vma in B is 'B', and has
         a chain that looks like 'B' -> 'A'. Everything is fine.
      
       - Swapping happens. The page (with mapping pointing to 'A') gets swapped
         out (perhaps not to disk - it's enough to assume that it's just not
         mapped any more, and lives entirely in the swap-cache)
      
       - Process B pages it in, which goes like this:
      
              do_swap_page ->
                page = lookup_swap_cache(entry);
               ...
                set_pte_at(mm, address, page_table, pte);
                page_add_anon_rmap(page, vma, address);
      
         And think about what happens here!
      
         In particular, what happens is that this will now be the "first"
         mapping of that page, so page_add_anon_rmap() used to do
      
              if (first)
                      __page_set_anon_rmap(page, vma, address);
      
         and notice what anon_vma it will use? It will use the anon_vma for
         process B!
      
         What happens then? Trivial: process 'A' also pages it in (nothing
         happens, it's not the first mapping), and then process 'B' execve's
         or exits or unmaps, making anon_vma B go away.
      
         End result: process A has a page that points to anon_vma B, but
         anon_vma B does not exist any more.  This can go on forever.  Forget
         about RCU grace periods, forget about locking, forget anything like
         that.  The bug is simply that page->mapping points to an anon_vma
         that was correct at one point, but was _not_ the one that was shared
         by all users of that possible mapping.
      
      Changing it to always use the deepest anon_vma in the anonvma chain gets
      us to the safest model.
      
      This can be improved in certain cases: if we know the page is private to
      just this particular mapping (for example, it's a new page, or it is the
      only swapcache entry), we could pick the top (most specific) anon_vma.
      
      But that's a future optimization. Make it _work_ reliably first.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea90002b
    • Linus Torvalds's avatar
      anon_vma: clone the anon_vma chain in the right order · 646d87b4
      Linus Torvalds authored
      We want to walk the chain in reverse order when cloning it, so that the
      order of the result chain will be the same as the order in the source
      chain.  When we add entries to the chain, they go at the head of the
      chain, so we want to add the source head last.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      646d87b4
    • Linus Torvalds's avatar
      vma_adjust: fix the copying of anon_vma chains · 287d97ac
      Linus Torvalds authored
      When we move the boundaries between two vma's due to things like
      mprotect, we need to make sure that the anon_vma of the pages that got
      moved from one vma to another gets properly copied around.  And that was
      not always the case, in this rather hard-to-follow code sequence.
      
      Clarify the code, and fix it so that it copies the anon_vma from the
      right source.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      287d97ac
    • Linus Torvalds's avatar
      Simplify and comment on anon_vma re-use for anon_vma_prepare() · d0e9fe17
      Linus Torvalds authored
      This changes the anon_vma reuse case to require that we only reuse
      simple anon_vma's - ie the case when the vma only has a single anon_vma
      associated with it.
      
      This means that a reuse of an anon_vma from an adjacent vma will always
      guarantee that both vma's are associated not only with the same
      anon_vma, they will also have the same anon_vma chain (of just a single
      entry in this case).
      
      And since anon_vma re-use was the only case where the same anon_vma
      might be associated with different chains of anon_vma's, we now have the
      case that every vma that shares the same anon_vma will always also have
      the same chain.  That makes it much easier to think about merging vma's
      that share the same anon_vma's: you can always just drop the other
      anon_vma chain in anon_vma_merge() since you know that they are always
      identical.
      
      This also splits up the function to validate the anon_vma re-use, and
      adds a lot of commentary about the possible races.
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0e9fe17
  2. 09 Apr, 2010 36 commits