1. 24 May, 2014 3 commits
  2. 23 May, 2014 6 commits
  3. 22 May, 2014 4 commits
  4. 19 May, 2014 1 commit
  5. 18 May, 2014 6 commits
  6. 17 May, 2014 7 commits
  7. 16 May, 2014 1 commit
  8. 14 May, 2014 4 commits
  9. 13 May, 2014 1 commit
    • Raymond Hettinger's avatar
      Issue 21469: Mitigate risk of false positives with robotparser. · a5413c49
      Raymond Hettinger authored
      * Repair the broken link to norobots-rfc.txt.
      
      * HTTP response codes >= 500 treated as a failed read rather than as a not
      found.  Not found means that we can assume the entire site is allowed.  A 5xx
      server error tells us nothing.
      
      * A successful read() or parse() updates the mtime (which is defined to be "the
        time the robots.txt file was last fetched").
      
      * The can_fetch() method returns False unless we've had a read() with a 2xx or
      4xx response.  This avoids false positives in the case where a user calls
      can_fetch() before calling read().
      
      * I don't see any easy way to test this patch without hitting internet
      resources that might change or without use of mock objects that wouldn't
      provide must reassurance.
      a5413c49
  10. 12 May, 2014 2 commits
  11. 11 May, 2014 5 commits