urllib2.py 44.3 KB
Newer Older
1
"""An extensible library for opening URLs using a variety of protocols
Jeremy Hylton's avatar
Jeremy Hylton committed
2 3

The simplest way to use this module is to call the urlopen function,
4
which accepts a string containing a URL or a Request object (described
Jeremy Hylton's avatar
Jeremy Hylton committed
5 6 7
below).  It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.

Jeremy Hylton's avatar
Jeremy Hylton committed
8
The OpenerDirector manages a collection of Handler objects that do
9
all the actual work.  Each Handler implements a particular protocol or
Jeremy Hylton's avatar
Jeremy Hylton committed
10 11 12 13
option.  The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL.  For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns.  The HTTPRedirectHandler automatically deals with
14 15
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
Jeremy Hylton's avatar
Jeremy Hylton committed
16 17 18

urlopen(url, data=None) -- basic usage is that same as original
urllib.  pass the url and optionally data to post to an HTTP URL, and
19
get a file-like object back.  One difference is that you can also pass
Jeremy Hylton's avatar
Jeremy Hylton committed
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
a Request instance instead of URL.  Raises a URLError (subclass of
IOError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.

build_opener -- function that creates a new OpenerDirector instance.
will install the default handlers.  accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate.  if one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.

install_opener -- installs a new opener as the default opener.

objects of interest:
OpenerDirector --

Request -- an object that encapsulates the state of a request.  the
state can be a simple as the URL.  it can also include extra HTTP
headers, e.g. a User-Agent.

BaseHandler --

exceptions:
URLError-- a subclass of IOError, individual protocols have their own
specific subclass

45
HTTPError-- also a valid HTTP response, so you can treat an HTTP error
Jeremy Hylton's avatar
Jeremy Hylton committed
46 47 48 49 50 51 52 53 54 55 56 57 58 59
as an exceptional event or valid response

internals:
BaseHandler and parent
_call_chain conventions

Example usage:

import urllib2

# set up authentication info
authinfo = urllib2.HTTPBasicAuthHandler()
authinfo.add_password('realm', 'host', 'username', 'password')

60 61
proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})

62
# build a new opener that adds authentication and caching FTP handlers
63
opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
Jeremy Hylton's avatar
Jeremy Hylton committed
64 65 66 67 68 69 70 71 72 73 74

# install it
urllib2.install_opener(opener)

f = urllib2.urlopen('http://www.python.org/')


"""

# XXX issues:
# If an authentication error handler that tries to perform
75 76 77 78 79
# authentication for some reason but fails, how should the error be
# signalled?  The client needs to know the HTTP error code.  But if
# the handler knows that the problem was, e.g., that it didn't know
# that hash algo that requested in the challenge, it would be good to
# pass that information along to the client, too.
Jeremy Hylton's avatar
Jeremy Hylton committed
80 81 82 83 84 85 86 87 88 89

# XXX to do:
# name!
# documentation (getting there)
# complex proxies
# abstract factory for opener
# ftp errors aren't handled cleanly
# gopher can return a socket.error
# check digest against correct (i.e. non-apache) implementation

90 91 92
import base64
import ftplib
import gopherlib
Jeremy Hylton's avatar
Jeremy Hylton committed
93
import httplib
94
import inspect
Jeremy Hylton's avatar
Jeremy Hylton committed
95 96 97
import md5
import mimetypes
import mimetools
98 99 100 101 102 103
import os
import posixpath
import random
import re
import sha
import socket
Jeremy Hylton's avatar
Jeremy Hylton committed
104 105
import sys
import time
106
import urlparse
107
import bisect
108
import cookielib
Jeremy Hylton's avatar
Jeremy Hylton committed
109 110 111 112 113 114 115

try:
    from cStringIO import StringIO
except ImportError:
    from StringIO import StringIO

# not sure how many of these need to be gotten rid of
Andrew M. Kuchling's avatar
Andrew M. Kuchling committed
116 117 118
from urllib import (unwrap, unquote, splittype, splithost,
     addinfourl, splitport, splitgophertype, splitquery,
     splitattr, ftpwrapper, noheaders, splituser, splitpasswd, splitvalue)
Jeremy Hylton's avatar
Jeremy Hylton committed
119

120 121
# support for FileHandler, proxies via environment variables
from urllib import localhost, url2pathname, getproxies
Jeremy Hylton's avatar
Jeremy Hylton committed
122

123
__version__ = "2.4"
Jeremy Hylton's avatar
Jeremy Hylton committed
124 125 126 127 128 129 130 131 132 133 134 135 136

_opener = None
def urlopen(url, data=None):
    global _opener
    if _opener is None:
        _opener = build_opener()
    return _opener.open(url, data)

def install_opener(opener):
    global _opener
    _opener = opener

# do these error classes make sense?
137
# make sure all of the IOError stuff is overridden.  we just want to be
138
# subtypes.
Jeremy Hylton's avatar
Jeremy Hylton committed
139 140 141

class URLError(IOError):
    # URLError is a sub-type of IOError, but it doesn't share any of
142 143 144 145
    # the implementation.  need to override __init__ and __str__.
    # It sets self.args for compatibility with other EnvironmentError
    # subclasses, but args doesn't have the typical format with errno in
    # slot 0 and strerror in slot 1.  This may be better than nothing.
Jeremy Hylton's avatar
Jeremy Hylton committed
146
    def __init__(self, reason):
147
        self.args = reason,
Fred Drake's avatar
Fred Drake committed
148
        self.reason = reason
Jeremy Hylton's avatar
Jeremy Hylton committed
149 150

    def __str__(self):
Fred Drake's avatar
Fred Drake committed
151
        return '<urlopen error %s>' % self.reason
Jeremy Hylton's avatar
Jeremy Hylton committed
152 153 154

class HTTPError(URLError, addinfourl):
    """Raised when HTTP error occurs, but also acts like non-error return"""
Jeremy Hylton's avatar
Jeremy Hylton committed
155
    __super_init = addinfourl.__init__
Jeremy Hylton's avatar
Jeremy Hylton committed
156 157

    def __init__(self, url, code, msg, hdrs, fp):
Fred Drake's avatar
Fred Drake committed
158 159 160 161 162
        self.code = code
        self.msg = msg
        self.hdrs = hdrs
        self.fp = fp
        self.filename = url
163 164 165
        # The addinfourl classes depend on fp being a valid file
        # object.  In some cases, the HTTPError may not have a valid
        # file object.  If this happens, the simplest workaround is to
Tim Peters's avatar
Tim Peters committed
166
        # not initialize the base classes.
167 168
        if fp is not None:
            self.__super_init(fp, hdrs, url)
169

Jeremy Hylton's avatar
Jeremy Hylton committed
170
    def __str__(self):
Fred Drake's avatar
Fred Drake committed
171
        return 'HTTP Error %s: %s' % (self.code, self.msg)
Jeremy Hylton's avatar
Jeremy Hylton committed
172 173 174 175

class GopherError(URLError):
    pass

176

Jeremy Hylton's avatar
Jeremy Hylton committed
177
class Request:
178

179 180
    def __init__(self, url, data=None, headers={},
                 origin_req_host=None, unverifiable=False):
Fred Drake's avatar
Fred Drake committed
181 182 183 184 185 186
        # unwrap('<URL:type://host/path>') --> 'type://host/path'
        self.__original = unwrap(url)
        self.type = None
        # self.__r_type is what's left after doing the splittype
        self.host = None
        self.port = None
Jeremy Hylton's avatar
Jeremy Hylton committed
187
        self.data = data
Fred Drake's avatar
Fred Drake committed
188
        self.headers = {}
189
        for key, value in headers.items():
190
            self.add_header(key, value)
191
        self.unredirected_hdrs = {}
192 193 194 195
        if origin_req_host is None:
            origin_req_host = cookielib.request_host(self)
        self.origin_req_host = origin_req_host
        self.unverifiable = unverifiable
Jeremy Hylton's avatar
Jeremy Hylton committed
196 197

    def __getattr__(self, attr):
Fred Drake's avatar
Fred Drake committed
198
        # XXX this is a fallback mechanism to guard against these
199
        # methods getting called in a non-standard order.  this may be
Fred Drake's avatar
Fred Drake committed
200 201 202 203 204 205 206 207
        # too complicated and/or unnecessary.
        # XXX should the __r_XXX attributes be public?
        if attr[:12] == '_Request__r_':
            name = attr[12:]
            if hasattr(Request, 'get_' + name):
                getattr(self, 'get_' + name)()
                return getattr(self, attr)
        raise AttributeError, attr
Jeremy Hylton's avatar
Jeremy Hylton committed
208

209 210 211 212 213 214
    def get_method(self):
        if self.has_data():
            return "POST"
        else:
            return "GET"

215 216
    # XXX these helper methods are lame

Jeremy Hylton's avatar
Jeremy Hylton committed
217 218 219 220 221 222 223 224 225 226 227 228 229
    def add_data(self, data):
        self.data = data

    def has_data(self):
        return self.data is not None

    def get_data(self):
        return self.data

    def get_full_url(self):
        return self.__original

    def get_type(self):
Fred Drake's avatar
Fred Drake committed
230 231
        if self.type is None:
            self.type, self.__r_type = splittype(self.__original)
232 233
            if self.type is None:
                raise ValueError, "unknown url type: %s" % self.__original
Fred Drake's avatar
Fred Drake committed
234
        return self.type
Jeremy Hylton's avatar
Jeremy Hylton committed
235 236

    def get_host(self):
Fred Drake's avatar
Fred Drake committed
237 238 239 240 241
        if self.host is None:
            self.host, self.__r_host = splithost(self.__r_type)
            if self.host:
                self.host = unquote(self.host)
        return self.host
Jeremy Hylton's avatar
Jeremy Hylton committed
242 243

    def get_selector(self):
Fred Drake's avatar
Fred Drake committed
244
        return self.__r_host
Jeremy Hylton's avatar
Jeremy Hylton committed
245

246 247
    def set_proxy(self, host, type):
        self.host, self.type = host, type
Fred Drake's avatar
Fred Drake committed
248
        self.__r_host = self.__original
Jeremy Hylton's avatar
Jeremy Hylton committed
249

250 251 252 253 254 255
    def get_origin_req_host(self):
        return self.origin_req_host

    def is_unverifiable(self):
        return self.unverifiable

Jeremy Hylton's avatar
Jeremy Hylton committed
256
    def add_header(self, key, val):
Fred Drake's avatar
Fred Drake committed
257
        # useful for something like authentication
258
        self.headers[key.capitalize()] = val
Jeremy Hylton's avatar
Jeremy Hylton committed
259

260 261 262 263 264
    def add_unredirected_header(self, key, val):
        # will not be added to a redirected request
        self.unredirected_hdrs[key.capitalize()] = val

    def has_header(self, header_name):
265 266
        return (header_name in self.headers or
                header_name in self.unredirected_hdrs)
267

268 269 270 271 272 273 274 275 276
    def get_header(self, header_name, default=None):
        return self.headers.get(
            header_name,
            self.unredirected_hdrs.get(header_name, default))

    def header_items(self):
        hdrs = self.unredirected_hdrs.copy()
        hdrs.update(self.headers)
        return hdrs.items()
277

Jeremy Hylton's avatar
Jeremy Hylton committed
278 279
class OpenerDirector:
    def __init__(self):
280 281
        client_version = "Python-urllib/%s" % __version__
        self.addheaders = [('User-agent', client_version)]
Jeremy Hylton's avatar
Jeremy Hylton committed
282 283 284 285
        # manage the individual handlers
        self.handlers = []
        self.handle_open = {}
        self.handle_error = {}
286 287
        self.process_response = {}
        self.process_request = {}
Jeremy Hylton's avatar
Jeremy Hylton committed
288 289

    def add_handler(self, handler):
290
        added = False
291
        for meth in dir(handler):
292 293 294 295 296
            i = meth.find("_")
            protocol = meth[:i]
            condition = meth[i+1:]

            if condition.startswith("error"):
297
                j = condition.find("_") + i + 1
Jeremy Hylton's avatar
Jeremy Hylton committed
298 299
                kind = meth[j+1:]
                try:
Eric S. Raymond's avatar
Eric S. Raymond committed
300
                    kind = int(kind)
Jeremy Hylton's avatar
Jeremy Hylton committed
301 302
                except ValueError:
                    pass
303 304 305 306
                lookup = self.handle_error.get(protocol, {})
                self.handle_error[protocol] = lookup
            elif condition == "open":
                kind = protocol
Raymond Hettinger's avatar
Raymond Hettinger committed
307 308
                lookup = self.handle_open
            elif condition == "response":
309
                kind = protocol
Raymond Hettinger's avatar
Raymond Hettinger committed
310 311 312 313
                lookup = self.process_response
            elif condition == "request":
                kind = protocol
                lookup = self.process_request
314
            else:
Jeremy Hylton's avatar
Jeremy Hylton committed
315
                continue
316 317 318 319 320 321 322 323

            handlers = lookup.setdefault(kind, [])
            if handlers:
                bisect.insort(handlers, handler)
            else:
                handlers.append(handler)
            added = True

Jeremy Hylton's avatar
Jeremy Hylton committed
324
        if added:
325 326
            # XXX why does self.handlers need to be sorted?
            bisect.insort(self.handlers, handler)
Jeremy Hylton's avatar
Jeremy Hylton committed
327
            handler.add_parent(self)
328

Jeremy Hylton's avatar
Jeremy Hylton committed
329
    def close(self):
330 331
        # Only exists for backwards compatibility.
        pass
Jeremy Hylton's avatar
Jeremy Hylton committed
332 333 334 335 336 337 338

    def _call_chain(self, chain, kind, meth_name, *args):
        # XXX raise an exception if no one else should try to handle
        # this url.  return None if you can't but someone else could.
        handlers = chain.get(kind, ())
        for handler in handlers:
            func = getattr(handler, meth_name)
Jeremy Hylton's avatar
Jeremy Hylton committed
339 340

            result = func(*args)
Jeremy Hylton's avatar
Jeremy Hylton committed
341 342 343 344
            if result is not None:
                return result

    def open(self, fullurl, data=None):
Fred Drake's avatar
Fred Drake committed
345
        # accept a URL or a Request object
346
        if isinstance(fullurl, basestring):
Fred Drake's avatar
Fred Drake committed
347
            req = Request(fullurl, data)
Jeremy Hylton's avatar
Jeremy Hylton committed
348 349 350 351
        else:
            req = fullurl
            if data is not None:
                req.add_data(data)
352

353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371
        protocol = req.get_type()

        # pre-process request
        meth_name = protocol+"_request"
        for processor in self.process_request.get(protocol, []):
            meth = getattr(processor, meth_name)
            req = meth(req)

        response = self._open(req, data)

        # post-process response
        meth_name = protocol+"_response"
        for processor in self.process_response.get(protocol, []):
            meth = getattr(processor, meth_name)
            response = meth(req, response)

        return response

    def _open(self, req, data=None):
Jeremy Hylton's avatar
Jeremy Hylton committed
372
        result = self._call_chain(self.handle_open, 'default',
373
                                  'default_open', req)
Jeremy Hylton's avatar
Jeremy Hylton committed
374 375 376
        if result:
            return result

377 378
        protocol = req.get_type()
        result = self._call_chain(self.handle_open, protocol, protocol +
Jeremy Hylton's avatar
Jeremy Hylton committed
379
                                  '_open', req)
Jeremy Hylton's avatar
Jeremy Hylton committed
380 381 382 383 384 385 386
        if result:
            return result

        return self._call_chain(self.handle_open, 'unknown',
                                'unknown_open', req)

    def error(self, proto, *args):
387
        if proto in ('http', 'https'):
388 389
            # XXX http[s] protocols are special-cased
            dict = self.handle_error['http'] # https is not different than http
Jeremy Hylton's avatar
Jeremy Hylton committed
390
            proto = args[2]  # YUCK!
391
            meth_name = 'http_error_%s' % proto
Jeremy Hylton's avatar
Jeremy Hylton committed
392 393 394 395 396 397 398
            http_err = 1
            orig_args = args
        else:
            dict = self.handle_error
            meth_name = proto + '_error'
            http_err = 0
        args = (dict, proto, meth_name) + args
Jeremy Hylton's avatar
Jeremy Hylton committed
399
        result = self._call_chain(*args)
Jeremy Hylton's avatar
Jeremy Hylton committed
400 401 402 403 404
        if result:
            return result

        if http_err:
            args = (dict, 'default', 'http_error_default') + orig_args
Jeremy Hylton's avatar
Jeremy Hylton committed
405
            return self._call_chain(*args)
Jeremy Hylton's avatar
Jeremy Hylton committed
406

407 408 409
# XXX probably also want an abstract factory that knows when it makes
# sense to skip a superclass in favor of a subclass and when it might
# make sense to include both
Jeremy Hylton's avatar
Jeremy Hylton committed
410 411 412 413 414

def build_opener(*handlers):
    """Create an opener object from a list of handlers.

    The opener will use several default handlers, including support
415
    for HTTP and FTP.
Jeremy Hylton's avatar
Jeremy Hylton committed
416 417 418 419

    If any of the handlers passed as arguments are subclasses of the
    default handlers, the default handlers will not be used.
    """
420

Jeremy Hylton's avatar
Jeremy Hylton committed
421 422 423
    opener = OpenerDirector()
    default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
                       HTTPDefaultErrorHandler, HTTPRedirectHandler,
424
                       FTPHandler, FileHandler, HTTPErrorProcessor]
425 426
    if hasattr(httplib, 'HTTPS'):
        default_classes.append(HTTPSHandler)
Jeremy Hylton's avatar
Jeremy Hylton committed
427 428 429
    skip = []
    for klass in default_classes:
        for check in handlers:
430
            if inspect.isclass(check):
Jeremy Hylton's avatar
Jeremy Hylton committed
431 432
                if issubclass(check, klass):
                    skip.append(klass)
433 434
            elif isinstance(check, klass):
                skip.append(klass)
Jeremy Hylton's avatar
Jeremy Hylton committed
435 436 437 438 439 440 441
    for klass in skip:
        default_classes.remove(klass)

    for klass in default_classes:
        opener.add_handler(klass())

    for h in handlers:
442
        if inspect.isclass(h):
Jeremy Hylton's avatar
Jeremy Hylton committed
443 444 445 446 447
            h = h()
        opener.add_handler(h)
    return opener

class BaseHandler:
448 449
    handler_order = 500

Jeremy Hylton's avatar
Jeremy Hylton committed
450 451
    def add_parent(self, parent):
        self.parent = parent
Tim Peters's avatar
Tim Peters committed
452

Jeremy Hylton's avatar
Jeremy Hylton committed
453
    def close(self):
454 455
        # Only exists for backwards compatibility
        pass
Tim Peters's avatar
Tim Peters committed
456

457 458 459 460 461 462 463
    def __lt__(self, other):
        if not hasattr(other, "handler_order"):
            # Try to preserve the old behavior of having custom classes
            # inserted after default ones (works only for custom user
            # classes which are not aware of handler_order).
            return True
        return self.handler_order < other.handler_order
Tim Peters's avatar
Tim Peters committed
464

Jeremy Hylton's avatar
Jeremy Hylton committed
465

466 467 468 469 470 471 472
class HTTPErrorProcessor(BaseHandler):
    """Process HTTP error responses."""
    handler_order = 1000  # after all other processing

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

473
        if code not in (200, 206):
474 475 476 477 478 479 480
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)

        return response

    https_response = http_response

Jeremy Hylton's avatar
Jeremy Hylton committed
481 482
class HTTPDefaultErrorHandler(BaseHandler):
    def http_error_default(self, req, fp, code, msg, hdrs):
Fred Drake's avatar
Fred Drake committed
483
        raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
Jeremy Hylton's avatar
Jeremy Hylton committed
484 485

class HTTPRedirectHandler(BaseHandler):
486 487 488 489 490
    # maximum number of redirections to any single URL
    # this is needed because of the state that cookies introduce
    max_repeats = 4
    # maximum total number of redirections (regardless of URL) before
    # assuming we're in a loop
491 492
    max_redirections = 10

493
    def redirect_request(self, req, fp, code, msg, headers, newurl):
494 495
        """Return a Request or None in response to a redirect.

496 497 498 499 500 501
        This is called by the http_error_30x methods when a
        redirection response is received.  If a redirection should
        take place, return a new Request to allow http_error_30x to
        perform the redirect.  Otherwise, raise HTTPError if no-one
        else should try to handle this url.  Return None if you can't
        but another Handler might.
502
        """
503 504
        m = req.get_method()
        if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
505 506 507
            or code in (301, 302, 303) and m == "POST"):
            # Strictly (according to RFC 2616), 301 or 302 in response
            # to a POST MUST NOT cause a redirection without confirmation
508 509 510
            # from the user (of urllib2, in this case).  In practice,
            # essentially all clients do redirect in this case, so we
            # do the same.
511 512 513 514
            return Request(newurl,
                           headers=req.headers,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)
515
        else:
516
            raise HTTPError(req.get_full_url(), code, msg, headers, fp)
517

Jeremy Hylton's avatar
Jeremy Hylton committed
518 519 520 521 522
    # Implementation note: To avoid the server sending us into an
    # infinite loop, the request object needs to track what URLs we
    # have already seen.  Do this by adding a handler-specific
    # attribute to the Request object.
    def http_error_302(self, req, fp, code, msg, headers):
523 524
        # Some servers (incorrectly) return multiple Location headers
        # (so probably same goes for URI).  Use first header.
525
        if 'location' in headers:
526
            newurl = headers.getheaders('location')[0]
527
        elif 'uri' in headers:
528
            newurl = headers.getheaders('uri')[0]
Jeremy Hylton's avatar
Jeremy Hylton committed
529 530
        else:
            return
Jeremy Hylton's avatar
Jeremy Hylton committed
531 532
        newurl = urlparse.urljoin(req.get_full_url(), newurl)

Jeremy Hylton's avatar
Jeremy Hylton committed
533 534 535
        # XXX Probably want to forget about the state of the current
        # request, although that might interact poorly with other
        # handlers that also use handler-specific request attributes
536
        new = self.redirect_request(req, fp, code, msg, headers, newurl)
537 538 539 540
        if new is None:
            return

        # loop detection
541
        # .redirect_dict has a key url if url was previously visited.
542 543
        if hasattr(req, 'redirect_dict'):
            visited = new.redirect_dict = req.redirect_dict
544 545
            if (visited.get(newurl, 0) >= self.max_repeats or
                len(visited) >= self.max_redirections):
Jeremy Hylton's avatar
Jeremy Hylton committed
546
                raise HTTPError(req.get_full_url(), code,
547
                                self.inf_msg + msg, headers, fp)
548 549
        else:
            visited = new.redirect_dict = req.redirect_dict = {}
550
        visited[newurl] = visited.get(newurl, 0) + 1
551 552

        # Don't close the fp until we are sure that we won't use it
Tim Peters's avatar
Tim Peters committed
553
        # with HTTPError.
554 555 556
        fp.read()
        fp.close()

Jeremy Hylton's avatar
Jeremy Hylton committed
557 558
        return self.parent.open(new)

559
    http_error_301 = http_error_303 = http_error_307 = http_error_302
Jeremy Hylton's avatar
Jeremy Hylton committed
560

561
    inf_msg = "The HTTP server returned a redirect error that would " \
562
              "lead to an infinite loop.\n" \
563
              "The last 30x error message was:\n"
Jeremy Hylton's avatar
Jeremy Hylton committed
564 565

class ProxyHandler(BaseHandler):
566 567 568
    # Proxies must be in front
    handler_order = 100

Jeremy Hylton's avatar
Jeremy Hylton committed
569
    def __init__(self, proxies=None):
Fred Drake's avatar
Fred Drake committed
570 571 572 573
        if proxies is None:
            proxies = getproxies()
        assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
        self.proxies = proxies
574
        for type, url in proxies.items():
575
            setattr(self, '%s_open' % type,
Fred Drake's avatar
Fred Drake committed
576 577
                    lambda r, proxy=url, type=type, meth=self.proxy_open: \
                    meth(r, proxy, type))
Jeremy Hylton's avatar
Jeremy Hylton committed
578 579

    def proxy_open(self, req, proxy, type):
Fred Drake's avatar
Fred Drake committed
580
        orig_type = req.get_type()
581 582 583 584
        type, r_type = splittype(proxy)
        host, XXX = splithost(r_type)
        if '@' in host:
            user_pass, host = host.split('@', 1)
585 586
            if ':' in user_pass:
                user, password = user_pass.split(':', 1)
Tim Peters's avatar
Tim Peters committed
587
                user_pass = base64.encodestring('%s:%s' % (unquote(user),
588
                                                unquote(password))).strip()
589
                req.add_header('Proxy-authorization', 'Basic ' + user_pass)
590 591
        host = unquote(host)
        req.set_proxy(host, type)
Fred Drake's avatar
Fred Drake committed
592 593 594 595 596 597 598 599 600
        if orig_type == type:
            # let other handlers take care of it
            # XXX this only makes sense if the proxy is before the
            # other handlers
            return None
        else:
            # need to start over, because the other handlers don't
            # grok the proxy's URL type
            return self.parent.open(req)
Jeremy Hylton's avatar
Jeremy Hylton committed
601 602 603 604 605 606

# feature suggested by Duncan Booth
# XXX custom is not a good name
class CustomProxy:
    # either pass a function to the constructor or override handle
    def __init__(self, proto, func=None, proxy_addr=None):
Fred Drake's avatar
Fred Drake committed
607 608 609
        self.proto = proto
        self.func = func
        self.addr = proxy_addr
Jeremy Hylton's avatar
Jeremy Hylton committed
610 611

    def handle(self, req):
Fred Drake's avatar
Fred Drake committed
612 613
        if self.func and self.func(req):
            return 1
Jeremy Hylton's avatar
Jeremy Hylton committed
614 615

    def get_proxy(self):
Fred Drake's avatar
Fred Drake committed
616
        return self.addr
Jeremy Hylton's avatar
Jeremy Hylton committed
617 618

class CustomProxyHandler(BaseHandler):
619 620 621
    # Proxies must be in front
    handler_order = 100

Jeremy Hylton's avatar
Jeremy Hylton committed
622
    def __init__(self, *proxies):
Fred Drake's avatar
Fred Drake committed
623
        self.proxies = {}
Jeremy Hylton's avatar
Jeremy Hylton committed
624 625

    def proxy_open(self, req):
Fred Drake's avatar
Fred Drake committed
626 627 628 629 630 631 632 633 634 635
        proto = req.get_type()
        try:
            proxies = self.proxies[proto]
        except KeyError:
            return None
        for p in proxies:
            if p.handle(req):
                req.set_proxy(p.get_proxy())
                return self.parent.open(req)
        return None
Jeremy Hylton's avatar
Jeremy Hylton committed
636 637

    def do_proxy(self, p, req):
Fred Drake's avatar
Fred Drake committed
638
        return self.parent.open(req)
Jeremy Hylton's avatar
Jeremy Hylton committed
639 640

    def add_proxy(self, cpo):
641
        if cpo.proto in self.proxies:
Fred Drake's avatar
Fred Drake committed
642 643 644
            self.proxies[cpo.proto].append(cpo)
        else:
            self.proxies[cpo.proto] = [cpo]
Jeremy Hylton's avatar
Jeremy Hylton committed
645 646 647

class HTTPPasswordMgr:
    def __init__(self):
Fred Drake's avatar
Fred Drake committed
648
        self.passwd = {}
Jeremy Hylton's avatar
Jeremy Hylton committed
649 650

    def add_password(self, realm, uri, user, passwd):
Fred Drake's avatar
Fred Drake committed
651
        # uri could be a single URI or a sequence
652
        if isinstance(uri, basestring):
Fred Drake's avatar
Fred Drake committed
653 654
            uri = [uri]
        uri = tuple(map(self.reduce_uri, uri))
655
        if not realm in self.passwd:
Fred Drake's avatar
Fred Drake committed
656 657
            self.passwd[realm] = {}
        self.passwd[realm][uri] = (user, passwd)
Jeremy Hylton's avatar
Jeremy Hylton committed
658 659

    def find_user_password(self, realm, authuri):
Fred Drake's avatar
Fred Drake committed
660 661
        domains = self.passwd.get(realm, {})
        authuri = self.reduce_uri(authuri)
662
        for uris, authinfo in domains.iteritems():
Fred Drake's avatar
Fred Drake committed
663 664 665 666
            for uri in uris:
                if self.is_suburi(uri, authuri):
                    return authinfo
        return None, None
Jeremy Hylton's avatar
Jeremy Hylton committed
667 668

    def reduce_uri(self, uri):
Fred Drake's avatar
Fred Drake committed
669 670 671 672 673 674
        """Accept netloc or URI and extract only the netloc and path"""
        parts = urlparse.urlparse(uri)
        if parts[1]:
            return parts[1], parts[2] or '/'
        else:
            return parts[2], '/'
Jeremy Hylton's avatar
Jeremy Hylton committed
675 676

    def is_suburi(self, base, test):
Fred Drake's avatar
Fred Drake committed
677 678 679 680 681
        """Check if test is below base in a URI tree

        Both args must be URIs in reduced form.
        """
        if base == test:
682
            return True
Fred Drake's avatar
Fred Drake committed
683
        if base[0] != test[0]:
684
            return False
685
        common = posixpath.commonprefix((base[1], test[1]))
Fred Drake's avatar
Fred Drake committed
686
        if len(common) == len(base[1]):
687 688
            return True
        return False
689

Jeremy Hylton's avatar
Jeremy Hylton committed
690

691 692 693
class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):

    def find_user_password(self, realm, authuri):
694 695
        user, password = HTTPPasswordMgr.find_user_password(self, realm,
                                                            authuri)
696 697 698 699 700 701 702
        if user is not None:
            return user, password
        return HTTPPasswordMgr.find_user_password(self, None, authuri)


class AbstractBasicAuthHandler:

703
    rx = re.compile('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', re.I)
Jeremy Hylton's avatar
Jeremy Hylton committed
704 705 706 707 708

    # XXX there can actually be multiple auth-schemes in a
    # www-authenticate header.  should probably be a lot more careful
    # in parsing them to extract multiple alternatives

709 710 711 712
    def __init__(self, password_mgr=None):
        if password_mgr is None:
            password_mgr = HTTPPasswordMgr()
        self.passwd = password_mgr
Fred Drake's avatar
Fred Drake committed
713
        self.add_password = self.passwd.add_password
714

715 716 717
    def http_error_auth_reqed(self, authreq, host, req, headers):
        # XXX could be multiple headers
        authreq = headers.get(authreq, None)
Jeremy Hylton's avatar
Jeremy Hylton committed
718
        if authreq:
719
            mo = AbstractBasicAuthHandler.rx.search(authreq)
Jeremy Hylton's avatar
Jeremy Hylton committed
720 721
            if mo:
                scheme, realm = mo.groups()
Eric S. Raymond's avatar
Eric S. Raymond committed
722
                if scheme.lower() == 'basic':
723
                    return self.retry_http_basic_auth(host, req, realm)
Jeremy Hylton's avatar
Jeremy Hylton committed
724

725
    def retry_http_basic_auth(self, host, req, realm):
726 727 728 729
        # TODO(jhylton): Remove the host argument? It depends on whether
        # retry_http_basic_auth() is consider part of the public API.
        # It probably is.
        user, pw = self.passwd.find_user_password(realm, req.get_full_url())
730
        if pw is not None:
Fred Drake's avatar
Fred Drake committed
731
            raw = "%s:%s" % (user, pw)
732 733 734 735 736
            auth = 'Basic %s' % base64.encodestring(raw).strip()
            if req.headers.get(self.auth_header, None) == auth:
                return None
            req.add_header(self.auth_header, auth)
            return self.parent.open(req)
Jeremy Hylton's avatar
Jeremy Hylton committed
737 738 739
        else:
            return None

740
class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
Jeremy Hylton's avatar
Jeremy Hylton committed
741

742
    auth_header = 'Authorization'
Jeremy Hylton's avatar
Jeremy Hylton committed
743

744 745
    def http_error_401(self, req, fp, code, msg, headers):
        host = urlparse.urlparse(req.get_full_url())[1]
Tim Peters's avatar
Tim Peters committed
746
        return self.http_error_auth_reqed('www-authenticate',
747 748 749 750 751
                                          host, req, headers)


class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):

752
    auth_header = 'Proxy-authorization'
753 754 755

    def http_error_407(self, req, fp, code, msg, headers):
        host = req.get_host()
Tim Peters's avatar
Tim Peters committed
756
        return self.http_error_auth_reqed('proxy-authenticate',
757 758 759
                                          host, req, headers)


760 761 762 763 764 765 766 767 768 769 770 771 772 773
def randombytes(n):
    """Return n random bytes."""
    # Use /dev/urandom if it is available.  Fall back to random module
    # if not.  It might be worthwhile to extend this function to use
    # other platform-specific mechanisms for getting random bytes.
    if os.path.exists("/dev/urandom"):
        f = open("/dev/urandom")
        s = f.read(n)
        f.close()
        return s
    else:
        L = [chr(random.randrange(0, 256)) for i in range(n)]
        return "".join(L)

774
class AbstractDigestAuthHandler:
775 776 777 778 779 780 781 782 783
    # Digest authentication is specified in RFC 2617.

    # XXX The client does not inspect the Authentication-Info header
    # in a successful response.

    # XXX It should be possible to test this implementation against
    # a mock server that just generates a static set of challenges.

    # XXX qop="auth-int" supports is shaky
784 785 786

    def __init__(self, passwd=None):
        if passwd is None:
787
            passwd = HTTPPasswordMgr()
788
        self.passwd = passwd
Fred Drake's avatar
Fred Drake committed
789
        self.add_password = self.passwd.add_password
790 791 792 793 794 795 796 797 798 799 800 801 802 803
        self.retried = 0
        self.nonce_count = 0

    def reset_retry_count(self):
        self.retried = 0

    def http_error_auth_reqed(self, auth_header, host, req, headers):
        authreq = headers.get(auth_header, None)
        if self.retried > 5:
            # Don't fail endlessly - if we failed once, we'll probably
            # fail a second time. Hm. Unless the Password Manager is
            # prompting for the information. Crap. This isn't great
            # but it's better than the current 'repeat until recursion
            # depth exceeded' approach <wink>
Tim Peters's avatar
Tim Peters committed
804
            raise HTTPError(req.get_full_url(), 401, "digest auth failed",
805 806 807
                            headers, None)
        else:
            self.retried += 1
Fred Drake's avatar
Fred Drake committed
808
        if authreq:
809 810
            scheme = authreq.split()[0]
            if scheme.lower() == 'digest':
Fred Drake's avatar
Fred Drake committed
811
                return self.retry_http_digest_auth(req, authreq)
812 813 814
            else:
                raise ValueError("AbstractDigestAuthHandler doesn't know "
                                 "about %s"%(scheme))
Jeremy Hylton's avatar
Jeremy Hylton committed
815 816

    def retry_http_digest_auth(self, req, auth):
Eric S. Raymond's avatar
Eric S. Raymond committed
817
        token, challenge = auth.split(' ', 1)
Fred Drake's avatar
Fred Drake committed
818 819 820
        chal = parse_keqv_list(parse_http_list(challenge))
        auth = self.get_authorization(req, chal)
        if auth:
821 822 823 824
            auth_val = 'Digest %s' % auth
            if req.headers.get(self.auth_header, None) == auth_val:
                return None
            req.add_header(self.auth_header, auth_val)
Fred Drake's avatar
Fred Drake committed
825 826
            resp = self.parent.open(req)
            return resp
Jeremy Hylton's avatar
Jeremy Hylton committed
827

828 829 830 831 832 833 834 835 836 837
    def get_cnonce(self, nonce):
        # The cnonce-value is an opaque
        # quoted string value provided by the client and used by both client
        # and server to avoid chosen plaintext attacks, to provide mutual
        # authentication, and to provide some message integrity protection.
        # This isn't a fabulous effort, but it's probably Good Enough.
        dig = sha.new("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
                                       randombytes(8))).hexdigest()
        return dig[:16]

Jeremy Hylton's avatar
Jeremy Hylton committed
838
    def get_authorization(self, req, chal):
Fred Drake's avatar
Fred Drake committed
839 840 841
        try:
            realm = chal['realm']
            nonce = chal['nonce']
842
            qop = chal.get('qop')
Fred Drake's avatar
Fred Drake committed
843 844 845 846 847 848 849 850 851 852 853
            algorithm = chal.get('algorithm', 'MD5')
            # mod_digest doesn't send an opaque, even though it isn't
            # supposed to be optional
            opaque = chal.get('opaque', None)
        except KeyError:
            return None

        H, KD = self.get_algorithm_impls(algorithm)
        if H is None:
            return None

854
        user, pw = self.passwd.find_user_password(realm, req.get_full_url())
Fred Drake's avatar
Fred Drake committed
855 856 857 858 859 860 861 862 863 864
        if user is None:
            return None

        # XXX not implemented yet
        if req.has_data():
            entdig = self.get_entity_digest(req.get_data(), chal)
        else:
            entdig = None

        A1 = "%s:%s:%s" % (user, realm, pw)
865
        A2 = "%s:%s" % (req.get_method(),
Fred Drake's avatar
Fred Drake committed
866 867
                        # XXX selector: what about proxies and full urls
                        req.get_selector())
868 869 870 871 872 873 874 875 876 877 878
        if qop == 'auth':
            self.nonce_count += 1
            ncvalue = '%08x' % self.nonce_count
            cnonce = self.get_cnonce(nonce)
            noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
            respdig = KD(H(A1), noncebit)
        elif qop is None:
            respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
        else:
            # XXX handle auth-int.
            pass
Tim Peters's avatar
Tim Peters committed
879

Fred Drake's avatar
Fred Drake committed
880 881 882 883 884 885
        # XXX should the partial digests be encoded too?

        base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
               'response="%s"' % (user, realm, nonce, req.get_selector(),
                                  respdig)
        if opaque:
886
            base += ', opaque="%s"' % opaque
Fred Drake's avatar
Fred Drake committed
887
        if entdig:
888 889
            base += ', digest="%s"' % entdig
        base += ', algorithm="%s"' % algorithm
890
        if qop:
891
            base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
Fred Drake's avatar
Fred Drake committed
892
        return base
Jeremy Hylton's avatar
Jeremy Hylton committed
893 894

    def get_algorithm_impls(self, algorithm):
Fred Drake's avatar
Fred Drake committed
895 896
        # lambdas assume digest modules are imported at the top level
        if algorithm == 'MD5':
897
            H = lambda x: md5.new(x).hexdigest()
Fred Drake's avatar
Fred Drake committed
898
        elif algorithm == 'SHA':
899
            H = lambda x: sha.new(x).hexdigest()
Fred Drake's avatar
Fred Drake committed
900
        # XXX MD5-sess
901
        KD = lambda s, d: H("%s:%s" % (s, d))
Fred Drake's avatar
Fred Drake committed
902
        return H, KD
Jeremy Hylton's avatar
Jeremy Hylton committed
903 904

    def get_entity_digest(self, data, chal):
Fred Drake's avatar
Fred Drake committed
905 906
        # XXX not implemented yet
        return None
Jeremy Hylton's avatar
Jeremy Hylton committed
907

908 909 910 911 912 913 914 915

class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
    """An authentication protocol defined by RFC 2069

    Digest authentication improves on basic authentication because it
    does not transmit passwords in the clear.
    """

916
    auth_header = 'Authorization'
917 918 919

    def http_error_401(self, req, fp, code, msg, headers):
        host = urlparse.urlparse(req.get_full_url())[1]
Tim Peters's avatar
Tim Peters committed
920
        retry = self.http_error_auth_reqed('www-authenticate',
921 922 923
                                           host, req, headers)
        self.reset_retry_count()
        return retry
924 925 926 927


class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):

928
    auth_header = 'Proxy-Authorization'
929 930 931

    def http_error_407(self, req, fp, code, msg, headers):
        host = req.get_host()
Tim Peters's avatar
Tim Peters committed
932
        retry = self.http_error_auth_reqed('proxy-authenticate',
933 934 935
                                           host, req, headers)
        self.reset_retry_count()
        return retry
936

937 938
class AbstractHTTPHandler(BaseHandler):

939 940 941 942 943 944
    def __init__(self, debuglevel=0):
        self._debuglevel = debuglevel

    def set_http_debuglevel(self, level):
        self._debuglevel = level

945
    def do_request_(self, request):
946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970
        host = request.get_host()
        if not host:
            raise URLError('no host given')

        if request.has_data():  # POST
            data = request.get_data()
            if not request.has_header('Content-type'):
                request.add_unredirected_header(
                    'Content-type',
                    'application/x-www-form-urlencoded')
            if not request.has_header('Content-length'):
                request.add_unredirected_header(
                    'Content-length', '%d' % len(data))

        scheme, sel = splittype(request.get_selector())
        sel_host, sel_path = splithost(sel)
        if not request.has_header('Host'):
            request.add_unredirected_header('Host', sel_host or host)
        for name, value in self.parent.addheaders:
            name = name.capitalize()
            if not request.has_header(name):
                request.add_unredirected_header(name, value)

        return request

971
    def do_open(self, http_class, req):
972 973 974 975 976 977 978 979 980
        """Return an addinfourl object for the request, using http_class.

        http_class must implement the HTTPConnection API from httplib.
        The addinfourl return value is a file-like object.  It also
        has methods and attributes including:
            - info(): return a mimetools.Message object for the headers
            - geturl(): return the original request URL
            - code: HTTP status code
        """
981
        host = req.get_host()
Jeremy Hylton's avatar
Jeremy Hylton committed
982 983 984
        if not host:
            raise URLError('no host given')

985
        h = http_class(host) # will parse host:port
986
        h.set_debuglevel(self._debuglevel)
987

988 989
        headers = dict(req.headers)
        headers.update(req.unredirected_hdrs)
990 991 992 993 994 995 996
        # We want to make an HTTP/1.1 request, but the addinfourl
        # class isn't prepared to deal with a persistent connection.
        # It will try to read all remaining data from the socket,
        # which will block while the server waits for the next request.
        # So make sure the connection gets closed after the (only)
        # request.
        headers["Connection"] = "close"
997
        try:
998 999 1000
            h.request(req.get_method(), req.get_selector(), req.data, headers)
            r = h.getresponse()
        except socket.error, err: # XXX what error?
1001
            raise URLError(err)
1002

1003
        # Pick apart the HTTPResponse object to get the addinfourl
1004 1005 1006 1007 1008 1009
        # object initialized properly.

        # Wrap the HTTPResponse object in socket's file object adapter
        # for Windows.  That adapter calls recv(), so delegate recv()
        # to read().  This weird wrapping allows the returned object to
        # have readline() and readlines() methods.
Tim Peters's avatar
Tim Peters committed
1010

1011 1012
        # XXX It might be better to extract the read buffering code
        # out of socket._fileobject() and into a base class.
Tim Peters's avatar
Tim Peters committed
1013

1014 1015
        r.recv = r.read
        fp = socket._fileobject(r)
Tim Peters's avatar
Tim Peters committed
1016

1017
        resp = addinfourl(fp, r.msg, req.get_full_url())
1018 1019 1020
        resp.code = r.status
        resp.msg = r.reason
        return resp
Jeremy Hylton's avatar
Jeremy Hylton committed
1021

1022 1023 1024 1025

class HTTPHandler(AbstractHTTPHandler):

    def http_open(self, req):
1026
        return self.do_open(httplib.HTTPConnection, req)
1027

1028
    http_request = AbstractHTTPHandler.do_request_
1029 1030 1031 1032 1033

if hasattr(httplib, 'HTTPS'):
    class HTTPSHandler(AbstractHTTPHandler):

        def https_open(self, req):
1034
            return self.do_open(httplib.HTTPSConnection, req)
1035

1036 1037 1038 1039 1040
        https_request = AbstractHTTPHandler.do_request_

class HTTPCookieProcessor(BaseHandler):
    def __init__(self, cookiejar=None):
        if cookiejar is None:
1041
            cookiejar = cookielib.CookieJar()
1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053
        self.cookiejar = cookiejar

    def http_request(self, request):
        self.cookiejar.add_cookie_header(request)
        return request

    def http_response(self, request, response):
        self.cookiejar.extract_cookies(response, request)
        return response

    https_request = http_request
    https_response = http_response
1054

Jeremy Hylton's avatar
Jeremy Hylton committed
1055 1056
class UnknownHandler(BaseHandler):
    def unknown_open(self, req):
Fred Drake's avatar
Fred Drake committed
1057
        type = req.get_type()
Jeremy Hylton's avatar
Jeremy Hylton committed
1058 1059 1060 1061 1062 1063
        raise URLError('unknown url type: %s' % type)

def parse_keqv_list(l):
    """Parse list of key=value strings where keys are not duplicated."""
    parsed = {}
    for elt in l:
Eric S. Raymond's avatar
Eric S. Raymond committed
1064
        k, v = elt.split('=', 1)
Fred Drake's avatar
Fred Drake committed
1065 1066 1067
        if v[0] == '"' and v[-1] == '"':
            v = v[1:-1]
        parsed[k] = v
Jeremy Hylton's avatar
Jeremy Hylton committed
1068 1069 1070 1071
    return parsed

def parse_http_list(s):
    """Parse lists as described by RFC 2068 Section 2.
1072
    
Andrew M. Kuchling's avatar
Andrew M. Kuchling committed
1073
    In particular, parse comma-separated lists where the elements of
Jeremy Hylton's avatar
Jeremy Hylton committed
1074
    the list may include quoted-strings.  A quoted-string could
1075 1076 1077
    contain a comma.  A non-quoted string could have quotes in the
    middle.  Neither commas nor quotes count if they are escaped.
    Only double-quotes count, not single-quotes.
Jeremy Hylton's avatar
Jeremy Hylton committed
1078
    """
1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090
    res = []
    part = ''

    escape = quote = False
    for cur in s:
        if escape:
            part += cur
            escape = False
            continue
        if quote:
            if cur == '\\':
                escape = True
Fred Drake's avatar
Fred Drake committed
1091
                continue
1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111
            elif cur == '"':
                quote = False
            part += cur
            continue

        if cur == ',':
            res.append(part)
            part = ''
            continue

        if cur == '"':
            quote = True
        
        part += cur

    # append last part
    if part:
        res.append(part)

    return [part.strip() for part in res]
Jeremy Hylton's avatar
Jeremy Hylton committed
1112 1113 1114 1115

class FileHandler(BaseHandler):
    # Use local file or FTP depending on form of URL
    def file_open(self, req):
Fred Drake's avatar
Fred Drake committed
1116 1117 1118 1119 1120 1121
        url = req.get_selector()
        if url[:2] == '//' and url[2:3] != '/':
            req.type = 'ftp'
            return self.parent.open(req)
        else:
            return self.open_local_file(req)
Jeremy Hylton's avatar
Jeremy Hylton committed
1122 1123 1124 1125

    # names for the localhost
    names = None
    def get_names(self):
Fred Drake's avatar
Fred Drake committed
1126
        if FileHandler.names is None:
1127
            FileHandler.names = (socket.gethostbyname('localhost'),
Fred Drake's avatar
Fred Drake committed
1128 1129
                                 socket.gethostbyname(socket.gethostname()))
        return FileHandler.names
Jeremy Hylton's avatar
Jeremy Hylton committed
1130 1131 1132

    # not entirely sure what the rules are here
    def open_local_file(self, req):
1133
        import email.Utils
Fred Drake's avatar
Fred Drake committed
1134 1135
        host = req.get_host()
        file = req.get_selector()
1136 1137
        localfile = url2pathname(file)
        stats = os.stat(localfile)
1138
        size = stats.st_size
1139
        modified = email.Utils.formatdate(stats.st_mtime, usegmt=True)
1140 1141
        mtype = mimetypes.guess_type(file)[0]
        headers = mimetools.Message(StringIO(
1142
            'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
1143
            (mtype or 'text/plain', size, modified)))
Fred Drake's avatar
Fred Drake committed
1144 1145 1146 1147
        if host:
            host, port = splitport(host)
        if not host or \
           (not port and socket.gethostbyname(host) in self.get_names()):
1148
            return addinfourl(open(localfile, 'rb'),
Fred Drake's avatar
Fred Drake committed
1149 1150
                              headers, 'file:'+file)
        raise URLError('file not on local host')
Jeremy Hylton's avatar
Jeremy Hylton committed
1151 1152 1153

class FTPHandler(BaseHandler):
    def ftp_open(self, req):
Fred Drake's avatar
Fred Drake committed
1154 1155 1156
        host = req.get_host()
        if not host:
            raise IOError, ('ftp error', 'no host given')
1157 1158 1159
        host, port = splitport(host)
        if port is None:
            port = ftplib.FTP_PORT
1160 1161
        else:
            port = int(port)
1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172

        # username/password handling
        user, host = splituser(host)
        if user:
            user, passwd = splitpasswd(user)
        else:
            passwd = None
        host = unquote(host)
        user = unquote(user or '')
        passwd = unquote(passwd or '')

Jeremy Hylton's avatar
Jeremy Hylton committed
1173 1174 1175 1176
        try:
            host = socket.gethostbyname(host)
        except socket.error, msg:
            raise URLError(msg)
Fred Drake's avatar
Fred Drake committed
1177
        path, attrs = splitattr(req.get_selector())
Eric S. Raymond's avatar
Eric S. Raymond committed
1178
        dirs = path.split('/')
1179
        dirs = map(unquote, dirs)
Fred Drake's avatar
Fred Drake committed
1180 1181 1182 1183 1184 1185 1186
        dirs, file = dirs[:-1], dirs[-1]
        if dirs and not dirs[0]:
            dirs = dirs[1:]
        try:
            fw = self.connect_ftp(user, passwd, host, port, dirs)
            type = file and 'I' or 'D'
            for attr in attrs:
1187
                attr, value = splitvalue(attr)
Eric S. Raymond's avatar
Eric S. Raymond committed
1188
                if attr.lower() == 'type' and \
Fred Drake's avatar
Fred Drake committed
1189
                   value in ('a', 'A', 'i', 'I', 'd', 'D'):
Eric S. Raymond's avatar
Eric S. Raymond committed
1190
                    type = value.upper()
Fred Drake's avatar
Fred Drake committed
1191
            fp, retrlen = fw.retrfile(file, type)
1192 1193 1194
            headers = ""
            mtype = mimetypes.guess_type(req.get_full_url())[0]
            if mtype:
1195
                headers += "Content-type: %s\n" % mtype
Fred Drake's avatar
Fred Drake committed
1196
            if retrlen is not None and retrlen >= 0:
1197
                headers += "Content-length: %d\n" % retrlen
1198 1199
            sf = StringIO(headers)
            headers = mimetools.Message(sf)
Fred Drake's avatar
Fred Drake committed
1200 1201 1202
            return addinfourl(fp, headers, req.get_full_url())
        except ftplib.all_errors, msg:
            raise IOError, ('ftp error', msg), sys.exc_info()[2]
Jeremy Hylton's avatar
Jeremy Hylton committed
1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216

    def connect_ftp(self, user, passwd, host, port, dirs):
        fw = ftpwrapper(user, passwd, host, port, dirs)
##        fw.ftp.set_debuglevel(1)
        return fw

class CacheFTPHandler(FTPHandler):
    # XXX would be nice to have pluggable cache strategies
    # XXX this stuff is definitely not thread safe
    def __init__(self):
        self.cache = {}
        self.timeout = {}
        self.soonest = 0
        self.delay = 60
Fred Drake's avatar
Fred Drake committed
1217
        self.max_conns = 16
Jeremy Hylton's avatar
Jeremy Hylton committed
1218 1219 1220 1221 1222

    def setTimeout(self, t):
        self.delay = t

    def setMaxConns(self, m):
Fred Drake's avatar
Fred Drake committed
1223
        self.max_conns = m
Jeremy Hylton's avatar
Jeremy Hylton committed
1224 1225

    def connect_ftp(self, user, passwd, host, port, dirs):
1226
        key = user, host, port, '/'.join(dirs)
1227
        if key in self.cache:
Jeremy Hylton's avatar
Jeremy Hylton committed
1228 1229 1230 1231
            self.timeout[key] = time.time() + self.delay
        else:
            self.cache[key] = ftpwrapper(user, passwd, host, port, dirs)
            self.timeout[key] = time.time() + self.delay
Fred Drake's avatar
Fred Drake committed
1232
        self.check_cache()
Jeremy Hylton's avatar
Jeremy Hylton committed
1233 1234 1235
        return self.cache[key]

    def check_cache(self):
Fred Drake's avatar
Fred Drake committed
1236
        # first check for old ones
Jeremy Hylton's avatar
Jeremy Hylton committed
1237 1238
        t = time.time()
        if self.soonest <= t:
1239
            for k, v in self.timeout.items():
Jeremy Hylton's avatar
Jeremy Hylton committed
1240 1241 1242 1243 1244 1245 1246
                if v < t:
                    self.cache[k].close()
                    del self.cache[k]
                    del self.timeout[k]
        self.soonest = min(self.timeout.values())

        # then check the size
Fred Drake's avatar
Fred Drake committed
1247
        if len(self.cache) == self.max_conns:
1248
            for k, v in self.timeout.items():
Fred Drake's avatar
Fred Drake committed
1249 1250 1251 1252 1253
                if v == self.soonest:
                    del self.cache[k]
                    del self.timeout[k]
                    break
            self.soonest = min(self.timeout.values())
Jeremy Hylton's avatar
Jeremy Hylton committed
1254 1255 1256

class GopherHandler(BaseHandler):
    def gopher_open(self, req):
Fred Drake's avatar
Fred Drake committed
1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270
        host = req.get_host()
        if not host:
            raise GopherError('no host given')
        host = unquote(host)
        selector = req.get_selector()
        type, selector = splitgophertype(selector)
        selector, query = splitquery(selector)
        selector = unquote(selector)
        if query:
            query = unquote(query)
            fp = gopherlib.send_query(selector, query, host)
        else:
            fp = gopherlib.send_selector(selector, host)
        return addinfourl(fp, noheaders(), req.get_full_url())
Jeremy Hylton's avatar
Jeremy Hylton committed
1271 1272 1273 1274 1275

#bleck! don't use this yet
class OpenerFactory:

    default_handlers = [UnknownHandler, HTTPHandler,
1276
                        HTTPDefaultErrorHandler, HTTPRedirectHandler,
Fred Drake's avatar
Fred Drake committed
1277
                        FTPHandler, FileHandler]
Jeremy Hylton's avatar
Jeremy Hylton committed
1278 1279 1280 1281
    handlers = []
    replacement_handlers = []

    def add_handler(self, h):
Fred Drake's avatar
Fred Drake committed
1282
        self.handlers = self.handlers + [h]
Jeremy Hylton's avatar
Jeremy Hylton committed
1283 1284

    def replace_handler(self, h):
Fred Drake's avatar
Fred Drake committed
1285
        pass
Jeremy Hylton's avatar
Jeremy Hylton committed
1286 1287

    def build_opener(self):
1288
        opener = OpenerDirector()
1289
        for ph in self.default_handlers:
1290
            if inspect.isclass(ph):
Fred Drake's avatar
Fred Drake committed
1291 1292
                ph = ph()
            opener.add_handler(ph)
1293 1294 1295 1296 1297

# Mapping status codes to official W3C names
httpresponses = {
    100: 'Continue',
    101: 'Switching Protocols',
Tim Peters's avatar
Tim Peters committed
1298

1299 1300 1301 1302 1303 1304 1305
    200: 'OK',
    201: 'Created',
    202: 'Accepted',
    203: 'Non-Authoritative Information',
    204: 'No Content',
    205: 'Reset Content',
    206: 'Partial Content',
Tim Peters's avatar
Tim Peters committed
1306

1307 1308 1309 1310 1311 1312 1313 1314
    300: 'Multiple Choices',
    301: 'Moved Permanently',
    302: 'Found',
    303: 'See Other',
    304: 'Not Modified',
    305: 'Use Proxy',
    306: '(Unused)',
    307: 'Temporary Redirect',
Tim Peters's avatar
Tim Peters committed
1315

1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333
    400: 'Bad Request',
    401: 'Unauthorized',
    402: 'Payment Required',
    403: 'Forbidden',
    404: 'Not Found',
    405: 'Method Not Allowed',
    406: 'Not Acceptable',
    407: 'Proxy Authentication Required',
    408: 'Request Timeout',
    409: 'Conflict',
    410: 'Gone',
    411: 'Length Required',
    412: 'Precondition Failed',
    413: 'Request Entity Too Large',
    414: 'Request-URI Too Long',
    415: 'Unsupported Media Type',
    416: 'Requested Range Not Satisfiable',
    417: 'Expectation Failed',
Tim Peters's avatar
Tim Peters committed
1334

1335 1336 1337 1338 1339 1340 1341
    500: 'Internal Server Error',
    501: 'Not Implemented',
    502: 'Bad Gateway',
    503: 'Service Unavailable',
    504: 'Gateway Timeout',
    505: 'HTTP Version Not Supported',
}