Commit 6208369f authored by Barry Warsaw's avatar Barry Warsaw

get_param(): Update the docstring to explain how CHARSET and LANGUAGE

can be None, and what to do in that situation.

get_filename(), get_boundary(), get_content_charset(): Make sure these
handle RFC 2231 headers without a CHARSET field.

Backport candidate (as was the Utils.py 1.25 change) to both Python
2.3.1 and 2.2.4 -- will do momentarily.
parent 0b6f0d88
......@@ -571,13 +571,16 @@ class Message:
Parameter keys are always compared case insensitively. The return
value can either be a string, or a 3-tuple if the parameter was RFC
2231 encoded. When it's a 3-tuple, the elements of the value are of
the form (CHARSET, LANGUAGE, VALUE), where LANGUAGE may be the empty
string. Your application should be prepared to deal with these, and
can convert the parameter to a Unicode string like so:
the form (CHARSET, LANGUAGE, VALUE). Note that both CHARSET and
LANGUAGE can be None, in which case you should consider VALUE to be
encoded in the us-ascii charset. You can usually ignore LANGUAGE.
Your application should be prepared to deal with 3-tuple return
values, and can convert the parameter to a Unicode string like so:
param = msg.get_param('foo')
if isinstance(param, tuple):
param = unicode(param[2], param[0])
param = unicode(param[2], param[0] or 'us-ascii')
In any case, the parameter value (either the returned string, or the
VALUE item in the 3-tuple) is always unquoted, unless unquote is set
......@@ -708,7 +711,7 @@ class Message:
if isinstance(filename, TupleType):
# It's an RFC 2231 encoded parameter
newvalue = _unquotevalue(filename)
return unicode(newvalue[2], newvalue[0])
return unicode(newvalue[2], newvalue[0] or 'us-ascii')
else:
newvalue = _unquotevalue(filename.strip())
return newvalue
......@@ -725,7 +728,8 @@ class Message:
return failobj
if isinstance(boundary, TupleType):
# RFC 2231 encoded, so decode. It better end up as ascii
return unicode(boundary[2], boundary[0]).encode('us-ascii')
charset = boundary[0] or 'us-ascii'
return unicode(boundary[2], charset).encode('us-ascii')
return _unquotevalue(boundary.strip())
def set_boundary(self, boundary):
......@@ -792,7 +796,8 @@ class Message:
return failobj
if isinstance(charset, TupleType):
# RFC 2231 encoded, so decode it, and it better end up as ascii.
charset = unicode(charset[2], charset[0]).encode('us-ascii')
pcharset = charset[0] or 'us-ascii'
charset = unicode(charset[2], pcharset).encode('us-ascii')
# RFC 2046, $4.1.2 says charsets are not case sensitive
return charset.lower()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment