Commit e61e3fda authored by R David Murray's avatar R David Murray

#18891: Complete new provisional email API.

This adds EmailMessage and, MIMEPart subclasses of Message
with new API methods, and a ContentManager class used by
the new methods.  Also a new policy setting, content_manager.

Patch was reviewed by Stephen J. Turnbull and Serhiy Storchaka,
and reflects their feedback.

I will ideally add some examples of using the new API to the
documentation before the final release.
parent 57822dfd
This diff is collapsed.
......@@ -33,10 +33,11 @@ Here are the methods of the :class:`Message` class:
.. class:: Message(policy=compat32)
The *policy* argument determiens the :mod:`~email.policy` that will be used
to update the message model. The default value, :class:`compat32
<email.policy.Compat32>` maintains backward compatibility with the
Python 3.2 version of the email package. For more information see the
If *policy* is specified (it must be an instance of a :mod:`~email.policy`
class) use the rules it specifies to udpate and serialize the representation
of the message. If *policy* is not set, use the :class`compat32
<email.policy.Compat32>` policy, which maintains backward compatibility with
the Python 3.2 version of the email package. For more information see the
:mod:`~email.policy` documentation.
.. versionchanged:: 3.3 The *policy* keyword argument was added.
......@@ -465,7 +466,8 @@ Here are the methods of the :class:`Message` class:
to ``False``.
.. method:: set_param(param, value, header='Content-Type', requote=True, charset=None, language='')
.. method:: set_param(param, value, header='Content-Type', requote=True,
charset=None, language='', replace=False)
Set a parameter in the :mailheader:`Content-Type` header. If the
parameter already exists in the header, its value will be replaced with
......@@ -482,6 +484,12 @@ Here are the methods of the :class:`Message` class:
language, defaulting to the empty string. Both *charset* and *language*
should be strings.
If *replace* is ``False`` (the default) the header is moved to the
end of the list of headers. If *replace* is ``True``, the header
will be updated in place.
.. versionchanged: 3.4 ``replace`` keyword was added.
.. method:: del_param(param, header='content-type', requote=True)
......
......@@ -371,7 +371,7 @@ added matters. To illustrate::
to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs.
This policy adds new header parsing and folding algorithms. Instead of
simple strings, headers are custom objects with custom attributes depending
simple strings, headers are ``str`` subclasses with attributes that depend
on the type of the field. The parsing and folding algorithm fully implement
:rfc:`2047` and :rfc:`5322`.
......@@ -408,6 +408,20 @@ added matters. To illustrate::
fields are treated as unstructured. This list will be completed before
the extension is marked stable.)
.. attribute:: content_manager
An object with at least two methods: get_content and set_content. When
the :meth:`~email.message.Message.get_content` or
:meth:`~email.message.Message.set_content` method of a
:class:`~email.message.Message` object is called, it calls the
corresponding method of this object, passing it the message object as its
first argument, and any arguments or keywords that were passed to it as
additional arguments. By default ``content_manager`` is set to
:data:`~email.contentmanager.raw_data_manager`.
.. versionadded 3.4
The class provides the following concrete implementations of the abstract
methods of :class:`Policy`:
......@@ -427,7 +441,7 @@ added matters. To illustrate::
The name is returned unchanged. If the input value has a ``name``
attribute and it matches *name* ignoring case, the value is returned
unchanged. Otherwise the *name* and *value* are passed to
``header_factory``, and the resulting custom header object is returned as
``header_factory``, and the resulting header object is returned as
the value. In this case a ``ValueError`` is raised if the input value
contains CR or LF characters.
......@@ -435,7 +449,7 @@ added matters. To illustrate::
If the value has a ``name`` attribute, it is returned to unmodified.
Otherwise the *name*, and the *value* with any CR or LF characters
removed, are passed to the ``header_factory``, and the resulting custom
removed, are passed to the ``header_factory``, and the resulting
header object is returned. Any surrogateescaped bytes get turned into
the unicode unknown-character glyph.
......@@ -445,9 +459,9 @@ added matters. To illustrate::
A value is considered to be a 'source value' if and only if it does not
have a ``name`` attribute (having a ``name`` attribute means it is a
header object of some sort). If a source value needs to be refolded
according to the policy, it is converted into a custom header object by
according to the policy, it is converted into a header object by
passing the *name* and the *value* with any CR and LF characters removed
to the ``header_factory``. Folding of a custom header object is done by
to the ``header_factory``. Folding of a header object is done by
calling its ``fold`` method with the current policy.
Source values are split into lines using :meth:`~str.splitlines`. If
......@@ -502,23 +516,23 @@ With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of
the email package is changed from the Python 3.2 API in the following ways:
* Setting a header on a :class:`~email.message.Message` results in that
header being parsed and a custom header object created.
header being parsed and a header object created.
* Fetching a header value from a :class:`~email.message.Message` results
in that header being parsed and a custom header object created and
in that header being parsed and a header object created and
returned.
* Any custom header object, or any header that is refolded due to the
* Any header object, or any header that is refolded due to the
policy settings, is folded using an algorithm that fully implements the
RFC folding algorithms, including knowing where encoded words are required
and allowed.
From the application view, this means that any header obtained through the
:class:`~email.message.Message` is a custom header object with custom
:class:`~email.message.Message` is a header object with extra
attributes, whose string value is the fully decoded unicode value of the
header. Likewise, a header may be assigned a new value, or a new header
created, using a unicode string, and the policy will take care of converting
the unicode string into the correct RFC encoded form.
The custom header objects and their attributes are described in
The header objects and their attributes are described in
:mod:`~email.headerregistry`.
......@@ -53,6 +53,7 @@ Contents of the :mod:`email` package documentation:
email.generator.rst
email.policy.rst
email.headerregistry.rst
email.contentmanager.rst
email.mime.rst
email.header.rst
email.charset.rst
......
......@@ -280,6 +280,21 @@ result: a bytes object containing the fully formatted message.
(Contributed by R. David Murray in :issue:`18600`.)
A pair of new subclasses of :class:`~email.message.Message` have been added,
along with a new sub-module, :mod:`~email.contentmanager`. All documentation
is currently in the new module, which is being added as part of the new
:term:`provisional <provosional package>` email API. These classes provide a
number of new methods that make extracting content from and inserting content
into email messages much easier. See the :mod:`~email.contentmanager`
documentation for details.
These API additions complete the bulk of the work that was planned as part of
the email6 project. The currently provisional API is scheduled to become final
in Python 3.5 (possibly with a few minor additions in the area of error
handling).
(Contributed by R. David Murray in :issue:`18891`.)
functools
---------
......
This diff is collapsed.
......@@ -8,8 +8,6 @@ __all__ = ['Message']
import re
import uu
import base64
import binascii
from io import BytesIO, StringIO
# Intrapackage imports
......@@ -679,7 +677,7 @@ class Message:
return failobj
def set_param(self, param, value, header='Content-Type', requote=True,
charset=None, language=''):
charset=None, language='', replace=False):
"""Set a parameter in the Content-Type header.
If the parameter already exists in the header, its value will be
......@@ -723,8 +721,11 @@ class Message:
else:
ctype = SEMISPACE.join([ctype, append_param])
if ctype != self.get(header):
del self[header]
self[header] = ctype
if replace:
self.replace_header(header, ctype)
else:
del self[header]
self[header] = ctype
def del_param(self, param, header='content-type', requote=True):
"""Remove the given parameter completely from the Content-Type header.
......@@ -905,3 +906,208 @@ class Message:
# I.e. def walk(self): ...
from email.iterators import walk
class MIMEPart(Message):
def __init__(self, policy=None):
if policy is None:
from email.policy import default
policy = default
Message.__init__(self, policy)
@property
def is_attachment(self):
c_d = self.get('content-disposition')
if c_d is None:
return False
return c_d.lower() == 'attachment'
def _find_body(self, part, preferencelist):
if part.is_attachment:
return
maintype, subtype = part.get_content_type().split('/')
if maintype == 'text':
if subtype in preferencelist:
yield (preferencelist.index(subtype), part)
return
if maintype != 'multipart':
return
if subtype != 'related':
for subpart in part.iter_parts():
yield from self._find_body(subpart, preferencelist)
return
if 'related' in preferencelist:
yield (preferencelist.index('related'), part)
candidate = None
start = part.get_param('start')
if start:
for subpart in part.iter_parts():
if subpart['content-id'] == start:
candidate = subpart
break
if candidate is None:
subparts = part.get_payload()
candidate = subparts[0] if subparts else None
if candidate is not None:
yield from self._find_body(candidate, preferencelist)
def get_body(self, preferencelist=('related', 'html', 'plain')):
"""Return best candidate mime part for display as 'body' of message.
Do a depth first search, starting with self, looking for the first part
matching each of the items in preferencelist, and return the part
corresponding to the first item that has a match, or None if no items
have a match. If 'related' is not included in preferencelist, consider
the root part of any multipart/related encountered as a candidate
match. Ignore parts with 'Content-Disposition: attachment'.
"""
best_prio = len(preferencelist)
body = None
for prio, part in self._find_body(self, preferencelist):
if prio < best_prio:
best_prio = prio
body = part
if prio == 0:
break
return body
_body_types = {('text', 'plain'),
('text', 'html'),
('multipart', 'related'),
('multipart', 'alternative')}
def iter_attachments(self):
"""Return an iterator over the non-main parts of a multipart.
Skip the first of each occurrence of text/plain, text/html,
multipart/related, or multipart/alternative in the multipart (unless
they have a 'Content-Disposition: attachment' header) and include all
remaining subparts in the returned iterator. When applied to a
multipart/related, return all parts except the root part. Return an
empty iterator when applied to a multipart/alternative or a
non-multipart.
"""
maintype, subtype = self.get_content_type().split('/')
if maintype != 'multipart' or subtype == 'alternative':
return
parts = self.get_payload()
if maintype == 'multipart' and subtype == 'related':
# For related, we treat everything but the root as an attachment.
# The root may be indicated by 'start'; if there's no start or we
# can't find the named start, treat the first subpart as the root.
start = self.get_param('start')
if start:
found = False
attachments = []
for part in parts:
if part.get('content-id') == start:
found = True
else:
attachments.append(part)
if found:
yield from attachments
return
parts.pop(0)
yield from parts
return
# Otherwise we more or less invert the remaining logic in get_body.
# This only really works in edge cases (ex: non-text relateds or
# alternatives) if the sending agent sets content-disposition.
seen = [] # Only skip the first example of each candidate type.
for part in parts:
maintype, subtype = part.get_content_type().split('/')
if ((maintype, subtype) in self._body_types and
not part.is_attachment and subtype not in seen):
seen.append(subtype)
continue
yield part
def iter_parts(self):
"""Return an iterator over all immediate subparts of a multipart.
Return an empty iterator for a non-multipart.
"""
if self.get_content_maintype() == 'multipart':
yield from self.get_payload()
def get_content(self, *args, content_manager=None, **kw):
if content_manager is None:
content_manager = self.policy.content_manager
return content_manager.get_content(self, *args, **kw)
def set_content(self, *args, content_manager=None, **kw):
if content_manager is None:
content_manager = self.policy.content_manager
content_manager.set_content(self, *args, **kw)
def _make_multipart(self, subtype, disallowed_subtypes, boundary):
if self.get_content_maintype() == 'multipart':
existing_subtype = self.get_content_subtype()
disallowed_subtypes = disallowed_subtypes + (subtype,)
if existing_subtype in disallowed_subtypes:
raise ValueError("Cannot convert {} to {}".format(
existing_subtype, subtype))
keep_headers = []
part_headers = []
for name, value in self._headers:
if name.lower().startswith('content-'):
part_headers.append((name, value))
else:
keep_headers.append((name, value))
if part_headers:
# There is existing content, move it to the first subpart.
part = type(self)(policy=self.policy)
part._headers = part_headers
part._payload = self._payload
self._payload = [part]
else:
self._payload = []
self._headers = keep_headers
self['Content-Type'] = 'multipart/' + subtype
if boundary is not None:
self.set_param('boundary', boundary)
def make_related(self, boundary=None):
self._make_multipart('related', ('alternative', 'mixed'), boundary)
def make_alternative(self, boundary=None):
self._make_multipart('alternative', ('mixed',), boundary)
def make_mixed(self, boundary=None):
self._make_multipart('mixed', (), boundary)
def _add_multipart(self, _subtype, *args, _disp=None, **kw):
if (self.get_content_maintype() != 'multipart' or
self.get_content_subtype() != _subtype):
getattr(self, 'make_' + _subtype)()
part = type(self)(policy=self.policy)
part.set_content(*args, **kw)
if _disp and 'content-disposition' not in part:
part['Content-Disposition'] = _disp
self.attach(part)
def add_related(self, *args, **kw):
self._add_multipart('related', *args, _disp='inline', **kw)
def add_alternative(self, *args, **kw):
self._add_multipart('alternative', *args, **kw)
def add_attachment(self, *args, **kw):
self._add_multipart('mixed', *args, _disp='attachment', **kw)
def clear(self):
self._headers = []
self._payload = None
def clear_content(self):
self._headers = [(n, v) for n, v in self._headers
if not n.lower().startswith('content-')]
self._payload = None
class EmailMessage(MIMEPart):
def set_content(self, *args, **kw):
super().set_content(*args, **kw)
if 'MIME-Version' not in self:
self['MIME-Version'] = '1.0'
......@@ -5,6 +5,7 @@ code that adds all the email6 features.
from email._policybase import Policy, Compat32, compat32, _extend_docstrings
from email.utils import _has_surrogates
from email.headerregistry import HeaderRegistry as HeaderRegistry
from email.contentmanager import raw_data_manager
__all__ = [
'Compat32',
......@@ -58,10 +59,22 @@ class EmailPolicy(Policy):
special treatment, while all other fields are
treated as unstructured. This list will be
completed before the extension is marked stable.)
content_manager -- an object with at least two methods: get_content
and set_content. When the get_content or
set_content method of a Message object is called,
it calls the corresponding method of this object,
passing it the message object as its first argument,
and any arguments or keywords that were passed to
it as additional arguments. The default
content_manager is
:data:`~email.contentmanager.raw_data_manager`.
"""
refold_source = 'long'
header_factory = HeaderRegistry()
content_manager = raw_data_manager
def __init__(self, **kw):
# Ensure that each new instance gets a unique header factory
......
......@@ -68,9 +68,13 @@ def _has_surrogates(s):
# How to deal with a string containing bytes before handing it to the
# application through the 'normal' interface.
def _sanitize(string):
# Turn any escaped bytes into unicode 'unknown' char.
original_bytes = string.encode('ascii', 'surrogateescape')
return original_bytes.decode('ascii', 'replace')
# Turn any escaped bytes into unicode 'unknown' char. If the escaped
# bytes happen to be utf-8 they will instead get decoded, even if they
# were invalid in the charset the source was supposed to be in. This
# seems like it is not a bad thing; a defect was still registered.
original_bytes = string.encode('utf-8', 'surrogateescape')
return original_bytes.decode('utf-8', 'replace')
# Helpers
......
......@@ -2,6 +2,7 @@ import os
import sys
import unittest
import test.support
import collections
import email
from email.message import Message
from email._policybase import compat32
......@@ -42,6 +43,8 @@ class TestEmailBase(unittest.TestCase):
# here we make minimal changes in the test_email tests compared to their
# pre-3.3 state.
policy = compat32
# Likewise, the default message object is Message.
message = Message
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
......@@ -54,11 +57,23 @@ class TestEmailBase(unittest.TestCase):
with openfile(filename) as fp:
return email.message_from_file(fp, policy=self.policy)
def _str_msg(self, string, message=Message, policy=None):
def _str_msg(self, string, message=None, policy=None):
if policy is None:
policy = self.policy
if message is None:
message = self.message
return email.message_from_string(string, message, policy=policy)
def _bytes_msg(self, bytestring, message=None, policy=None):
if policy is None:
policy = self.policy
if message is None:
message = self.message
return email.message_from_bytes(bytestring, message, policy=policy)
def _make_message(self):
return self.message(policy=self.policy)
def _bytes_repr(self, b):
return [repr(x) for x in b.splitlines(keepends=True)]
......@@ -123,6 +138,7 @@ def parameterize(cls):
"""
paramdicts = {}
testers = collections.defaultdict(list)
for name, attr in cls.__dict__.items():
if name.endswith('_params'):
if not hasattr(attr, 'keys'):
......@@ -134,7 +150,15 @@ def parameterize(cls):
d[n] = x
attr = d
paramdicts[name[:-7] + '_as_'] = attr
if '_as_' in name:
testers[name.split('_as_')[0] + '_as_'].append(name)
testfuncs = {}
for name in paramdicts:
if name not in testers:
raise ValueError("No tester found for {}".format(name))
for name in testers:
if name not in paramdicts:
raise ValueError("No params found for {}".format(name))
for name, attr in cls.__dict__.items():
for paramsname, paramsdict in paramdicts.items():
if name.startswith(paramsname):
......
This diff is collapsed.
......@@ -661,7 +661,7 @@ class TestContentTypeHeader(TestHeaderBase):
'text/plain; name="ascii_is_the_default"'),
'rfc2231_bad_character_in_charset_parameter_value': (
"text/plain; charset*=ascii''utf-8%E2%80%9D",
"text/plain; charset*=ascii''utf-8%F1%F2%F3",
'text/plain',
'text',
'plain',
......@@ -669,6 +669,18 @@ class TestContentTypeHeader(TestHeaderBase):
[errors.UndecodableBytesDefect],
'text/plain; charset="utf-8\uFFFD\uFFFD\uFFFD"'),
'rfc2231_utf_8_in_supposedly_ascii_charset_parameter_value': (
"text/plain; charset*=ascii''utf-8%E2%80%9D",
'text/plain',
'text',
'plain',
{'charset': 'utf-8”'},
[errors.UndecodableBytesDefect],
'text/plain; charset="utf-8”"',
),
# XXX: if the above were *re*folded, it would get tagged as utf-8
# instead of ascii in the param, since it now contains non-ASCII.
'rfc2231_encoded_then_unencoded_segments': (
('application/x-foo;'
'\tname*0*="us-ascii\'en-us\'My";'
......
This diff is collapsed.
......@@ -30,6 +30,7 @@ class PolicyAPITests(unittest.TestCase):
'raise_on_defect': False,
'header_factory': email.policy.EmailPolicy.header_factory,
'refold_source': 'long',
'content_manager': email.policy.EmailPolicy.content_manager,
})
# For each policy under test, we give here what we expect the defaults to
......
......@@ -42,6 +42,9 @@ Core and Builtins
Library
-------
- Issue #18891: Completed the new email package (provisional) API additions
by adding new classes EmailMessage, MIMEPart, and ContentManager.
- Issue #18468: The re.split, re.findall, and re.sub functions and the group()
and groups() methods of match object now always return a string or a bytes
object.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment