Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
8451c4b6
Commit
8451c4b6
authored
Oct 23, 2010
by
R. David Murray
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
#1349106: add linesep argument to generator.flatten and header.encode.
parent
29aad000
Changes
7
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
164 additions
and
49 deletions
+164
-49
Doc/library/email.generator.rst
Doc/library/email.generator.rst
+73
-11
Doc/library/email.header.rst
Doc/library/email.header.rst
+8
-1
Lib/email/generator.py
Lib/email/generator.py
+43
-31
Lib/email/header.py
Lib/email/header.py
+12
-4
Lib/email/test/data/msg_26.txt
Lib/email/test/data/msg_26.txt
+2
-1
Lib/email/test/test_email.py
Lib/email/test/test_email.py
+23
-1
Misc/NEWS
Misc/NEWS
+3
-0
No files found.
Doc/library/email.generator.rst
View file @
8451c4b6
...
...
@@ -56,7 +56,7 @@ Here are the public methods of the :class:`Generator` class, imported from the
The other public :class:`Generator` methods are:
.. method:: flatten(msg, unixfrom=False)
.. method:: flatten(msg, unixfrom=False
, linesep='\\n'
)
Print the textual representation of the message object structure rooted at
*msg* to the output file specified when the :class:`Generator` instance
...
...
@@ -71,12 +71,20 @@ Here are the public methods of the :class:`Generator` class, imported from the
Note that for subparts, no envelope header is ever printed.
Optional *linesep* specifies the line separator character used to
terminate lines in the output. It defaults to ``\n`` because that is
the most useful value for Python application code (other library packages
expect ``\n`` separated lines). ``linesep=\r\n`` can be used to
generate output with RFC-compliant line separators.
Messages parsed with a Bytes parser that have a
:mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a
use a 7bit Content-Transfer-Encoding. Any other non-ASCII bytes in the
message structure will be converted to '?' characters.
.. versionchanged:: 3.2 added support for re-encoding 8bit message bodies.
.. versionchanged:: 3.2
added support for re-encoding 8bit message bodies, and the linesep
argument
.. method:: clone(fp)
...
...
@@ -97,16 +105,70 @@ formatted string representation of a message object. For more detail, see
.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78)
This class has the same API as the :class:`Generator` class, except that
*outfp* must be a file like object that will accept :class`bytes` input to
its ``write`` method. If the message object structure contains non-ASCII
bytes, this generator's :meth:`~BytesGenerator.flatten` method will produce
them as-is, including preserving parts with a
:mailheader:`Content-Transfer-Encoding` of ``8bit``.
The constructor for the :class:`BytesGenerator` class takes a binary
:term:`file-like object` called *outfp* for an argument. *outfp* must
support a :meth:`write` method that accepts binary data.
Optional *mangle_from_* is a flag that, when ``True``, puts a ``>``
character in front of any line in the body that starts exactly as ``From``,
i.e. ``From`` followed by a space at the beginning of the line. This is the
only guaranteed portable way to avoid having such lines be mistaken for a
Unix mailbox format envelope header separator (see `WHY THE CONTENT-LENGTH
FORMAT IS BAD <http://www.jwz.org/doc/content-length.html>`_ for details).
*mangle_from_* defaults to ``True``, but you might want to set this to
``False`` if you are not writing Unix mailbox format files.
Optional *maxheaderlen* specifies the longest length for a non-continued
header. When a header line is longer than *maxheaderlen* (in characters,
with tabs expanded to 8 spaces), the header will be split as defined in the
:class:`~email.header.Header` class. Set to zero to disable header
wrapping. The default is 78, as recommended (but not required) by
:rfc:`2822`.
The other public :class:`BytesGenerator` methods are:
.. method:: flatten(msg, unixfrom=False, linesep='\n')
Print the textual representation of the message object structure rooted
at *msg* to the output file specified when the :class:`BytesGenerator`
instance was created. Subparts are visited depth-first and the resulting
text will be properly MIME encoded. If the input that created the *msg*
contained bytes with the high bit set and those bytes have not been
modified, they will be copied faithfully to the output, even if doing so
is not strictly RFC compliant. (To produce strictly RFC compliant
output, use the :class:`Generator` class.)
Messages parsed with a Bytes parser that have a
:mailheader:`Content-Transfer-Encoding` of 8bit will be reconstructed
as 8bit if they have not been modified.
Optional *unixfrom* is a flag that forces the printing of the envelope
header delimiter before the first :rfc:`2822` header of the root message
object. If the root object has no envelope header, a standard one is
crafted. By default, this is set to ``False`` to inhibit the printing of
the envelope delimiter.
Note that for subparts, no envelope header is ever printed.
Optional *linesep* specifies the line separator character used to
terminate lines in the output. It defaults to ``\n`` because that is
the most useful value for Python application code (other library packages
expect ``\n`` separated lines). ``linesep=\r\n`` can be used to
generate output with RFC-compliant line separators.
.. method:: clone(fp)
Return an independent clone of this :class:`BytesGenerator` instance with
the exact same options.
.. method:: write(s)
Note that even the :meth:`write` method API is identical: it expects
strings as input, and converts them to bytes by encoding them using
the ASCII codec.
Write the string *s* to the underlying file object. *s* is encoded using
the ``ASCII`` codec and written to the *write* method of the *outfp*
*outfp* passed to the :class:`BytesGenerator`'s constructor. This
provides just enough file-like API for :class:`BytesGenerator` instances
to be used in the :func:`print` function.
.. versionadded:: 3.2
...
...
Doc/library/email.header.rst
View file @
8451c4b6
...
...
@@ -104,7 +104,7 @@ Here is the :class:`Header` class description:
:func:`ustr.encode` call, and defaults to "strict".
.. method:: encode(splitchars=';, \\t', maxlinelen=None)
.. method:: encode(splitchars=';, \\t', maxlinelen=None
, linesep='\\n'
)
Encode a message header into an RFC-compliant format, possibly wrapping
long lines and encapsulating non-ASCII parts in base64 or quoted-printable
...
...
@@ -115,6 +115,13 @@ Here is the :class:`Header` class description:
*maxlinelen*, if given, overrides the instance's value for the maximum
line length.
*linesep* specifies the characters used to separate the lines of the
folded header. It defaults to the most useful value for Python
application code (``\n``), but ``\r\n`` can be specified in order
to produce headers with RFC-compliant line separators.
.. versionchanged:: 3.2 added the linesep argument
The :class:`Header` class also provides a number of methods to support
standard operators and built-in functions.
...
...
Lib/email/generator.py
View file @
8451c4b6
...
...
@@ -17,7 +17,7 @@ from email.header import Header
from
email.message
import
_has_surrogates
UNDERSCORE
=
'_'
NL
=
'
\
n
'
NL
=
'
\
n
'
# XXX: no longer used by the code below.
fcre
=
re
.
compile
(
r'^From '
,
re
.
MULTILINE
)
...
...
@@ -58,7 +58,7 @@ class Generator:
# Just delegate to the file object
self
.
_fp
.
write
(
s
)
def
flatten
(
self
,
msg
,
unixfrom
=
False
):
def
flatten
(
self
,
msg
,
unixfrom
=
False
,
linesep
=
'
\
n
'
):
"""Print the message object tree rooted at msg to the output file
specified when the Generator instance was created.
...
...
@@ -68,12 +68,23 @@ class Generator:
is False to inhibit the printing of any From_ delimiter.
Note that for subobjects, no From_ line is printed.
linesep specifies the characters used to indicate a new line in
the output.
"""
# We use the _XXX constants for operating on data that comes directly
# from the msg, and _encoded_XXX constants for operating on data that
# has already been converted (to bytes in the BytesGenerator) and
# inserted into a temporary buffer.
self
.
_NL
=
linesep
self
.
_encoded_NL
=
self
.
_encode
(
linesep
)
self
.
_EMPTY
=
''
self
.
_encoded_EMTPY
=
self
.
_encode
(
''
)
if
unixfrom
:
ufrom
=
msg
.
get_unixfrom
()
if
not
ufrom
:
ufrom
=
'From nobody '
+
time
.
ctime
(
time
.
time
())
self
.
write
(
ufrom
+
NL
)
self
.
write
(
ufrom
+
self
.
_
NL
)
self
.
_write
(
msg
)
def
clone
(
self
,
fp
):
...
...
@@ -93,20 +104,18 @@ class Generator:
# it has already transformed the input; but, since this whole thing is a
# hack anyway this seems good enough.
# We use these class constants when we need to manipulate data that has
# already been written to a buffer (ex: constructing a re to check the
# boundary), and the module level NL constant when adding new output to a
# buffer via self.write, because 'write' always takes strings.
# Having write always take strings makes the code simpler, but there are
# a few occasions when we need to write previously created data back
# to the buffer or to a new buffer; for those cases we use self._fp.write.
_NL
=
NL
_EMPTY
=
''
# Similarly, we have _XXX and _encoded_XXX attributes that are used on
# source and buffer data, respectively.
_encoded_EMPTY
=
''
def
_new_buffer
(
self
):
# BytesGenerator overrides this to return BytesIO.
return
StringIO
()
def
_encode
(
self
,
s
):
# BytesGenerator overrides this to encode strings to bytes.
return
s
def
_write
(
self
,
msg
):
# We can't write the headers yet because of the following scenario:
# say a multipart message includes the boundary string somewhere in
...
...
@@ -158,14 +167,15 @@ class Generator:
for
h
,
v
in
msg
.
items
():
self
.
write
(
'%s: '
%
h
)
if
isinstance
(
v
,
Header
):
self
.
write
(
v
.
encode
(
maxlinelen
=
self
.
_maxheaderlen
)
+
NL
)
self
.
write
(
v
.
encode
(
maxlinelen
=
self
.
_maxheaderlen
,
linesep
=
self
.
_NL
)
+
self
.
_NL
)
else
:
# Header's got lots of smarts, so use it.
header
=
Header
(
v
,
maxlinelen
=
self
.
_maxheaderlen
,
header_name
=
h
)
self
.
write
(
header
.
encode
(
)
+
NL
)
self
.
write
(
header
.
encode
(
linesep
=
self
.
_NL
)
+
self
.
_
NL
)
# A blank line always separates headers from body
self
.
write
(
NL
)
self
.
write
(
self
.
_
NL
)
#
# Handlers for writing types and subtypes
...
...
@@ -208,11 +218,11 @@ class Generator:
for
part
in
subparts
:
s
=
self
.
_new_buffer
()
g
=
self
.
clone
(
s
)
g
.
flatten
(
part
,
unixfrom
=
False
)
g
.
flatten
(
part
,
unixfrom
=
False
,
linesep
=
self
.
_NL
)
msgtexts
.
append
(
s
.
getvalue
())
# Now make sure the boundary we've selected doesn't appear in any of
# the message texts.
alltext
=
self
.
_NL
.
join
(
msgtexts
)
alltext
=
self
.
_
encoded_
NL
.
join
(
msgtexts
)
# BAW: What about boundaries that are wrapped in double-quotes?
boundary
=
msg
.
get_boundary
(
failobj
=
self
.
_make_boundary
(
alltext
))
# If we had to calculate a new boundary because the body text
...
...
@@ -225,9 +235,9 @@ class Generator:
msg
.
set_boundary
(
boundary
)
# If there's a preamble, write it out, with a trailing CRLF
if
msg
.
preamble
is
not
None
:
self
.
write
(
msg
.
preamble
+
NL
)
self
.
write
(
msg
.
preamble
+
self
.
_
NL
)
# dash-boundary transport-padding CRLF
self
.
write
(
'--'
+
boundary
+
NL
)
self
.
write
(
'--'
+
boundary
+
self
.
_
NL
)
# body-part
if
msgtexts
:
self
.
_fp
.
write
(
msgtexts
.
pop
(
0
))
...
...
@@ -236,13 +246,13 @@ class Generator:
# --> CRLF body-part
for
body_part
in
msgtexts
:
# delimiter transport-padding CRLF
self
.
write
(
'
\
n
--'
+
boundary
+
NL
)
self
.
write
(
self
.
_NL
+
'--'
+
boundary
+
self
.
_
NL
)
# body-part
self
.
_fp
.
write
(
body_part
)
# close-delimiter transport-padding
self
.
write
(
'
\
n
--'
+
boundary
+
'--'
)
self
.
write
(
self
.
_NL
+
'
--'
+
boundary
+
'--'
)
if
msg
.
epilogue
is
not
None
:
self
.
write
(
NL
)
self
.
write
(
self
.
_
NL
)
self
.
write
(
msg
.
epilogue
)
def
_handle_multipart_signed
(
self
,
msg
):
...
...
@@ -266,16 +276,16 @@ class Generator:
g
=
self
.
clone
(
s
)
g
.
flatten
(
part
,
unixfrom
=
False
)
text
=
s
.
getvalue
()
lines
=
text
.
split
(
self
.
_NL
)
lines
=
text
.
split
(
self
.
_
encoded_
NL
)
# Strip off the unnecessary trailing empty line
if
lines
and
lines
[
-
1
]
==
self
.
_EMPTY
:
blocks
.
append
(
self
.
_NL
.
join
(
lines
[:
-
1
]))
if
lines
and
lines
[
-
1
]
==
self
.
_
encoded_
EMPTY
:
blocks
.
append
(
self
.
_
encoded_
NL
.
join
(
lines
[:
-
1
]))
else
:
blocks
.
append
(
text
)
# Now join all the blocks with an empty line. This has the lovely
# effect of separating each block with an empty line, but not adding
# an extra one after the last one.
self
.
_fp
.
write
(
self
.
_NL
.
join
(
blocks
))
self
.
_fp
.
write
(
self
.
_
encoded_
NL
.
join
(
blocks
))
def
_handle_message
(
self
,
msg
):
s
=
self
.
_new_buffer
()
...
...
@@ -333,10 +343,9 @@ class BytesGenerator(Generator):
The outfp object must accept bytes in its write method.
"""
# Bytes versions of th
ese constants
for use in manipulating data from
# Bytes versions of th
is constant
for use in manipulating data from
# the BytesIO buffer.
_NL
=
NL
.
encode
(
'ascii'
)
_EMPTY
=
b''
_encoded_EMPTY
=
b''
def
write
(
self
,
s
):
self
.
_fp
.
write
(
s
.
encode
(
'ascii'
,
'surrogateescape'
))
...
...
@@ -344,6 +353,9 @@ class BytesGenerator(Generator):
def
_new_buffer
(
self
):
return
BytesIO
()
def
_encode
(
self
,
s
):
return
s
.
encode
(
'ascii'
)
def
_write_headers
(
self
,
msg
):
# This is almost the same as the string version, except for handling
# strings with 8bit bytes.
...
...
@@ -363,9 +375,9 @@ class BytesGenerator(Generator):
# Header's got lots of smarts and this string is safe...
header
=
Header
(
v
,
maxlinelen
=
self
.
_maxheaderlen
,
header_name
=
h
)
self
.
write
(
header
.
encode
(
)
+
NL
)
self
.
write
(
header
.
encode
(
linesep
=
self
.
_NL
)
+
self
.
_
NL
)
# A blank line always separates headers from body
self
.
write
(
NL
)
self
.
write
(
self
.
_
NL
)
def
_handle_text
(
self
,
msg
):
# If the string has surrogates the original source was bytes, so
...
...
Lib/email/header.py
View file @
8451c4b6
...
...
@@ -272,7 +272,7 @@ class Header:
output_string
=
input_bytes
.
decode
(
output_charset
,
errors
)
self
.
_chunks
.
append
((
output_string
,
charset
))
def
encode
(
self
,
splitchars
=
';,
\
t
'
,
maxlinelen
=
None
):
def
encode
(
self
,
splitchars
=
';,
\
t
'
,
maxlinelen
=
None
,
linesep
=
'
\
n
'
):
"""Encode a message header into an RFC-compliant format.
There are many issues involved in converting a given string for use in
...
...
@@ -293,6 +293,11 @@ class Header:
Optional splitchars is a string containing characters to split long
ASCII lines on, in rough support of RFC 2822's `highest level
syntactic breaks'. This doesn't affect RFC 2047 encoded lines.
Optional linesep is a string to be used to separate the lines of
the value. The default value is the most useful for typical
Python applications, but it can be set to
\
r
\
n
to produce RFC-compliant
line separators when needed.
"""
self
.
_normalize
()
if
maxlinelen
is
None
:
...
...
@@ -311,7 +316,7 @@ class Header:
if
len
(
lines
)
>
1
:
formatter
.
newline
()
formatter
.
add_transition
()
return
str
(
formatter
)
return
formatter
.
_str
(
linesep
)
def
_normalize
(
self
):
# Step 1: Normalize the chunks so that all runs of identical charsets
...
...
@@ -342,9 +347,12 @@ class _ValueFormatter:
self
.
_lines
=
[]
self
.
_current_line
=
_Accumulator
(
headerlen
)
def
_
_str__
(
self
):
def
_
str
(
self
,
linesep
):
self
.
newline
()
return
NL
.
join
(
self
.
_lines
)
return
linesep
.
join
(
self
.
_lines
)
def
__str__
(
self
):
return
self
.
_str
(
NL
)
def
newline
(
self
):
end_of_line
=
self
.
_current_line
.
pop
()
...
...
Lib/email/test/data/msg_26.txt
View file @
8451c4b6
...
...
@@ -24,7 +24,8 @@ Simple email with attachment.
--1618492860--2051301190--113853680
Content-Type: application/riscos; name="clock.bmp,69c"; type=BMP; load=&fff69c4b; exec=&355dd4d1; access=&03
Content-Type: application/riscos; name="clock.bmp,69c"; type=BMP;
load=&fff69c4b; exec=&355dd4d1; access=&03
Content-Disposition: attachment; filename="clock.bmp"
Content-Transfer-Encoding: base64
...
...
Lib/email/test/test_email.py
View file @
8451c4b6
...
...
@@ -77,7 +77,7 @@ class TestMessageAPI(TestEmailBase):
eq
(
msg
.
get_all
(
'cc'
),
[
'ccc@zzz.org'
,
'ddd@zzz.org'
,
'eee@zzz.org'
])
eq
(
msg
.
get_all
(
'xx'
,
'n/a'
),
'n/a'
)
def
te
st_getset_charset
(
self
):
def
TE
st_getset_charset
(
self
):
eq
=
self
.
assertEqual
msg
=
Message
()
eq
(
msg
.
get_charset
(),
None
)
...
...
@@ -2600,6 +2600,18 @@ Here's the message body
part2
=
msg
.
get_payload
(
1
)
eq
(
part2
.
get_content_type
(),
'application/riscos'
)
def
test_crlf_flatten
(
self
):
# Using newline='\n' preserves the crlfs in this input file.
with
openfile
(
'msg_26.txt'
,
newline
=
'
\
n
'
)
as
fp
:
text
=
fp
.
read
()
msg
=
email
.
message_from_string
(
text
)
s
=
StringIO
()
g
=
Generator
(
s
)
g
.
flatten
(
msg
,
linesep
=
'
\
r
\
n
'
)
self
.
assertEqual
(
s
.
getvalue
(),
text
)
maxDiff
=
None
def
test_multipart_digest_with_extra_mime_headers
(
self
):
eq
=
self
.
assertEqual
neq
=
self
.
ndiffAssertEqual
...
...
@@ -2931,6 +2943,16 @@ class Test8BitBytesHandling(unittest.TestCase):
m
=
bfp
.
close
()
self
.
assertEqual
(
str
(
m
),
self
.
latin_bin_msg_as7bit
)
def
test_crlf_flatten
(
self
):
with
openfile
(
'msg_26.txt'
,
'rb'
)
as
fp
:
text
=
fp
.
read
()
msg
=
email
.
message_from_bytes
(
text
)
s
=
BytesIO
()
g
=
email
.
generator
.
BytesGenerator
(
s
)
g
.
flatten
(
msg
,
linesep
=
'
\
r
\
n
'
)
self
.
assertEqual
(
s
.
getvalue
(),
text
)
maxDiff
=
None
class
TestBytesGeneratorIdempotent
(
TestIdempotent
):
...
...
Misc/NEWS
View file @
8451c4b6
...
...
@@ -48,6 +48,9 @@ Core and Builtins
Library
-------
- Issue #1349106: Generator (and BytesGenerator) flatten method and Header
encode method now support a 'linesep' argument.
- Issue #5639: Add a *server_hostname* argument to ``SSLContext.wrap_socket``
in order to support the TLS SNI extension. ``HTTPSConnection`` and
``urlopen()`` also use this argument, so that HTTPS virtual hosts are now
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment