Commit 43851a20 authored by Stefan Behnel's avatar Stefan Behnel Committed by GitHub

bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree. (#12883)

* bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree.

* bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilder in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default).
parent 3d37ea25
......@@ -523,8 +523,9 @@ Functions
Parses an XML section into an element tree incrementally, and reports what's
going on to the user. *source* is a filename or :term:`file object`
containing XML data. *events* is a sequence of events to report back. The
supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
``"end-ns"`` (the "ns" events are used to get detailed namespace
supported events are the strings ``"start"``, ``"end"``, ``"comment"``,
``"pi"``, ``"start-ns"`` and ``"end-ns"``
(the "ns" events are used to get detailed namespace
information). If *events* is omitted, only ``"end"`` events are reported.
*parser* is an optional parser instance. If not given, the standard
:class:`XMLParser` parser is used. *parser* must be a subclass of
......@@ -549,6 +550,10 @@ Functions
.. deprecated:: 3.4
The *parser* argument.
.. versionchanged:: 3.8
The ``comment`` and ``pi`` events were added.
.. function:: parse(source, parser=None)
Parses an XML section into an element tree. *source* is a filename or file
......@@ -1021,14 +1026,24 @@ TreeBuilder Objects
^^^^^^^^^^^^^^^^^^^
.. class:: TreeBuilder(element_factory=None)
.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \
pi_factory=None, insert_comments=False, insert_pis=False)
Generic element structure builder. This builder converts a sequence of
start, data, and end method calls to a well-formed element structure. You
can use this class to build an element structure using a custom XML parser,
or a parser for some other XML-like format. *element_factory*, when given,
must be a callable accepting two positional arguments: a tag and
a dict of attributes. It is expected to return a new element instance.
start, data, end, comment and pi method calls to a well-formed element
structure. You can use this class to build an element structure using
a custom XML parser, or a parser for some other XML-like format.
*element_factory*, when given, must be a callable accepting two positional
arguments: a tag and a dict of attributes. It is expected to return a new
element instance.
The *comment_factory* and *pi_factory* functions, when given, should behave
like the :func:`Comment` and :func:`ProcessingInstruction` functions to
create comments and processing instructions. When not given, the default
factories will be used. When *insert_comments* and/or *insert_pis* is true,
comments/pis will be inserted into the tree if they appear within the root
element (but not outside of it).
.. method:: close()
......@@ -1054,6 +1069,22 @@ TreeBuilder Objects
containing element attributes. Returns the opened element.
.. method:: comment(text)
Creates a comment with the given *text*. If ``insert_comments`` is true,
this will also add it to the tree.
.. versionadded:: 3.8
.. method:: pi(target, text)
Creates a comment with the given *target* name and *text*. If
``insert_pis`` is true, this will also add it to the tree.
.. versionadded:: 3.8
In addition, a custom :class:`TreeBuilder` object can provide the
following method:
......@@ -1150,9 +1181,9 @@ XMLPullParser Objects
callback target, :class:`XMLPullParser` collects an internal list of parsing
events and lets the user read from it. *events* is a sequence of events to
report back. The supported events are the strings ``"start"``, ``"end"``,
``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
namespace information). If *events* is omitted, only ``"end"`` events are
reported.
``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events
are used to get detailed namespace information). If *events* is omitted,
only ``"end"`` events are reported.
.. method:: feed(data)
......@@ -1171,7 +1202,13 @@ XMLPullParser Objects
data fed to the
parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
string representing the type of event (e.g. ``"end"``) and *elem* is the
encountered :class:`Element` object.
encountered :class:`Element` object, or other context value as follows.
* ``start``, ``end``: the current Element.
* ``comment``, ``pi``: the current comment / processing instruction
* ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace
mapping.
* ``end-ns``: :const:`None` (this may change in a future version)
Events provided in a previous call to :meth:`read_events` will not be
yielded again. Events are consumed from the internal queue only when
......@@ -1191,6 +1228,10 @@ XMLPullParser Objects
.. versionadded:: 3.4
.. versionchanged:: 3.8
The ``comment`` and ``pi`` events were added.
Exceptions
^^^^^^^^^^
......
......@@ -1194,6 +1194,12 @@ class XMLPullParserTest(unittest.TestCase):
for i in range(0, len(data), chunk_size):
parser.feed(data[i:i+chunk_size])
def assert_events(self, parser, expected):
self.assertEqual(
[(event, (elem.tag, elem.text))
for event, elem in parser.read_events()],
expected)
def assert_event_tags(self, parser, expected):
events = parser.read_events()
self.assertEqual([(action, elem.tag) for action, elem in events],
......@@ -1276,8 +1282,10 @@ class XMLPullParserTest(unittest.TestCase):
self.assert_event_tags(parser, [])
parser = ET.XMLPullParser(events=('start', 'end'))
self._feed(parser, "<!-- comment -->\n")
self.assert_event_tags(parser, [])
self._feed(parser, "<!-- text here -->\n")
self.assert_events(parser, [])
parser = ET.XMLPullParser(events=('start', 'end'))
self._feed(parser, "<root>\n")
self.assert_event_tags(parser, [('start', 'root')])
self._feed(parser, "<element key='value'>text</element")
......@@ -1314,6 +1322,33 @@ class XMLPullParserTest(unittest.TestCase):
self._feed(parser, "</root>")
self.assertIsNone(parser.close())
def test_events_comment(self):
parser = ET.XMLPullParser(events=('start', 'comment', 'end'))
self._feed(parser, "<!-- text here -->\n")
self.assert_events(parser, [('comment', (ET.Comment, ' text here '))])
self._feed(parser, "<!-- more text here -->\n")
self.assert_events(parser, [('comment', (ET.Comment, ' more text here '))])
self._feed(parser, "<root-tag>text")
self.assert_event_tags(parser, [('start', 'root-tag')])
self._feed(parser, "<!-- inner comment-->\n")
self.assert_events(parser, [('comment', (ET.Comment, ' inner comment'))])
self._feed(parser, "</root-tag>\n")
self.assert_event_tags(parser, [('end', 'root-tag')])
self._feed(parser, "<!-- outer comment -->\n")
self.assert_events(parser, [('comment', (ET.Comment, ' outer comment '))])
parser = ET.XMLPullParser(events=('comment',))
self._feed(parser, "<!-- text here -->\n")
self.assert_events(parser, [('comment', (ET.Comment, ' text here '))])
def test_events_pi(self):
parser = ET.XMLPullParser(events=('start', 'pi', 'end'))
self._feed(parser, "<?pitarget?>\n")
self.assert_events(parser, [('pi', (ET.PI, 'pitarget'))])
parser = ET.XMLPullParser(events=('pi',))
self._feed(parser, "<?pitarget some text ?>\n")
self.assert_events(parser, [('pi', (ET.PI, 'pitarget some text '))])
def test_events_sequence(self):
# Test that events can be some sequence that's not just a tuple or list
eventset = {'end', 'start'}
......@@ -1333,7 +1368,6 @@ class XMLPullParserTest(unittest.TestCase):
self._feed(parser, "<foo>bar</foo>")
self.assert_event_tags(parser, [('start', 'foo'), ('end', 'foo')])
def test_unknown_event(self):
with self.assertRaises(ValueError):
ET.XMLPullParser(events=('start', 'end', 'bogus'))
......@@ -2741,6 +2775,33 @@ class TreeBuilderTest(unittest.TestCase):
parser.feed(self.sample1)
self.assertIsNone(parser.close())
def test_treebuilder_comment(self):
b = ET.TreeBuilder()
self.assertEqual(b.comment('ctext').tag, ET.Comment)
self.assertEqual(b.comment('ctext').text, 'ctext')
b = ET.TreeBuilder(comment_factory=ET.Comment)
self.assertEqual(b.comment('ctext').tag, ET.Comment)
self.assertEqual(b.comment('ctext').text, 'ctext')
b = ET.TreeBuilder(comment_factory=len)
self.assertEqual(b.comment('ctext'), len('ctext'))
def test_treebuilder_pi(self):
b = ET.TreeBuilder()
self.assertEqual(b.pi('target', None).tag, ET.PI)
self.assertEqual(b.pi('target', None).text, 'target')
b = ET.TreeBuilder(pi_factory=ET.PI)
self.assertEqual(b.pi('target').tag, ET.PI)
self.assertEqual(b.pi('target').text, "target")
self.assertEqual(b.pi('pitarget', ' text ').tag, ET.PI)
self.assertEqual(b.pi('pitarget', ' text ').text, "pitarget text ")
b = ET.TreeBuilder(pi_factory=lambda target, text: (len(target), text))
self.assertEqual(b.pi('target'), (len('target'), None))
self.assertEqual(b.pi('pitarget', ' text '), (len('pitarget'), ' text '))
def test_treebuilder_elementfactory_none(self):
parser = ET.XMLParser(target=ET.TreeBuilder(element_factory=None))
parser.feed(self.sample1)
......@@ -2761,6 +2822,21 @@ class TreeBuilderTest(unittest.TestCase):
e = parser.close()
self._check_sample1_element(e)
def test_subclass_comment_pi(self):
class MyTreeBuilder(ET.TreeBuilder):
def foobar(self, x):
return x * 2
tb = MyTreeBuilder(comment_factory=ET.Comment, pi_factory=ET.PI)
self.assertEqual(tb.foobar(10), 20)
parser = ET.XMLParser(target=tb)
parser.feed(self.sample1)
parser.feed('<!-- a comment--><?and a pi?>')
e = parser.close()
self._check_sample1_element(e)
def test_element_factory(self):
lst = []
def myfactory(tag, attrib):
......@@ -3418,6 +3494,12 @@ def test_main(module=None):
# Copy the path cache (should be empty)
path_cache = ElementPath._cache
ElementPath._cache = path_cache.copy()
# Align the Comment/PI factories.
if hasattr(ET, '_set_factories'):
old_factories = ET._set_factories(ET.Comment, ET.PI)
else:
old_factories = None
try:
support.run_unittest(*test_classes)
finally:
......@@ -3426,6 +3508,8 @@ def test_main(module=None):
nsmap.clear()
nsmap.update(nsmap_copy)
ElementPath._cache = path_cache
if old_factories is not None:
ET._set_factories(*old_factories)
# don't interfere with subsequent tests
ET = pyET = None
......
......@@ -1374,12 +1374,30 @@ class TreeBuilder:
*element_factory* is an optional element factory which is called
to create new Element instances, as necessary.
*comment_factory* is a factory to create comments to be used instead of
the standard factory. If *insert_comments* is false (the default),
comments will not be inserted into the tree.
*pi_factory* is a factory to create processing instructions to be used
instead of the standard factory. If *insert_pis* is false (the default),
processing instructions will not be inserted into the tree.
"""
def __init__(self, element_factory=None):
def __init__(self, element_factory=None, *,
comment_factory=None, pi_factory=None,
insert_comments=False, insert_pis=False):
self._data = [] # data collector
self._elem = [] # element stack
self._last = None # last element
self._root = None # root element
self._tail = None # true if we're after an end tag
if comment_factory is None:
comment_factory = Comment
self._comment_factory = comment_factory
self.insert_comments = insert_comments
if pi_factory is None:
pi_factory = ProcessingInstruction
self._pi_factory = pi_factory
self.insert_pis = insert_pis
if element_factory is None:
element_factory = Element
self._factory = element_factory
......@@ -1387,8 +1405,8 @@ class TreeBuilder:
def close(self):
"""Flush builder buffers and return toplevel document Element."""
assert len(self._elem) == 0, "missing end tags"
assert self._last is not None, "missing toplevel element"
return self._last
assert self._root is not None, "missing toplevel element"
return self._root
def _flush(self):
if self._data:
......@@ -1417,6 +1435,8 @@ class TreeBuilder:
self._last = elem = self._factory(tag, attrs)
if self._elem:
self._elem[-1].append(elem)
elif self._root is None:
self._root = elem
self._elem.append(elem)
self._tail = 0
return elem
......@@ -1435,6 +1455,33 @@ class TreeBuilder:
self._tail = 1
return self._last
def comment(self, text):
"""Create a comment using the comment_factory.
*text* is the text of the comment.
"""
return self._handle_single(
self._comment_factory, self.insert_comments, text)
def pi(self, target, text=None):
"""Create a processing instruction using the pi_factory.
*target* is the target name of the processing instruction.
*text* is the data of the processing instruction, or ''.
"""
return self._handle_single(
self._pi_factory, self.insert_pis, target, text)
def _handle_single(self, factory, insert, *args):
elem = factory(*args)
if insert:
self._flush()
self._last = elem
if self._elem:
self._elem[-1].append(elem)
self._tail = 1
return elem
# also see ElementTree and TreeBuilder
class XMLParser:
......@@ -1519,6 +1566,15 @@ class XMLParser:
def handler(prefix, event=event_name, append=append):
append((event, None))
parser.EndNamespaceDeclHandler = handler
elif event_name == 'comment':
def handler(text, event=event_name, append=append, self=self):
append((event, self.target.comment(text)))
parser.CommentHandler = handler
elif event_name == 'pi':
def handler(pi_target, data, event=event_name, append=append,
self=self):
append((event, self.target.pi(pi_target, data)))
parser.ProcessingInstructionHandler = handler
else:
raise ValueError("unknown event %r" % event_name)
......@@ -1640,7 +1696,10 @@ try:
# (see tests)
_Element_Py = Element
# Element, SubElement, ParseError, TreeBuilder, XMLParser
# Element, SubElement, ParseError, TreeBuilder, XMLParser, _set_factories
from _elementtree import *
from _elementtree import _set_factories
except ImportError:
pass
else:
_set_factories(Comment, ProcessingInstruction)
The TreeBuilder and XMLPullParser in xml.etree.ElementTree gained support
for parsing comments and processing instructions.
Patch by Stefan Behnel.
This diff is collapsed.
......@@ -635,19 +635,26 @@ exit:
static int
_elementtree_TreeBuilder___init___impl(TreeBuilderObject *self,
PyObject *element_factory);
PyObject *element_factory,
PyObject *comment_factory,
PyObject *pi_factory,
int insert_comments, int insert_pis);
static int
_elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwargs)
{
int return_value = -1;
static const char * const _keywords[] = {"element_factory", NULL};
static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", "insert_comments", "insert_pis", NULL};
static _PyArg_Parser _parser = {NULL, _keywords, "TreeBuilder", 0};
PyObject *argsbuf[1];
PyObject *argsbuf[5];
PyObject * const *fastargs;
Py_ssize_t nargs = PyTuple_GET_SIZE(args);
Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 0;
PyObject *element_factory = NULL;
PyObject *comment_factory = NULL;
PyObject *pi_factory = NULL;
int insert_comments = 0;
int insert_pis = 0;
fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 1, 0, argsbuf);
if (!fastargs) {
......@@ -656,9 +663,76 @@ _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwar
if (!noptargs) {
goto skip_optional_pos;
}
element_factory = fastargs[0];
if (fastargs[0]) {
element_factory = fastargs[0];
if (!--noptargs) {
goto skip_optional_pos;
}
}
skip_optional_pos:
return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory);
if (!noptargs) {
goto skip_optional_kwonly;
}
if (fastargs[1]) {
comment_factory = fastargs[1];
if (!--noptargs) {
goto skip_optional_kwonly;
}
}
if (fastargs[2]) {
pi_factory = fastargs[2];
if (!--noptargs) {
goto skip_optional_kwonly;
}
}
if (fastargs[3]) {
insert_comments = PyObject_IsTrue(fastargs[3]);
if (insert_comments < 0) {
goto exit;
}
if (!--noptargs) {
goto skip_optional_kwonly;
}
}
insert_pis = PyObject_IsTrue(fastargs[4]);
if (insert_pis < 0) {
goto exit;
}
skip_optional_kwonly:
return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory, insert_comments, insert_pis);
exit:
return return_value;
}
PyDoc_STRVAR(_elementtree__set_factories__doc__,
"_set_factories($module, comment_factory, pi_factory, /)\n"
"--\n"
"\n"
"Change the factories used to create comments and processing instructions.\n"
"\n"
"For internal use only.");
#define _ELEMENTTREE__SET_FACTORIES_METHODDEF \
{"_set_factories", (PyCFunction)(void(*)(void))_elementtree__set_factories, METH_FASTCALL, _elementtree__set_factories__doc__},
static PyObject *
_elementtree__set_factories_impl(PyObject *module, PyObject *comment_factory,
PyObject *pi_factory);
static PyObject *
_elementtree__set_factories(PyObject *module, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *return_value = NULL;
PyObject *comment_factory;
PyObject *pi_factory;
if (!_PyArg_CheckPositional("_set_factories", nargs, 2, 2)) {
goto exit;
}
comment_factory = args[0];
pi_factory = args[1];
return_value = _elementtree__set_factories_impl(module, comment_factory, pi_factory);
exit:
return return_value;
......@@ -680,6 +754,48 @@ PyDoc_STRVAR(_elementtree_TreeBuilder_end__doc__,
#define _ELEMENTTREE_TREEBUILDER_END_METHODDEF \
{"end", (PyCFunction)_elementtree_TreeBuilder_end, METH_O, _elementtree_TreeBuilder_end__doc__},
PyDoc_STRVAR(_elementtree_TreeBuilder_comment__doc__,
"comment($self, text, /)\n"
"--\n"
"\n");
#define _ELEMENTTREE_TREEBUILDER_COMMENT_METHODDEF \
{"comment", (PyCFunction)_elementtree_TreeBuilder_comment, METH_O, _elementtree_TreeBuilder_comment__doc__},
PyDoc_STRVAR(_elementtree_TreeBuilder_pi__doc__,
"pi($self, target, text=None, /)\n"
"--\n"
"\n");
#define _ELEMENTTREE_TREEBUILDER_PI_METHODDEF \
{"pi", (PyCFunction)(void(*)(void))_elementtree_TreeBuilder_pi, METH_FASTCALL, _elementtree_TreeBuilder_pi__doc__},
static PyObject *
_elementtree_TreeBuilder_pi_impl(TreeBuilderObject *self, PyObject *target,
PyObject *text);
static PyObject *
_elementtree_TreeBuilder_pi(TreeBuilderObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *return_value = NULL;
PyObject *target;
PyObject *text = Py_None;
if (!_PyArg_CheckPositional("pi", nargs, 1, 2)) {
goto exit;
}
target = args[0];
if (nargs < 2) {
goto skip_optional;
}
text = args[1];
skip_optional:
return_value = _elementtree_TreeBuilder_pi_impl(self, target, text);
exit:
return return_value;
}
PyDoc_STRVAR(_elementtree_TreeBuilder_close__doc__,
"close($self, /)\n"
"--\n"
......@@ -853,4 +969,4 @@ skip_optional:
exit:
return return_value;
}
/*[clinic end generated code: output=440b5d90a4b86590 input=a9049054013a1b77]*/
/*[clinic end generated code: output=386a68425d072b5c input=a9049054013a1b77]*/
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment