Commit e1d5dd64 authored by Stefan Behnel's avatar Stefan Behnel Committed by GitHub

bpo-13611: C14N 2.0 implementation for ElementTree (GH-12966)

* Implement C14N 2.0 as a new canonicalize() function in ElementTree.

Missing features:
- prefix renaming in XPath expressions (tag and attribute text is supported)
- preservation of original prefixes given redundant namespace declarations
parent ee88af3f
...@@ -465,6 +465,53 @@ Reference ...@@ -465,6 +465,53 @@ Reference
Functions Functions
^^^^^^^^^ ^^^^^^^^^
.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)
`C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.
Canonicalization is a way to normalise XML output in a way that allows
byte-by-byte comparisons and digital signatures. It reduced the freedom
that XML serializers have and instead generates a more constrained XML
representation. The main restrictions regard the placement of namespace
declarations, the ordering of attributes, and ignorable whitespace.
This function takes an XML data string (*xml_data*) or a file path or
file-like object (*from_file*) as input, converts it to the canonical
form, and writes it out using the *out* file(-like) object, if provided,
or returns it as a text string if not. The output file receives text,
not bytes. It should therefore be opened in text mode with ``utf-8``
encoding.
Typical uses::
xml_data = "<root>...</root>"
print(canonicalize(xml_data))
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(xml_data, out=out_file)
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(from_file="inputfile.xml", out=out_file)
The configuration *options* are as follows:
- *with_comments*: set to true to include comments (default: false)
- *strip_text*: set to true to strip whitespace before and after text content
(default: false)
- *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
(default: false)
- *qname_aware_tags*: a set of qname aware tag names in which prefixes
should be replaced in text content (default: empty)
- *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
should be replaced in text content (default: empty)
- *exclude_attrs*: a set of attribute names that should not be serialised
- *exclude_tags*: a set of tag names that should not be serialised
In the option list above, "a set" refers to any collection or iterable of
strings, no ordering is expected.
.. versionadded:: 3.8
.. function:: Comment(text=None) .. function:: Comment(text=None)
...@@ -1114,6 +1161,19 @@ TreeBuilder Objects ...@@ -1114,6 +1161,19 @@ TreeBuilder Objects
.. versionadded:: 3.8 .. versionadded:: 3.8
.. class:: C14NWriterTarget(write, *, \
with_comments=False, strip_text=False, rewrite_prefixes=False, \
qname_aware_tags=None, qname_aware_attrs=None, \
exclude_attrs=None, exclude_tags=None)
A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the
same as for the :func:`canonicalize` function. This class does not build a
tree but translates the callback events directly into a serialised form
using the *write* function.
.. versionadded:: 3.8
.. _elementtree-xmlparser-objects: .. _elementtree-xmlparser-objects:
XMLParser Objects XMLParser Objects
......
...@@ -525,6 +525,10 @@ xml ...@@ -525,6 +525,10 @@ xml
external entities by default. external entities by default.
(Contributed by Christian Heimes in :issue:`17239`.) (Contributed by Christian Heimes in :issue:`17239`.)
* The :mod:`xml.etree.ElementTree` module provides a new function
:func:`–xml.etree.ElementTree.canonicalize()` that implements C14N 2.0.
(Contributed by Stefan Behnel in :issue:`13611`.)
Optimizations Optimizations
============= =============
......
This diff is collapsed.
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:IgnoreComments>true</c14n2:IgnoreComments>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" Algorithm="http://www.w3.org/2010/xml-c14n2">
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
<c14n2:QNameAware>
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:TrimTextNodes>true</c14n2:TrimTextNodes>
</dsig:CanonicalizationMethod>
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT doc (#PCDATA)>
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
</xsl:stylesheet>
<?xml version="1.0"?>
<?xml-stylesheet href="doc.xsl"
type="text/xsl" ?>
<!DOCTYPE doc SYSTEM "doc.dtd">
<doc>Hello, world!<!-- Comment 1 --></doc>
<?pi-without-data ?>
<!-- Comment 2 -->
<!-- Comment 3 -->
<doc>
<clean> </clean>
<dirty> A B </dirty>
<mixed>
A
<clean> </clean>
B
<dirty> A B </dirty>
C
</mixed>
</doc>
<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
<doc>
<e1 />
<e2 ></e2>
<e3 name = "elem3" id="elem3" />
<e4 name="elem4" id="elem4" ></e4>
<e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
xmlns:b="http://www.ietf.org"
xmlns:a="http://www.w3.org"
xmlns="http://example.org"/>
<e6 xmlns="" xmlns:a="http://www.w3.org">
<e7 xmlns="http://www.ietf.org">
<e8 xmlns="" xmlns:a="http://www.w3.org">
<e9 xmlns="" xmlns:a="http://www.ietf.org"/>
</e8>
</e7>
</e6>
</doc>
<!DOCTYPE doc [
<!ATTLIST normId id ID #IMPLIED>
<!ATTLIST normNames attr NMTOKENS #IMPLIED>
]>
<doc>
<text>First line&#x0d;&#10;Second line</text>
<value>&#x32;</value>
<compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
<compute expr='value>"0" &amp;&amp; value&lt;"10" ?"valid":"error"'>valid</compute>
<norm attr=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/>
<normNames attr=' A &#x20;&#13;&#xa;&#9; B '/>
<normId id=' &apos;&#x20;&#13;&#xa;&#9; &apos; '/>
</doc>
<!DOCTYPE doc [
<!ATTLIST doc attrExtEnt CDATA #IMPLIED>
<!ENTITY ent1 "Hello">
<!ENTITY ent2 SYSTEM "world.txt">
<!ENTITY entExt SYSTEM "earth.gif" NDATA gif>
<!NOTATION gif SYSTEM "viewgif.exe">
]>
<doc attrExtEnt="entExt">
&ent1;, &ent2;!
</doc>
<!-- Let world.txt contain "world" (excluding the quotes) -->
<?xml version="1.0" encoding="ISO-8859-1"?>
<doc>&#169;</doc>
<a:foo xmlns:a="http://a" xmlns:b="http://b" xmlns:child="http://c" xmlns:soap-env="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<a:bar>xsd:string</a:bar>
<dsig2:IncludedXPath xmlns:dsig2="http://www.w3.org/2010/xmldsig2#">/soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string']</dsig2:IncludedXPath>
</a:foo>
<foo xmlns:a="http://a" xmlns:b="http://b">
<b:bar b:att1="val" att2="val"/>
</foo>
<a:foo xmlns:a="http://a" xmlns:b="http://b" xmlns:c="http://c">
<b:bar/>
<b:bar/>
<b:bar/>
<a:bar b:att1="val"/>
</a:foo>
<foo xmlns:a="http://z3" xmlns:b="http://z2" a:att1="val1" b:att2="val2">
<bar xmlns="http://z0" xmlns:a="http://z2" a:att1="val1" b:att2="val2" xmlns:b="http://z3" />
</foo>
<a:foo xmlns:a="http://z3" xmlns:b="http://z2" b:att1="val1" c:att3="val3" b:att2="val2" xmlns:c="http://z1" xmlns:d="http://z0">
<c:bar/>
<c:bar d:att3="val3"/>
</a:foo>
<foo xmlns:a="http://z0" xmlns:b="http://z0" a:att1="val1" b:att2="val2" xmlns="http://z0">
<c:bar xmlns:a="http://z0" xmlns:c="http://z0" c:att3="val3"/>
<d:bar xmlns:d="http://z0"/>
</foo>
<foo xmlns="http://z0" xml:id="23">
<bar xsi:type="xsd:string" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">data</bar>
</foo>
<?xml-stylesheet href="doc.xsl"
type="text/xsl" ?>
<doc>Hello, world!<!-- Comment 1 --></doc>
<?pi-without-data?>
<!-- Comment 2 -->
<!-- Comment 3 -->
\ No newline at end of file
<?xml-stylesheet href="doc.xsl"
type="text/xsl" ?>
<doc>Hello, world!</doc>
<?pi-without-data?>
\ No newline at end of file
<doc>
<clean> </clean>
<dirty> A B </dirty>
<mixed>
A
<clean> </clean>
B
<dirty> A B </dirty>
C
</mixed>
</doc>
\ No newline at end of file
<doc><clean></clean><dirty>A B</dirty><mixed>A<clean></clean>B<dirty>A B</dirty>C</mixed></doc>
\ No newline at end of file
<doc>
<e1></e1>
<e2></e2>
<e3 id="elem3" name="elem3"></e3>
<e4 id="elem4" name="elem4"></e4>
<e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5>
<e6>
<e7 xmlns="http://www.ietf.org">
<e8 xmlns="">
<e9 attr="default"></e9>
</e8>
</e7>
</e6>
</doc>
\ No newline at end of file
<n0:doc xmlns:n0="">
<n0:e1></n0:e1>
<n0:e2></n0:e2>
<n0:e3 id="elem3" name="elem3"></n0:e3>
<n0:e4 id="elem4" name="elem4"></n0:e4>
<n1:e5 xmlns:n1="http://example.org" xmlns:n2="http://www.ietf.org" xmlns:n3="http://www.w3.org" attr="I'm" attr2="all" n2:attr="sorted" n3:attr="out"></n1:e5>
<n0:e6>
<n2:e7 xmlns:n2="http://www.ietf.org">
<n0:e8>
<n0:e9 attr="default"></n0:e9>
</n0:e8>
</n2:e7>
</n0:e6>
</n0:doc>
\ No newline at end of file
<doc><e1></e1><e2></e2><e3 id="elem3" name="elem3"></e3><e4 id="elem4" name="elem4"></e4><e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5><e6><e7 xmlns="http://www.ietf.org"><e8 xmlns=""><e9 attr="default"></e9></e8></e7></e6></doc>
\ No newline at end of file
<doc>
<text>First line&#xD;
Second line</text>
<value>2</value>
<compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute>
<compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot; ?&quot;valid&quot;:&quot;error&quot;">valid</compute>
<norm attr=" ' &#xD;&#xA;&#x9; ' "></norm>
<normNames attr="A &#xD;&#xA;&#x9; B"></normNames>
<normId id="' &#xD;&#xA;&#x9; '"></normId>
</doc>
\ No newline at end of file
<doc><text>First line&#xD;
Second line</text><value>2</value><compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute><compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot; ?&quot;valid&quot;:&quot;error&quot;">valid</compute><norm attr=" ' &#xD;&#xA;&#x9; ' "></norm><normNames attr="A &#xD;&#xA;&#x9; B"></normNames><normId id="' &#xD;&#xA;&#x9; '"></normId></doc>
\ No newline at end of file
<doc attrExtEnt="entExt">
Hello, world!
</doc>
\ No newline at end of file
<doc attrExtEnt="entExt">Hello, world!</doc>
\ No newline at end of file
<doc>©</doc>
\ No newline at end of file
<a:foo xmlns:a="http://a">
<a:bar>xsd:string</a:bar>
<dsig2:IncludedXPath xmlns:dsig2="http://www.w3.org/2010/xmldsig2#">/soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string']</dsig2:IncludedXPath>
</a:foo>
\ No newline at end of file
<n0:foo xmlns:n0="http://a">
<n0:bar xmlns:n1="http://www.w3.org/2001/XMLSchema">n1:string</n0:bar>
<n4:IncludedXPath xmlns:n2="http://b" xmlns:n3="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:n4="http://www.w3.org/2010/xmldsig2#">/n3:body/child::n2:foo[@att1 != "c:val" and @att2 != 'xsd:string']</n4:IncludedXPath>
</n0:foo>
\ No newline at end of file
<a:foo xmlns:a="http://a">
<a:bar xmlns:xsd="http://www.w3.org/2001/XMLSchema">xsd:string</a:bar>
<dsig2:IncludedXPath xmlns:dsig2="http://www.w3.org/2010/xmldsig2#">/soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string']</dsig2:IncludedXPath>
</a:foo>
\ No newline at end of file
<a:foo xmlns:a="http://a">
<a:bar xmlns:xsd="http://www.w3.org/2001/XMLSchema">xsd:string</a:bar>
<dsig2:IncludedXPath xmlns:b="http://b" xmlns:dsig2="http://www.w3.org/2010/xmldsig2#" xmlns:soap-env="http://schemas.xmlsoap.org/wsdl/soap/">/soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string']</dsig2:IncludedXPath>
</a:foo>
\ No newline at end of file
<foo>
<b:bar xmlns:b="http://b" att2="val" b:att1="val"></b:bar>
</foo>
\ No newline at end of file
<n0:foo xmlns:n0="">
<n1:bar xmlns:n1="http://b" att2="val" n1:att1="val"></n1:bar>
</n0:foo>
\ No newline at end of file
<a:foo xmlns:a="http://a">
<b:bar xmlns:b="http://b"></b:bar>
<b:bar xmlns:b="http://b"></b:bar>
<b:bar xmlns:b="http://b"></b:bar>
<a:bar xmlns:b="http://b" b:att1="val"></a:bar>
</a:foo>
\ No newline at end of file
<n0:foo xmlns:n0="http://a">
<n1:bar xmlns:n1="http://b"></n1:bar>
<n1:bar xmlns:n1="http://b"></n1:bar>
<n1:bar xmlns:n1="http://b"></n1:bar>
<n0:bar xmlns:n1="http://b" n1:att1="val"></n0:bar>
</n0:foo>
\ No newline at end of file
<foo xmlns:a="http://z3" xmlns:b="http://z2" b:att2="val2" a:att1="val1">
<bar xmlns="http://z0" xmlns:a="http://z2" xmlns:b="http://z3" a:att1="val1" b:att2="val2"></bar>
</foo>
\ No newline at end of file
<n0:foo xmlns:n0="" xmlns:n1="http://z2" xmlns:n2="http://z3" n1:att2="val2" n2:att1="val1">
<n3:bar xmlns:n3="http://z0" n1:att1="val1" n2:att2="val2"></n3:bar>
</n0:foo>
\ No newline at end of file
<a:foo xmlns:a="http://z3" xmlns:b="http://z2" xmlns:c="http://z1" c:att3="val3" b:att1="val1" b:att2="val2">
<c:bar></c:bar>
<c:bar xmlns:d="http://z0" d:att3="val3"></c:bar>
</a:foo>
\ No newline at end of file
<n2:foo xmlns:n0="http://z1" xmlns:n1="http://z2" xmlns:n2="http://z3" n0:att3="val3" n1:att1="val1" n1:att2="val2">
<n0:bar></n0:bar>
<n0:bar xmlns:n3="http://z0" n3:att3="val3"></n0:bar>
</n2:foo>
\ No newline at end of file
<foo xmlns="http://z0" xmlns:a="http://z0" xmlns:b="http://z0" a:att1="val1" b:att2="val2">
<c:bar xmlns:c="http://z0" c:att3="val3"></c:bar>
<d:bar xmlns:d="http://z0"></d:bar>
</foo>
\ No newline at end of file
<n0:foo xmlns:n0="http://z0" n0:att1="val1" n0:att2="val2">
<n0:bar n0:att3="val3"></n0:bar>
<n0:bar></n0:bar>
</n0:foo>
\ No newline at end of file
<foo xmlns="http://z0" xml:id="23">
<bar xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xsd:string">data</bar>
</foo>
\ No newline at end of file
<n0:foo xmlns:n0="http://z0" xml:id="23">
<n0:bar xmlns:n1="http://www.w3.org/2001/XMLSchema-instance" n1:type="xsd:string">data</n0:bar>
</n0:foo>
\ No newline at end of file
<n0:foo xmlns:n0="http://z0" xml:id="23">
<n0:bar xmlns:n1="http://www.w3.org/2001/XMLSchema" xmlns:n2="http://www.w3.org/2001/XMLSchema-instance" n2:type="n1:string">data</n0:bar>
</n0:foo>
\ No newline at end of file
<foo xmlns="http://z0" xml:id="23">
<bar xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xsd:string">data</bar>
</foo>
\ No newline at end of file
world
\ No newline at end of file
This diff is collapsed.
The xml.etree.ElementTree packages gained support for C14N 2.0 serialisation.
Patch by Stefan Behnel.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment