Previous Up Next


p:unescape-markup

p:unescape-markup — Converts “escaped XML” back into real XML.

Synopsis

<p:declare-step type="p:unescape-markup">
     <p:input port="source"/>
     <p:output port="result"/>
     <p:option name="namespace"/>                                  <!-- anyURI -->
     <p:option name="content-type" select="'application/xml'"/>    <!-- string -->
     <p:option name="encoding"/>                                   <!-- string -->
     <p:option name="charset"/>                                    <!-- string -->
</p:declare-step>

Description

The p:unescape-markup step takes the string value of the document element and parses the content as if it was a Unicode character stream containing serialized XML. The output is the result of that parse inside the same document element wrapper. This is the reverse of the p:escape-markup step.

When the string value is parsed, the original document element is preserved so that the result will be well-formed XML even if the content consists of multiple, sibling elements.

The namespace option specifies a default namespace. Elements that are in no namespace in the unescaped content will be placed into this namespace unless there is an in-scope namespace declaration that specifies a different namespace (or explicitly undeclares the default namespace).

The content-type option can be used to specify an alternate content type for the string value. Implementations can use this value to select different parsing strategies for converting escaped text into XML. For example, if the value text/html is used, XML Calabash parses with [TagSoup], which just about guarantees that any random, sloppy collection of markup will produce well-formed XML.

If you specify application/xml, all implementations must use a standard XML parser for it. Behavior of p:unescape-markup for content-types other than application/xml is implementation-defined.

The serialized form of an XML document is a sequence of Unicode characters. If the escaped markup is not in a Unicode-compatible character set, it may be encoded (so that it can be carried as a payload in an XML document). In this case the encoding option must be used.

If an encoding is specified, a charset may also be specified. The character set may be specified as a parameter on the content-type or via the separate charset option. If the specified encoding is base64, then the character set must be specified.

If no encoding is specified, the character set is ignored, irrespective of where it was specified.

Errors

Error Description
err:C0051 Occurs if the content-type specified is not supported by the implementation.
err:C0052 Occurs if the encoding specified is not supported by the implementation.
err:C0010 Occurs if an encoding of base64 is specified and the character set is not specified or if the specified character set is not supported by the implementation.

Examples

This example simply decodes some encoded text:

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:unescape-markup/>
  5 </p:pipeline>
Input Output
1 <description>
&lt;p&gt;This is a chunk.&lt;/p&gt;
&lt;p&gt;This is a another chunk.&lt;/p&gt;
</description>
 
1 <description>
<p>This is a chunk.</p>
<p>This is a another chunk.</p>
</description>

The namespace option can be used to specify the default namespace.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:unescape-markup namespace="http://www.w3.org/1999/xhtml"/>
  5 </p:pipeline>
Input Output
1 <description>
&lt;p&gt;This is a chunk.&lt;/p&gt;
&lt;p&gt;This is a another chunk.&lt;/p&gt;
</description>
 
1 <description>
<p xmlns="http://www.w3.org/1999/xhtml">This is a chunk.</p>
<p xmlns="http://www.w3.org/1999/xhtml">This is a another chunk.</p>
</description>

Simply unescaping the markup in this HTML example would not yield a well-formed XML result. By specifying a text/html content-type, we give the processor the ability to perform appropriate “fixup” on the markup.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:unescape-markup content-type="text/html"/>
  5 </p:pipeline>
Input Output
1 <wrapper>
&lt;html&gt;
&lt;title&gt;Some title&lt;/title&gt;
&lt;h1&gt;Some title&lt;/h1&gt;
5 &lt;p&gt;This is some&lt;br&gt;
HTML text
&lt;p&gt;It isn't well-formed XML&lt;br&gt;
by any stretch of the imagination
</wrapper>
 
1 <wrapper>
<html>
<head>
<title>Some title</title>
5 </head>
<body>
<h1>Some title</h1>
<p>This is some<br clear="none"/>
HTML text
10 </p>
<p>It isn't well-formed XML<br clear="none"/>
by any stretch of the imagination
</p>
</body>
15 </html>
</wrapper>