Previous Up Next


p:exec

p:exec — Runs an external command.

Synopsis

<p:declare-step type="p:exec">
     <p:input port="source" primary="true" sequence="true"/>
     <p:output port="result" primary="true"/>
     <p:output port="errors"/>
     <p:output port="exit-status"/>
     <p:option name="command" required="true"/>                    <!-- string -->
     <p:option name="args" select="''"/>                           <!-- string -->
     <p:option name="cwd"/>                                        <!-- string -->
     <p:option name="source-is-xml" select="'true'"/>              <!-- boolean -->
     <p:option name="result-is-xml" select="'true'"/>              <!-- boolean -->
     <p:option name="wrap-result-lines" select="'false'"/>         <!-- boolean -->
     <p:option name="errors-is-xml" select="'false'"/>             <!-- boolean -->
     <p:option name="wrap-error-lines" select="'false'"/>          <!-- boolean -->
     <p:option name="path-separator"/>                             <!-- string -->
     <p:option name="failure-threshold"/>                          <!-- integer -->
     <p:option name="arg-separator" select="' '"/>                 <!-- string -->
     <p:option name="byte-order-mark"/>                            <!-- boolean -->
     <p:option name="cdata-section-elements" select="''"/>         <!-- ListOfQNames -->
     <p:option name="doctype-public"/>                             <!-- string -->
     <p:option name="doctype-system"/>                             <!-- anyURI -->
     <p:option name="encoding"/>                                   <!-- string -->
     <p:option name="escape-uri-attributes" select="'false'"/>     <!-- boolean -->
     <p:option name="include-content-type" select="'true'"/>       <!-- boolean -->
     <p:option name="indent" select="'false'"/>                    <!-- boolean -->
     <p:option name="media-type"/>                                 <!-- string -->
     <p:option name="method" select="'xml'"/>                      <!-- QName -->
     <p:option name="normalization-form" select="'none'"/>         <!-- NormalizationForm -->
     <p:option name="omit-xml-declaration" select="'true'"/>       <!-- boolean -->
     <p:option name="standalone" select="'omit'"/>                 <!-- "true" | "false" | "omit" -->
     <p:option name="undeclare-prefixes"/>                         <!-- boolean -->
     <p:option name="version" select="'1.0'"/>                     <!-- string -->
</p:declare-step>

Description

The p:exec step runs an external command. It can pass a document to the “standard input” of the command and it can return both the command's “standard output” and “standard error” as XML documents.

The p:exec step executes the command passed on command with the arguments passed on args. The processor does not interpolate the values of the command or args (for example, expanding references to environment variables).

If cwd is specified, then the current working directory is changed to the value of that option before execution begins. If cwd is not specified, the current working directory is implementation-defined.

Although the step is optional and therefore not guaranteed to be portable, some care has been taken to address at least the most common cross-platform issues: path and argument separators.

Unix-based systems us a forward slash to separate the components of a path, for example /usr/bin/ls. On Windows-based systems, the backward slash is used, for example: \usr\bin\ls. Assuming that you want to run a p:exec step on both kinds of systems, and you've already arranged for the actual path name of the command to be the same on both systems, you couldn't easily write the p:exec step because no matter which character you chose, it would only work on one of the systems.

XProc addresses this with a path-separator option. This option, which must be exactly one character, will be replaced in command and args options with the appropriate, platform-specific path separator character.

A similar problem exists for the command line arguments. It's common to separate arguments with spaces. For example, “a b c” is usually interpreted as three distinct arguments, “a”, “b”, and “c”.

A the same time, it's become quite common for filenames to contain spaces. This leads to a dilemma. Given a file named “My Report.xml”, how do I pass it to a command without having it interpreted as two arguments: “My” and “Report.xml”.

Command line tools usually get around this problem by introducing various quoting mechanisms. Double quotes around the filename, for example, tell the shell to recognize it as a single argument, and a backslash in front of a character often causes it to be interpreted literally (allowing even uncommon filenames like “My "report".xml” to be processed).

XML already has its own notion of special characters and an escaping mechanism for them, not a mechanism that can reasonably be extended to cover this case. Adding another mechanism for escaping characters in this context would have been confusing. What's more, XProc doesn't really give the pipeline author a convenient way to analyze the arguments, select an appropriate, escaping mechanism, and use it. That's just not the sort of thing it's made to do.

XProc attacks the problem a different way, it provides an arg-separator option, which defaults to a space. Every occurrence of this character in the args string is interpreted as a separator between arguments. Using this mechanism, you can pass filenames with special characters along with other arguments simply by picking a different separator:

  1 <p:exec command="dosomething"
            arg-separator="!"
            args='-f!My "report".xml' …>
      5 </p:exec>

This has one unfortunate consequence: if you separate arguments with multiple spaces, each space gets interpreted as a separator and you get empty strings as individual arguments which is rarely what you want. Either pick an explicit separator character or make sure you use only a single space between arguments.

Inputs and outputs

The document that appears on the source port is sent to the command on “standard input”. The step is declared to allow a sequence of documents to appear on source, this is only to allow the source to be an empty sequence. If an empty sequence appears on the source port, nothing is passed to standard input. It is an error to send more than one document to the source port.

There's no standard API for passing XML documents to external commands, so they are sent as text. If the source-is-xml option is “true”, then the source document is serialized with the serialization options specified and that text is sent to the command. If source-is-xml option is “false”, the XPath string-value of the document is passed.

The standard output of the command is read and returned on result; the standard error output is read and returned on errors. In order to assure that the result will be an XML document, each of the results will be wrapped in a c:result element.

If result-is-xml is “true”, the standard output of the program is assumed to be XML and will be parsed as a single document. If it is false, the output will be returned as escaped text.

If wrap-result-lines is “true”, a c:line element will be wrapped around each line of output.

The same rules apply to the standard error output of the program, with the errors-is-xml and wrap-error-lines options, respectively.

If either of the results are XML, they are parsed using the same rules that p:document uses.

The exit-status port always returns a single c:result element which contains the system exit status that the process returned. The specific exit status values returned by a process invoked with p:exec are implementation-dependent.

If a failure-threshold value is supplied, and the exit status is greater than that threshold, then the p:exec step will fail. This failure, like any step failure, can be captured with a p:try.

Errors

Error Description
err:C0033 Occurs if the command cannot be run.
err:C0034 Occurs if the current working directory cannot be changed to the value of the cwd option.
err:C0063 Occurs if the path-separator option is specified and is not exactly one character long.
err:C0066 Occurs if the arg-separator option is specified and is not exactly one character long.
err:D0006 Occurs if more than one document appears on the source port of the p:exec step.
err:C0035 Occurs if both result-is-xml and wrap-result-lines or error-is-xml and wrap-error-lines are specified
err:C0064 Occurs if the exit code from the command is greater than the specified failure-threshold value.

Examples

The optional p:exec step runs external processes.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:exec command="/bin/cat" result-is-xml="true"/>
  5 </p:pipeline>
Input Output
1 <doc>
<p>Some text.</p>
</doc>
 
1 <c:result xmlns:c="http://www.w3.org/ns/xproc-step">
<doc>
<p>Some text.</p>
</doc>
5 </c:result>

Many processes produce non-XML markup, which has to be encoded. In this case, the external process does not expect any input, so we explicitly make the source empty.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:exec command="/bin/ls" result-is-xml="false"
  5           args="-l ../docs">
        <p:input port="source">
          <p:empty/>
        </p:input>
      </p:exec>
 10 </p:pipeline>
Output
1 <c:result xmlns:c="http://www.w3.org/ns/xproc-step">total 72
drwxr-xr-x 4 ndw ndw 136 Apr 5 07:00 apps
drwxr-xr-x 5 ndw ndw 170 Apr 5 07:00 chaps
-rw-r--r-- 1 ndw ndw 93 Mar 31 11:49 document.xml
5 -rw-r--r-- 1 ndw ndw 1490 Apr 6 08:46 document.xsd
-rw-r--r-- 1 ndw ndw 1421 Apr 6 08:34 document.xsd~
-rw-r--r-- 1 ndw ndw 0 Apr 1 08:55 funny&amp;name.txt
-rw-r--r-- 1 ndw ndw 99 Apr 6 07:53 grammar.rnc
-rw-r--r-- 1 ndw ndw 484 Apr 6 07:47 grammar.rng
10 -rw-r--r-- 1 ndw ndw 104 Apr 2 12:10 invalid.xml
-rw-r--r-- 1 ndw ndw 200 Mar 30 20:37 main.xml
-rw-r--r-- 1 ndw ndw 551 Apr 6 08:06 rules.sch
-rw-r--r-- 1 ndw ndw 94 Apr 2 12:09 valid.xml
</c:result>

The wrap-result-lines option can make it easier for other XML processes to work with the lines of non-XML output.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:exec command="/bin/ls" result-is-xml="false"
  5           args="-l ../docs"
              wrap-result-lines="true">
        <p:input port="source">
          <p:empty/>
        </p:input>
 10   </p:exec>
    </p:pipeline>
Output
1 <c:result xmlns:c="http://www.w3.org/ns/xproc-step">
<c:line>total 72</c:line>
<c:line>drwxr-xr-x 4 ndw ndw 136 Apr 5 07:00 apps</c:line>
<c:line>drwxr-xr-x 5 ndw ndw 170 Apr 5 07:00 chaps</c:line>
5 <c:line>-rw-r--r-- 1 ndw ndw 93 Mar 31 11:49 document.xml</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 1490 Apr 6 08:46 document.xsd</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 1421 Apr 6 08:34 document.xsd~</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 0 Apr 1 08:55 funny&amp;name.txt</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 99 Apr 6 07:53 grammar.rnc</c:line>
10 <c:line>-rw-r--r-- 1 ndw ndw 484 Apr 6 07:47 grammar.rng</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 104 Apr 2 12:10 invalid.xml</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 200 Mar 30 20:37 main.xml</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 551 Apr 6 08:06 rules.sch</c:line>
<c:line>-rw-r--r-- 1 ndw ndw 94 Apr 2 12:09 valid.xml</c:line>
15 </c:result>

The exit status is available through the non-primary “exit-status” port.

  1 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
                version="1.0">
    
      <p:exec name="ls" command="/bin/ls" result-is-xml="false"
  5           args="-l ../docs"
              wrap-result-lines="true">
        <p:input port="source">
          <p:empty/>
        </p:input>
 10   </p:exec>
    
      <p:sink/>
    
      <p:identity>
 15     <p:input port="source">
          <p:pipe step="ls" port="exit-status"/>
        </p:input>
      </p:identity>
    </p:pipeline>
Output
1 <c:result xmlns:c="http://www.w3.org/ns/xproc-step">0</c:result>

Because it's an error to leave a primary output port unbound, in this example we explicitly discard it with p:sink. In a real pipeline, we'd probably be doing something useful with the output.