Chapter 2. Pipeline Terminology
In this chapter, we'll survey the main syntactic elements of an
XProc pipeline. By their nature, pipelines describe a series of steps
and the connections between those steps. As a
consequence, many of the important parts of a pipeline are directly
related to other parts. This makes it difficult to describe any part
of a pipeline without mentioning other, not yet explained, parts.
We'll do our best. Press on and you'll find all the parts
explained.
A step is a unit of work: it performs an XSLT transformation, or
validates a document, or issues an HTTP request, or computes an average
bank balance, or translates French into German.
Most steps accept some input, process it in some way, and produce
some output. You can think of a step as a black box; XML documents (and
possibly options and parameters) enter the box, (usually) different
XML documents come out. Even if you know what happens inside the box,
other sibling steps, steps from which the inputs came or to which the outputs
go, do not.
Different kinds of steps are distinguished by their
type; a p:xslt-type step is
different from a p:http-request-type step is different from
a my:compute-balance-type step. Generally, when we're talking
about steps we talk about them by type and leave out the “-type” part:
a p:xslt step, a p:http-request, etc.
Step types are identified with QNames and are globally unique.
When we need to distinguish a particular instance of a step, we
can give it a name. Connections between steps are done by name, we
read the output of the “style” step or the output
of the “get-form” step.
Step names must be XML Names (no colons, no
spaces, etc.). Names are scoped, but they are not globally unique.
Graphically, a step looks something like Figure 2.1, “A step”.
This “StepType”-type step named “Name”
has two input ports: source and alternate,
one output port: result. It has
two options: Option and Another_option,
and accepts parameters.
Steps have “ports” into which “pipes” are connected. A pipe
connects the output port of one step to the input port of another. The
type, name, and number of ports on each type of step is fixed and
immutable.
The only thing that can flow through a pipe is an XML document.
Each input and output port indicates if it can accept or produce a
sequence of documents. If a port does not accept a sequence, then it
must accept or produce exactly one document.
Consider the pipeline
in Figure 2.2, “A pipeline”.
This pipeline has four steps named: A, B, C, and D. The fact that step
C is implemented as a pipeline of two additional steps is
not visible to the steps outside of C.
In this pipeline, the pipes are A-to-B, B-to-C, C-to-D, and A-to-D.
It would be more precise to say that B reads from A, C reads from B,
and D reads from A and C. Conceptually, the following two statements
amount to the same thing: “the result output port of step A
is connected to the source input port of step B” or
“the source input port of step B is connected to the result
output port of step A”. The only difference is perspective; in the former statement,
the perspective is from step A, connecting one of its outputs to one of the inputs
of step B; in the latter statement, the perspective is from step B, connecting one
of its input ports to one of the output ports of step A.
All connections in XProc are written
from the perspective of the port which is accepting input: we always,
and can only, say that step B reads from step A; we can't say that
step A writes to step B. This was a somewhat arbitrary, but conscious
decision. Given two different but equivalent ways to say the same
thing, XProc requires that only one way be used. This reduces the
insignificant variation in pipelines and hopefully reduces confusion
as well.
As we'll see in Chapter 3, Steps and Connections, input ports can be connected not
only to other steps, but also to the outside world through URIs. Well, technically,
IRIs. XProc adopts the common convention that authors can write IRIs in places
where we use resource identifiers to point to documents on the web. If all of your
URIs can be represented in US ASCII, this distinction is irrelevant and can
be ignored. If you have identifiers that contain characters from a broader
repertoire, this means you can write them just the way you'd expect.
1. Primary ports
If you think of steps as performing units of work, it's often
obvious that one input and one output port are “primary”. For example,
the schema validation steps have two inputs: one for the document to
be validated and one for the set of schemas to use in validation. The port
that reads the documents to be validated is, in some sense, the important
one. Similarly, although the XSLT step produces both primary and secondary
outputs, the primary output is the important one.
XProc identifies at most one input port and at most one output
port as “primary”. Default connections always involve at least one
primary port.
The primary attribute on a p:input
or p:output element in a step declaration marks a port as either
primary or non-primary. If there is only one input or output port, then it
is primary unless it is explicitly marked as non-primary. If there is more than
one input or output port, then only the one explicitly marked as primary
is primary.
In other words, if there's a single input or output port, then it is
primary by default. If there's more than one input or output port, then none
are primary unless one is explicitly identified as primary. It's an error
if more than one is identified as primary.
2. The default readable port
Primary ports come into play when considering default connections.
In the context of a pipeline, there may be a “default readable port” associated
with each step. In the ordinary case, if a step has inputs that are not
explicitly connected, then they will be connected to the default readable
port associated with that step.
The default readable port for the first step in a subpipline is its
parent's primary input port. If its parent doesn't have a primary input port,
then the default readable port is undefined.
The default readable port for each step after the first in a
subpipeline is the default output port of its preceding
sibling. If its preceding sibling has no primary output port, then the
default readable port is undefined.
If there is no default readable port, then no inputs will be connected
by default and the pipeline author must make explicit connections for all
the inputs.
3. Errors and Exceptions
Two kinds of errors can occur in an XProc pipeline: static
errors and dynamic errors. A static error is something that is
independent of any of the pipeline's inputs and will be detected by
the processor before it even starts running. Leaving an input port unconnected
is a static error, so is an attempt to use a step type for which there is
no declaration. Static errors are like “compile time” errors in traditional
programming languages, a pipeline with a static error isn't a valid pipeline
and cannot be run.
Dynamic errors occur when a pipeline encounters an invalid or
unexpected condition while it's running. If more than one document is written
to a port that only accepts a single document, that's a dynamic error.
Many steps can cause a dynamic error to occur; informally we speak of
steps failing or throwing an exception. It all amounts to the same thing:
something has gone wrong and the step that caused or detected the error
will not run or finish running. Steps that fail produce no output, though
they may have side effects (such as writing to the filesystem or sending an
HTTP POST).
A pipeline that contains a step that fails will also fail.
Errors propagate up through any intervening steps until either a
p:try step is encountered or the top level step fails, at
which point the entire pipeline fails and stops running.
A p:try step catches dynamic errors. If a dynamic
error occurs in a p:try, then its p:catch
pipeline is run and the output of the step is whatever the
p:catch produces. Of course, the p:catch can
fail, in which case the “upward cascade” of failure continues to the
next p:try or the collapse of the whole pipeline.
The reference pages for the p:try step outline the
syntax and mechanics of exception catching in more detail.