Previous Up Next

Chapter 2. Pipeline Terminology

Table of Contents

1. Primary ports
2. The default readable port
3. Errors and Exceptions

In this chapter, we'll survey the main syntactic elements of an XProc pipeline. By their nature, pipelines describe a series of steps and the connections between those steps. As a consequence, many of the important parts of a pipeline are directly related to other parts. This makes it difficult to describe any part of a pipeline without mentioning other, not yet explained, parts. We'll do our best. Press on and you'll find all the parts explained.

A step is a unit of work: it performs an XSLT transformation, or validates a document, or issues an HTTP request, or computes an average bank balance, or translates French into German.

Most steps accept some input, process it in some way, and produce some output. You can think of a step as a black box; XML documents (and possibly options and parameters) enter the box, (usually) different XML documents come out. Even if you know what happens inside the box, other sibling steps, steps from which the inputs came or to which the outputs go, do not.

Different kinds of steps are distinguished by their type; a p:xslt-type step is different from a p:http-request-type step is different from a my:compute-balance-type step. Generally, when we're talking about steps we talk about them by type and leave out the “-type” part: a p:xslt step, a p:http-request, etc. Step types are identified with QNames and are globally unique.

When we need to distinguish a particular instance of a step, we can give it a name. Connections between steps are done by name, we read the output of the “style” step or the output of the “get-form” step. Step names must be XML Names (no colons, no spaces, etc.). Names are scoped, but they are not globally unique.

Graphically, a step looks something like Figure 2.1, “A step”.

Figure 2.1. A step

This “StepType”-type step named “Name” has two input ports: source and alternate, one output port: result. It has two options: Option and Another_option, and accepts parameters.

Steps have “ports” into which “pipes” are connected. A pipe connects the output port of one step to the input port of another. The type, name, and number of ports on each type of step is fixed and immutable.

The only thing that can flow through a pipe is an XML document. Each input and output port indicates if it can accept or produce a sequence of documents. If a port does not accept a sequence, then it must accept or produce exactly one document.

Consider the pipeline in Figure 2.2, “A pipeline”.

Figure 2.2. A pipeline

This pipeline has four steps named: A, B, C, and D. The fact that step C is implemented as a pipeline of two additional steps is not visible to the steps outside of C. In this pipeline, the pipes are A-to-B, B-to-C, C-to-D, and A-to-D.

It would be more precise to say that B reads from A, C reads from B, and D reads from A and C. Conceptually, the following two statements amount to the same thing: “the result output port of step A is connected to the source input port of step B” or “the source input port of step B is connected to the result output port of step A”. The only difference is perspective; in the former statement, the perspective is from step A, connecting one of its outputs to one of the inputs of step B; in the latter statement, the perspective is from step B, connecting one of its input ports to one of the output ports of step A.

All connections in XProc are written from the perspective of the port which is accepting input: we always, and can only, say that step B reads from step A; we can't say that step A writes to step B. This was a somewhat arbitrary, but conscious decision. Given two different but equivalent ways to say the same thing, XProc requires that only one way be used. This reduces the insignificant variation in pipelines and hopefully reduces confusion as well.

As we'll see in Chapter 3, Steps and Connections, input ports can be connected not only to other steps, but also to the outside world through URIs. Well, technically, IRIs. XProc adopts the common convention that authors can write IRIs in places where we use resource identifiers to point to documents on the web. If all of your URIs can be represented in US ASCII, this distinction is irrelevant and can be ignored. If you have identifiers that contain characters from a broader repertoire, this means you can write them just the way you'd expect.

1. Primary ports

If you think of steps as performing units of work, it's often obvious that one input and one output port are “primary”. For example, the schema validation steps have two inputs: one for the document to be validated and one for the set of schemas to use in validation. The port that reads the documents to be validated is, in some sense, the important one. Similarly, although the XSLT step produces both primary and secondary outputs, the primary output is the important one.

XProc identifies at most one input port and at most one output port as “primary”. Default connections always involve at least one primary port.

The primary attribute on a p:input or p:output element in a step declaration marks a port as either primary or non-primary. If there is only one input or output port, then it is primary unless it is explicitly marked as non-primary. If there is more than one input or output port, then only the one explicitly marked as primary is primary.

In other words, if there's a single input or output port, then it is primary by default. If there's more than one input or output port, then none are primary unless one is explicitly identified as primary. It's an error if more than one is identified as primary.

2. The default readable port

Primary ports come into play when considering default connections. In the context of a pipeline, there may be a “default readable port” associated with each step. In the ordinary case, if a step has inputs that are not explicitly connected, then they will be connected to the default readable port associated with that step.

The default readable port for the first step in a subpipline is its parent's primary input port. If its parent doesn't have a primary input port, then the default readable port is undefined.

The default readable port for each step after the first in a subpipeline is the default output port of its preceding sibling. If its preceding sibling has no primary output port, then the default readable port is undefined.

If there is no default readable port, then no inputs will be connected by default and the pipeline author must make explicit connections for all the inputs.

3. Errors and Exceptions

Two kinds of errors can occur in an XProc pipeline: static errors and dynamic errors. A static error is something that is independent of any of the pipeline's inputs and will be detected by the processor before it even starts running. Leaving an input port unconnected is a static error, so is an attempt to use a step type for which there is no declaration. Static errors are like “compile time” errors in traditional programming languages, a pipeline with a static error isn't a valid pipeline and cannot be run.

Dynamic errors occur when a pipeline encounters an invalid or unexpected condition while it's running. If more than one document is written to a port that only accepts a single document, that's a dynamic error. Many steps can cause a dynamic error to occur; informally we speak of steps failing or throwing an exception. It all amounts to the same thing: something has gone wrong and the step that caused or detected the error will not run or finish running. Steps that fail produce no output, though they may have side effects (such as writing to the filesystem or sending an HTTP POST).

A pipeline that contains a step that fails will also fail. Errors propagate up through any intervening steps until either a p:try step is encountered or the top level step fails, at which point the entire pipeline fails and stops running.

A p:try step catches dynamic errors. If a dynamic error occurs in a p:try, then its p:catch pipeline is run and the output of the step is whatever the p:catch produces. Of course, the p:catch can fail, in which case the “upward cascade” of failure continues to the next p:try or the collapse of the whole pipeline.

The reference pages for the p:try step outline the syntax and mechanics of exception catching in more detail.