Blog

Pecia: Towards a Fluent Interface for Building Documents

25 Jun, 2009
Xebia Background Header Wave

1. Introduction

There is a chance that – after having read this article – you conclude that nobody in a sane state of mind would ever use what this article is going to suggest. Let me therefore start with disclaimer: I have never made any public claims regarding my state of mind.
Apart from that, I figure an article about a technology almost nobody is using, is still way more interesting than an article everyone is using. In fact, I guess the more senior you are, the more things you have already seen before, the less likely it is you will be you will be interested in something people already did many times before. Based on that, you might as well say that the most experienced people around will probably be interested in stuff that nobody is using. This article is for those people.
With that out of the way: Pecia is a new way of generating documentation from your Java applications. You will probably wonder why we need yet another way of generating documents from Java, and I have to admit that the Java world is not in a bad shape if it comes down to the number of frameworks allowing you to generate documents. However, Pecia takes another stab at it, and I just had to see if it would work. You be the judge whether it makes sense.

2. Background

I clearly remember the day on which I spend more time than I am willing to admit on finding out why my Maven report was not producing well-formed HTML. Definitely not one of my finest moments.
Maven reports are built using an API called Doxia. It’s not a general purpose template-based text generating mechanism, like Velocity, FreeMarker or StringTemplate. In fact, there is no template at all. Instead, Doxia provides an API for building a document, abstracting the final representation of that document.
In a way, the most important interface in Doxia is the Sink interface. The Sink interface is basically the builder interface. It has operations to start the document body, to start a section, to generate text, to start a table, a table row, a table cell, and many other document chunks, and then a slew of operations to finish all of those. Example 1, “Doxia Sink usage” shows an excerpt of some of my code using Doxia.

Example 1. Doxia Sink usage

[java]
sink.body();
sink.sectionTitle1();
sink.text("Message Catalog");
sink.sectionTitle1();
sink.table();
sink.tableRow();
sink.tableHeaderCell();
sink.text("Type");
sink.tableHeaderCell
();
sink.tableHeaderCell();
sink.text("Identifier");
sink.tableHeaderCell();
sink.tableHeaderCell();
sink.text("Message");
sink.tableHeaderCell
();
sink.tableRow_();

[/java]

Unmistakeably, there is a correspondence between the Sink interface and a subset of the HTML content model. However, HTML is not the only type of document that can be generated from Doxia. In fact, by abstracting the interface from the implementation, Doxia is capable of generating any type of output document. Because of that, it generates PDF just as easy as HTML.
Now, there is one limitation in the Doxia approach: you are never really sure if your code is building a valid document. The API does not prevent you from adding an image to a table row, or inserting text in a table row outside of a table cell. And since Doxia aims to abstract you from the target representation, it’s quite hard to make any assumptions on what is or is not considered to be valid.
This was in fact the reason why I spend so much time in completing my Maven report. It turned out I was ‘building’ the wrong type of document elements at the wrong time. The API did not prevent me from doing it, and the framework did not warn me at runtime. Which made me wonder….

3. Pecia

It made me wonder if it would not be possible to enforce validation at compile time. Would it not be much nicer to have the API prevent me from adding images to table rows, or – for instance – from adding a fourth table cell to a three-column table? Would it not be much nicer if the API would prevent me from making any mistakes like these? And would it not be great if the API would be a fluent API[]?
From my perspective, the answer to all of these questions was: yes, that would be much nicer. It would allow me to spot errors quickly, and moreover, it would make my IDE’s code completion assistance actually become valuable. With Doxia’s Sink interface, your IDE will offer you the choice of adding images to a table row. In a framework that enforces proper document structure through its API, your IDE would never consider that to be a viable option.

4. Pecia API Principles

4.1. Context-based

The API offered by Pecia depends on on your context. This is probably best explained with an example.

Example 2. Pecia API Usage

Article article = ...;
ItemizedList list = article.itemizedList();
list.item("first item");
list.item("second item");


So what happens in the example above? Well, first you create the target document. More on that later. From that point on, you can add content to the Article. Once you add an itemized list, the API returns an ItemizedList instance, an object representation of that new context. If you want to add content to the itemized list, you need to invoke operations on the ItemizedList instance. In this case, the example adds two items to the itemized list.
Just like Doxia, Pecia is also backed by a number of implementations of the API. At this stage, it supports both DocBook and HTML output. Given the sample code given above, a simple HTML implementation would generate HTML output like this:

<html>
  <body>
    <ul>
      <li>first item</li>
      <li>second item</li>
    </ul>
  </body>
</html>

4.2. Method Chaining

Now, the example given above is not really illustrative for the way you would write code with Pecia. With method chaining, you can create the document without declaring variables to hold every intermediate content model element, and create a interface that bears a greater similarity with the way you would normally create documents in markup languages in, say, HTML or DocBook. Let us just say, a more fluent interface. So instead of writing the code given above, in the previous chapter, you can write code like this:

Example 3. Method chaining

Article article = ...;
article.itemizedList()
  .item("first item")
  .item("second item");

 

4.3. Shorthand Notations

This is another sample document:

Example 4. Mixed ‘Expanded’ and Shorthand Notations

article
  .author("Wilfred Springer")
  .copyright("agilejava.com", 2008)
  .para()
    .text("This is the ")
    .emphasis("first")
    .text(" paragraph.")
  .end()
  .para("And this is the second.")
.end()


Which will generate something similar to this:

<html>
  <body>
    <p>This is the <em>first</em> paragraph.</p>
    <p>And this is the second.</p>
  </body>
</html>

The important principle illustrated here is that Pecia both has shorthand notations as well as more verbose notations for specifying content. The simple para(String) operation (illustrated by the second paragraph in the example) starts the paragraph, adds text to it and closes the paragraph. So, it basically expands to this:

.para()
  .text("And this is the second.")
.end()

The principle does not only apply to paragraphs. It also applies to other document elements, such as list items, table cells and footnotes. In all of these cases, you can add that document element using a simple operation accepting a String with the text to be embedded within that document element, or by calling an operation without any arguments, which will change the context into the context of that document element.
Let’s take an API snippet as an example. Example 5, “Pecia API Snippet” shows the signature of some operations on Para, the interface implemented by paragraphs. As you can see, there are two different footnote operations. They are different in a number of ways.
First of all, the first one takes a String argument, and the second does not. The first operation will create footnote, add a paragraph, and add text to the paragraph in a single call. Once it is done, the entire footnote is considered to be done. The context is no longer the footnote that was just added the paragraph. The context is – again – the paragraph itself.

Example 5. Pecia API Snippet

interface Para<T> {
  ...
  Para<T> footnote(String text);
  Footnote<? extends Para<T>> footnote();
  ...
}


The other footnote operation does not take a String argument. The API assumes you are not interested in adding an empty footnote (why would you?), and changes the current context into the context of the footnote. From that point on, you can only invoke operations defined by the Footnote interface, until you finally consider yourself to be done with the footnote and call its end() operation, which will restore the original context.

4.4. Tables

Tables deserve some special attention. In order to preserve a valid document structure, you not only want to restrict table cells to table rows; you also need to make sure that every row contains exactly the same number of cells.
Enforcing this property of tables proved to be challenging. Before going into details, let us first look at an example:

Example 6. A table in Pecia

article.table2Cols()
  .header()
    .entry().para("col1")
    .entry().para("col2")
  .end()
  .row()
    .entry().para("foo")
    .entry().para("bar")
  .end()
  .row()
    .entry().para("foo")
    .entry().para("bar")
  .end()
.end();


Example 6, “A table in Pecia” illustrates how to build a table that more or less corresponds to this HTML table:

<table>
  <tr>
    <th><p>col1</p></th>
    <th><p>col2</p></th>
  </tr>
  <tr>
    <td><p>foo</p></td>
    <td><p>bar</p></td>
  </tr>
  <tr>
    <td><p>foo</p></td>
    <td><p>bar</p></td>
  </tr>
</table>

So what exactly is happening in Example 6, “A table in Pecia”? Well, first the table2Cols() operation constructs a table of two columns. The object getting created will allow only operations on tables of two columns.
The first thing we do after that, is adding a table header, by calling header() on the table. Since it is a two column table, the header accepts only two cells. Any attempt to add more or less then those two cells will give compilation errors.
Every table cell is getting constructed by calling entry(). The resulting context is a table cell. There are a number of things you can add to a table cell, such as paragraphs. Once you are done with the cell, you either call entry() or end(). Calling entry() will create the next table cell. Calling end() will mark the end of the current table header. And because of the way Pecia has been constructed, you can only call end() after the last table cell, and only call entry() before the last table cell.
Table header are added in exactly the same way as table rows; only in this case you call row() instead of header().

4.5. Metadata

Some document elements can have metadata associated to it; it often involves data that is not necessarily part of the main document flow. In cases like those, Pecia allows you to specify metadata at the start of the document element to which it is pertaining.
Let us take an article as an example. An article can have an author. At the beginning of an article, before adding any content to the article, you can add metadata like the author’s name. Once you have started adding content to the article, it is impossible to add any more metadata. Example 7, “Article metadata” shows you a valid way of using it. Example 8, “Illegal article metadata” illustrates an invalid way of specifying metadata; the compiler will not accept any more metadata after content has been added to it.

Example 7. Article metadata

article
  .author()
    .firstname("Wilfred")
    .surname("Springer")
  .end()
  .para("This is the first paragraph.");


Example 8. Illegal article metadata

article
  .para("This is the first paragraph.")
  .author()
    .firstname("Wilfred")
    .surname("Springer")
 .end()

 

5. Using Pecia

In the previous section, you have seen most of the basic principles behind Pecia. However, you have not really seen how you actually make sure that some output document is generated as a result. That was done deliberately. The important thing here is the API outlined above. How you actually get your hands on an actual implementation, and how that implementation will treat handle the documents you are building is implementation specific.
Fortunately, Pecia does come with an implementation. So this is how you use the implementation:

Example 9. Producing HTML

// The standard implementation uses a wrapper around STaX to
// produce XML documents.
XmlWriter writer = new StreamingXmlWriter(...);
// The DocumentBuilder will actually produce the output.
DocumentBuilder builder = new HtmlDocumentBuilder(writer);
// But if we are building documents, we need to have an
// implementation of the interfaces mentioned above. Let's wrap
// the DocumentBuilder in an Article implementation. (The
// second argument is the Article's title.)
ArticleDocument document =
  new DefaultArticleDocument(builder, "Example");
// ... and now we can build the document.
document
  .section("First section")
    .section("First subsection")
    .end()
  .end()
.end();


The standard Pecia implementation will generate the output on the fly. Technically, there is nothing preventing you from creating the entire document in memory first, and generating output afterwards. So all of this is all just implementation. In fact, I suspect some significant changes in the implementation, somewhere in the future; use this implementation at your own risk.

6. Pecia State

After having read the previous section, you probably already guessed that Pecia is not done yet. It is usable, and it is actually in use in one of my projects, but there is still work left to be done. So this article is in a way covers an alpha version of the API.

7. Pecia Document Object Model

The document object model supported by Pecia today is fairly simple. In fact, it is probably way too simple. Which is another reason why Pecia there is no 1.0 version of Pecia yet.

Figure 1. Pecia Document Object Model

Pecia DOM


Figure 1, “Pecia Document Object Model” provides a schematic overview of the document object model supported. The arrows denote potential containment relationships: a list item can contain paragraphs, sections can contain tables, lists, verbatim content and other sections, etc. The document elements supported are pretty self-explanatory. The only exception may be xref, which represents an internal reference to another part of the document.

8. Summary & Conclusions

In this article, I have tried to justify the creation of yet another framework for generating documentation. It grew out of unease with the existing solutions, and then turned to have a couple of interesting side effects. Pecia not only prevents you from breaking the document structure in the documents you generate at compile time, but also supports the IDE in preventing you to make these errors altogether, at coding time.
In fact, there are some benefits that I haven’t even covered yet. They might be less tangible, but nevertheless, very real. I started to see that benefit when applying Pecia in a project where I had 40-something small objects floating around, each of which needed to be represented differently in my document, depending on the context.
In situations like those, many of the existing frameworks commonly in use for generating documents will force you have all of these 40-something objects expose their state to the outside world, in order to be able to bind to it from an external template.
However, this violates one of the most important principles of object orientation: the encapsulation principle. By having the objects expose their entire state to the outer world, you have actually increased the dependencies between the outer world and the object’s implementation, and instead of defining behaviour as part of the object, the behaviour (how to represent itself) is externalized entirely. Consequently, maintaining the templates becomes a nightmare; your templates need to have a deep understanding of the internals of all objects.
Doxia would already allow you to take a different approach: you could potentially define a common interface on all objects allowing each object to render itself using the Sink interface. However, how would you convey the context in which the content needs to be written? How would you make sure that your object understands that it needs to display itself as part of a paragraph. Or as a table? How do you make sure that the object does not write outside of the context expected by the calling program?

void render(Sink sink) {
  // doh, how would I know if I am in a paragraph or table context
}

In Doxia, there is no way of solving this. In Pecia, this is trivial. The common interface would simply define a single operation for each context in which the object needs to render itself:

void render(Para<?> para) {
  // ah, I need to render it as part of a paragraph
  // end there is no way of writing outside of that context.
}

Which is only to say: there does seem to be a case for frameworks like Pecia. True, the current incarnation of Pecia is still quite young, and there are definitely things that will change in due time, but I have come to believe that it has potential, and I hope to have convinced you about that too.
Pecia is currently hosted at SourceForge (https://pecia.sourceforge.net/).
[] See https://www.martinfowler.com/bliki/FluentInterface.html.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts