Scrap Your Boilerplate with Object Algebras

Scrap Your Boilerplate with Object Algebras – Zhang et al. 2015

We’ve seen Object Algebras once before on The Morning Paper when we looked at extensible streaming APIs. Today’s paper choice uses the extensible properties of object algebras to help remove some of the boilerplate code traditionally associated with implementing visitors that traverse ASTs. The resulting framework is called Shy, because the code that results is ‘structure shy’. Things do get pretty hairy with all the generic typing involved, so I’m going to bias towards explaining the benefits of object algebras and only sketch the details of the application to AST traversals. If you want more, the link to the full paper is at the top of the post as always.

Back in 1998, a certain P. Wadler posted a challenge on the java-genericity mailing list that he titled ‘The Expression Problem.’

To quote from that post:

The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts). For the concrete example, we take expressions as the data type, begin with one case (constants) and one function (evaluators), then add one more construct (plus) and one more function (conversion to a string). Whether a language can solve the Expression Problem is a salient indicator of its capacity for expression…

When one of the principal designers of Haskell tells you something is a hard typing problem, it almost certainly is! 😉

Object Algebras are a design pattern for solving the expression problem. In the object algebra pattern, interfaces enable the data type to be extended with new data variants, and implementing those interfaces allows developers to define operations. You can add new data variants without breaking existing operations, and you can add new operations without breaking open the data type. “Extension in both dimensions is fully type-safe and does not compromise separate compilation.”

By convention, object algebra interface names seem to end in ‘Alg’. Here’s an interface for a tiny expression language from the background section of the paper:


interface ExpAlg<E> {
  E Lit(int n);
  E Add(E l, E r);
}

Note that the result type of expressions is left abstract (E). We can implement the interface with a type parameter that suits our purpose, e.g.


class Eval implements ExpAlg<Integer> {
  public Integer Lit(int n) { return n; }
  public Integer Add(Integer l, Integer r) {
    return l + r;
  }
}

Suppose instead we want to print out expressions, we can define an implementation that works with Strings…


class Print implements ExpAlg<String> {
  public String Lit(int n) { return "" + n; }
  public String Add(String l, String r) {
    return l + " + " + r;
  }
}

We can extend in the other dimension (adding new data variants / cases) by extending the interface – for example, to add support for multiplication expressions:


interface MulAlg<E> extends ExpAlg<E> {
  E Mul(E l, E r);
}

Instances of the the Multiplication Algebra (MulAlg) can still be passed to the existing Eval and Print implementations and those will continue to work (though of course they ignore the new case). What’s nice though is that we can extend these implementations to add in the multiplication support, without needing to modify them:


class MulEval extends Eval implements MulAlg<Integer> {
  public Integer Mul(Integer l, Integer r) { 
    return l * r;
  }
}

Client side code that builds an expression takes as an argument an instance of the object algebra implementation to be used. So we can define an operation ‘makeExp’ to make an expression “2+3” as so:


 E makeExp(ExpAlg<E> alg) {
  return alg.Add(alg.Lit(2), alg.Lit(3));
}

We can now evaluate that expression by calling makeExp(new Eval()), and ‘print’ it by calling ‘makeExp(new Print())`.

So far so good. When we try to expand this approach to larger expression languages though, we find that the Object Algebra solves the problem of extensibility, but does not remove the boilerplate code inherent in needing to provide an implementation of every method in the algebra interface, even if you are only interested in one or two.

To deal with the boilerplate problem we created Shy: a Java Object Algebras framework, which provides a number of generic traversals at the cost of a single annotation. The key idea in Shy is to automatically create highly generic Object Algebras, which encapsulate common types of traversals. In particular, Shy supports generic queries and transformations.

Here’s an example of an expression language interface for the MiniQL questionnaire language, using the Shy @Algebra annotation. (Note we have more type parameters now to model the different types of things within the interface).


@Algebra
public interface QLAlg<E, S, F> {
  F Form(String name, List<S> body);
  S If(E cond, S then);
  S Question(String name, String label, String type);
  E Lit(int n);
  E Var(String x);
  E GEq(E lhs, E rhs);
}

We can write a generic query to figure out which variables are used in a QL expression very simply…


class UsedVars implements QLAlgQuery<Set<String>> {
  public Monoid<Set<String>> m() {
    return new SetMonoid<String>();
  }
  public Set<String> Var(String name) {
    return Collections.singleton(name);
  }
}

There’s a lot going on here! The QLAlgQuery interface is automatically generated by Shy and extends the annotated base QLAlg interface. The monoid returning operation m() defines the accumulator (aggregator) type to be used. There are a number of Monoid implementations provided out of the box in Shy (they simply have empty() and join(x, y) operations). Simply by implementing the Var method to extract a Set from the parameters (just name), everything else can happen automatically. Note that the Var method returns an object of type E, and E is used as an argument in the If and GEq methods…

Things get even more involved as you go further into the paper. For example:

The previous section introduced simple queries where each constructor contributes to a single monoid. Recursive data types, however, often have multiple syntactic categories, for instance expressions and statements. In such multi-sorted Object Algebras each sort is represented by a different type parameter in the algebra interface. In this section we present generalized queries, where each such type parameter can be instantiated to different monoids. It turns out that, for some operations, this generalized version of queries is needed…

I’ll spare you the details 😉

As well as queries, Shy can do transformations. They’re fairly easy to use, but pretty hard to explain succinctly: “… we instantiate algebras over function types to obtain transformations which pass information down during traversal. Instead of having the algebra methods delegate directly to base algebra (e.g. expAlg()), this now happens indirectly through closures that propagate the context information…” I refer the interested reader to the full paper.

Object Algebras are interesting. Shy pushes them to the limits to enable traversals to be written in Java that are type-safe, extensible, and separately compilable.

There has always been a tension between the correctness guarantees of static typing, and the flexibility of untyped/dynamically-typed approaches. Shy shows that even in type systems like Java’s, it is possible to get considerable flexibility and adaptability for the problem of boilerplate code in traversals of complex structures, without giving up modular static typing.