Hierarchical Delta Debugging

Hierarchical Delta Debugging – Misherghi & Su, 2006

The thing I find striking about the delta debugging approach we saw yesterday is that with no understanding of the syntax of the input at all, it is still able to simplify, for example, a C program – despite the fact that nearly all of the subsets the algorithm tries will not be syntactically valid. It intuitively feels like we could be much more efficient if we only made syntactically valid changes, not just changes at the character or line level. To be fair, the delta debugging paper very carefully describes sub-setting by ‘changes’ – so nothing is stopping us redefining change in a way that is more meaningful with respect to the structure of the input. But the given examples certainly don’t really do this.

In Hierarchical Delta Debugging (HDD) Misherghi & Su take exactly this step of understanding the structure of the input, and thus ensuring subsets tested are always syntactically valid (they are not trying to find parsing bugs – regular ddmin can suffice for that). In addition, they focus on input sources that are hierarchical in nature such as a program, an XML or HTML document, or a json document. This allows them to exploit the hierarchy and proceed in layers from the root of the tree (whole input). In return for incorporating some understanding of input format into the process, they show that HDD is indeed much more efficient than ddmin, and also produces input that looks more natural to the developer.

Take the C program example we looked at yesterday. ddmin requires 680 tests to minimize it, and produces output that is very terse because it also truncates all redundant characters in variable names as well as redundant formatting etc.. HDD in contrast can minimize the same input with only 86 test cases, and produces something much easier to read:

mult(double *z, int n) 
  int i;
  int j;
  for(;;) {
    i = i + j + 1;
   z[i] = z[i] * (z[0] + 0);

HDD extends the simplifying Delta Debugging algorithm ddmin, not the isolating dd algorithm, on the belief that minimal is often the most useful.

The first step of HDD is to parse the input and produce a parse tree. From here the algorithm proceeds top-down:

We are concerned with hierarchical inputs. Why not first determine the minimal failure-inducing con guration of coarse objects? Following this intuition, we can recursively minimize configurations starting from the top-most level. By limiting simplification to one level of the hierarchy at a time, we can prune data more intelligently. At each level we must try multiple configurations to determine the minimum failure-inducing configuration. This process employs the ddmin algorithm at each level.

When employing ddmin at each level the unit of change is a node in the parse tree – so ddmin will find the 1-minimal set of nodes at that level of the parse tree which are required to reproduce the failure. For each actual test, HDD has to transform the manipulated parse tree back into the input form required by the program.

Unparsing a configuration of the tree is required for each attempted test con guration. This is the most frequent operation executed by HDD. Our implementation traverses the tree, checking for node inclusion in the configuration before printing a particular node.

Once a level has been minimized, HDD recurses to the next level in the tree. It is in essence a breadth-first walk over the parse tree applying ddmin at each node. For a tree with n nodes, HDD runs in O(log n) in the ideal case. More realistic inputs probably run in O(n).

After HDD completes, it is still possible that the tree could be further minimized. HDD works by layer without concern for 1-tree-minimality. So we can further optimise the results by post-processing HDDs output:

We now present an algorithm, HDD+, consisting of a greedy phase and a nal 1-tree-minimality assurance phase. First we perform the greedy phase: a call to HDD on the input configuration tree. This phase will attempt to trim the tree as best it can without consideration for 1-tree-minimality. The second optimality oriented phase is symmetric with the final step of ddmin. It will try to remove individual nodes from the tree one by one in a BFS-like manner. It will repeat the attempt on the entire tree continually as long as the previous iteration successfully removed at least a single node. This algorithm produces a 1-tree-minimal configuration by definition: the final input tree does not have a single node that can be removed, otherwise the algorithm would not have terminated.

With a theoretically higher upper bound on test inputs ( O(n3) vs O(n2) for HDD+), a variant HDD* has the advantage of being very easy to implement and in practice just as useful:

We present another algorithm for 1-tree-minimality, HDD*. This algorithm repeatedly calls HDD until a single call fails to remove a single element from the input tree… Implementing HDD* is simple with an already working HDD implementation, so for comparison we included it in our empirical results. We hope by our evaluation to demonstrate two conclusions: (1) worst-case analysis for HDD, ddmin, and HDD* does not reflect what happens in practice, and (2) 1-minimality is not necessarily the best criteria for input minimization.

The HDD process is very efficient at reducing test cases, and if you already have libraries to hand to parse and unparse your input format, it can be surprisingly efficient to implement. The authors built a version for XML in only 150 lines of Python:

Because of XML’s generality, a single implementation of HDD can be applied uniformly in many application domains. The process of our experimentation is evidence of this: we first implemented the XML tree processingmodule for HDD and then searched for XML document types on which to experiment. Our implementation is very simple; it is written in less than 150 lines of Python code using the DOM API. Although it may not always produce documents that are valid with respect to a specific document type definition (DTD), the generated documents are always well-formed.

There are some great examples of the power of this approach in the paper. If you have a system that works with hierarchical input data that you expect to be working with for any length of time, the investment in producing a test-case minimiser looks to be well worth it to me.