Helping Developers Help Themselves: Automatic Decomposition of Code Review Changes

Helping Developers Help Themselves: Automatic Decomposition of Code Review Changes – Barnett et al. 2015

Earlier this week we saw that pull requests with well organised commits are strongly preferred by integrators.

Unfortunately, developers often make changes that incorporate multiple bug fixes, feature additions, refactorings, etc.. These result in changes that are both large and only loosely related, if at all, leading to difficulty in understanding.

Rounding out this week of papers from ICSE ’15, Barnett et al. from Microsoft developed a tool called ClusterChanges which decomposes changesets into independent parts. In a study, developers found the partioning to be a helpful aid during code reviews.

… we built a prototype graphical tool and used it to investigate changesets submitted for review in Bing and Office at Microsoft. Our quantitative evaluation shows that over 40% of changes submitted for review at Microsoft can be potentially decomposed into multiple partitions, indicating a high potential for use.

The basic approach to identifying related changes is to take the diff-regions produced by a standard diff tool comparing before and after versions of files, and then group those diff-regions together based on definition-and-use relationships.

We use the def-use relationship as the primary organizing principle for clustering diff-regions. Programmers often introduce interesting functional changes to code by introducing or modifying definitions along with their uses.

ClusterChanges finds definitions (of types, fields, and methods) that have been changed within a diff-region, and the uses of that definition changed within diff-regions. Diff-regions f1 and f2 are then grouped into the same partition (RelatedDiffs) if any one of the following conditions is true:

  • f1 and f2 are both within the same enclosing method, or
  • there are changes to the definition of some element in f1 and corresponding changes in the use of that element in f2
  • f1 and f2 both contain a change to the use of some element, and that element is defined within the changeset, but not itself changed

We group diff-regions in the same method together because a) in practice, we observe that changes to the same method are often related, and b) in prior research, we observed that reviewers usually review methods atomically (i.e., they rarely review different diffregions in a method separately). Given these relations we create a partitioning over the set of diff-regions by computing the reflexive, symmetric and transitive closure of RelatedDiffs.

The result of this process is a set of trivial partitions that are fully enclosed within a single method, or where there is only one diff-region and it is outside of a method, and a set of non-trivial partitions (everything else).

The ClusterChanges tool then displays these partitions graphically:

Review Editor for Clustered Changes

ClusterChanges was applied to a randomly selected set of 1000 changesets submitted for review in the development of Microsoft Office 2013.

While the most common case are changesets containing just one non-trivial partition, this still makes up only 45%. Nearly 42% of all changes contain more than one non-trivial partition. In addition, the proportion of changed methods that end up in non-trivial partitions is 66% on average per review. To the degree that CLUSTERCHANGES correctly identifies non-trivial partitions, this indicates that i) a large proportion of changesets can be decomposed into multiple independent changes, and ii) our decomposition covers a large fraction of changed methods in a review.

Looking at changesets with lots of partitions, the authors found that many of these could be further consolidated by an enhancement to the tool that also considered:

(a) annotating several methods with common C# attributes such as Serializable or Obsolete, (b) a common refactoring (e.g. addition of a log message or variable renaming) across a large number of methods, and (c) relationships between overridden methods and their implementations.

A user study was conducted in which the developers responsible for the changesets were asked if they agreed with the automated decomposition.

Of the 20 participants, 16 said that our non-trivial partitions were both correct and complete, i.e., the non-trivial partitions were indeed independent, the diff-regions within each partition were related and there were no missing conceptual groups… most developers agree with our automatic partitioning and believe the decomposition is useful for reviewers to understand their changes better (some even asked for the prototype to use on their own reviews going forward).

With these promising early results, the authors will now be moving on to do further studies with code reviewers.