Detecting ROP with statistical learning of program characteristics

Detecting ROP with statistical learning of program characteristics Elsabagh et al., CODASPY ’17

Return-oriented programming (ROP) attacks work by finding short instruction sequences in a process’ executable memory (called gadgets) and chaining them together to achieve some goal of the attacker. For a quick introduction to ROP, see “The geometry of innocent flesh on the bone…,” which we covered back in December of 2015.

Since ROP attacks can bypass Data Execution Protection and Address Space Layout Randomization (ASLR), ROP detection solutions attempt to detect ROP attacks at runtime, either by signature-based or anomaly-based detection. Signature-based defenses look for (pre-defined) patterns in the execution trace of programs. They have very low overhead but can be bypassed by suitably skilled attackers. Anomaly based detection methods learn what normal looks like, and then detect attacks by looking for deviations from the normal baseline. Recent ROP anomaly detectors have explored the use of hardware performance counters to detect anomalies, but the very use of such counters may also hide the underlying program behaviour meaning that they mask the very signals we want to detect.

Today’s paper introduces EigenROP, which investigates the use microarchitecture-independent program characteristics for the detection of ROP attacks. It significantly improves on the prior state-of-the-art for anomaly-based ROP detection, with a detection accuracy of 81% and only 0.8% false positives.

The key idea of EigenROP is to identify anomalies in program characteristics, due to the execution of ROP gadgets. In this context, it is difficult to precisely define what anomalies are since that depends on the characteristics of both the monitored program and the ROP. However, it is reasonable to assume that some unexpected change occurs in the relationships among the different program characteristics due to the execution of the ROP.

The high level approach works is summarised in Fig. 1 below:

EigenROP samples a set of program characteristics every n instructions (we’ll see which characteristics in a moment), and embeds them into a snapshot vector.
EigenROP uses a sliding window over snapshots, and embeds the window measurements into a high-dimensional space.
The principal components of the measurements are then extracted from which a representative direction is estimated: “…the idea here is that any strong relationships among the measured characteristics will appear as principal components in the high-dimensional space.” The training phase learns the representative direction.
During detection, EigenROP performs the same computation of representative direction of the incoming measurements and compares it to the learned baseline. If the distance exceeds some threshold, an alarm is raised.

Which characteristics have the most predictive power?

The authors evaluated 15 different candidate characteristics, before finally selecting 10 of them for use in EigenROP. In the following table, characteristics are divided into one of three different types: A represents architectural characteristics, I represents microarchitecture-independent characteristics, and M represents micro-architectural characteristics.

Let’s look at the intuition behind a few of these characteristics:

Since ROP attacks disturb the normal control flow of execution, they may increase the number of mispredicted branches (MISP_CBR) by the processor branch predictor.
ROP attacks may show different usage for ret and call instructions, as well as push and pop since they depend on chaining blocks of instructions that load data from the hijacked program stack to registers, and later return to the stack (INST_RET, INST_CALL, INST_STACK).
ROP attacks chain gadgets from arbitrary memory locations, so attacks may exhibit low memory locality when compared to clean execution (MEM_RDIST, MEM_WDIST).
ROP attacks typically load data from the hijacked stack to registers using pop instructions with a single operand. The average number of register operands is therefore likely to be lower in a gadget chain (REG_OPS).
ROP attacks use the stack for chaining gadgets, and the gadgets are typically spread out across the memory of the program, thus they show abnormal (lower) reuse of the same memory blocks compared to clean executions (MEM_REUSE).

Implementation and evaluation

EigenROP is implemented in just 700 lines of Python using the MICA framework which is itself based on Intel’s Pin tool, and SciKit-Learn. Several detected-in-the-wild ROP attacks were used for the evaluation, as well as attacks generated by the ROPC gadgets finder and compiler for common Linux programs.

Using a sampling interval of 16K instructions, EigenROP successfully detected the OSVBD-ID:87289 Linux Hex Editor ROP exploit. The PHP ROP OSVBD-ID:72644 was detectable with a 32K instruction sampling interval.

Despite the very small ROP length (only ~60 instructions in the case of the Linux Hex Editor ROP attack) when compared to the sampling window size, EigenROP still detected the deviation in the program’s characteristics.

The overall accuracy of EigenROP across the test set was 81% with a false positive rate of 0.8%. State of the art microarchitectural defenses achieved accuracy between 49% and 68%.

Figure 4 shows the difference in accuracy with and without the microarchitecture-independent characteristics. By including microarchitecture-independent characteristics, an increase of 9% to 15% in accuracy was achieved. This indicates that microarchitecture-independent characteristics contribute significantly to the detection performance of EigenROP.

It’s possible to trade-off the overhead of EigenROP (by increasing the sampling interval) with detection accuracy, as the following chart reveals:

Recent advances in ROP attacks have shown how to bypass conventional ROP defenses though evasion and mimicry using call-preceded gadgets, evasion gadgets, and history-flushing gadgets.

Call-preceded gadgets fool defenses that depend on branch tracing to look for sequences ending in ret that are not preceded by a call. EigenROP doesn’t depend on branch detection, and will pick up the signal from the mispredicted return addresses.
Evasion gadgets use long gadgets to evade detectors looking for short gadgets! EigenROP does not depend on the gadget chain length for its detection prowess.
History-flushing gadgets target defenses that only keep a limited history of execution – such history can be ‘flushed’ by using innocuous gadgets to fill up the history. Flushing of history in the context of EigenROP though is much harder, since it requires chaining enough gadgets exhibiting similar characteristics to benign code across all measured characteristics.

While our work demonstrates that ROP payloads can be detected using simple program characteristics, there are still needed improvements concerning detection accuracy of very short chains and overhead reduction. Future hardware support can help on both fronts by enabling low-cost fine-grained monitoring. Despite that, EigenROP raises the bar for ROP attacks, and can easily run in-tandem with complementary ROP defenses.

5 thoughts on “Detecting ROP with statistical learning of program characteristics”

feedingthegeeek says:

April 6, 2017 at 9:39 pm

Typical home computer user has more available cores in their processor than they’re actively using, and that’s only going to become more true as processor architectures continue to advance. Spending one core to detect attack on the other core seems like a reasonable tradeoff for this class of user. Might be worthwhile to see what reduction in overhead could be achieved by switching from Python to, say, C++.
Pingback: Detecting ROP (Return-Oriented Programming) with statistical learning of program characteristics - ZRaven Consulting
Pingback: 【知识】4月7日 - 每日安全知识热点 - 莹莹之色
Pingback: 半月安全看看看第五期 – 安全0day
Pingback: The dynamics of innocent flesh on the bone: code reuse ten years later | the morning paper

Comments are closed.

Which characteristics have the most predictive power?

Implementation and evaluation

Share this:

Related

5 thoughts on “Detecting ROP with statistical learning of program characteristics”