WAFLGo as our comparison not a good idea

15 May, 2025 [essay] 2 min read

Abstract

Benchmarking multi-target fuzzers is tricky because some tools exploit commit history while others rely on static locations only. I explain why WAFLGo can distort such experiments and propose a safer comparison strategy.

본문

Most recent evaluation plans reuse a fixed list of historical bug locations, yet WAFLGo was designed for regression testing on fresh commits. This mismatch raises two hidden biases that can mislead reviewers and ourselves.

WAFLGo first parses the latest diff, builds a write-after-load slice, and labels every block in that slice as a target. It then assigns a critical score that gives more energy to seeds covering those blocks. If we strip away the diff and hand the tool our legacy list, the critical score collapses to zero and WAFLGo devolves into ordinary AFLGo behaviour. Any performance ranking that follows would reward parameter coincidence rather than the commit-aware idea the authors proposed.

Suppose we locate the original commit for each legacy bug and feed those diffs to WAFLGo. Now the opposite bias appears. WAFLGo receives many extra basic blocks that lie on the slice, whereas other fuzzers see a single coordinate per bug. The extra structural context narrows WAFLGo’s search space and increases its apparent precision, so the comparison again becomes unbalanced.

A cleaner approach is to reserve WAFLGo for genuine commit-level regression studies and to employ a location-oriented baseline such as SelectFuzz. SelectFuzz accepts the same multi-target list as Lyso and AFLGo, prunes instrumentation with data-flow analysis, and competes without secret advantages. Choosing an apples-to-apples line-up keeps the experiment defensible and avoids reviewer objections about hidden guidance.

#Essay #Weekly-Writing

<< Previous Post

Next Post >>

← 뒤로