TLSH - Blog

Notes on Function Re-ordering

by Jonathan Oliver
13 / April / 2021.

A recent article by Martin-Perez et al. called "Bringing order to approximate matching: Classification and attacks on similarity digest algorithms" considered the issue of function re-ordering.
They write:

"Last, counting features in a limited set of counters (as TLSH or Nilsimsa do) provokes the loss of the order between features, which allows for the input rearranging without affecting to the similarity score.
This issue is partially solved by considering feature intersection (processed feature-aware).
The last limitation is the solution less vulnerable to attacks..."

TLSH and Nilsimsa indeed have the property that rearranging functions has limited impact on the score (this was experimently tested, for example in Table 4 of this article).
We will assert that this is not a limitation, but rather a requirement for security and malware applications:

Desireable Property for Fuzzy Hashes: Code which is re-ordered versions of some original file should be scored as being similar.

Function re-ordering is a known obfuscation technique used by malware authors for more than a decade (see slide 9).
Function re-ordering is a straight forward obfuscation technique to use because re-ordering the functions can be done with very low chance of breaking the code.

Code with N functions can be re-written with re-ordered functions in approximately N! ways.
So if a fuzzy hash is not resistant to function re-ordering then it has given attackers an easy to exploit attack surface which has the potential to generate a never ending list of malware variants.