Currently, one analytic is available.
Fuzzy hashing is an effective method to identify similar files based on common byte strings despite changes in the byte order and structure of the files. ssdeep provides a fuzzy hash implementation and provides the capability to compare hashes. Virus Bulletin originally described a method for comparing ssdeep hashes at scale.
Comparing ssdeep hashes at scale is a challenge. Therefore, the ssdeep analytic computes
ssdeep.compare for all samples where the result is non-zero and provides the capability to return all samples clustered based on the ssdeep hash.
When possible, it can be effective to push work to the Elasticsearch cluster which support horizontal scaling. For the ssdeep comparison, Elasticsearch NGram Tokenizers are used to compute 7-grams of the chunk and double-chunk portions of the ssdeep hash, as described here. This prevents the comparison of two ssdeep hashes where the result will be zero.
Because we need to compute
ssdeep.compare, the ssdeep analytic cannot be done entirely in Elasticsearch. Python is used to query Elasticsearch, compute
ssdeep.compare on the results, and update the documents in Elasticsearch.