Deep learning models analyzing API sequences for Windows malware detection face challenges due to evolving malware variants.
A group of researchers recently proposed the MME framework, which enhances the existing detectors by leveraging API knowledge graphs and system resource encodings.
Utilizing contrastive learning, MME captures similar malicious semantics in evolved malware samples.
MME Framework
Experimental results demonstrate a 13.10% reduction in false positive rates and an 8.47% improvement in F1-Score over a five-year dataset compared to Regular Text-CNN.
Are you from SOC and DFIR Teams? – Analyse Malware Incidents & get live Access with ANY.RUN -> Free Access
Moreover, MME significantly reduces model maintenance costs, requiring only 1% of the monthly budget to achieve an 11.16% decrease in false positives and a 6.44% increase in F1-Score.
The MME framework enhances API sequence-based Windows malware detection models by addressing the challenge of evolving malware variants. MME introduces two key innovations and they are:-
- A sophisticated API embedding method that combines API knowledge graphs for semantic representation and feature hash embedding for system resource encoding.
- A contrastive learning strategy that improves the model’s ability to recognize similar malicious behaviors across evolving samples.
When applied to LSTM and Text-CNN models using a dataset of 76K Windows PE samples from 2017-2021, MME significantly reduced false negative rates (from 22.4% to 10.1% for LSTM, and 22.7% to 9.6% for Text-CNN) and decreased required human labeling efforts by 24.19%-94.42%.
This approach demonstrates enhanced stability against malware evolution, effectively slowing down model aging and improving long-term detection accuracy without altering the original model structure.
MME framework augments API sequence-based Windows malware detection models by targeting the evolving nature of malware families.
There are three major elements introduced in MME:-
- The first is a knowledge graph for APIs with semantic proximity that makes up for equal API replacement effects.
- Another is hierarchical system resource encoding based on feature hashing, which boosts model focus on similar resource access patterns.
- Meanwhile, the third one involves a contrastive learning strategy that enforces attention to persisting API pieces over malware generations.
Such implementation of MME into LSTM and Text-CNN models significantly extends their working life and decreases false negative rates.
In maintenance scenarios, it shows a decrease in human annotation efforts required by as much as 94.42% without compromising on performance.
However, MME-enhanced models require just 1% labeled data per month to achieve F1 scores above 90% and false negative rates lower than 10%, while regular models require at least 5%.
This reflects a five-fold reduction in analyst involvement along with improved precision of detection which makes MME an effective tool to counteract malware evolution impacts and maintain the sustainable long-term operation of malware detectors.
How to Build a Security Framework With Limited Resources IT Security Team (PDF) - Free Guide