VirusTotal is an essential tool for cybersecurity professionals. It offers a comprehensive platform for analyzing files, URLs, domains, and IP addresses to detect malicious activities.
This guide provides a detailed overview of the effective use of VirusTotal for threat research, leveraging its extensive dataset and querying capabilities.
Understanding the VirusTotal Dataset
The VirusTotal dataset is the platform’s core, storing over 50 billion files, 6 billion URLs, and 4 billion domains. It organizes artifact-related information into objects, each with an ID, type, and attributes. These objects can represent files, URLs, domains, IPs, and more, with relationships that provide contextual links between them.
Object Structure
- ID: Uniquely identifies an object, derived from the artifact itself (e.g., SHA-256 hash for files).
- Type: Indicates the kind of information stored (e.g., file, URL, domain).
- Attributes: Data items related to an object, which can be primitive or complex.
- Relationships: Connections between objects, useful for describing scenarios involving multiple artifacts.
Querying VirusTotal
VirusTotal offers two main interfaces for querying its dataset: the graphical user interface (GUI) and the application programming interface (API).
VirusTotal GUI
According to the VirusTotal and SentinelOne detailed write-up, The GUI allows manual interaction with the dataset. Users enter search queries composed of search filters, which can be unique identifiers or search modifiers in the format modifier:value
. Logical operators (AND, OR, NOT) can combine multiple modifiers for complex queries.
VirusTotal’s web interface allows users to search for specific artifacts like URLs, domains, file hashes, or IP addresses using search filters. These filters can be a unique identifier for an artifact or more complex queries using search modifiers (e.g., modifier:value).
Users can combine multiple modifiers with logical operators (AND, OR, NOT) and group them using parentheses.
Each search is scoped to a top-level collection (files, URLs, IPs, domains, or collections) specified by the entity modifier (e.g., entity:file).
The platform retrieves and displays related information as a web analysis report or a list of results based on the search query.
VirusTotal API
The API enables programmatic interaction, suitable for large-scale querying. Users issue HTTP GET requests to API endpoints, specifying search filters in the request URLs. The API provides more extensive information than the GUI, making it ideal for tasks requiring scalability.
A basic way to query VirusTotal via the API is by sending HTTP GET requests to specific API endpoints, using search filters in the request URLs. Key endpoints include:
- /api/v3/intelligence/search?query={query}: Allows querying VirusTotal using a unique identifier (like a URL, domain, IP address, or file hash) or search modifiers.
- Example URLs:
https://www.virustotal.com/api/v3/intelligence/search?query=test.com
https://www.virustotal.com/api/v3/intelligence/search?query=entity:domain+and+domain:test
- /api/v3/files/{hash}: Retrieves the file object that matches the specified hash (MD5, SHA-1, or SHA-256).
- Example URL:
https://www.virustotal.com/api/v3/files/e6adf40a959308ea9de69699c58d2f25
Querying with the /api/v3/files/{hash} endpoint returns JSON-formatted data, which users can parse and use for further analysis or querying. API requests can be made using HTTP client libraries, command-line tools, or custom scripts, with the official Python library, vt-py
, simplifying this process.
API vs. GUI
The VirusTotal API offers advantages over the GUI, particularly in scalability and data scope:
- The API enables large-scale querying, which is impractical via the GUI. For example, retrieving process names for all Windows Shortcut files submitted in 2024 is feasible only with the API.
- Some data in VirusTotal cannot be queried through the GUI, such as URLs found in process memory during sandbox execution (e.g., secondary C2 URLs). The GUI displays these in reports but doesn’t allow direct querying. The /api/v3/files/{hash}/behaviours endpoint retrieves sandbox data, including
memory_pattern_urls
, not accessible via the GUI.
The API may also provide more detailed information than the GUI, such as complete sandbox-generated data, including suspicious behavior rules, via the /api/v3/files/{hash}/behaviours endpoint, which is not fully visible in the GUI.
AI Integration in VirusTotal
VirusTotal utilizes AI to create natural language summaries of code in executable files, aiding malware analysis. This involves integrating AI engines into its platform, which analyze code and generate both summaries and verdicts (benign, suspicious, or malicious).
VirusTotal supports two AI engine types: Code Insight (in-house, based on Google’s Gemini) and Crowdsourced AI (community-contributed engines). These engines specialize in different file types, helping analysts understand and categorize the capabilities of the malware.
Key Features:
- Code Insight: Focuses on scripts (e.g., PowerShell, Python) and now supports Windows PE binaries.
- Analysis Process:
- Unpacking: Using Mandiant Backscatter.
- Decompilation: With Hex-Rays IDA Pro.
- Analysis: Using Gemini to summarize code functionalities.
- Crowdsourced AI: Third-party engines like ByteDefend (analyzes Microsoft Office macros).
- Search Modifiers: Users can query for specific summaries or verdicts in VirusTotal’s dataset.
AI Search Modifiers:
Modifier | Usage and Scope |
---|---|
codeinsight:[text] |
Searches for text in summaries generated by Code Insight. |
crowdsourced_ai_analysis |
Searches text in summaries from Code Insight and all Crowdsourced AI engines. |
crowdsourced_ai_verdict |
Searches for verdicts (benign, suspicious, malicious) by all AI engines. |
[ENGINE]_ai_analysis |
Searches text in summaries from a specific Crowdsourced AI engine. |
[ENGINE]_ai_verdict |
Searches verdicts (benign, suspicious, malicious) from a specific AI engine. |
These AI tools enhance but do not replace traditional malware analysis, as they might miss novel or highly obfuscated code.
Using VirusTotal Search Modifiers for Effective Threat Analysis
VirusTotal offers a wide range of search modifiers that allow analysts to query the platform with precision, aiding in retrieving relevant data for specific threats.
However, the challenge of false positives (irrelevant data) and false negatives (missing relevant data) can affect the accuracy of search results. Crafting and refining queries using logical operators (AND, OR, NOT) helps mitigate these issues, making the search process more iterative and effective.
Example Investigation
In 2023, SentinelLabs investigated suspected China-linked actors targeting Southeast Asian gambling companies, leading to the discovery of the malware loader AdventureQuest.exe. This .NET-based executable was signed with a certificate likely stolen from the Ivacy VPN vendor PMG PTE LTD, a tactic often used by Chinese threat actors.
Are You From SOC/DFIR Teams? - Try Advanced Malware and Phishing Analysis With ANY.RUN - 14 day free trial
Querying VirusTotal
To investigate further, analysts used the following search approach:
- Digital Signature Query: Searching for files with the same certificate serial number (
signature:"0E3E037C57A5447295669A3DB1A28B8A"
) returned 94 results. - File Type Refinement: Narrowing the search to Windows PE executables (
AND type:"peexe"
) reduced the results to 31. - .NET Framework Focus: Further refining to .NET-built executables (
AND magic:".NET"
) brought the results to 13. - Eliminating False Positives: Excluding files with PDB paths or those from Ivacy VPN’s directory (
AND (NOT metadata:".pdb") AND (NOT name:"Program Files (x86)ivacy")
) isolated AdventureQuest.exe as the sole relevant result.
This investigation demonstrated how VirusTotal’s search capabilities, when used strategically, can effectively identify threats within specific activity clusters, despite potential challenges with false positives or negatives.
Practical Use Cases
VirusTotal is invaluable for various threat intelligence activities, such as:
- Clustering Artifacts: Grouping related artifacts to identify trends or specific threat actors.
- Tracking Malicious Activities: Monitoring the behavior of known threats and identifying new ones.
- Analyzing Threat Trends: Understanding the evolving threat landscape.
Limitations and Considerations
While VirusTotal is a powerful tool, users should be aware of certain limitations:
- False Positives/Negatives: Inaccurate data can impact search results’ relevance and completeness.
- Sandbox Limitations: Some malware may evade sandbox analysis, leading to incomplete behavior capture.
- AI Limitations: AI-generated summaries may lack detail or accuracy due to limitations in training data.
Effectively using VirusTotal for threat research requires understanding its querying capabilities and the factors that may impact data relevance. While the GUI provides a user-friendly interface, the API offers expanded capabilities for large-scale investigations.
AI engines integrated into VirusTotal can enhance analysis efforts but should be used as part of a broader strategy. By leveraging VirusTotal’s extensive features, users can conduct thorough and effective threat investigations.
What Does MITRE ATT&CK Expose About Your Enterprise Security? - Watch Free Webinar!