Cloud GenAI workloads inherit pre-existing cloud security challenges, and security teams must proactively evolve innovative security countermeasures, including threat detection mechanisms.
Traditional cloud threat detection
Threat detection systems are designed to allow early detection of potential security breaches; usually, these indicators imply attackers that might have bypassed preventive security measures. Hence, threat detection systems are essential to a layered, defense-in-depth security architecture.
A common strategy employed by threat detection systems is using a threat detection engine, which essentially collects log events for security analysis. These threat detection engines leverage algorithms to detect specific log entries indicative of suspicious activities. Sigma rules are commonly used by several threat detection engines to specify the log events that should be flagged as suspicious. However, due to the wide variety of log formats developed by cybersecurity vendors, Sigma rules are eventually converted into proprietary formats that align with cybersecurity vendors’ proprietary detection engines.
False positives are always a challenge in threat detection; hence, other strategies – e.g., event correlation and Cyber Threat Intelligence (CTI) – are leveraged to increase the accuracy of detections and reduce alert fatigue. More recently, detection engineering has spun off as a specialized aspect of threat detection, allowing detection engineers to customize threat detection systems.
Under the Shared Responsibility Model, organizations using the cloud are responsible for conducting threat detection. This responsibility has been quite challenging to organizations since there is a lot of difference between threat detection in on-premises systems and threat detection on the cloud.
One huge difference is accessing event logs, as organizations depend on cloud service providers (CSPs) to provide logs. In contrast, logs are directly accessible for on-premises systems. Another huge difference is the interconnectedness of cloud resources via APIs in the cloud. By design, this allows for the cloud’s core attributes: agility, scalability, and elasticity. The interconnectedness is a double-edged sword for threat detection: defenders could leverage it for speedy attack detection and prevention, while attackers could also leverage it to move quickly into the cloud’s fabric laterally.
Threat detection for GenAI cloud workloads
Detecting threats in GenAI cloud workloads should be a significant concern for most organizations. Although this topic is not heavily discussed, it is a ticking time bomb that might explode only when attacks emerge or if compliance regulations enforce threat detection requirements for GenAI workloads.
Several challenges exist to evolving threat detection systems in GenAI cloud workloads.
Asset management: Automatic inventory systems are required to track organizations’ GenAI workloads. This is a critical requirement for threat detection, the basis for security visibility. However, this might be challenging in organizations where security teams are unaware of GenAI adoption. Similarly, only some technical tools can discover and maintain an inventory of GenAI cloud workloads.
Lack of threat detection logic: Threat detection engines need specific logic to identify malicious or suspicious events in the cloud. However, this logic must be developed through open-source efforts, e.g., Sigma rules or cybersecurity vendors. Currently, there seems to be little availability of such detection rules.
Alignment with MITRE ATLAS: MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a globally accessible, living knowledge base of adversary tactics and techniques against Al-enabled systems based on real-world attack observations and realistic demonstrations from Al red teams and security groups.
Like MITRE ATT&CK, security teams leverage this knowledge base to enhance threat detection systems by aligning them with the detection rules. This reduces alert fatigue and enables realistic threat detection. However, the current MITRE ATLAS is generic and does not define the cloud-specific GenAI techniques. This might take some time to evolve, similar to the Cloud IaaS Matrix.
Detection gaps and API abuse: Most cloud threats are not actual vulnerabilities but abuses of existing features, making the detection of malicious behavior challenging. This is also a challenge for rule-based systems since they are not always able to identify intelligently when API calls or log events indicate malicious events. Therefore, event correlation is leveraged to formulate possible events indicating attacks.
GenAI has several abuse cases, e.g., prompt injections and training data poisoning. However, more abuse cases will surface as Cloud GenAI becomes more prevalent, and identifying these could be challenging. Proactive measures are therefore necessary to avoid surprises.
A case study: Amazon Bedrock
Let us illustrate the above-mentioned points using Amazon Bedrock, one of the leading GenAI services in the cloud, provided by Amazon Web Service.
Amazon Bedrock allows access to several Foundation Models (FMs) supplied by leading AI companies, including A121 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. Bedrock employs several AI techniques – e.g., fine-tuning and RAG (retrieval-augmented generation) – to empower organizations to build innovative GenAI applications without undergoing rigorous AI processes. Furthermore, Bedrock is serverless, relieving users of infrastructure orchestration and maintenance.
However, a firm understanding of the AWS shared responsibility model, its peculiarities, and its application to Bedrock is imperative for a threat detection system. Organizations leveraging Bedrock first need an efficient cloud asset management system capable of discovering and maintaining an updated inventory of all Bedrock’s components. This capability will allow quick identification of changes that might be malicious.
Next, you need threat detection systems that collect and analyze event logs based on all API calls against Bedrock. AWS Cloudtrail can come in handy; however, commensurate detection logic is needed to examine the collected logs for malicious Cloudtrail event names. Furthermore, Bedrock’s introduction of AWS S3 in the Knowledge Base for the Bedrock component is central to this understanding. This critical Bedrock component manages data retrieval and processing amongst the core Amazon Bedrock components. The vital role played by S3 as a data source is Bedrock’s Achilles heel; it introduces several attack vectors, including data poisoning, denial of service, data breach, and S3 ransomware. It is imperative to evolve systems that quickly detect these attack vectors.
Cloud attack emulation
Cloud attack emulation mimics the tactics, techniques, and procedures (TTPs) of real-world attacks in controlled cloud infrastructure, allowing organizations to evaluate the impact of these attacks on their infrastructure practically and safely.
The MITRE ATT&CK framework heavily influences the attacks emulated, thus providing meaningful value to defenders. Also, MITRE Engenuity formulated Threat-Informed Defense, a guide organizations can leverage to prioritize realistic attacks rather than hypothetical attacks hinged on published vulnerabilities. A core pillar of Threat-Informed Defense is adversary emulation, which is used to validate that the combination of security measures and CTI is as expected. Cloud attack emulation applies the adversary emulation concept to cloud infrastructure by integrating into the cloud’s fabric with API and providing a cloud-native experience.
Cloud attack emulation minimizes cloud detection errors and alert fatigue by safely emulating cyber attacks that typify actual attacker behavior. The emulated attacker behavior, usually captured as security events, provides opportunities to uncover attack vectors that might bypass detection strategies.
Cloud attack emulation is a critical component for developing and improving cloud detection significantly, as cloud APIs, features, and resources change unpredictably, and these changes are potential vulnerabilities and attack opportunities.
Cloud security operation teams can leverage cloud attack emulation in several ways.
Detection engineers can validate if the attack patterns are captured in logging system (e.g., Cloudtrail) and also evolve rules that reduce alert fatigue by identifying potential false positives.
Cloud logs tend to be either decentralized or not available. For example, a data poisoning attack against Amazon Bedrock includes object-level events unavailable in the Cloudtrail console. Identifying these events requires additional configuration, e.g., using Security Lake or CLoudtail Lake. Unaware of this, SOC teams might miss out on data poisoning attacks against the S3 data source bucket.
But running cloud attack emulations provides opportunities to identify these blind spots to evolve commensurate detection mechanisms. The attacks emulated can be based on MITRE ATT&CK and MITRE ATLAS, thus enabling a contextual understanding of threats against GenAI cloud workloads.
Conclusion
GenAI has taken the world by storm, and organizations are rapidly adopting this technology to enable innovation while gaining business advantages. However, most organizations would adopt the GenAI services offered by the public cloud providers to strike a meaningful balance between the required cost and benefits of innovation.
Leveraging GenAI cloud workloads would open several security challenges that are not well discussed currently, especially how to detect threats effectively. The most confusing aspect of this challenge would be grasping the shared responsibility model’s interpretation of GenAI workloads, adapting current threat detection strategies to align with GenAI-specific challenges, and devising suitable technologies.
While learning from actual attacks has proven to be the most powerful motivation to strengthen threat detection, cloud attack emulation provides a means to learn cheaply without the exact implications of an actual cyber attack. Therefore, it is a great way to identify the dynamics of GenAI-specific threats and evolve commensurate detection approaches. Furthermore, cloud attack emulation techniques enable Threat-Informed Defense, thus drastically reducing alert fatigue and false positives for GenAI cloud workloads.