Azure Data Factory And Apache Airflow Integration Flaws Let Attackers Gain Write Access


Researchers uncovered new security vulnerabilities in the Azure Data Factory Apache Airflow integration dubbed “Dirty DAG”, which allow attackers to get unauthorized write permissions to a directed acyclic graph (DAG) file or use a compromised service principal.

The vulnerabilities can allow attackers to shadow admin authority over Azure infrastructure, which might result in malware deployment, data exfiltration, and unauthorized access to data.

“Exploiting these flaws could allow attackers to gain persistent access as shadow administrators over the entire Airflow Azure Kubernetes Service (AKS) cluster”, Palo Alto Networks said in a report shared with Cyber Security News.

“This could enable malicious activities like data exfiltration, malware deployment or covert operations within the cluster”.

Azure Data Factory Airflow Infrastructure Attack Process

Data Factory is an Azure-based data integration tool that allows customers to manage data pipelines while transporting data across sources.

Azure Data Factory provides a managed Airflow instance integration, which is deployed as an AKS cluster managed by Azure.

Free Webinar on Best Practices for API vulnerability & Penetration Testing:  Free Registration

Apache Airflow is an open-source platform that allows for the scheduling and orchestration of complicated workflows. This allows users to manage and schedule jobs using Python-coded DAGs.

DAG files indicate the order in which tasks should be completed, the dependencies between tasks, and scheduling criteria.

Multiple flaws in the Azure Data Factory were discovered by researchers, including:

  • Misconfigured Kubernetes RBAC in Airflow cluster
  • Misconfigured secret handling of the Azure’s internal Geneva service
  • Weak authentication for Geneva

Attackers can also modify Azure’s internal Geneva service, which is responsible for managing critical logs and analytics, after they have gained access.

This might make it possible for attackers to manipulate log data or get access to other private Azure resources.

 Azure Data Factory and airflow cluster architecture overview

Initially, creating a DAG file that, when imported, launches an automatic reverse shell to a remote server, then the DAG file should be uploaded to a private GitHub repository with the Airflow cluster.

Attackers can access and modify DAG files in two different ways: By using a shared access signature (SAS) token for the files or a primary account with write permissions, write permissions can be obtained to the storage account holding DAG. 

Further, using a misconfigured repository or credentials that have been leaked to access a Git repository. The directory containing the malicious DAG file is automatically imported after the attacker produces a new DAG file or modifies an existing one.

Airflow user interface (UI) showing current DAG files with details (Source: Palo Alto Networks)

Researchers found that the pod’s service account had cluster-admin permissions, which allowed complete control over the cluster.

As shown in the figure below, creating pods, gaining access to Kubernetes secrets, and adding new users were among these permissions. 

Further secrets related to Airflow, such as the PostgreSQL backend password and TLS certificates to the Airflow domain have been observed.

In response to the identified underlying security issue, Microsoft emphasized that “the above is isolated to the researcher’s cluster alone.”

Chain of events leading to host takeover (Source: Palo Alto Networks)

Through reverse engineering, researchers were able to recreate Geneva API calls.

More Azure resources were discovered to be revealed by the API endpoints. Some grant write access to internal Azure services, such as storage accounts and event hubs.

Geneva service in the cluster with access to different Azure resources
(Source: Palo Alto Networks)

Therefore, it is imperative to implement a thorough protection strategy that extends beyond merely securing the cluster’s perimeter.

  • Utilizing policy and audit engines to assist identify and stop future issues (both in the cloud and within the cluster), as well as protecting rights and configurations within the environment itself.
  • Identifying which data is being processed by which data service and protecting sensitive data assets that communicate with various cloud services.

Investigate Real-World Malicious Links, Malware & Phishing Attacks With ANY.RUN – Try for Free



Source link