A newly disclosed critical flaw, CVE-2025-64712 (CVSS 9.8), in Unstructured.io’s “unstructured” ETL library could let attackers perform arbitrary file writes and potentially achieve remote code execution (RCE) on systems that process untrusted documents.
Unstructured is widely used to convert messy business files into AI-ready text and embeddings, and the vendor’s ecosystem footprint is often cited as spanning a large portion of Fortune 1000 deployments, raising concerns for organizations running the pipeline in production.
“Unstructured data” (PDFs, emails, Word docs, slide decks, images) represents roughly 80%–90% of enterprise data, but it is hard to search and analyze in traditional systems.
Tools like Unstructured.io address this by extracting text (OCR for PDFs, speech-to-text for audio, etc.), splitting it into chunks, and storing the results in a searchable backend such as a vector database, so AI assistants can retrieve relevant passages quickly.
| CVE ID | CVSS Score | Description |
|---|---|---|
| CVE-2025-64712 | 9.8 | Path traversal leading to arbitrary file write, potentially enabling RCE on hosts running the unstructured library |
Unstructured.io offers an open-source library (and managed offerings) that many internal AI assistants and RAG pipelines rely on for ingestion from sources like S3, Google Drive, OneDrive, or Salesforce.

That positioning makes any file-handling vulnerability especially dangerous; the pipeline often runs with broad filesystem access and processes files pulled from shared drives, ticketing systems, inboxes, or third-party connectors.
Technical details and risk
According to Cyera, the issue is described as a classic path traversal bug in the code path that processes Microsoft Outlook email message files (.msg).
When the library handles a .msg with attachments, it stores each attachment into a temporary directory before extracting text from it.
The vulnerable behavior occurs when the temp file path is built by concatenating the temp directory (for example, /tmp/) with the “original name” of the attachment file without safely constraining it.
If an attacker can control the attachment filename, they can use traversal sequences like ../../ so the write lands outside the temp directory.
For instance, a crafted attachment name such as ../../root/.ssh/authorized_keys could cause the library to overwrite SSH authorised keys with attacker-controlled content, enabling persistent access.
Similar techniques can target cron entries, startup scripts, or app-served directories to escalate to RCE, depending on the runtime and permissions.
Risk increases because Unstructured is frequently pulled in indirectly through upstream frameworks and “wrapper” libraries used in AI apps, which can make the blast radius hard to inventory.
To mitigate this, restrict file processing to isolated containers/VMs, run as a non-root user, block traversal via path normalisation and filename allowlists (e.g., enforce basename), and avoid writing attacker-influenced filenames to disk.
Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google




