Apache Tika Core Flaw Allows Attackers to Exploit Systems with Malicious PDF Uploads

Apache Tika Core Flaw Allows Attackers to Exploit Systems with Malicious PDF Uploads

A newly disclosed critical vulnerability in Apache Tika could allow attackers to compromise servers by simply uploading a malicious PDF file, according to a security advisory published by Apache maintainers.

Tracked as CVE-2025-66516, the flaw affects Apache Tika core, Apache Tika parsers, and the Apache Tika PDF parser module.

CVE ID Severity Vulnerability Type Affected Component Affected Versions
CVE-2025-66516 Critical XML External Entity (XXE) Injection Apache Tika Core, Parsers, PDF Module Tika Core 1.13-3.2.1, Tika Parsers 1.13-1.28.5, PDF Module 2.0.0-3.2.1

The vulnerability is rated critical and impacts a wide range of versions commonly embedded in content analysis, search, and document processing pipelines.

The issue stems from an XML External Entity (XXE) injection flaw in Apache Tika’s handling of XFA (XML Forms Architecture) content embedded in PDF files.

When a crafted PDF containing a malicious XFA component is processed, Tika may evaluate external XML entities, allowing an attacker to access local files, internal network resources, or other sensitive data on the server where Tika runs.

According to the apache advisory, the following versions are affected:

  • Apache Tika core (org. Apache Apache.tika:tika-core) from versions 1.13 through 3.2.1
  • Apache Tika parsers (org. Apache Apache.tika:tika-parsers) from 1.13 before 2.0.0
  • Apache Tika PDF parser module (org. Apache Apache.tika:tika-parser-pdf-module) from 2.0.0 through 3.2.1

The vulnerability is closely related to a previously reported issue, CVE-2025-54988, but CVE-2025-66516 expands the scope of the affected artifacts.

While the original report focused on the PDF parser module as the entry point, ApacheApache has clarified that the root cause and fix reside in Tika core.

This means that organizations that only updated the PDF parser module but did not upgrade tika-core to a safe version (at least 3.2.2) may still be exposed.

Additionally, the new CVE notes that in the older 1.x Tika releases, the PDF parser was bundled inside the general tika-parsers module.

Those earlier packages were not explicitly called out in the initial advisory, leaving some deployments potentially unaware of their exposure.

In real-world environments, Apache Tika is frequently integrated into file upload workflows, search indexing systems, data ingestion pipelines, and security tools that automatically parse and extract content from documents.

In such setups, an attacker could upload or submit a specially crafted PDF, trigger the vulnerable parsing logic, and leverage XXE to exfiltrate secrets or pivot further into internal infrastructure.

Administrators and developers using Apache Tika are urged to:

  • Identify whether their applications rely on the affected tika-core, tika-parsers, or tika-parser-pdf-module versions.
  • Upgrade tika-core to version 3.2.2 or later, and ensure all related Tika components are updated in a consistent manner.
  • Review any systems that process untrusted PDFs, especially public-facing upload endpoints, and consider additional hardening and input validation.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.



Source link