Apache Tika CVE Expands To Critical Multi-Module Flaw

Apache Tika CVE Expands To Critical Multi-Module Flaw

A security issue disclosed in the Apache Tika document-processing framework has proved broader and more serious than first believed. The project’s maintainers have issued a new advisory revealing that a flaw previously thought to be limited to a single PDF-processing component extends across several Tika modules, widening the scope of a vulnerability first publicized in mid-2025. 

Initial Disclosure and the Limits of CVE-2025-54988 

The original flaw, listed as CVE-2025-54988 and published in August with a severity rating of 8.4, was traced to the tika-parser-pdf-module used to process PDFs in Apache Tika from versions 1.13 through 3.2.1. Tika, a tool designed to extract and standardize content from more than 1,000 proprietary file formats, has long been a target for attacks involving XML External Entity (XXE) injection, a recurring risk in software that parses complex document formats. 

According to the original CVE description, the weakness allowed attackers to hide XML Forms Architecture (XFA) instructions inside a malicious PDF. When processed, these instructions could enable an XXE injection attack, potentially letting an attacker “read sensitive data or trigger malicious requests to internal resources or third-party servers.”

The vulnerability also created a pathway for data exfiltration through Tika’s own processing pipeline, with no outward indication that data was leaking. 

New CVE Expands Affected Components and Severity 

Project maintainers now report that the PDF parser was not the only vulnerable entry point. A new advisory issued on 4 December 2025 by Tim Allison on the Tika mailing list confirms that the issue affects additional components. The newly disclosed CVE-2025-66516, rated at a maximum severity of 10.0, expands the scope to include: 

  • Apache Tika core (tika-core) versions 1.13 through 3.2.1 
  • Apache Tika parsers (tika-parsers) versions 1.13 through 1.28.5 
  • Apache Tika PDF parser module (tika-parser-pdf-module) versions 2.0.0 through 3.2.1 

The maintainers note two reasons for issuing a second CVE. First, although the vulnerability was detected via the PDF parser, the underlying flaw and its fix were located in tika-core. This means that users who updated only the PDF parser after the initial disclosure but did not update Tika core to version 3.2.2 or later remain exposed.

Second, earlier Tika versions housed the PDFParser class within the tika-parsers module, which was not included in the initial CVE despite being vulnerable. The advisory states that CVE-2025-66516 “covers the same vulnerability as in CVE-2025-54988,” but widens the list of affected packages to ensure users understand the full extent of the risk. 

Impact, Exploitation Risk, and Recommended Mitigation 

As of early December, maintainers say they have no evidence that attackers are exploiting the weakness in real-world campaigns. Still, the potential for rapid exploitation remains high, particularly if proofs-of-concept or reverse-engineered attack samples begin circulating. 

To eliminate the vulnerability, users are instructed to update to: 

  • tika-core 3.2.2 
  • tika-parser-pdf-module 3.2.2 
  • tika-parsers 2.0.0 (for legacy users) 

The maintainers warn that patching may be insufficient in environments where Apache Tika is used indirectly or embedded within other applications. Its presence is not always clearly documented, creating blind spots for developers. The advisory notes that disabling XML parsing via tika-config.xml is the only mitigation for teams uncertain about where Tika may be running. 



Source link