PDFly Variant Uses Custom PyInstaller Tweaks to Obfuscate Payload, Thwarting Analysis


A new malware variant dubbed “PDFly” is abusing a heavily modified PyInstaller stub to hide its Python bytecode, forcing analysts to reverse-engineer a custom decryption routine before any meaningful analysis can begin.

A closely related sample, “PDFClick,” shows almost identical behavior, suggesting a small family of PyInstaller-based droppers that deliberately break standard tooling.

Both samples are delivered as PyInstaller-packaged executables. Normally, analysts rely on utilities such as pyinstxtractor-ng to unpack the embedded Python modules from these binaries.

In the case of PDFly and PDFClick, this standard approach fails immediately: the extractor does not even recognize the files as PyInstaller executables because the usual “magic” cookie in the overlay has been altered.

IDA Pro inspection further reveals that the stub has been tampered with, with strings partially corrupted and the cookie value changed in memory (for example, in the local_80 variable).

To work around this, the researcher manually patched pyinstxtractor-ng. First, the script’s expected magic value was replaced with the custom cookie observed in the modified stub.

After that, an assertion that checks for the “PYZ” marker inside the package failed; this assertion turned out to be non-essential for extraction, so it was simply removed.

With these changes, pyinstxtractor-ng was finally able to extract the embedded files – but all contents of the PYZ archive were still encrypted.

Custom PyInstaller Build Evades Analysis

Static inspection with tools like CAPA and IDAScope did not show any clear encryption routines inside the stub itself, only compression logic.

Manual extraction (Source : Samplepedia).

This indicated that decryption was likely implemented elsewhere, so the analyst turned to the bootstrap and pyimod modules dropped alongside the stub.

These were not encrypted and could be decompiled. Using PyLingual, the researcher processed Python 3.13 bytecode from pyimod01_archive.pyc and identified a custom two‑stage XOR scheme wrapped around zlib decompression.

Disassembly showed that the archive data is first XORed with a 13‑byte key (SCbZtkeMKAvyU), then passed through zlib.decompress, and finally XORed again with a 7‑byte key (KYFrLmy) before being reversed and unmarshalled.

In code, the logic boils down to: XOR with key 1, decompress, XOR with key 2, reverse, then unmarshal. By integrating this sequence into pyinstxtractor-ng, the analyst could successfully extract and decrypt the embedded Python bytecode from both PDFly and PDFClick.

Payload Recovery

Because different samples in this cluster use varying cookies and XOR keys, the researcher went further and built a more generic extractor.

A helper function scans the PE overlay to locate plausible PyInstaller cookies by validating length fields and table-of-contents offsets, then recovers the magic value dynamically.

Another routine parses pyimod01_archive.pyc with Python’s marshal module, walks the ZlibArchiveReader.extract method, and automatically pulls out XOR keys from constants.

With these additions, the modified script can adapt to multiple PyInstaller-modified samples and reliably recover their encrypted Python payloads laying the groundwork for deeper behavioral analysis of PDFly and related threats.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.



Source link