PyTorch discloses malicious dependency chain compromise over holidays

January 1, 2023 4 min read

Table of Contents

Malicious library targets PyTorch-nightly users
Hacker steals sensitive files, claims ethical research
Mitigations

PyTorch has identified a malicious dependency with the same name as the framework’s ‘torchtriton’ library. This has led to a successful compromise via the dependency confusion attack vector.

PyTorch admins are warning users who installed PyTorch-nightly over the holidays to uninstall the framework and the counterfeit ‘torchtriton’ dependency.

From computer vision to natural language processing, the open source machine learning framework PyTorch has gained prominence in both commercial and academic realms.

Malicious library targets PyTorch-nightly users

Between December 25th and December 30th, 2022, users who installed PyTorch-nightly should ensure their systems were not compromised, PyTorch team has warned.

The warning follows a ‘torchtriton’ dependency that appeared over the holidays on the Python Package Index (PyPI) registry, the official third-party software repository for Python.

“Please uninstall it and torchtriton immediately, and use the latest nightly binaries (newer than Dec 30th 2022),” advises PyTorch team.

Malicious PyTorch dependency on PyPI — **Malicious PyTorch dependency ‘torchtriton’ on PyPI** (BleepingComputer)

The malicious ‘torchtriton’ dependency on PyPI shares name with the official library published on the PyTorch-nightly’s repo. But, when fetching dependencies in the Python ecosystem, PyPI normally takes precedence, causing the malicious package to get pulled on your machine instead of PyTorch’s legitimate one.

“Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository. This design enables somebody to register a package by the same name as one that exists in a third party index, and pip will install their version by default,” writes PyTorch team in a disclosure published yesterday.

At the time of writing, BleepingComputer observed the malicious ‘torchtriton’ dependency had exceeded 2,300 downloads in the past week.

This type of supply chain attack is known as “dependency confusion,” as first reported by BleepingComputer in 2021, just as the attack vector was popularized by ethical hacker Alex Birsan.

PyTorch states, users of the PyTorch stable packages are unaffected by this issue.

Hacker steals sensitive files, claims ethical research

Not only does the malicious ‘torchtriton’ survey your system for basic fingerprinting info (such as IP address, username, and current working directory), it further steals sensitive data:

Gets system information
- nameservers from /etc/resolv.conf
- hostname from gethostname()
- current username from getlogin()
- current working directory name from getcwd()
- environment variables
Reads the following files
- /etc/hosts
- /etc/passwd
- The first 1,000 files in $HOME/*
- $HOME/.gitconfig
- $HOME/.ssh/*

It then uploads all of this data, including file contents, to the h4ck.cfd domain via encrypted DNS queries using the wheezy.io DNS server.

PyTorch explains, the malicious ‘triton’ binary contained within the counterfeit ‘torchtriton’ is only executed when the user imports ‘triton’ package in their build. This would require explicit code and is not PyTorch’s default behavior.

The notice on the h4ck.cfd domain implies the whole operation is ethical research, but the analysis strongly indicates otherwise.

“Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I’ve identified who is vulnerable and [reported] the finding all of the metadata about your server will be deleted.”

Contrary to the wording of the notice, the binary not only collects “metadata,” but steals aforementioned secrets including your SSH keys, ,gitconfig, hosts and passwd files, and the contents of the first 1,000 files in your HOME directory.

BleepingComputer obtained a copy of the malicious binary which, according to VirusTotal, shows a clean reputation at the time of writing. But, don’t be fooled.

We observed, unlike several research packages and PoC exploits that are conspicuous in their intent and behavior, ‘torchtriton’ employs known anti-VM techniques to evade detection. More importantly, the malicious payload is obfuscated and contained entirely in the binary format, i.e. Linux ELF files, all of which makes the library an outlier when juxtaposed with ethical dependency confusion exploits of the past shipped in plaintext.

We also noticed the sample reads .bash_history or a list of commands and inputs the user has typed into the terminal, which is yet another trait exhibited by malware.

This won’t be the first time either when a hacker claims that their actions constitute ethical research, just as they are caught exfiltrating secrets.

In mid 2022, hugely popular Python and PHP libraries, respectively, ‘ctx’ and ‘PHPass’ were hijacked and altered to steal AWS keys. The researcher behind the attack later claimed that this was ethical research.

For the avoidance of doubt, we have approached the owner of h4ck.cfd for comment. Public records show the domain was registered with Namecheap on December 21st, just days prior to this incident.

Mitigations

PyTorch team has renamed the ‘torchtriton’ dependency to ‘pytorch-triton’ and reserved a dummy package on PyPI to prevent similar attacks. The group seeks to claim ownership of the existing ‘torchtriton’ on PyPI to diffuse the current attack.

**PyTorch renames dependency to prevent further attacks** (BleepingComputer)

To uninstall the malicious dependency chain, users should run the following command:

$ pip3 uninstall -y torch torchvision torchaudio torchtriton
$ pip3 cache purge

Running the following command will look for the presence of malicious binary and reveal if you are impacted:

python3 -c "import pathlib;import importlib.util;s=importlib.util.find_spec('triton');
affected=any(x.name == 'triton' for x in (pathlib.Path(s.submodule_search_locations[0] 
if s is not None else '/' ) / 'runtime').glob('*'));
print('You are {}affected'.format('' if affected else 'not '))"

The SHA256 hash of the ‘triton’ ELF binary is: 2385b29489cd9e35f92c072780f903ae2e517ed422eae67246ae50a5cc738a0e.

Source link