A set of critical vulnerabilities dubbed ‘ShellTorch’ in the open-source TorchServe AI model-serving tool impact tens of thousands of internet-exposed servers, some of which belong to large organizations.
TorchServe, maintained by Meta and Amazon, is a popular tool for serving and scaling PyTorch (machine learning framework) models in production.
The library is primarily used by those engaged in AI model training and development, from academic researchers to big firms like Amazon, OpenAI, Tesla, Azure, Google, and Intel.
The TorchServe flaws discovered by the Oligo Security research team can lead to unauthorized server access and remote code execution (RCE) on vulnerable instances.
The ShellTorch vulnerability
The three vulnerabilities are collectively named ShellTorch and impact TorchServe versions 0.3.0 through 0.8.1.
The first flaw is an unauthenticated management interface API misconfiguration that causes the web panel to be bound to the IP address 0.0.0.0 by default instead of localhost, exposing it to external requests.
As the interface lacks authentication, it allows unrestricted access for any user, which can be used to upload malicious models from an external address.
The second issue, tracked as CVE-2023-43654, is a remote server-side request forgery (SSRF) leading to remote code execution (RCE).
While TorchServe’s API has logic for an allowed list of domains for fetching models’ configuration files from a remote URL, it was found that all domains were accepted by default, leading to a Server-Side Request Forgery (SSRF) flaw.
This lets attackers upload malicious models that trigger arbitrary code execution when launched on the target server.
The third vulnerability tracked as CVE-2022-1471, is a Java deserialization problem leading to remote code execution.
Due to insecure deserialization in the SnakeYAML library, attackers can upload a model with a malicious YAML file to trigger remote code execution.
Should an attacker chain these three flaws, they could easily compromise a system running vulnerable versions of TorchServe.
A demonstration of the ShellTorch attack chain can be seen below.
ShellTorch fixes
Oligo says its analysts scanned the web for vulnerable deployments and found tens of thousands of IP addresses currently exposed to ShellTorch attacks, some belonging to large organizations with global reach.
“Once an attacker can breach an organization’s network by executing code on its PyTorch server, they can use it as an initial foothold to move laterally to infrastructure in order to launch even more impactful attacks, especially in cases where proper restrictions or standard controls are not present,” explains Oligo.
To fix these vulnerabilities, users should upgrade to TorchServe 0.8.2. However, this update does not fix CVE-2023-43654 but does display a warning about the SSRF to the user.
Next, correctly configure the management console by setting the management_address to http://127.0.0.1:8081 in the config.properties file. This will cause TorchServe to bind to the localhost instead of every IP address configured on the server.
Finally, ensure that your server fetches models only from trusted domains by updating the allowed_urls in the config.properties file accordingly.
Amazon has also published a security bulletin about CVE-2023-43654, providing mitigation guidance for customers using Deep Learning Containers (DLC) in EC2, EKS, or ECS.
Finally, Oligo has released a free checker tool that admins can use to check if their instances are vulnerable to ShellTorch attacks.