HelpnetSecurity

OpenAI updates Agents SDK, adds sandbox for safer code execution


OpenAI’s updated Agents SDK helps developers build agents that inspect files, run commands, edit code, and handle tasks within controlled sandbox environments. The update provides standardized infrastructure for OpenAI models, a model-native harness that lets agents work with files and tools on a computer, and native sandbox execution for running tasks safely.

The new harness and sandbox capabilities launch first in Python, with TypeScript support planned for a future release. Additional features, including code mode and subagents, are in development for both Python and TypeScript.

These Agents SDK capabilities are generally available via the API and follow standard API pricing based on tokens and tool usage.

Building agents with models and Agents SDK (Source: OpenAI)

Harness architecture

Existing systems present limitations as teams move from prototypes to production. Model-agnostic frameworks offer flexibility and often leave frontier model capabilities underused. Model-provider SDKs stay closer to the model and often limit visibility into the harness. Managed agent APIs simplify deployment and place limits on where agents run and how they access sensitive data.

“The OpenAI Agents SDK has enabled complex legal drafting and workflows by providing a unified framework with built-in safeguards and secure, isolated environments for data processing and code execution. This allows teams to focus on developing high-value, long-running legal agents rather than building agent infrastructure from scratch,” said Min Chen, Chief AI Officer at LexisNexis.

The updated harness expands agent capabilities for working with documents, files, and systems. It includes configurable memory, sandbox-aware orchestration, filesystem tools similar to Codex, and standardized integrations with common agent system primitives.

“These primitives include tool use via MCP, progressive disclosure via skills, custom instructions via AGENTS.md, code execution using the shell tool, file edits using the apply_patch tool, and more,” the company explained.

The harness reduces time spent on infrastructure and supports focus on domain-specific logic. It continues to evolve with additional agent patterns and primitives.

The harness aligns model execution with performance characteristics, keeps agents close to the model’s operating patterns, and improves reliability for long-running or coordinated tasks across tools and systems.

Sandbox execution environments

Sandbox execution is built into the updated Agents SDK. Agents run in controlled environments with access to files, tools, and required dependencies. This provides an execution layer and removes the need for developers to build one. They can use their own sandbox or choose built-in support for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel.

“To make these environments portable across providers, the SDK introduces a Manifest abstraction for describing the agent’s workspace. Developers can mount local files, define output directories, and connect storage providers including AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2,” the company said.

The SDK provides a consistent way to define the agent’s environment and gives models a predictable workspace with defined inputs, outputs, and organization for long-running tasks.

Security, reliability and scaling

Agent systems assume prompt injection and data exfiltration attempts during design. Separation of the harness and compute layers keeps credentials outside model-generated execution environments.

Externalized agent state ensures that runs continue after a sandbox container is lost. The Agents SDK restores state in a new container and continues from the last checkpoint using built-in snapshotting and rehydration.

Agent runs can use one or multiple sandboxes, invoke them as needed, route subagents to isolated environments, and execute tasks in parallel across containers for faster execution.



Source link