The Enterprise Agent Control Plane from Toggles to Policy as Code
If agent safety lives in user settings, you do not have policy. You have uneven risk decisions across teams.
Executive Summary
Vendors are shipping sandboxing and approvals, but enterprises still lack a central control plane. When critical safety settings are per-user, organizations inherit inconsistent behavior, approval fatigue, and risky shortcuts like YOLO modes. A real control plane turns those individual choices into enforceable policy.
Tools Have Controls. Enterprises Don’t Own Them (Yet).
The controls are real; the control plane is not. Modern coding agents ship serious safety mechanisms. Codex CLI defines approval modes and sandbox modes, including a dangerous --yolo flag that bypasses both approvals and sandboxing. Claude Code adds OS-level sandboxing and network allowlisting through a proxy. Cursor includes allowlists and "Run Everything" options - but explicitly notes those are convenience features, not security controls. These controls are real. The problem is where they live: in user settings, not centralized policy.
Why Per-User Controls Fail at Enterprise Scale
That local control model breaks the moment you scale beyond a small team. Enterprises operate in high-velocity environments with mixed skill levels. That means one engineer’s “just this once” approval becomes a systemic risk. Human behavior is predictable: prompts become noise, convenience wins, and policy drifts into custom aliases or YOLO shortcuts. The result is uneven safety posture across teams—even when every tool nominally has the right controls.
Failure Scenarios That Start Small
- Secrets in a debug bundle: the agent zips
~/.sshor.envand commits it. - PII leakage: a “quick analysis” uploads a customer export to a ticket.
- Prod destruction: a cleanup task runs
kubectl deleteorterraform destroy. - Wrong repo, wrong remote: the agent commits to an unrelated checkout or force-pushes.
- Network drift: approvals accumulate until the allowlist is effectively open.
Community Evidence: This Drift Is Already Happening
The community discussions tell the same story. On Hacker News, one commenter argues they’ve used Claude Code for 1000+ hours with no issues, while others reply that the safety problem only shows up once adversarial prompt injection becomes common—and that sandboxing is the only robust mitigation. Others describe DIY containment (devcontainers, separate Linux users, Codespaces) because relying on a single toggle feels unsafe.
On Reddit, users describe how approval fatigue pushes them toward skip-permission flags, or how YOLO inside a sandbox still finds workarounds (like forging package-lock integrity to bypass blocked registries). Those are not edge cases—they’re the natural outcome of human incentives fighting tool friction.
What a Real Control Plane Looks Like
Move controls from preferences into enforceable policy.
Minimum Enterprise Requirements
- Central policy enforcement: lock sandbox modes and approval policies at the org level.
- Audit logs: record approvals, allowlist edits, and any escape hatch usage.
- Least privilege tool access: scope agents to the minimum file and network surface.
- Training + playbooks: explain when to approve, when to pause, and when to escalate.
How AARSM Helps
AARSM gives you a real control plane - policy, approvals, and logging owned by the organization, not scattered across user settings.
About This Analysis
This analysis draws on OpenAI Codex documentation, Anthropic’s Claude Code sandboxing documentation and engineering write-up, Cursor’s agent security guidance, and community discussions on Hacker News and Reddit.
Related Articles
Agent Sandboxes: Why Approval Prompts Are Not Enough
A technical breakdown of sandboxing in Codex, Claude Code, and beyond.
Approval Fatigue Is an Enterprise Risk
The human factors behind unsafe approvals and sandbox bypasses.