The core tenet of this approach is to treat AI agents as powerful, semi-autonomous users and to enforce robust security measures at the boundaries where they interact with identity, tools, data, and outputs. This framework, supported by AI security guidance from standards bodies like NIST, regulators such as those behind the EU AI Act, and major AI providers, aims to establish a comprehensive governance model.

Constraining Capabilities: Defining Identity and Limiting Scope

The initial steps focus on establishing clear identities for AI agents and strictly limiting their operational capabilities, mirroring the discipline applied to human employees.

  1. Identity and Scope: Make Agents Real Users with Narrow Jobs. Currently, many agents operate under vague, overly permissive service identities. The recommended fix is to assign each agent a unique, non-human principal identity. Permissions should be meticulously constrained to align with the requesting user’s role, tenant, and geographical limitations. Crucially, cross-tenant "on-behalf-of" shortcuts must be prohibited. Any action with high impact should mandate explicit human approval, accompanied by a documented rationale. This practical application aligns with Google’s Secure AI Framework (SAIF) and NIST’s access control recommendations. The CEO’s critical question here is: "Can we demonstrate, today, a comprehensive list of our agents and the precise scope of their authorized actions?"

  2. Tooling Control: Pin, Approve, and Bound What Agents Can Use. The Anthropic espionage framework’s success was partly due to attackers’ ability to integrate Claude with a flexible suite of tools (e.g., scanners, exploit frameworks, data parsers) via Model Context Protocol, bypassing standard policy gates. The defense strategy involves treating agent toolchains as a critical supply chain, requiring rigorous vetting and approval processes. This aligns with OWASP’s concerns regarding excessive agency and recommendations for protection. Under the EU AI Act, designing for cyber resilience and misuse resistance is a key component of Article 15’s obligation for robustness and cybersecurity. The CEO’s inquiry should be: "Who approves when an agent gains a new tool or an expanded operational scope, and how is this authorization tracked?"

  3. Permissions by Design: Bind Tools to Tasks, Not to Models. A common and dangerous anti-pattern is to grant models long-lived credentials, relying on prompts to enforce good behavior. SAIF and NIST advocate for the opposite: binding credentials and scopes to specific tools and tasks. These permissions should be regularly rotated and auditable, with agents then requesting narrowly defined capabilities through these tools. For instance, a "finance-ops-agent" might be permitted to read specific ledgers but require CFO approval for any write operations. The CEO’s critical question is: "Can we revoke a specific capability from an agent without requiring a complete system re-architecture?"

Controlling Data and Behavior: Gating Inputs, Outputs, and Actions

The next set of steps addresses the secure handling of data and the control of agent behavior to prevent unauthorized actions and data exfiltration.

From guardrails to governance: A CEO’s guide for securing agentic systems
  1. Inputs, Memory, and RAG: Treat External Content as Hostile Until Proven Otherwise. A significant number of agent incidents originate from compromised external data sources, such as poisoned web pages, PDFs, emails, or repositories, which can smuggle adversarial instructions. OWASP’s prompt injection cheat sheet and OpenAI’s guidance strongly emphasize the separation of system instructions from user content and treating unvetted retrieval sources with extreme caution. Operationally, this means implementing strict gates before any external content enters retrieval or long-term memory systems. New sources should undergo review, tagging, and formal onboarding. Persistent memory should be disabled when untrusted context is present, and provenance must be attached to each data chunk. The CEO’s key question is: "Can we enumerate every external content source our agents access, and who provided the approval for each?"

  2. Output Handling and Rendering: Nothing Executes ‘Just Because the Model Said So’. In the Anthropic case, AI-generated exploit code and credential dumps were executed directly. Any agent output with the potential to cause a side effect requires a robust validator positioned between the agent and the real-world system. OWASP’s "insecure output handling" category and browser security best practices regarding origin boundaries explicitly address this. The CEO’s vital question is: "Where in our architecture are agent outputs rigorously assessed before they are executed or delivered to customers?"

  3. Data Privacy at Runtime: Protect the Data First, Then the Model. The principle of "secure-by-default" is paramount. Sensitive data should be tokenized or masked by default, with re-hydration only occurring for authorized users and specific use cases. This "data-first" approach, advocated by NIST and SAIF, ensures that even if an agent is compromised, the "blast radius" is limited by the predefined policies governing data access. This intersection of AI security with data privacy regulations like GDPR and sector-specific regimes is critical. The EU AI Act expects providers and deployers to manage AI-specific risks, and runtime tokenization with policy-gated reveal serves as strong evidence of active risk management in production. The CEO’s question should be: "When our agents interact with regulated data, is that protection enforced by our architecture or by mere assurances?"

Proving Governance and Resilience: Ensuring Ongoing Security

The final steps focus on establishing mechanisms for continuous validation and comprehensive oversight of AI agent systems.

  1. Continuous Evaluation: Don’t Ship a One-Time Test, Ship a Test Harness. The concept of "sleeper agents," as highlighted in Anthropic’s research, underscores the inadequacy of single, static security tests. Organizations must implement continuous evaluation through deep agent observability, regular red teaming with adversarial test suites, and robust logging that captures evidence of failures. These logs should serve as regression tests and inform enforceable policy updates. The CEO’s critical question is: "Who is actively attempting to compromise our agents on a weekly basis, and how do their findings lead to policy changes?"

  2. Governance, Inventory, and Audit: Keep Score in One Place. AI security frameworks emphasize the importance of maintaining a comprehensive inventory of all AI components – models, prompts, tools, datasets, and vector stores – along with ownership information and risk-related decisions. For agentic systems, this translates into a living catalog and unified logging system. This catalog should track agent versions, approved prompts, tool access rights, data sources, and user roles. Unified logs should record all agent actions, including tool invocations, data access, and decision-making processes, providing an auditable trail. The CEO’s crucial question is: "If asked to explain how an agent arrived at a specific decision, can we reconstruct that entire chain of events?"

Furthermore, a system-level threat model is essential, operating under the assumption that sophisticated adversaries, like the state-based threat actor GTG-1002 mentioned in the Anthropic case study, are already within the enterprise. The MITRE ATT&CK® for Artificial Intelligence (ATLAS™) framework provides valuable insights into how adversaries target AI systems, not just isolated models.

In conclusion, these eight steps do not promise an infallible solution to AI agent security but rather aim to integrate AI into the existing, familiar security framework applied to powerful users and systems. For boards and CEOs, the conversation must evolve from a general inquiry about "AI guardrails" to a demand for demonstrable evidence, supported by concrete data and audited processes, that directly answers the critical questions outlined above. The focus is on moving from assurances to irrefutable proof of effective governance and resilience in the face of evolving AI threats.