AI Agents in Cybersecurity: Measuring Real ROI Against the Promise

Chris Groves

15 Apr 2026 — 14 min read

The cybersecurity industry has a workforce gap of approximately four million professionals globally, according to multiple industry surveys. Meanwhile, threat volumes continue their relentless climb. Enter AI agents, marketed as the solution that will automate away the workload, compress response times from hours to seconds, and ultimately make the SOC analyst obsolete. I have spent two decades in offensive security, and I have learned to be suspicious when something sounds too good to be true. The gap between AI agent marketing and production reality is where this post lives.

Vendor presentations at security conferences are full of impressive demonstrations. AI agents can hunt threats autonomously, find vulnerabilities before attackers do, and run red team operations around the clock without fatigue. These capabilities are real, but they exist on a spectrum that the marketing materials rarely clarify. The question I want to answer is not whether AI agents work in controlled demos, but whether they deliver genuine return on investment when deployed against the messy reality of production environments. That question requires looking at concrete case studies, honest cost accounting, and the organizational factors that determine success or failure.

The Promise Versus Reality Gap

The narrative around AI agents in cybersecurity has followed a familiar pattern. First comes the revolutionary announcement, complete with impressive metrics and glowing testimonials. Then comes the pilot phase, where controlled environments produce results that match the demos. Finally, production deployment reveals the gap between benchmark performance and real-world effectiveness.

The promise is substantial. AI agents that can autonomously hunt threats, assess vulnerabilities, and conduct red team operations would address the workforce gap directly. Rather than replacing analysts, they would multiply the effectiveness of existing teams by handling the repetitive, high-volume work that currently consumes analyst time. The arithmetic is appealing: if one AI agent can do the work of several analysts, the four-million-person gap becomes significantly less daunting.

The reality involves constraints that the marketing rarely mentions. Agents require careful configuration, ongoing supervision, and integration with existing security infrastructure. They generate false positives that require human evaluation. They can drift from their intended objectives, particularly in complex, multi-step operations. And they introduce their own security considerations, since an AI agent with access to security tools is simultaneously a high-value target for attackers who might seek to compromise or hijack it.

A comprehensive survey of agentic AI systems in cybersecurity Lazer et al., 2026 identifies five generations of development, from simple single-model assistants to fully adaptive multi-agent architectures with sophisticated planning capabilities. The survey notes that capability and risk have evolved together, with each generation introducing new potential failure modes even as it delivers improved functionality. Understanding where a given system falls on this spectrum matters for realistic expectations.

What Amazon ATA Actually Delivered

Amazon's Autonomous Threat Analysis system represents one of the most detailed public case studies of AI agents deployed for active security operations. The system, described by the Amazon Science team Amazon Science, 2025, uses adversarial multi-agent architecture with distinct red-team and blue-team components. Red-team agents simulate attack techniques, while blue-team agents generate and refine detection rules based on the simulated activity.

The architecture is worth understanding in some detail because it illustrates how agents can be structured for production use rather than demos. The red-team agents simulate techniques such as Python reverse shells and multistep attack chains, running these simulations in isolated graph workflows with grounded execution to minimize hallucinations. The blue-team agents take the simulated attack data and use it to develop detection rules, then validate those rules against further simulation runs. The result is a continuous feedback loop where offensive simulation directly improves defensive detection.

ATTACK SIMULATION LAYER          DEFENSE GENERATION LAYER
┌─────────────────────┐         ┌─────────────────────┐
│   Red-Team Agent    │         │   Blue-Team Agent   │
│  ┌───────────────┐  │         │  ┌───────────────┐  │
│  │ Attack Tech   │  │         │  │ Detection Rule│  │
│  │ Library       │  │         │  │ Generator     │  │
│  └───────────────┘  │         │  └───────────────┘  │
│  ┌───────────────┐  │         │  ┌───────────────┐  │
│  │ Simulation    │──┼────────▶│  │ Rule          │  │
│  │ Engine       │  │         │  │ Validator     │  │
│  └───────────────┘  │         │  └───────────────┘  │
└─────────────────────┘         └─────────────────────┘
         │                              │
         ▼                              ▼
┌─────────────────────┐         ┌─────────────────────┐
│  Isolated Graph     │         │  Production SIEM    │
│  Workflows          │         │  / Alerting System  │
│  (No hallucinations)│         │                     │
└─────────────────────┘         └─────────────────────┘

Each red-team agent operates within strict boundaries: it can only call predefined attack techniques from the library, execute them in isolated containerized environments, and report observations back to the orchestration layer. This bounded autonomy model prevents the agent from introducing novel attack vectors that might slip into production while still generating realistic test data for the blue-team to work with.

The metrics Amazon reported are striking. Testing time compression reached 96 percent, with full security testing cycles that previously took weeks completing in approximately four hours. The system ran ten to thirty concurrent attack variations simultaneously, providing coverage that manual testing could not match. Perhaps most significantly, the precision and recall on validated detection rules reached near-perfect levels, and the system identified new hunt opportunities in under an hour that human analysts had not previously considered.

These results are genuine and impressive. They also come from an organization with substantial resources to invest in AI infrastructure, dedicated teams to configure and supervise the agents, and access to massive internal datasets for training and validation. The question for smaller organizations is whether comparable results are achievable with more limited resources, or whether the Amazon ATA case represents an outlier that reflects their specific context rather than a generalizable model.

What the Amazon case demonstrates is that AI agents can deliver substantial ROI in appropriate contexts. The key phrase is "in appropriate contexts." The ATA system works because it has clear boundaries, deterministic guardrails, and human oversight throughout the process. The agents are not running wild across the infrastructure; they operate within defined scopes and their outputs feed into human decision-making processes rather than automagically triggering actions.

Google Big Sleep: The First AI-Thwarted Zero-Day

If Amazon ATA represents the defensive deployment of AI agents at scale, Google Big Sleep represents something more novel: an AI agent acting as a proactive zero-day hunter. Big Sleep, a collaboration between Google DeepMind and Project Zero, takes a different approach than most security tooling. Rather than responding to known threats or scanning for known vulnerability patterns, the system analyzes codebases looking for previously unknown vulnerabilities, with the goal of discovering and reporting them before attackers can exploit them.

In 2025, Big Sleep achieved what appears to be the first confirmed case of an AI agent discovering and helping to thwart a real-world zero-day vulnerability before it was actively exploited. The vulnerability, CVE-2025-6965, was a memory corruption issue in SQLite versions prior to 3.50.2 with a CVSS score of 7.2. The discovery was significant not merely for the technical finding itself but for what it represented about the trajectory of AI-assisted security research.

Following the CVE-2025-6965 discovery, Big Sleep has continued finding vulnerabilities in widely-used open-source software including FFmpeg and ImageMagick, bringing the total to more than twenty confirmed findings as of early 2026. The system combines code analysis with threat intelligence to prioritize targets and guide its investigation, effectively functioning as an AI security researcher that can work continuously without requiring breaks, weekends, or retention incentives.

BIG SLEEP VULNERABILITY DISCOVERY PIPELINE
┌─────────────────────────────────────────────────────────────────┐
│                      ORCHESTRATION LAYER                        │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    │
│  │ Target       │───▶│ Code         │───▶│ Exploit      │    │
│  │ Prioritizer  │    │ Analyzer     │    │ Generator    │    │
│  └──────────────┘    └──────────────┘    └──────────────┘    │
│         │                  │                   │               │
│         ▼                  ▼                   ▼               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              FEEDBACK LOOP                               │  │
│  │   Findings → Triage → Validated → Disclosed → New intel │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
         │                  │                   │
         ▼                  ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Open-source     │ │ Threat Intel    │ │ NVD / Vendor    │
│ Repositories    │ │ Feeds           │ │ Disclosure      │
│ (FFmpeg, etc)   │ │                 │ │ Pipeline        │
└─────────────────┘ └─────────────────┘ └─────────────────┘

The implications for defenders are significant. If AI agents can reliably discover zero-days before attackers find them, the entire threat landscape shifts. Rather than racing to patch vulnerabilities after exploitation begins, organizations could theoretically address vulnerabilities during the window between discovery and disclosure. In practice, the timeline is more complex, but the direction of travel matters for strategic planning.

However, the Big Sleep case also illustrates the limits of current AI agent capabilities. The system required substantial investment from Google and Project Zero, sophisticated threat intelligence to guide its targeting, and significant computational resources. It is not a tool that most organizations can deploy independently. More importantly, the agent is not operating autonomously in the way that marketing might suggest. It is a powerful tool that augments human security researchers, helping them scale their coverage and focus their attention on the most promising leads.

The CVE-Bench study Zhu et al., 2025 provides useful context for expectations. Researchers evaluated AI agents on forty critical web application vulnerabilities with CVSS scores of 9.0 or higher in sandboxed environments. The agents achieved up to 13 percent success on true zero-days and approximately 25 percent on one-day vulnerabilities, where some public information was available. These numbers are meaningful improvements over random searching, but they are far from the near-perfect success rates that marketing might imply.

CVE-BENCH PERFORMANCE BREAKDOWN (40 critical CVEs, CVSS ≥ 9.0)
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   ZERO-DAY SUCCESS RATE (No prior knowledge)                    │
│   ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  13%            │
│                                                                 │
│   ONE-DAY SUCCESS RATE (Some public info available)              │
│   █████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  25%            │
│                                                                 │
│   FAILURE RATE                                                  │
│   ███████████████████████████████████████████████  62-87%       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

VULNERABILITY TYPE PERFORMANCE:
┌─────────────────────────────────────────────────────────────────┐
│   SQL Injection (CVE-2024-37849, CVE-2024-4223)     ████████████ │
│   T-Agent performance: Strong                                │ │
│                                                                 │
│   Memory Corruption (requires complex exploration)    ████░░░░░ │
│   T-Agent performance: Weak                                  │ │
│                                                                 │
│   RCE Chains (multi-step)                             ██░░░░░░░ │
│   T-Agent performance: Very weak                            │ │
└─────────────────────────────────────────────────────────────────┘

The variance by vulnerability type reveals something important: AI agents excel at pattern matching against known vulnerability signatures but struggle with exploration-intensive tasks. SQL injection has a recognizable pattern that agents can identify reliably. Memory corruption vulnerabilities require understanding program state across multiple execution frames, which remains challenging for current architectures.

The study also found that agent performance varied significantly by vulnerability type. T-Agent, for example, performed well on SQL injection vulnerabilities including CVE-2024-37849 and CVE-2024-4223, but struggled with vulnerabilities requiring complex multi-step exploration or novel attack patterns. The implication is that AI agents are valuable additions to the security toolkit, but they do not replace the need for skilled security researchers who can handle the cases that fall outside the agents' current capabilities.

The ROI Calculation

Putting aside the impressive demonstrations and focusing on the practical question of return on investment requires honest accounting of both costs and benefits. The costs of AI agent deployment in security operations are real and often underestimated in vendor presentations.

Direct costs include the infrastructure required to run agentic systems, which can be substantial for agents that require significant computational resources. Licensing costs for commercial platforms, if used, add to the infrastructure expenses. Integration with existing security tools and workflows requires custom development work that is rarely cheap or quick. Training and configuration consumes analyst time that might otherwise go toward actual security work.

Hidden costs emerge in the supervision and quality assurance that production agents require. Even the best-tuned agents generate false positives, and evaluating those false positives falls to human analysts who must understand enough about the agent's reasoning to make informed decisions about alerts. Memory poisoning and goal drift represent emerging concerns where agent state can be corrupted or influenced in ways that compromise outputs. Organizations must invest in monitoring agent behavior and maintaining the integrity of agent state over time.

The liability question is real and largely unresolved. If an AI agent misses a breach, or causes collateral damage during an automated response, the question of who bears responsibility is not clearly answered by current law or industry practice. Gartner research Gartner, 2025 has explicitly stated that there will never be a fully autonomous SOC, recognizing that the liability and oversight requirements prevent complete automation. Organizations that deploy agents must understand that they are accepting ongoing responsibility for the agents' decisions and actions.

Against these costs, the benefits include the efficiency gains discussed in the case studies above. The Amazon ATA system achieved 96 percent testing time compression and identified new hunt opportunities that human analysts had missed. Microsoft Security Copilot, with its eleven or more specialized agents, has reported significant reductions in analyst workload for triage and vulnerability remediation tasks. CrowdStrike Charlotte AI claims detection triage accuracy above 98 percent with more than forty hours of manual work saved weekly for typical security teams.

The IBM security research on shadow AI is particularly relevant to the ROI calculation. Organizations that allow AI tools to proliferate without proper governance face breach costs approximately 670,000 dollars higher than those with established AI security policies. This premium reflects the increased attack surface that ungoverned AI introduces, the potential for data leaks through AI systems, and the forensic complexity when incidents involve AI components. The implication is that deploying AI agents without investing in proper governance may paradoxically increase risk even as it improves certain operational metrics.

For most organizations, the realistic ROI calculation looks like this. AI agents deliver meaningful value for high-volume, repetitive tasks where false positive costs are manageable and where the cost of missing something is not catastrophic. They deliver less value for complex investigations that require nuanced judgment, for novel attack patterns that fall outside their training, and for contexts where the cost of a false negative vastly outweighs the cost of a false positive. The organizations that get the most from AI agents are those that match the technology to appropriate use cases rather than deploying it broadly and hoping for the best.

Beyond the direct efficiency metrics, there are secondary benefits that are harder to quantify but equally important. Agentic systems can maintain continuous vigilance that human teams cannot sustain, watching for anomalies around the clock without the attention degradation that comes from long shifts. They can correlate across data sources in ways that would take human analysts significant time to replicate manually. And they can provide consistent analysis methodology, applying the same rigor to every alert rather than the variability that comes from analyst fatigue or differing levels of expertise.

However, these benefits only materialize when agents are properly integrated into workflows. An agent that generates findings but has no clear path to analyst review is not delivering value; it is generating noise that distracts from genuine threats. An agent that operates without proper monitoring may drift from its intended scope or generate outputs that waste analyst time on false leads. The technology is necessary but not sufficient; deployment practices matter as much as the technology itself.

The OWASP Top 10 for Agentic Applications [OWASP, 2026] highlights the security risks that accompany agent deployments. Agent goal hijacking, tool misuse, and cascading failures represent failure modes that organizations must plan for. The Cloud Security Alliance guidance on agentic AI red teaming CSA, 2025 provides a framework for testing these systems before deployment, but many organizations skip this step under time pressure. The result is production systems with known vulnerability classes that should have been addressed in pre-deployment testing.

The risk is not theoretical. In 2026, multiple critical vulnerabilities have emerged in AI agent platforms themselves. These are not vulnerabilities in traditional systems that happen to run alongside AI tools; they are vulnerabilities inherent to the agent architecture itself.

The first, CVE-2026-32922 with a CVSS score of 9.9, affects OpenClaw, an agentic AI platform designed for orchestrating multi-step security workflows. The vulnerability enables privilege escalation to administrative levels with a direct path to remote code execution. What makes this particularly severe is the attack path visualization: an agent that has been granted permissions to run security scans, execute commands in contained environments, or access credentials for integration with other security tools becomes the pivot point.

EXPLOIT CHAIN: CVE-2026-32922 (OpenClaw)
┌─────────────┐      ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│ Attacker    │─────▶│ Agent       │─────▶│ Privilege   │─────▶│ RCE with   │
│ compromises │      │ platform    │      │ escalation  │      │ admin priv │
│ agent creds │      │ vuln (RCE)  │      │ to root     │      │            │
└─────────────┘      └─────────────┘      └─────────────┘      └─────────────┘
       │                    │                    │                    │
       ▼                    ▼                    ▼                    ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│ Lateral     │      │ Access to   │      │ Full host   │      │ Pwn entire  │
│ movement    │      │ all agent   │      │ control     │      │ infra       │
│ initated    │      │ workflows   │      │             │      │             │
└─────────────┘      └─────────────┘      └─────────────┘      └─────────────┘

The second significant disclosure, CVE-2026-40252, affects FastGPT, a platform for building AI workflows that process enterprise data. This cross-tenant data exposure means that users in one organization could potentially access data belonging to users in another organization. For security teams, this creates a direct compliance issue: any AI workflow that processes sensitive data now has a shared-tenancy risk model that traditional data classification may not account for.

CROSS-TENANT EXPOSURE: CVE-2026-40252 (FastGPT)
┌─────────────────────────────────────────────────────────────────────┐
│                        FASTGPT MULTI-TENANT ARCHITECTURE             │
│                                                                      │
│   Organization A          Organization B          Organization C     │
│   ┌───────────┐          ┌───────────┐          ┌───────────┐      │
│   │ User A1   │          │ User B1   │          │ User C1   │      │
│   │ Workflow  │          │ Workflow  │          │ Workflow  │      │
│   │ Data A    │          │ Data B    │          │ Data C    │      │
│   └──┬───────┘          └──┬───────┘          └──┬───────┘      │
│      │                     │                     │                │
│      ▼                     ▼                     ▼                │
│   ┌─────────────────────────────────────────────────────────┐    │
│   │              SHARED INFERENCE LAYER                      │    │
│   │   (Where prompts are processed, embeddings generated)    │    │
│   └─────────────────────────────────────────────────────────┘    │
│                              │                                    │
│              ┌───────────────┴───────────────┐                   │
│              ▼                               ▼                   │
│   ┌───────────────────┐           ┌───────────────────┐          │
│   │ Tenant Isolation  │           │ VULNERABILITY:   │          │
│   │ Boundary BROKEN   │◀──────────│ Bypass via prompt│          │
│   └───────────────────┘           │ injection/context │          │
│                                   │ switching         │          │
│                                   └───────────────────┘          │
└─────────────────────────────────────────────────────────────────────┘

Additional vulnerabilities in the Langflow framework for constructing AI agent pipelines and in Chrome Gemini Live panel expand this pattern. These demonstrate that the attack surface for AI agent infrastructure is active and growing, not theoretical.

The architectural pattern these vulnerabilities share is important: AI agents require persistent state, access to external tools, and the ability to process untrusted inputs. Each of these requirements creates attack surface. Persistent state means compromised credentials have longer windows of exploitation. Tool access means privilege escalation has more impact. Processing untrusted inputs means prompt injection and context manipulation become viable attack vectors.

This represents a meaningful shift in the threat landscape.

The dual-use concern deserves mention in any honest ROI discussion. The same capabilities that enable AI agents to find vulnerabilities for defensive purposes can be redirected for offensive use. RedTeamLLM and similar systems demonstrate how attackers can use agentic AI to automate reconnaissance, exploitation, and lateral movement. The security community should not assume that defensive applications have the upper hand indefinitely. The attackers have access to similar technology, and the efficiency gains from agentic AI apply to offensive operations as well as defensive ones.

DUAL-USE CAPABILITY MAPPING
┌─────────────────────────────────────────────────────────────────┐
│                  AI AGENT CORE CAPABILITIES                     │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐            │
│   │ Code Review │  │ Log Analysis│  │ Pattern     │            │
│   │ & Analysis  │  │ & Anomaly   │  │ Recognition │            │
│   │             │  │ Detection   │  │ at Scale    │            │
│   └──────┬──────┘  └──────┬──────┘  └──────┬──────┘            │
└──────────┼────────────────┼────────────────┼──────────────────┘
           │                │                │
           ▼                ▼                ▼
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   DEFENSE APPLICATION          │      OFFENSE APPLICATION       │
│   ───────────────────         │      ───────────────────       │
│                               │                                │
│   ┌─────────────────┐          │      ┌─────────────────┐       │
│   │ Find bugs in    │          │      │ Find bugs to    │       │
│   │ your code       │          │      │ exploit in      │       │
│   │ before attackers│          │      │ target's code   │       │
│   └─────────────────┘          │      └─────────────────┘       │
│                               │                                │
│   ┌─────────────────┐          │      ┌─────────────────┐       │
│   │ Correlate logs │          │      │ Generate exploit │       │
│   │ to find breaches│          │      │ candidates from │       │
│   │ faster          │          │      │ findings        │       │
│   └─────────────────┘          │      └─────────────────┘       │
│                               │                                │
│   ┌─────────────────┐          │      ┌─────────────────┐       │
│   │ Scale threat    │          │      │ Automate recon   │       │
│   │ hunting ops    │          │      │ and lateral     │       │
│   │                │          │      │ movement        │       │
│   └─────────────────┘          │      └─────────────────┘       │
│                               │                                │
└───────────────────────────────┴──────────────────────────────────┘
           │                                    │
           ▼                                    ▼
┌─────────────────────┐          ┌─────────────────────┐
│ DEFENDER BENEFIT     │          │ ATTACKER BENEFIT    │
│ - Faster remediation │          │ - Faster exploits   │
│ - Proactive defense  │          │ - Automated attacks │
│ - Scale coverage     │          │ - Lower barrier     │
└─────────────────────┘          └─────────────────────┘

The asymmetric reality is that attackers need to find only one exploitable path while defenders must secure all paths. AI agents improve both sides of this equation, but they disproportionately benefit attackers because offense requires less rigor than defense. A defensive AI that misses 10 percent of vulnerabilities leaves an organization exposed. An offensive AI that finds 10 percent of exploitable vulnerabilities gives an attacker a viable entry point.

What This Means for Your SOC

If you are evaluating AI agents for your security operations, the case studies and research suggest a pragmatic framework for decision-making. Rather than asking whether AI agents are effective in principle, ask whether they are effective for the specific use cases that matter in your environment.

Threat hunting is an area where AI agents have demonstrated meaningful value, particularly for hypothesis-driven investigation at scale. An agent that can continuously query logs, SIEM data, endpoint detection systems, and cloud infrastructure to test hunt hypotheses can extend the coverage of your existing team without requiring additional headcount. The key is defining clear scopes and escalation paths so that the agent's findings reach qualified analysts who can evaluate and respond appropriately.

Vulnerability assessment represents another strong use case, particularly for organizations that struggle with coverage at scale. Agents that can continuously scan for vulnerabilities, prioritize remediation based on exploitability and business impact, and track remediation progress can significantly improve your vulnerability management program. The caveat is that agents excel at known vulnerability patterns and may miss novel issues that require human discovery.

Red team operations have seen significant innovation through AI agents, with systems like FireCompass enabling continuous automated red teaming that goes beyond episodic penetration tests. The CyberBattleSim framework from Microsoft demonstrates how multi-agent simulations can model attack progression and defensive response in ways that inform both offensive testing and defensive strategy. For organizations that take purple teaming seriously, these tools offer capabilities that were previously available only to the largest security teams.

Throughout these use cases, the common thread is human oversight. The Gartner prediction that there will never be a fully autonomous SOC reflects hard-learned lessons about the limits of automation in security contexts. Security involves adversarial opponents who adapt, edge cases that defeat pattern matching, and decisions with significant consequences that require human accountability. AI agents are powerful tools for extending human capability, but they work best when humans remain in the loop, defining scopes, evaluating outputs, and making critical decisions.

For organizations beginning the evaluation process, I recommend starting with bounded pilots that have clear success criteria and defined limits. Do not attempt to deploy agents across your entire security program simultaneously. Pick one use case, implement proper oversight mechanisms, measure results against clear metrics, and expand only when the pilot demonstrates value. The organizations that succeed with AI agents are those that treat deployment as an iterative process rather than a one-time project.

The question of build versus buy applies here as it does to most security capabilities. Commercial platforms like Microsoft Security Copilot, CrowdStrike Charlotte AI, and Darktrace offer turnkey solutions that require less internal expertise but may offer less flexibility for custom integrations. Open-source frameworks like CyberBattleSim provide educational value and can be extended for specific needs but require engineering investment to operationalize. The right choice depends on your organizations existing capabilities, budget, and tolerance for complexity.

Staffing implications deserve consideration as part of any deployment planning. AI agents do not eliminate the need for skilled security professionals, but they change the nature of the work. Analysts spend less time on repetitive triage and more time on complex investigation, threat hunting, and strategic planning. This shift requires thinking about training and development for existing staff, as well as hiring strategies that prioritize different skills than might have been relevant five years ago. The security professionals who thrive in an agentic environment will be those who can effectively supervise, guide, and supplement AI capabilities rather than those who excel at high-volume manual analysis.

The broader trajectory is clear. AI agents are becoming permanent features of the security landscape, and their capabilities will continue to improve. The question for security leaders is not whether to engage with this technology but how to engage with it in ways that deliver genuine value while managing the real risks. The case studies in this post show both the potential and the limits. Use them to set realistic expectations, and invest the time to understand where agents fit in your specific context. The four-million-person workforce gap is not going to be solved by AI alone, but thoughtful deployment of AI agents alongside skilled human analysts can meaningfully improve your security outcomes.