Exaforce Blog | One LLM does not an AI SOC make

Contributed by: Andrew Green, Enterprise IT Research Analyst

Even though LLMs have the potential to finally solve the decades-old challenges in the SOC, their ability to generate statistically likely strings of text is only a small component of what would constitute an AI SOC. Being new, cool, and delivering on the AI mandate from the C-suites, LLMs have taken the limelight in the AI SOC.

First, let us remember where LLMs shine:

Their inference process mimics human-like reasoning in a text-based format, which can be used to substitute a human agent's actions
Summarizing large amounts of hard-to-read data and translating it into human-understandable insights.

Both capabilities listed above are subject to the data being formatted in an LLM-friendly way, which is what we will address in this blog.

Why is the SOC a ripe area for automation using LLMs?

Unsurprisingly, these capabilities address the exact challenges for the current technology stack used in the SOC. Namely, analysts working with multiple tools to analyze multiple sources of data in order to manually investigate and respond to incidents.

This is how we got the most quoted problems in the SOC - alert overload, consolidation of tools, and staff shortages. Tools such as SOAR have attempted to address issues around alert overload and staff shortages using deterministic automation. However, scripts and workflows are exclusively useful for the use cases they were designed for. LLM-based automation, however, is suitable for investigation and response without needing to pre-define logic.

For the consolidation of tools, efforts are typically around deploying security operations platforms, which have typically been SIEM providers which have acquired or natively developed UEBA, XDR, or SOAR. These are very high effort and high cost exercises, which are often prohibitive for most organizations. LLMs, with their natural language interface capabilities, can act as an overlay across disparate tools.

To leverage these LLM capabilities, it’s not enough to have a ChatGPT-like experience, i.e. use just one LLM to take in logs via a prompt submitted by an analyst. LLMs need to be architected as agents. The difference between an LLM and an agent has to do with the services wrapped around the LLM, which include memory, RAG, reasoning and chain-of-thought, output parsing and the like. Agents are then architected into multi-agents, which consists of one master agent or orchestrator which is responsible for coordination, and multiple purpose-built agents such as endpoint response agents, cloud log interpretation agents, and evaluation agents which assess the validity of outputs from other agents. With this architecture, LLM agents are well-suited to address the complexity, scale, and repetitive nature of SOC workflows.

Limitations of using LLMs in the SOC

However promising LLMs are for solving some of the SOC’s most pressing challenges, they are not without limitations, which include the following:

Context degradation and forgetting - LLMs have finite context windows. As conversations grow longer or when processing large datasets, older information gets pushed out of the model's active memory. In the SOC investigations can involve analyzing weeks or months of logs. In those instances the LLM may "forget" earlier findings or context that must be used for accurate results.
Multi-agent handoff loss of resolution - In multi-agent architectures, handoffs between agents represent a point of failure. Critical context, nuances, or intermediate findings may get filtered out or summarized away as data moves through the agent chain.
Model drift - The longer the output, the more likely it is to drift. As LLMs generate extended responses or analysis, they tend to gradually deviate from the original query or lose focus on the specific security context. This is particularly an issue with chatty LLMs which provide verbose answers.
Time-series analysis - While LLMs excel at pattern recognition in text, they are not designed to handle numerical values and calculations. SOC work heavily relies on detecting statistical outliers or identifying subtle changes in user behavior over time. These tasks are better suited to specialized statistical models or machine learning algorithms, whose findings can then be fed into the AI agents.
Hallucinations - LLMs only produce the most likely string of text based on a prompt and context. There is no truth value associated with the prediction, which means that it can produce a likely but factually incorrect string. One hallucination can be carried over in the following responses.

You may think that the above are limitations of LLMs altogether, but these are particularly important for SOC use cases, not just because of the sensitive nature and low tolerance for faults, but due to the security stack itself. Take IAM log sources for example, which may include Entra ID, Google Workspace, or Okta.

Entra ID

{
  "time": "2025-01-15T14:30:25.123Z",
  "operationName": "Sign-in activity",
  "category": "SignInLogs",
  "resultType": "Success",
  "userPrincipalName": "john.doe@company.com",
  "ipAddress": "203.0.113.100",
  "location": "New York, US"
}

Google Workspace

{
  "id": {"time": "2025-01-15T14:45:30.456Z", "uniqueQualifier": "abc123"},
  "actor": {"email": "jane.smith@company.com"},
  "events": [{"name": "login", "type": "login"}],
  "ipAddress": "198.51.100.25"
}

Okta

{
  "uuid": "def456-ghi789-jkl012",
  "published": "2025-01-15T15:00:45.789Z",
  "eventType": "user.session.start",
  "actor": {"alternateId": "bob.wilson@company.com"},
  "client": {"ipAddress": "192.0.2.50"},
  "outcome": {"result": "SUCCESS"}
}

Looking at the syntax for each log, it’s easy to see how the same event type has different fields and field names across various log sources. Interpreting this information would require a human, let alone AI, to understand the nuances of these events across each source, without which information can be interpreted incorrectly, leading to incorrect analysis. Without normalization and canonicalization of information, it becomes difficult to extract a consistent value for understanding threats and providing insights for investigations.

Pre-LLM data processing and analysis

We can give LLMs the best chance of producing accurate outputs through a smart end-to-end pipeline, from data ingest to the multi-agent architectures and post-LLM validation and evaluation.

The pre-LLM data processing layer is perhaps the most important one for circumventing the limitations listed above, particularly around hallucinations. This pre-LLM layer is often referred to as the semantic layer, which is responsible for transforming raw data into LLM-friendly formats. By LLM-friendly formats, we refer to consistent and explicit schemas across all sources.

This is commonly done through normalization, deduplication, sanitization, and conversion of data, which will make, as an example, Entra ID, Google Workspace, and Okta logs read the same. An alternative option is to define explicit schema definitions for each data source which tells the LLM how the source is formatted and how to interpret it.

Like time-series analysis mentioned above, behavioral analysis is not in the LLM’s wheelhouse, but it is a great candidate for the pre-LLM semantic layer. Established techniques such as statistical analysis, are often adopted in solutions such as UEBA, but the output is either another alert, or a simple deterministic automation script. In the AI SOC, these can be forwarded to an Agent for further investigation, interpretation, and validation.

The AI SOC must extend beyond the LLM

We’ve seen the inherent limitation of LLMs and how they cannot be the only component in an AI SOC. Yes, they can perform sophisticated investigations at the level of human analysts in a fraction of the time, but they can only do it accurately if they are fed the right type of data in the right format, and if their output is evaluated and validated.

As such, the data must be made not just machine-readable but human-interpretable. After all, LLMs are optimized to predict what a human would likely say next, not to establish truth from bits. The layer responsible for normalizing, deduplicating, enriching, and shaping logs into consistent semantic structures is not optional. It’s foundational.

This semantic layer that renders fragmented, multi-source security data into a format that’s both logical and legible, gives LLMs the grounding they need to operate effectively and truly provide value in the SOC.

Table of contents