Discover how Exaforce fuses logs, config & identity into an AI-powered graph that improves on legacy and naive detection techniques.
What does it take to do detections right, and why have we gotten it wrong for so long?
Detections were originally built using rules in real time on log data. In the mid-2010s, User and Entity Behavior Analytics (UEBA) products introduced sophisticated baselining and grouping techniques to identify outliers, focusing primarily on users and, occasionally, resources. For many mature SOC teams, these components of rules and anomalies are still the pillars of their detection architecture.
However, this approach leaves three critical gaps:
- It cannot correlate findings with configuration (config) data
- It fails to account for unique cloud semantics
- It leaves context evaluation to a manual post-detection phase, which results in noisy individual detections instead of full incidents.
Incorporating configuration information
One element that both legacy approaches lack is the ability to correlate event data and historical data with config data. This piece of the puzzle is critical to ensuring that excessive false positives are not raised and to properly assess the impact of the alert. Today, most teams incorporate this data only post detection during the triage and analysis phase, and often this information has to be retrieved manually, and is difficult to parse. Moving the config analysis into the detection logic itself can make detections more accurate and help your teams operate the most efficiently. This config data can include:
- Blast radius assessment: This user may be compromised, but what resources can they access? This is not easy. In some environments, such as AWS, assessing this requires a full chain of identity analysis of which roles can assume what other roles, each of which can access which resources. Without this full analysis, there can be false positives for what the true access the identity has.
- Proper severity assessment: This EC2 instance is acting anomalously, and its attached instance profile has admin permissions, which may be extra risky.
- False positive recognition: Using effective permission analysis, we can tell that this user was suddenly granted a very permissive role, but the sensitive assets have resource policies that override the role granted, so it’s actually benign and shouldn’t create an alert.
Attempts to incorporate config data into detections themselves have been made. Many products offer features such as user or entity frameworks, tags, and HR integrations to try and bring aspects of this data into the detection fold. However, maintaining those integrations and updates, and covering the breadth of config data in the model, is extremely time-consuming and difficult, and as such, many teams have resorted to moving the config analysis to the post-detection phase.
Built for cloud & SaaS
UEBA has its origins in the insider threat use case. It was primarily built for identities and modeling their typical activities/patterns. As the approach proved fruitful, many expanded to model some resources as well, often individual virtual machines (VMs) or instances. However, with the shift to IaaS and SaaS environments that many are embarking on, the notion of UEBA needs a major reset. Cloud resources are often ephemeral, leading to different scaling requirements and a new approach to baselining and anomaly detection. The variety of IaaS and SaaS resources - from Kubernetes workloads and pods, GitHub repositories and actions, to Google documents - all require very different modeling. Even the traditional identity is not as straightforward. Roles in AWS, for example, may be assumed by a mix of humans and machines, making their modeling far more complex. In some AWS cases, the alert may not even be attributable to an origin identity, only to the role used. As a result, the traditional UEBA tools and features often fall short of the needs of modern organizations that operate in cloud and multicloud environments.
Detections are not incidents
The job of the detection tool is not just to provide a hint of suspicious activity but also to ensure that the alert is framed in the full context of the environment. Examples include auto-grouping duplicate alerts and incorporating business context during events such as reorganizations, mergers, or new tool rollouts. The ability for the detection tool to accommodate such context is critical to ensuring analyst expediency and completeness of investigation, as it greatly reduces the noisy individual detections and transforms them into well-documented incidents. Very pointed tools often put a burden on a Security Information and Event Management (SIEM), Security Orchestration, Automation, and Response (SOAR), or other system to do a second level of correlation, aggregation, and analysis to perform some of these steps, making maintenance of this system cumbersome and manual.
At Exaforce, effective detection is a well-balanced triad of rules, anomalies, and config data purpose-built for the modern cloud and SaaS centric company. Here's how our approach breaks from tradition and why that matters.
In the next few sections, we’ll explore how Exaforce overcomes these limitations in current solutions with a fresh, AI-powered approach to data ingestion and modeling. By fusing log, config, and identity data into a unified semantic layer, and then layering behavioral baselines and knowledge-driven reasoning on top, Exaforce converts scattered signals into precise, high-fidelity alerts that reveal complete attack chains rather than isolated anomalies.
The Semantic Data Model
Ensuring quality data
Our approach to detection begins with Exaforce’s three-model architecture: the Semantic Data, Behavioral, and Knowledge models. Each adds a distinct layer of context.
We ingest event and config data from various sources and convert them into structured Events and Resources. Events are organized into Sessions to add perspective and contextual signals such as location, Autonomous System Numbers (ASN), and duration at a session level. We also chain sessions to capture origin identities, role assumptions, cross-role behavior, and action sequences that enable more complete analysis of what was done and by whom. This preps the data for the detection assessments to come.
Resources undergo similar treatment. We capture config, parse resource types, enrich the resources, and build relationship graphs. Exaforce also collects config changes over time, enabling us to detect subtle but critical changes that would otherwise go unnoticed. It also empowers us to assess the impact of each config change, effectively conducting a full blast radius analysis. Identities, a key subset of resources, receive extra enrichment, for example:
- Human vs. machine classification: Exaforce’s model analyzes identity types, behavior patterns, and role assumption patterns to classify identities as humans or machines. This classification is dynamic to allow for complex scenarios (e.g., cases in which a new identity is created by a human but then used in a script executed by a machine identity, or roles which are shared by both human and machine identities). As an identity’s human vs machine classification changes, so will the way they are enriched and modeled.
- Effective permission analysis: Interpret the full range of permissions the user has based on transitive role assumption capabilities and overlay them with resource policy information.
- Identity chaining: Which identity actually performed this action, not just which role was used
- Reachable resource analysis: which resources can this identity access, with what actions, and access level
- 3rd-party identities/access: Identify third-party identities and monitor their behavior and privileges more carefully.
Resources in our context are a generic construct. They could be anything from AWS EC2 instances, Kubernetes Jobs, GitHub repositories, to Okta roles. This modeling of the config from the outset allows for a more complete detection to be formed and provides the foundation for the first pillar: configuration.
The Behavioral Model
In Exaforce, any dimension that could be anomalous is referred to as a Signal, for example, an unusual location or rare service usage. Signals may be weak or strong, but both are important. Detections are generated by grouping signals that occur in the same event or session, representing collections of medium-fidelity anomalies. These signals and detections provide the rule and anomaly pillars of the solution.
The Semantic Model sets the data up to be modeled in the Behavioral Model. Sessionizing events, for example, allows us to go beyond baseline individual actions to baseline combinations of events and event patterns. Similarly, baselines are customized to the object in question; for example, humans and machines (identified in the aforementioned Semantic Data Model) are modeled differently. Machines tend to follow predictable patterns, while humans are far more eclectic. Shared identities, such as a role used by both an engineer and automation scripts, are modeled with this nuance in mind.
We model a wide range of signals, independently and in combination, including:
- Action (and action patterns)
- Service
- Time
- Duration
- Location and ASN (including cross-source comparisons)
- Resource
Here’s an example of an Exaforce finding with multiple signals. In this example, we saw both an Operation Anomaly and a Service Usage anomaly. This user, Mallam, does not usually perform this GetSecretValue action, and they do not typically perform actions in the AWS US East 2 region. This led Exaforce to fire a detection.
A contextualized threat finding bringing together an action with past behavior.Additional event data and signals brought together into a unified finding.This multidimensional approach is critical: a single weak signal is rarely enough, but several weak signals, together, often are. This rule and anomaly detection approach across the breadth of resources and log sources supported represent two pillars in the detection trio.
The Knowledge Model
The goal of detections is completeness to make sure no signal of potential compromise is overlooked. But completeness can result in noise. That’s where our Knowledge Model comes in.
After the semantic data model runs and signals fire, and are grouped into detections, Exaforce runs a triage pipeline that contextualizes the detection and adds organization-specific business context to turn medium fidelity detections into high fidelity findings. This triage process is performed on Exaforce and 3rd party alerts alike and helps augment context even further to ensure we truly only surface alerts worthy of analyst attention. This analysis includes weighing conflicting factors in context and occurs at the end of the detection stage. For example, it could include weighing the fact that the user has broad privileges, with the severity of the action taken.
The weighing of resource/identity config data with rule and anomaly outputs happens in this knowledge model and is supplemented by the additional context around similar findings, business context provided by the user, user/manager validation responses, etc.
- Similar Findings are identified. If closed, their resolutions are used as inputs to the model to assess this finding. If they are still open or in progress, the model will group them. Once grouped, the findings will be classified as duplicate, grouped, or chained to specify the relationship and level of related analysis.
- Business context rules allow users a mechanism to input free-form data into the model. This could include context about the environment - eg these resources are very important, or we use this VPN, they could include information about users - users A, B, C are all part of the Executive Team team and should be monitored carefully, or general context about the company - eg we are a Health company with offices in the following locations, and often have teams commuting between these sites. This freeform input allows novice users to influence and inform the Exabots about critical context without manually having to silence or suppress individual alerts.
- Exabots also have skills that allow them to seek validation from end users. If the Exabot determines that a user validation or manager validation would be helpful, it can trigger a Slack/Teams message to the individual and use their response as an influence on the determination.
The Exabots curate this set of information and pass it to the Knowledge Model agents to assess each of these factors and make a determination of “False Positive” or “Needs Investigation,” turning basic Detections into context-rich Incidents. All of these analyses are run continuously while the alert is open or in progress, so even as your environment changes, your recommended assessments stay up to date. Note that the preparation, structuring, and condensing of this data helps ensure that the AI agents performing the analysis are the most accurate and minimizes hallucinations.
Running the initial knowledge model before presenting the detection to the user allows the Exaforce detections to be extremely high fidelity.
Example
The user here was seen in two locations, Zurich and Matran, in quick succession. This quick mid-session location switch was anomalous, but the locations themselves were actually consistent with the user's previous behavior and consistent with other company employees as well. The actions performed were also consistent with historical behavior for this user. Therefore, the triage agent was able to weight the anomalous action against the other factors and rule this a false positive. You'll note that the triage agent is also armed with company-specific business context - In this example, it refers to an office in Zurich. (More about business context and triage in our next blog!)
An automatically marked false positive of a user accessing a repository from multiple locations based on business context.After triage, we group findings, both Exaforce and third-party, into aggregated attack chains. This lets analysts see the full picture, not just disconnected events.
Exaforce in action: GitHub example
Let’s see the Exaforce approach in practice.
GitHub is a critical data source. It contains sensitive data such as company intellectual property and can even have attached secrets that are highly permissive to perform CI/CD actions. However, it’s often overlooked.
Exaforce ingests logs and config data to gather activity information and identify risks and threats associated with supply chain attacks. For example, uses of Personal Access Tokens (PATs), the credentials commonly used in CI/CD and developer workflows. Out of the box, GitHub logs provide hashed PATs and basic attribution. Exaforce goes further. In this example, the Semantic Data Model
- Ingests log and config data, and sessionizes it to understand resources such as the repositories, workloads, actions, tokens, etc.
- Enriches the token resource with scope information for the token from the config data to understand access and permissions. This involves correlating the token’s scope information from the config information with runtime data containing the hashed token in the logs.
- Classifies tokens used for cron jobs, ad-hoc scripts, and user-driven actions based on their historical usage
Instead of simply attributing actions to a user, the behavioral model then also builds tailored baselines for the tokens themselves and generates signals for any anomalies found. PAT-based baselines allow for a variety of unique detections and protections. Users may have multiple PATs in use simultaneously for a mix of automation and ad hoc usage. Distinguished baselines per PAT allow us to avoid firing false positive detections where both are in use concurrently.
Here, we identified 6 types of anomalies (signals), most critically, a new repository being accessed from a new ASN.
A threat identified with a user making code changes with multiple locations and ASNs, contextualized with configuration data (lacking branch protection rules).The Knowledge Model weighed these anomalies against the PAT’s scopes to determine alert severity.
Multiple event signals correlated with configuration data culminating in a single alert with a dynamic severity.The traditional detection pillars were powerful in finding things but lacked context, creating noisy alerts without enough detail to paint a full picture. Exaforce delivers high-fidelity findings by starting with strong foundations: a semantic data model that structures raw IaaS and SaaS data into enriched, contextual entities.
We monitor a wide range of signals across actions, identities, sessions, and more to detect even minor deviations that add up to real alerts. Our bespoke modeling ensures deep coverage across both IaaS and SaaS environments, including overlooked systems like GitHub and Google Workspace.
Signals are aggregated into cohesive, cross-dimensional findings, and our triage agents weigh conflicting anomalies to surface only what truly matters.
The result? Comprehensive coverage, smarter triage, and dramatically fewer false positives.