SureLog SIEM Federated Anomaly Detection Engine Using Classification
Next generation detection engine of SureLog SIEM combining rule based and ML based techniques. SureLog utilizes machine learning models and advanced correlation rules together and dynamically update each of them. 
Anomaly detection via classification
Anomaly detection with SureLog infers a probabilistic model for the network behaviors of each IP address. Each network event is assigned an estimated probability (henceforth, the event’s “score”). Those events with the lower scores are flagged as “suspicious” for further analysis.
SureLog utilizes Linear Discriminant Analysis (LDA)  as classifier.
The advantage that LDA has over other algorithms, is that most clustering algorithms would allow a data point to belong to only one group. There could be hundreds of different groups, but in the end, a data point will belong to just one. Often, it is not that clear-cut. LDA clusters things in such a way that they can belong to more than one group, i.e. “soft” clustering, as opposed to “hard” clustering.
Another advantage of LDA is that most clustering algorithms have some distance function between what is being measured that is similar. The LDA algorithm allows a data point to be partially similar to the other data points. This comes back to the “softness” of the algorithm—it allows partial similarity for a data point to still belong to a group.
Feature Selection is one of the core concepts in machine learning. SureLog anomaly detection model features are:
- Source IP
- Destination IP
- Source Port
- Destination Port
- Sent Bytes
- Received Bytes
- Sent Packets
- Received Packets
SureLog machine learning model uses a topic-modelling approach that:
- Simplifies entity log records into words.
- A topic modelling approach is used to infer a collection of “topics” that represent common profiles of network activities.
- These “topics” are probability distributions on words.
- Each entity has a mix of topics corresponding to its behavior.
- The probability of a word appearing in the network activity about an entity is estimated by simplifying its log record
- into a word, and then combining the word probabilities per topic using the topic mix of the particular entity.
- Create these models using the factory in the companion object.
SureLog converts logs to words.
(1) A record with source port 1066, destination port 301, protocol given as TCP, time of day with hour equal to 3, bytes transferred equal to 1026, with 10 packets sent.
The word “301_TCP_3_12_5” is created for the source IP document.
The word: “-1_301_TCP_3_12_5” is created for the destination IP document .
(2) A record with source port 1194, destination port 1109, protocol given as UDP, time of day with hour equal to 7, bytes transferred equal to 1026, and 1 packet sent.
The word: “333333_UDP_7_12_1” is created for both the source and destination IP documents
Created words inserted into the document associated with the source IP document and destination IP document then LDA algorithm applied.
SureLog detects anomalies using LDA and also SureLog supports many ML models and correlation rules.
SureLog is ready for the fallowing ML libraries also. 
SureLog threat detection and anomaly detection module utilizes many ML models and datasets.