Hello everyone, I am currently submitting a dissertation proposal for creating a Sequence Classifier based on HTM. The proof of concept will attempt to recognize Stalling Code from a training from VirusShare.com, which is a huge repository of live viruses (not for the faint of heart). The gist of research will be to use a parser that disassembles a executable file and feeds the resulting hex values representing the assembly code to the input stream. The main issue that I’m struggling with is converging this low level data with higher abstracted concepts such as Stalling Code (code that uses various stalling tactics to evade detection). I’ll be posting more of my work on here and am hoping for constructive feedback from the community. Below is the abstract. …
An Dissertation Proposal Submitted to Nova Southeastern University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science
A Hierarchical Temporal Memory
Sequence Classifier Application for Classifying Streaming Data
Jeffrey V. Barnett
Identifying abnormal data within modern data streams is extremely difficult. Modern sequence classifiers such as Hidden Markov Models (HMM), Dynamic Time Warping (DTW), Rule-Based Classifiers (RBC), Deep Learning, and sequence alignment tools all fail to successfully classify data streams because of two factors: the overwhelming volume of data and a distinctive feature known as concept drift. Concept drift is a term used to describe changes in the statistical properties of an object or learned structure that occur over time which eventually leads to a drastic drop in classification accuracy. Some of the more prevalent and effective malware utilize concept drift to force classifiers to frequently retrain, resulting in a degradation of performance and an increase in the consumption of system resources. One of the simplest methods of forcing concept drift is through the use of stalling methods that execute trivial instructions, waits for user interaction, or invokes system sleep calls. The neocortex (brain) is a predictive modeling system which receives information from sensors (i.e., retina, cochlea, touch) and builds a real-time model of the world that enables it to make predictions, detect anomalies, and generate behavior. Sparse Distributed Representations (SDR) are considered by neuroscientist to be the data structure of the brain. Hierarchical Temporal Memory (HTM) is a type of sequence memory that exhibits the predictive and anomaly detection properties of the neocortex. HTM algorithms conduct training through exposure to a stream of sensory data and are designed for continuous online learning. Current HTM models use SDRs which model the way that neurons contribute to maintaining a sparse pattern of activity which represents the world. Previous research of HTMs has shown promise for use in malware detection; however, HTMs currently fall short of practical use in this area due to insufficient behavioral analysis of malware needed to properly encode sufficient semantic meaning in the SDRs for a classifier to process. This research proposes to develop SDRs of stalling code based on a thorough behavioral analysis and to develop a novel HTM sequence classifier that is capable of recognizing stalling code in streaming data which is resistant to concept drift.
Keywords: Hierarchical Temporal Memory, Intrusion Detection, Malware, Sparse Distributed Representation, Concept Drift