Artificial intelligence (AI) has a growing role in cybersecurity by virtue of its ability to automate vital tasks such as anomaly detection for incident identification, enhance numerous types of analytics for threat detection and management, and reduce dependence on laboratory experimentation. A federal agency responsible for reducing risk to cyber and physical infrastructure sought Guidehouse’s expertise to leverage AI for the purpose of classifying vulnerabilities, a pressing yet challenging undertaking. While many hierarchies exist for grouping security flaws, their full implementation requires significant human expertise, which undermines scalability.
To tackle this problem, Guidehouse used a subset of AI called Natural Language Processing (NLP), which enables the modeling of textual features associated with vulnerabilities. Overall, our approach seeks to provide a new lens on vulnerability analysis generating previously inaccessible insights. By creating a reproducible classification framework that is practical in the face of large data volumes, new insights can be explored that match the needs of a modern big data-driven analysis ecosystem and realize its full potential.
Frameworks for classification research — Guidehouse first researched frameworks for classification, settling on the Organization for Web Application Security Protocols (OWASP) model, which groups security flaws into 10 categories based on researcher expertise.
Assembled common vulnerabilities and exposures — Guidehouse then assembled a common vulnerabilities and exposures (CVE) dataset containing over 120,000 vulnerabilities from the past 20 years and extrapolated pre-existing OWASP labels to 53% of the CVEs.
NLP experiments conducted — For unlabeled vulnerabilities requiring AI modeling, initial NLP experiments involved grouping uncategorized vulnerabilities using the Top2Vec1 topic modeling algorithm and classifying descriptions using keywords extracted with YaKe,2 a keyword-extraction algorithm.
Training neural network models — Subsequent research pivoted to training neural network models for classification of vectorized text data, which produced positive results for the largest classes,
Applying OWASP label mappings — The experiments overall indicated that applying OWASP label mappings when available and implementing a trained neural network for unlabeled vulnerabilities was effective for the majority of CVEs.
Description data accounting leveraged — Furthermore, experiments leveraged description data accounting for approximately 70% of all known cyber vulnerabilities, suggesting the approach is transferable to other organizations and their unique security challenges.
Complexity demands a trusted guide with the unique expertise and cross-sector versatility to deliver unwavering success. We work with organizations across regulated commercial and public sectors to catalyze transformation and pioneer new directions for those navigating the global energy transformation.