Case Study

Artificial Intelligence Improves Cyber Vulnerability Classification

Guidehouse Solution Distills Vulnerability Insights at Scale

Challenge 

Artificial intelligence (AI) has a growing role in cybersecurity by virtue of its ability to automate vital tasks such as anomaly detection for incident identification, enhance numerous types of analytics for threat detection and management, and reduce dependence on laboratory experimentation. A federal agency responsible for reducing risk to cyber and physical infrastructure sought Guidehouse’s expertise to leverage AI for the purpose of classifying vulnerabilities, a pressing yet challenging undertaking. While many hierarchies exist for grouping security flaws, their full implementation requires significant human expertise, which undermines scalability. 

 

To tackle this problem, Guidehouse used a subset of AI called Natural Language Processing (NLP), which enables the modeling of textual features associated with vulnerabilities. Overall, our approach seeks to provide a new lens on vulnerability analysis generating previously inaccessible insights. By creating a reproducible classification framework that is practical in the face of large data volumes, new insights can be explored that match the needs of a modern big data-driven analysis ecosystem and realize its full potential.

 

 

Solution

Frameworks for classification research — Guidehouse first researched frameworks for classification, settling on the Organization for Web Application Security Protocols (OWASP) model, which groups security flaws into 10 categories based on researcher expertise.

Assembled common vulnerabilities and exposures — Guidehouse then assembled a common vulnerabilities and exposures (CVE) dataset containing over 120,000 vulnerabilities from the past 20 years and extrapolated pre-existing OWASP labels to 53% of the CVEs.

NLP experiments conducted — For unlabeled vulnerabilities requiring AI modeling, initial NLP experiments involved grouping uncategorized vulnerabilities using the Top2Vec1 topic modeling algorithm and classifying descriptions using keywords extracted with YaKe,2 a keyword-extraction algorithm.

Training neural network models — Subsequent research pivoted to training neural network models for classification of vectorized text data, which produced positive results for the largest classes,

Applying OWASP label mappings — The experiments overall indicated that applying OWASP label mappings when available and implementing a trained neural network for unlabeled vulnerabilities was effective for the majority of CVEs.

Description data accounting leveraged — Furthermore, experiments leveraged description data accounting for approximately 70% of all known cyber vulnerabilities, suggesting the approach is transferable to other organizations and their unique security challenges.

 

Impact

Vulnerability data that is ever growing due to powerful ongoing scanning runs the risk of being impractical for an organization due to its subjective nature and volume. Scalable vulnerability categorization allows organizations to unlock the full potential of vulnerability data by distilling unwieldly volumes into discrete categories for actionable insights. This enriched and structured view of an entity’s security posture that Guidehouse’s solution proposes has powerful implications for organizations’ abilities to prioritize, triage, and manage threats. Analyst teams can gain improved visibility into patterns and trends for detected and/or exploited security flaws and produce insights needed for staying ahead of adversaries. Risk management can identify the most targeted classes of vulnerabilities and allocate funds more effectively for mitigating costly incidents. Teams ranging from human capital to incident response can tailor their solutions and programs to focus on the most pressing vulnerabilities and maximize value and impact for their organization.

 

 

 

 


Sources

1 Angelov, Dimo. 2023. “Top2Vec.” GitHub. July 28, 2023. https://github.com/ddangelov/Top2Vec.
2 “Yet Another Keyword Extractor (Yake).” 2022. GitHub. August 19, 2022. https://github.com/LIAAD/yake.

 


Let Us Help Guide You

Complexity demands a trusted guide with the unique expertise and cross-sector versatility to deliver unwavering success. We work with organizations across regulated commercial and public sectors to catalyze transformation and pioneer new directions for the future.

Stay ahead of the curve with news, insights and updates from Guidehouse about issues relevant to your organization and its work.