Artificial Intelligence Improves Cyber Vulnerability Classification

Guidehouse Solution Distills Vulnerability Insights at Scale



Artificial intelligence (AI) has a growing role in cybersecurity by virtue of its ability to automate vital tasks such as anomaly detection for incident identification, enhance numerous types of analytics for threat detection and management, and reduce dependence on laboratory experimentation. A federal agency responsible for reducing risk to cyber and physical infrastructure sought Guidehouse’s expertise to leverage AI for the purpose of classifying vulnerabilities, a pressing yet challenging undertaking. While many hierarchies exist for grouping security flaws, their full implementation requires significant human expertise, which undermines scalability. 


To tackle this problem, Guidehouse used a subset of AI called Natural Language Processing (NLP), which enables the modeling of textual features associated with vulnerabilities. Overall, our approach seeks to provide a new lens on vulnerability analysis generating previously inaccessible insights. By creating a reproducible classification framework that is practical in the face of large data volumes, new insights can be explored that match the needs of a modern big data-driven analysis ecosystem and realize its full potential.



  • Guidehouse first researched frameworks for classification, settling on the Organization for Web Application Security Protocols (OWASP) model, which groups security flaws into 10 categories based on researcher expertise
  • Guidehouse then assembled a common vulnerabilities and exposures (CVE) dataset containing over 120,000 vulnerabilities from the past 20 years and extrapolated pre-existing OWASP labels to 53% of the CVEs
  • For unlabeled vulnerabilities requiring AI modeling, initial NLP experiments involved grouping uncategorized vulnerabilities using the Top2Vec topic modeling algorithm and classifying descriptions using keywords extracted with YaKe, a keyword-extraction algorithm
  • Subsequent research pivoted to training neural network models for classification of vectorized text data, which produced positive results for the largest classes
  • The experiments overall indicated that applying OWASP label mappings when available and implementing a trained neural network for unlabeled vulnerabilities was effective for the majority of CVEs
  • Furthermore, experiments leveraged description data accounting for approximately 70% of all known cyber vulnerabilities, suggesting the approach is transferable to other organizations and their unique security challenges


Vulnerability data that is ever growing due to powerful ongoing scanning runs the risk of being impractical for an organization due to its subjective nature and volume. Scalable vulnerability categorization allows organizations to unlock the full potential of vulnerability data by distilling unwieldly volumes into discrete categories for actionable insights. This enriched and structured view of an entity’s security posture that Guidehouse’s solution proposes has powerful implications for organizations’ abilities to prioritize, triage, and manage threats. Analyst teams can gain improved visibility into patterns and trends for detected and/or exploited security flaws and produce insights needed for staying ahead of adversaries. Risk management can identify the most targeted classes of vulnerabilities and allocate funds more effectively for mitigating costly incidents. Teams ranging from human capital to incident response can tailor their solutions and programs to focus on the most pressing vulnerabilities and maximize value and impact for their organization.


Learn more about Guidehouse advanced analytics solutions



Case Study Contributor: Marika Van Laan



About the Experts

Back to top