Learning to Classify Vulnerabilities and Predict Exploits

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed—well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. A key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross-references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.

Mehran Bozorgi, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker,
Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits
To appear in Proceedings of the Sixteenth ACM Conference on Knowledge Discovery and Data Mining (KDD-10), Washington D.C., July 2010.

Our experiments are based on the data available in the following locations:

OSVDB: The Open Source Vulnerability Database.
CVE: Common Vulnerabilities and Exposures.

The data set used in our experiments will be available on this page soon. Meanwhile, please send us an email at mehranbozorgi@gmail.com if you have any questions.

UCSD Computer Science and Engineering

Learning to Classify Vulnerabilities and Predict Exploits

Abstract

Publications

People

Data Sets

Affiliations