Using Machine Learning to Find the Next Cyber-Threat


There is a dark corner of the Internet where hackers sell their software. The process unfolds in three steps.

  1. A new security flaw is discovered, perhaps in Windows, Word or the Flash plug-in.
  2. A hacker develops a piece of software that can exploit that software and puts it up for sale on an underground forum.
  3. A criminal buys that software and adapts it to his own particular goal.

So one way to protect against malware is to find the code being sold at Step 2. That is what one team of researchers, headed by Eric Nunes, have done. They developed a system of crawlers that scan these Internet marketplaces and accumulate the data they have found. The data is then fed to a classifier to identify which products and discussions pertain to buying and selling malicious software. After being trained using semi-supervised machine learning techniques including Label Propagation and Co-training, they are able to find 92% of products and 80% of discussions, which translates to 305 cyber threat warnings per week.

You can read Nunes et al.’s paper or MIT Technology Review’s article summarizing the paper.