A researcher from OpenDNS Security Labs has developed a new way to automatically detect and block sites used to distribute malware almost instantly without having to scan them. The approach, originally developed by researcher Jeremiah O’Connor, uses natural language processing and other analytics to detect malicious domains before they can attack by spotting hostnames designed as camouflage. Called NLPRankit flags DNS queries for sites that have names similar to legitimate sites, but with IP addresses that are outside of expected address blocks and other related data that suggests uncertainty.
The practice of using similar domain names in an effort to trick victims into visiting websites or endorsing downloads is a well-established approach in computer crime. But recent attacks engineered via ‘phishing’ links in email and social media have gone beyond the well-worn approach of ‘typo-squatting’ by using domain names that appear close to those of trusted sites, saved just in time for the attacks to fly. under reputation rating security tools to make it harder to blacklist them. Fake domain names such as update-java.net and adobe-update.net, for example, were used in the recently discovered “Carbanak” attacks against banks that allowed criminals to access the networks of institutions from January 2013 and to steal $1 billion over the next two years.
Many security services can filter out malicious sites based on techniques such as reputation analysis, checking a centralized database to see if a site name has been associated with malware attacks. But because attackers are able to quickly register new domains with scripted systems that seem relatively legitimate to the average user, they can often circumvent reputation checks, especially when using their specially crafted domain names in highly targeted attacks.
O’Connor’s approach, which is currently being tested by OpenDNS using live DNS query traffic, circumvents the reputation problem by simply analyzing the domain name itself for inaccuracy. It works in a way similar to the natural language processing of any textual content stream. Using patterns spotted in malicious DNS traffic, OpenDNS security researchers train the NLPRank system to identify domain names that look like legitimate sites but have attributes that flag them as suspicious.
“Essentially what we are defining is ‘malicious language’ in the lexical nature of DNS traffic,” O’Connor wrote in a blog post published this morning. The “language” consists of domain names which are combinations of text related to company technology (such as “java”, “gmail”, “facebook” or “adobe”, for example with a collection of “certain dictionary words,” O’ Connor explained (“install,” “update,” “security,” or “payment,” for example).
The system then performs “sentiment analysis” on frequently queried domain names in the tens of billions of DNS queries that pass through OpenDNS daily, looking for patterns like these, applying a set of rank scores to the names of domains that match the model. “If it’s a domain related to Facebook and not associated with Facebook’s IP address space, that would be a negative tick,” said Andrew Hay, director of security research at OpenDNS, in an interview with Ars. “Or if it was registered a day ago and administered by someone with a Russian disposable email address, that would be negatives.” And the system can also perform HTML analysis of websites associated with domain names to check if there is a match. “We can look at scam websites and compare them to real legitimate pages, see how much they differ,” Hay explained.
Hay said OpenDNS is currently fine-tuning the system to avoid false positives, but so far NLPRank has held up well in testing. “We used it to detect malicious phishing campaigns,” he said. “And we’ve been able to use it to validate data in reports from other security companies, giving us further confirmation that it’s working.”