Skip to content

Ideas for New Features and Research

awaldow edited this page Jan 2, 2013 · 2 revisions

This page is for brainstorming on new features and areas of research for Nova/Honeyd. This page is NOT for lists of UI improvements, technology changes (as much fun as refactoring the UI with knockout.js would be), or something already in the ticket tracker; we spend a lot of time looking at what we already have, evaluating it, and making minor improvements. This page is for novel features that could improve the functionality of Nova/Honeyd as an anti-reconnaissance tool. Basically: think outside the box and about topics that could fill an entire monthly build cycle.

Fool nmap's service scanner

The nmap service database consist of probes and regular expression matching on replies. A simple example would be a telnet probe that matches the reply against something like "Welcome to Microsoft Telnet version \d*.?". Reverse regular expression generation is possible (generate an arbitrary string that matches a given regular expression, see http://code.google.com/p/xeger/), and could lead to an interesting tool to fool nmap by sending replies that could make nmap report whatever service you want (though the resulting banners and replies may look like gibberish to a trained user, eg a telnet service reporting a version that doesn't actually exist). This would be a novel research area that fits well in the scope of anti reconnaissance.

Improving Honeyd's ability to hide from more obscure fingerprinting tools

Honeyd is very good at fooling nmap and other TCP/IP stack fingerprinting tools. However, there are no configuration options and very simple (making them easy to detect) implementations of DHCP and ARP, making a potential vulnerably for Honeyd detection, as fingerprinting tools exist for both of these protocols.

IPV6 Support

Nothing really conceptually difficult about this, but a lot of effort to implement NDP, ICMPv6, and the general IP stack in Honeyd (good chunk of work started here https://github.com/DataSoft/Honeyd/tree/ipv6). That's also only the first step, the second being able to fool nmap OS scanner, which uses an entirely different fingerprint engine for IPV6 (new probes and matching fingerprints via linear regression rather than weighted probe results). We'd also have to have internal changes in Nova to handle 128 bit IP addresses, UI changes to handle input of ipv6 addresses, and possibly deal with memory related problems due to the bigger suspect size. There's also research to be done on how to make honeyd effective in ipv6 network where the subnets are too large to scan. Implementing broadcasting services like MDNS and other Zeroconf daemons could be useful to draw attackers to the honeypots.

Improve Honeyd scripts

Many of Honeyd's scripts are entirely trivial and non-functional, such as SSH. Research could be done into implementing better scripts and also using real services as honeyd subsystems rather than bash scripts (eg, run a real copy of OpenSSH with limited permissions and make it always reject login credentials).

Improve the threshold trigger classification engine

The threshold trigger CE is a simple, mark as hostile if features exceeds a threshold, classification technique. Despite it's simplicity, it offers a predictable and reliable alternative to KNN or other machine learning based matchers that can be extremely accurate in some cases (such as detecting a standard port scan by setting an alert trigger on TCP ports contacted > 100).

Areas of improvement for this include,

Allow more complex expressions (mark as hostile if f1 > x && f2 < y) Allow aggregation of multiple classification engines (KNN + simple threshold matching to avoid false negatives on obviously hostile looking suspects). Would the classification aggregator average results from multiple CEs? Mark as hostile if any of them said it was hostile? Weight each CE with some value? Create a 0-100% result rather than just 0 or 100% extremes by looking at how close suspects are to the thresholds.

Combined results from multiple classification engines is also something I've wanted to research for a while (see the machine learning idea of boosting http://en.wikipedia.org/wiki/Boosting_(meta-algorithm)).