Skip to content

Latest commit

 

History

History
344 lines (274 loc) · 15.2 KB

README.md

File metadata and controls

344 lines (274 loc) · 15.2 KB

Adding a new checker to the cve-bin-tool

Requirements

In order to add a new checker to the CVE-bin-tool, one must provide a checker file. See any checker in the checkers/ directory as an example.

Currently, a checker must provide one class which inherits Checker class of the checkers module. class name of the checker must be same as filename of the checker with Checker suffix at the end. Ex: if you are creating a checker for curl binary then filename of checker should be curl.py and class definition should be:

from cve_bin_tool.checkers import Checker

class CurlChecker(Checker):

Every checker must contain following 4 class attributes specific to product(ex: curl) you are making checker for:

  1. CONTAINS_PATTERNS - list of commonly found strings in the binary of the product
  2. FILENAME_PATTERNS - list of different filename for the product
  3. VERSION_PATTERNS - list of version patterns found in binary of the product.
  4. VENDOR_PRODUCT - list of vendor product pairs for the product as they appear in NVD.

CONTAINS_PATTERN, FILENAME_PATTERNS and VERSION_PATTERNS supports regex to cover wide range of use cases.

Once the checker is added, its name should also be added to __init__.py (so that from modules import * will find it).

Hints for finding the right data to use

Finding a version pattern

The VERSION_PATTERNS contains strings which will be used as a signature for determining the version of the product that is present in the system. You should keep in mind that these strings should be consistent across all versions of the binary and in as many software distributions as possible.

You can get a basic idea of the pattern from looking at the project's documentation/website or use cvedetails since it catalogs vulnerable versions and thus has version lists. Once you know what the version numbers look like, you'll need to find them in the code or the binary itself to make sure you've got a findable pattern.

A few ways to do it:

  • The CVE Binary tool basically works by running the command line utility strings on a file, so if you have a local copy of the library, you can run strings $libraryname and see what comes out. try strings $libraryname | grep $version and see what you find, and if you don't find it that way strings $libraryname | less and page through (maybe run a filter in there so it's only strings over a certain size?)

  • If you don't have a copy, browse through the source to find the version string. It's usually helpfully named something like 'version' so a quick grep/search often will turn it up, and if you know the latest version number (usually proudly mentioned in the latest news post or similar) you can grep for that and then look at the history to see what valid patterns look like.

Avoiding false positives (beware the X.X.X version pattern!)

It can be very tempting to have a version pattern that matches X.X.X where X is a number (or in regex form: r"[0-9]+\.[0-9]+\.[0-9]+"). But beware! There are lots of other libraries potentially compiled in to your binary that will match X.X.X. The one you're most likely going to see is glibc, the standard c library.

For an example, here's a list of some of the "interesting" version-like strings from one of our binary test files:

~/Code/cve-bin-tool$ strings test/binaries/test-png-unknown.out
/lib64/ld-linux-x86-64.so.2
libc.so.6
GLIBC_2.2.5
This program is designed to test the cve-bin-tool checker.
It outputs a few strings normally associated with png 1.6.36.
They appear below this line.
------------------
Application uses deprecated png_write_init() and should be recompiled
GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
printf@@GLIBC_2.2.5
__libc_start_main@@GLIBC_2.2.5

As you can see, there's a lot of things that will match X.X.X:

  • glibc is version 2.2.5
  • gcc is version 7.4.0
  • Ubuntu is 18.04.1

So you want something that makes the version string a little more precise to the product you're looking for. For example, if we were intentionally looking for glibc (as in, writing a glibc checker), we could use the string GLIBC_ or @@GLIBC_ as a prefix and get a regex that would tell us about glibc without also telling us the GCC and Ubuntu versions.

So a good regex signature for GLIBC might be r"@@GLIBC_[0-9]+\.[0-9]+\.[0-9]+"

The whole point of the CVE Binary Tool is to detect libraries that you might not know are there, so we'd expect it often to be used on binaries that have a lot of libraries compiled into them. Finding a regex that detects only what you care about even in the face of a lot of similar strings is essential for us to avoid false positives.

It's also worth noting that sometimes there just aren't great version strings available: sometimes X.X.X is all you can find. If you get stuck at this point, please make a note of it in the New Checker issue if there is one. (You can make a new one and note it there if there isn't.) That helps other contributors know that that particular checker is going to be hard to do. Once you've done that, you can abandon the checker and find something easier to work on, or you can try to think outside the box to find another way to detect the version. One example is how we did it for the sqlite3 get_version_map() fuction where the checker uses version hashes from the website that are also stored as strings in the binary.

Finding FILENAME_PATTERNS

The FILENAME_PATTERNS contains the names of the files in the binary where the above signatures were found. If there are more than one place where the version strngs are found, please make sure that you add all the filenames.

Choosing contains patterns to detect the library

contains patterns are the string pattern that you commonly found in the binary of the product you are looking for. You want a signature that hasn't changed in a large number of versions so you'll detect the library as long as possible (and if you notice that it did change before some version date, you can always add more strings to improve the coverage). If you have a copy of the library you can run strings $libraryname to find some candidate strings that look good, then you should look at their source repository to see when those strings were added and if they were changed. (there's a 'history' button on github for this, or other tools for other repositories). CONTAINS_PATTERNS field supports regex pattern so you can use creative signature which remain same for number of versions.

Note: We by default include VERSION_PATTERNS as a valid CONTAINS_PATTERNS

You can find these by-

$ strings (path of the binary) | grep -i (product_name)

Quickstart for finding patterns

What often helps is trying to find an .rpm (or more than one) or a package which contains the product you're looking for.

Searching on https://pkgs.org is a good place to start.

For this example we'll be using libvorbis: https://pkgs.org/search/?q=libvorbis

In the below example we picked fedora 33's package for version 1.3.7 of libvorbis. We can extract the .rpm file using a combintation of rpm2cpio and cpio or using rpmfile. Sometimes you'll have packages which come in .deb or .tar files.

  • .deb files can be extracted with ar x somefile.deb && tar xvf data.tar.xz

  • .tar files can be extrated using tar

$ curl -sfL 'https://download-ib01.fedoraproject.org/pub/fedora/linux/releases/33/Everything/x86_64/os/Packages/l/libvorbis-1.3.7-2.fc33.x86_64.rpm' | rpmfile -xv -
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/02/980384bc359497f0121fc74974e465ba7e29aa
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/1c/ff0ed918467a6224a5108793bf779e61486151
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/75/8407ea857c63ae42c4d9959ad252de6fb9bcca
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbis.so.0
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbis.so.0.4.9
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisenc.so.2
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisenc.so.2.0.12
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisfile.so.3
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisfile.so.3.3.8
/tmp/tmp.U3wkntEqtD/usr/share/doc/libvorbis/AUTHORS
/tmp/tmp.U3wkntEqtD/usr/share/licenses/libvorbis/COPYING

Then look for which files you downloaded are binaries or libraries. We can use the file command combined with the find command for this. The find command will list every file in the directory we proivde to it (. in this case) and execute any program we want using that filename. In this case we want to run the file command on each file we get from find.

We want to filter the output using grep to show us only executables (programs you run) and shared objects (libaries programs use) using -E 'executable,|shared object,' which is a regex which says to show lines that find output if they have either executable, or shared object, in them.

The final tee command in combination with sed is creating a new file called executables.txt which has all the filenames in it. It does this by only writing what comes before the : to the file that was in the output of the grep command which looked for executables.

$ find . -exec file {} \; | grep -E 'executable,|shared object,' | tee >(sed -e 's/:.*//g' > executables.txt)
./usr/lib64/libvorbisfile.so.3.3.8: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=1cff0ed918467a6224a5108793bf779e61486151, stripped
./usr/lib64/libvorbisenc.so.2.0.12: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=02980384bc359497f0121fc74974e465ba7e29aa, stripped
./usr/lib64/libvorbis.so.0.4.9: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=758407ea857c63ae42c4d9959ad252de6fb9bcca, stripped

You'll want to run strings on those binaries and do a case insensitive search for the package name using grep -i.

$ strings $(cat executables.txt) | sort | uniq | grep -i libvorbis
3?Xiph.Org libVorbis 1.3.7
libvorbisenc.so.2
libvorbisenc.so.2.0.12-1.3.7-2.fc33.x86_64.debug
libvorbisfile.so.3
libvorbisfile.so.3.3.8-1.3.7-2.fc33.x86_64.debug
libvorbis.so.0
libvorbis.so.0.4.9-1.3.7-2.fc33.x86_64.debug
Xiph.Org libVorbis I 20200704 (Reducing Environment)

You also might want to look for the version number. In this case it's 1.3.7.

$ strings $(cat executables.txt) | sort | uniq | grep -i 1.3.7
3?Xiph.Org libVorbis 1.3.7
libvorbisenc.so.2.0.12-1.3.7-2.fc33.x86_64.debug
libvorbisfile.so.3.3.8-1.3.7-2.fc33.x86_64.debug
libvorbis.so.0.4.9-1.3.7-2.fc33.x86_64.debug

In this case the most interesting line in the output of the above two commands is 3?Xiph.Org libVorbis 1.3.7. We can probably use this to create a regex for VERSION_PATTERNS.

That regex might look like this: 3\?Xiph.Org libVorbis ([0-9]+\.[0-9]+\.[0-9]+)

If you can't get a signature match using just regex you may end up needing to overwrite the get_version() method for the checker, but that should be a last resort if you can't find a regex that works for VERSION_PATTERNS.

A note about this example: In the case of libvorbis the versions containing CVEs are 1.2.0 and below. The .rpm we used for this example was from version 1.3.7. While this was a nice example for how one might find a signature, it in the end is not all the work that is needed to create a checker for libvorbis. We need to make sure that any checker we develop has a get_version() function which works for versions of the software which have CVEs. If not overridden in a subclass the Checker base class implements a get_version() method which will use regex to determine the version (as described above). In the case of libvorbis a custom get_version() function is likely needed, this is because the signature we found is not in the 1.2.0 version, where the CVE is found.

Finding Vendor Product pairs

Every checker class must contain the vendor and product name pair(s) as they appear in NVD. The best way to do this is to search the cached sqlite database of the NVD using a CVE you want to know the vendor product pair(s) for.

$ sqlite3 ~/.cache/cvedb/cve.db \
    "SELECT vendor, product FROM cve_range WHERE CVE_Number='CVE-2016-0718';" \
    | sed -e 's/|/, /g' -e 's/^/VPkg\: /'
VPkg: apple, mac_os_x
VPkg: canonical, ubuntu_linux
VPkg: debian, debian_linux
VPkg: libexpat, expat
VPkg: mozilla, firefox
VPkg: opensuse, leap
VPkg: suse, linux_enterprise_debuginfo

VENDOR_PRODUCT attribute should have list of tuples of vendor product pair found in the listings. Some of the listings will be with regards to products that include this product. For our example all listings except libexpat, expat mearly include the target product (expat for the example SQL query).

Adding tests

There are two types of tests you want to add to prove that your checker works as expected:

  1. Test to show that the cve mapping works as expected.
  2. Tests to show that the checker correctly detects real binaries.

You can read about how to add these in tests/README.md

Running tests

To run the tests for cve-bin-tool

python setup.py test

To run tests for a particular checker

pytest -k $checkername

Alternatively you can run Long Tests using

LONG_TESTS=1 pytest -k $checkername

You can run tests in parallel by using

pytest -n 4

This will spawn 4 worker processes to leverage multicore system.
You can set an arbitrary number of workers. A good rule of thumb is to specify no. of workers equal to no. of cores.

How it works

The CVE-bin-checker works by extracting strings from binaries and determining if a given library has been compiled into the binary. For this, Checker class contains two methods: 1) guess_contains() and 2) get_version().

  1. guess_contains() method takes list of extracted string lines as an input and return True if it finds any of the CONTAINS_PATTERNS on any line from the lines.
  2. get_version() method takes list of extracted string lines and the filename as inputs and returns information about whether the binary contains the library in question, is a copy of the library in question, and if either of those are true it also returns a version string. If the binary does not contain the library, this function returns an empty dictionary.

If curl product is being scanned, get_version() method of CurlChecker will return following dictionary.

{
  "is_or_contains": "is",
  "modulename": "curl",
  "version": "6.41.0"
}

In most of the cases, Just providing above five class attributes will be enough. But sometimes, you need to override this method to correctly detect version of the product. We have done this in the checkers of python, sqlite and kerberos.