In order to add a new checker to the CVE-bin-tool, one must provide a checker
file. See any checker in the checkers/
directory as an example.
Currently, a checker must provide one class which inherits Checker class of
the checkers module. class name of the checker must be same as filename of the
checker with Checker
suffix at the end. Ex: if you are creating a checker for
curl
binary then filename of checker should be curl.py
and class definition
should be:
from cve_bin_tool.checkers import Checker
class CurlChecker(Checker):
Every checker must contain following 4 class attributes specific to product(ex: curl) you are making checker for:
- CONTAINS_PATTERNS - list of commonly found strings in the binary of the product
- FILENAME_PATTERNS - list of different filename for the product
- VERSION_PATTERNS - list of version patterns found in binary of the product.
- VENDOR_PRODUCT - list of vendor product pairs for the product as they appear in NVD.
CONTAINS_PATTERN
, FILENAME_PATTERNS
and VERSION_PATTERNS
supports regex to cover
wide range of use cases.
Once the checker is added, its name should also be added to __init__.py
(so
that from modules import *
will find it).
The VERSION_PATTERNS contains strings which will be used as a signature for determining the version of the product that is present in the system. You should keep in mind that these strings should be consistent across all versions of the binary and in as many software distributions as possible.
You can get a basic idea of the pattern from looking at the project's documentation/website or use cvedetails since it catalogs vulnerable versions and thus has version lists. Once you know what the version numbers look like, you'll need to find them in the code or the binary itself to make sure you've got a findable pattern.
A few ways to do it:
-
The CVE Binary tool basically works by running the command line utility
strings
on a file, so if you have a local copy of the library, you can runstrings $libraryname
and see what comes out. trystrings $libraryname | grep $version
and see what you find, and if you don't find it that waystrings $libraryname | less
and page through (maybe run a filter in there so it's only strings over a certain size?) -
If you don't have a copy, browse through the source to find the version string. It's usually helpfully named something like 'version' so a quick grep/search often will turn it up, and if you know the latest version number (usually proudly mentioned in the latest news post or similar) you can grep for that and then look at the history to see what valid patterns look like.
It can be very tempting to have a version pattern that matches X.X.X
where X
is a number
(or in regex form: r"[0-9]+\.[0-9]+\.[0-9]+"
). But beware! There are lots of other
libraries potentially compiled in to your binary that will match X.X.X
. The one you're
most likely going to see is glibc, the standard c library.
For an example, here's a list of some of the "interesting" version-like strings from one of our binary test files:
~/Code/cve-bin-tool$ strings test/binaries/test-png-unknown.out
/lib64/ld-linux-x86-64.so.2
libc.so.6
GLIBC_2.2.5
This program is designed to test the cve-bin-tool checker.
It outputs a few strings normally associated with png 1.6.36.
They appear below this line.
------------------
Application uses deprecated png_write_init() and should be recompiled
GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
printf@@GLIBC_2.2.5
__libc_start_main@@GLIBC_2.2.5
As you can see, there's a lot of things that will match X.X.X
:
- glibc is version 2.2.5
- gcc is version 7.4.0
- Ubuntu is 18.04.1
So you want something that makes the version string a little more precise to the
product you're looking for. For example, if we were intentionally looking for
glibc (as in, writing a glibc checker), we could use the string GLIBC_
or
@@GLIBC_
as a prefix and get a regex that would tell us about glibc without
also telling us the GCC and Ubuntu versions.
So a good regex signature for GLIBC might be r"@@GLIBC_[0-9]+\.[0-9]+\.[0-9]+"
The whole point of the CVE Binary Tool is to detect libraries that you might not know are there, so we'd expect it often to be used on binaries that have a lot of libraries compiled into them. Finding a regex that detects only what you care about even in the face of a lot of similar strings is essential for us to avoid false positives.
It's also worth noting that sometimes there just aren't great version strings
available: sometimes X.X.X
is all you can find. If you get stuck at this
point, please make a note of it in the
New Checker
issue if there is one. (You can make a new one and note it there if there isn't.)
That helps other contributors know that that particular checker is going to be
hard to do. Once you've done that, you can abandon the checker and find something
easier to work on, or you can try to think outside the box to find another way
to detect the version. One example is how we did it for the
sqlite3 get_version_map() fuction
where the checker uses version hashes from the website that are also stored
as strings in the binary.
The FILENAME_PATTERNS contains the names of the files in the binary where the above signatures were found. If there are more than one place where the version strngs are found, please make sure that you add all the filenames.
contains patterns are the string pattern that you commonly found in the binary of the
product you are looking for. You want a signature that hasn't changed in a large
number of versions so you'll detect the library as long as possible (and if you
notice that it did change before some version date, you can always add more
strings to improve the coverage). If you have a copy of the library you can
run strings $libraryname
to find some candidate strings that look good,
then you should look at their source repository to see when those strings
were added and if they were changed. (there's a 'history' button on github
for this, or other tools for other repositories). CONTAINS_PATTERNS
field supports
regex pattern so you can use creative signature which remain same for number of
versions.
Note: We by default include VERSION_PATTERNS as a valid CONTAINS_PATTERNS
You can find these by-
$ strings (path of the binary) | grep -i (product_name)
What often helps is trying to find an .rpm
(or more than one) or a package
which contains the product you're looking for.
Searching on https://pkgs.org is a good place to start.
For this example we'll be using libvorbis
: https://pkgs.org/search/?q=libvorbis
In the below example we picked fedora 33's package for version 1.3.7 of
libvorbis. We can extract the .rpm
file using a combintation of
rpm2cpio and cpio
or using rpmfile. Sometimes you'll have
packages which come in .deb
or .tar
files.
-
.deb
files can be extracted withar x somefile.deb && tar xvf data.tar.xz
-
.tar
files can be extrated usingtar
$ curl -sfL 'https://download-ib01.fedoraproject.org/pub/fedora/linux/releases/33/Everything/x86_64/os/Packages/l/libvorbis-1.3.7-2.fc33.x86_64.rpm' | rpmfile -xv -
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/02/980384bc359497f0121fc74974e465ba7e29aa
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/1c/ff0ed918467a6224a5108793bf779e61486151
/tmp/tmp.U3wkntEqtD/usr/lib/.build-id/75/8407ea857c63ae42c4d9959ad252de6fb9bcca
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbis.so.0
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbis.so.0.4.9
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisenc.so.2
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisenc.so.2.0.12
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisfile.so.3
/tmp/tmp.U3wkntEqtD/usr/lib64/libvorbisfile.so.3.3.8
/tmp/tmp.U3wkntEqtD/usr/share/doc/libvorbis/AUTHORS
/tmp/tmp.U3wkntEqtD/usr/share/licenses/libvorbis/COPYING
Then look for which files you downloaded are binaries or libraries. We can use
the file
command combined with the find
command for this. The find
command will list every file in the directory we proivde to it (.
in this
case) and execute any program we want using that filename. In this case we want
to run the file
command on each file we get from find
.
We want to filter the output using grep
to show us only executables (programs
you run) and shared objects (libaries programs use) using
-E 'executable,|shared object,'
which is a regex which says to show lines that
find
output if they have either executable,
or shared object,
in them.
The final tee
command in combination with sed
is creating a new file called
executables.txt
which has all the filenames in it. It does this by only
writing what comes before the :
to the file that was in the output of the
grep
command which looked for executables.
$ find . -exec file {} \; | grep -E 'executable,|shared object,' | tee >(sed -e 's/:.*//g' > executables.txt)
./usr/lib64/libvorbisfile.so.3.3.8: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=1cff0ed918467a6224a5108793bf779e61486151, stripped
./usr/lib64/libvorbisenc.so.2.0.12: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=02980384bc359497f0121fc74974e465ba7e29aa, stripped
./usr/lib64/libvorbis.so.0.4.9: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=758407ea857c63ae42c4d9959ad252de6fb9bcca, stripped
You'll want to run strings on those binaries and do a case insensitive search
for the package name using grep -i
.
$ strings $(cat executables.txt) | sort | uniq | grep -i libvorbis
3?Xiph.Org libVorbis 1.3.7
libvorbisenc.so.2
libvorbisenc.so.2.0.12-1.3.7-2.fc33.x86_64.debug
libvorbisfile.so.3
libvorbisfile.so.3.3.8-1.3.7-2.fc33.x86_64.debug
libvorbis.so.0
libvorbis.so.0.4.9-1.3.7-2.fc33.x86_64.debug
Xiph.Org libVorbis I 20200704 (Reducing Environment)
You also might want to look for the version number. In this case it's 1.3.7
.
$ strings $(cat executables.txt) | sort | uniq | grep -i 1.3.7
3?Xiph.Org libVorbis 1.3.7
libvorbisenc.so.2.0.12-1.3.7-2.fc33.x86_64.debug
libvorbisfile.so.3.3.8-1.3.7-2.fc33.x86_64.debug
libvorbis.so.0.4.9-1.3.7-2.fc33.x86_64.debug
In this case the most interesting line in the output of the above two commands
is 3?Xiph.Org libVorbis 1.3.7
. We can probably use this to create a regex for
VERSION_PATTERNS
.
That regex might look like this: 3\?Xiph.Org libVorbis ([0-9]+\.[0-9]+\.[0-9]+)
If you can't get a signature match using just regex you may end up needing to overwrite the
get_version()
method for the checker, but that should be a last resort if you can't find a regex that works forVERSION_PATTERNS
.A note about this example: In the case of libvorbis the versions containing CVEs are 1.2.0 and below. The
.rpm
we used for this example was from version 1.3.7. While this was a nice example for how one might find a signature, it in the end is not all the work that is needed to create a checker for libvorbis. We need to make sure that any checker we develop has aget_version()
function which works for versions of the software which have CVEs. If not overridden in a subclass the Checker base class implements aget_version()
method which will use regex to determine the version (as described above). In the case of libvorbis a customget_version()
function is likely needed, this is because the signature we found is not in the 1.2.0 version, where the CVE is found.
Every checker class must contain the vendor and product name pair(s) as they appear in NVD. The best way to do this is to search the cached sqlite database of the NVD using a CVE you want to know the vendor product pair(s) for.
$ sqlite3 ~/.cache/cvedb/cve.db \
"SELECT vendor, product FROM cve_range WHERE CVE_Number='CVE-2016-0718';" \
| sed -e 's/|/, /g' -e 's/^/VPkg\: /'
VPkg: apple, mac_os_x
VPkg: canonical, ubuntu_linux
VPkg: debian, debian_linux
VPkg: libexpat, expat
VPkg: mozilla, firefox
VPkg: opensuse, leap
VPkg: suse, linux_enterprise_debuginfo
VENDOR_PRODUCT
attribute should have list of tuples of vendor product pair
found in the listings. Some of the listings will be with regards to products
that include this product. For our example all listings except
libexpat, expat
mearly include the target product (expat
for the
example SQL query).
There are two types of tests you want to add to prove that your checker works as expected:
- Test to show that the cve mapping works as expected.
- Tests to show that the checker correctly detects real binaries.
You can read about how to add these in tests/README.md
To run the tests for cve-bin-tool
python setup.py test
To run tests for a particular checker
pytest -k $checkername
Alternatively you can run Long Tests using
LONG_TESTS=1 pytest -k $checkername
You can run tests in parallel by using
pytest -n 4
This will spawn 4 worker processes to leverage multicore system.
You can set an arbitrary number of workers. A good rule of thumb is to specify no. of workers equal to no. of cores.
The CVE-bin-checker works by extracting strings from binaries and determining
if a given library has been compiled into the binary. For this, Checker class
contains two methods: 1) guess_contains()
and 2) get_version()
.
guess_contains()
method takes list of extracted string lines as an input and return True if it finds any of theCONTAINS_PATTERNS
on any line from the lines.get_version()
method takes list of extracted string lines and the filename as inputs and returns information about whether the binary contains the library in question, is a copy of the library in question, and if either of those are true it also returns a version string. If the binary does not contain the library, this function returns an empty dictionary.
If curl
product is being scanned, get_version()
method of CurlChecker will
return following dictionary.
{
"is_or_contains": "is",
"modulename": "curl",
"version": "6.41.0"
}
In most of the cases, Just providing above five class attributes will be enough.
But sometimes, you need to override this method to correctly detect version of
the product. We have done this in the checkers of python
, sqlite
and kerberos
.