Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

val-ms · 2024-01-10T00:08:26Z

The goal of the ClamAV's functions like "cli_malloc" and "cli_realloc" are to check that the size of the allocation is not unreasonable. But we only need to do that check when the size is derived from untrusted user input (aka may vary based on values in the possibly malicious file that we're scanning).

There are many calls to allocate memory where the size of the allocation is either known at compile time or else is derived from trusted user input. Examples of variable but trusted allocation sizes include:

the number of parts of a signature
or the number of blocks in a bytecode function.

This PR seeks to eliminate those unnecessary max-allocation checks. This is mostly because checking the limits in these cases is a waste of CPU. But it also sets groundwork for a future improvement where we may pass in the memory allocation limit as a parameter, and make it generally configurable for the scanning service. This would be meaningful for anyone wishing to crank up the max-filesize to scan large archives that decompress to very large amounts of data.

Further, this PR renames the max-allocation functions and related functions and macros so it's obvious what they do. I suspect that the reason the likes of "cli_calloc" and "cli_realloc" were so widely used for fixed size allocations is that the developers knew we had a special wrapper for allocating memory and didn't know when they should be used.

What's also drawing attention to is the functions that wrap allocations but do NOT check the allocation-limit. These are cli_strdup, cli_realloc, and cli_realloc2. These functions are safer than the original strdup and realloc functions from the C "standard library".

For strdup, our wrapper adds a NULL check to prevent accidental NULL-deref crashes.
For realloc, the "standard library" one is not standard. Some variations may free the pointer if size == 0. Some do not. Ours wrapper does not. In this way, we know the behavior will be consistent. Our version also has two variants. The first, cli_realloc does not free the pointer on failure. The second, cli_realloc2, does free the pointer on failure. That second behavior is generally less desirable because you must remember to set the pointer to NULL before goto done; or else you risk a double-free. I have not removed the latter variant simply because it's a lot of work, so there will still be two variants. In this PR, I improve the documentation and I rename those two so you can easily tell the difference.

Also, for both the strdup and realloc wrappers, we now have a "cli_safer_" prefix on them when they do not check the the allocation limit. When we do check the allocation limit, we instead use a "cli_max_" prefix.

Finally, for the macros which wrap the allocation-related functions, I've added a "_OR_GOTO_DONE" suffix to the macro names. This makes it clear what is special about the macros, and that error handling is built int. If the allocation fails, then we goto done;.

In summary, the functions and macros are now:

void *cli_max_malloc(size_t nmemb);
void *cli_max_calloc(size_t nmemb, size_t size);
void *cli_max_realloc(void *ptr, size_t size);
void *cli_max_realloc_or_free(void *ptr, size_t size);
void *cli_safer_realloc(void *ptr, size_t size);
void *cli_safer_realloc_or_free(void *ptr, size_t size);
char *cli_safer_strdup(const char *s);

#define CLI_SAFER_STRDUP_OR_GOTO_DONE(buf, var, ...) \
...
#define CLI_FREE_AND_SET_NULL(var) \
...
#define CLI_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_MAX_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_MAX_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_VERIFY_POINTER_OR_GOTO_DONE(ptr, ...) \
...
#define CLI_MAX_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...
#define CLI_SAFER_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...

libclamav/bytecode_api.c

libclamav/hashtab.c

libclamav/nsis/bzlib.c

libclamav/pe_icons.c

clamonacc/inotif/inotif.c

There are a large number of allocations for fix sized buffers using the `cli_malloc` and `cli_calloc` calls that check if the requested size is larger than our allocation threshold for allocations based on untrusted input. These allocations will *always* be higher than the threshold, so the extra stack frame and check for these calls is a waste of CPU. This commit replaces needless calls with A -> B: - cli_malloc -> malloc - cli_calloc -> calloc - CLI_MALLOC -> MALLOC - CLI_CALLOC -> CALLOC I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by the max-allocation threshold, when MMAP is found/enabled. But the alternative was set to cli_malloc / cli_calloc when disabled. I changed those as well. I didn't change the cli_realloc/2 calls because our version of realloc not only implements a threshold but also stabilizes the undefined behavior in realloc to protect against accidental double-free's. It may be worth implementing a cli_realloc that doesn't have the threshold built-in, however, so as to allow reallocaitons for things like buffers for loading signatures, which aren't subject to the same concern as allocations for scanning possible malware. There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC, because it appears to be allocating based on untrusted input.

We have some special functions to wrap malloc, calloc, and realloc to make sure we don't allocate more than some limit, similar to the max-filesize and max-scansize limits. Our wrappers are really only needed when allocating memory for scans based on untrusted user input, where a scan file could have bytes that claim you need to allocate some ridiculous amount of memory. Right now they're named: - cli_malloc - cli_calloc - cli_realloc - cli_realloc2 ... and these names do not convey their purpose This commit renames them to: - cli_max_malloc - cli_max_calloc - cli_max_realloc - cli_max_realloc2 The realloc ones also have an additional feature in that they will not free your pointer if you try to realloc to 0 bytes. Freeing the memory is undefined by the C spec, and only done with some realloc implementations, so this stabilizes on the behavior of not doing that, which should prevent accidental double-free's. So for the case where you may want to realloc and do not need to have a maximum, this commit adds the following functions: - cli_safer_realloc - cli_safer_realloc2 These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when MPOOL is disabled (e.g. because mmap-support is not found), so as to match the behavior in the mpool_realloc/2 functions that do not make use of the allocation-limit.

Should be safe to allocate a smaller string than one we already have.

The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions provide a way to protect against allocating too much memory when the size of the allocation is derived from the untrusted input. Specifically, we worry about values in the file being scanned being manipulated to exhaust the RAM and crash the application. There is no need to check the limits if the size of the allocation is fixed, or if the size of the allocation is necessary for signature loading, or the general operation of the applications. E.g. checking the max-allocation limit for the size of a hash, or for the size of the scan recursion stack, is a complete waste of time. Although we significantly increased the max-allocation limit in a recent release, it is best not to check an allocation if the allocation will be safe. It would be a waste of time. I am also hopeful that if we can reduce the number allocations that require a limit-check to those that require it for the safe scan of a file, then eventually we can store the limit in the scan- context, and make it configurable.

Some sort of code merge way-back-when resulted in two identical max-allocation checks. I removed the noisy ones.

A code merge resulted in a duplicate copy of the CLI_STRDUP macro. Also fixed formatting.

Bytecode signature's are able to allocate buffers, but should probably adhere to clamav's max allocation limit. This adds a check to make sure they don't accidentally alloc too much based on untrusted user input.

Allocations for bytecode signatures to work need not check against the memory allocation limit, as bytecode signatures are considered trusted user input. You may note that I did not remove allocation limits from the bytecode API functions that may be called by the signatures such as adding json objects, hashsets, lzma and bz2 decompressors, etc. This is because it is likely that a bytecode signature may call them more times based on the structure of the file being scanned - particularly for the json objects.

Variables like the number of signature parts are considered trusted user input and so allocations based on those values need not check the memory allocation limit. Specifically for the allocation of the normalized buffer in cli_scanscript, I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen comes from the signature load and is therefore also trusted, so we do not need to check the allocation limit.

We add the _OR_GOTO_DONE suffix to the macros that go to done if the allocation fails. This makes it obvious what is different about the macro versus the equivalent function, and that error handling is built-in. Renamed the cli_strdup to safer_strdup to make it obvious that it exists because it is safer than regular strdup. Regular strdup doesn't have the NULL check before trying to dup, and so may result in a NULL-deref crash. Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the NULL-check is preferred.

- Update from version 1.3.2 to 1.4.1 - Update of rootfile - Changelog 1.4.1 ClamAV 1.4.1 is a critical patch release with the following fixes: - [CVE-2024-20506](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-20506): Changed the logging module to disable following symlinks on Linux and Unix systems so as to prevent an attacker with existing access to the 'clamd' or 'freshclam' services from using a symlink to corrupt system files. This issue affects all currently supported versions. It will be fixed in: - 1.4.1 - 1.3.2 - 1.0.7 - 0.103.12 Thank you to Detlef for identifying this issue. - [CVE-2024-20505](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-20505): Fixed a possible out-of-bounds read bug in the PDF file parser that could cause a denial-of-service (DoS) condition. This issue affects all currently supported versions. It will be fixed in: - 1.4.1 - 1.3.2 - 1.0.7 - 0.103.12 Thank you to OSS-Fuzz for identifying this issue. - Removed unused Python modules from freshclam tests including deprecated 'cgi' module that is expected to cause test failures in Python 3.13. 1.4.0 Major changes - Added support for extracting ALZ archives. The new ClamAV file type for ALZ archives is `CL_TYPE_ALZ`. Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html) option to enable or disable ALZ archive support. > _Tip_: DCONF (Dynamic CONFiguration) is a feature that allows for some > configuration changes to be made via ClamAV `.cfg` "signatures". - [GitHub pull request](Cisco-Talos/clamav#1183) - Added support for extracting LHA/LZH archives. The new ClamAV file type for LHA/LZH archives is `CL_TYPE_LHA_LZH`. Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html) option to enable or disable LHA/LZH archive support. - [GitHub pull request](Cisco-Talos/clamav#1192) - Added the ability to disable image fuzzy hashing, if needed. For context, image fuzzy hashing is a detection mechanism useful for identifying malware by matching images included with the malware or phishing email/document. New ClamScan options: ``` --scan-image[=yes(*)/no] --scan-image-fuzzy-hash[=yes(*)/no] ``` New ClamD config options: ``` ScanImage yes(*)/no ScanImageFuzzyHash yes(*)/no ``` New libclamav scan options: ```c options.parse &= ~CL_SCAN_PARSE_IMAGE; options.parse &= ~CL_SCAN_PARSE_IMAGE_FUZZY_HASH; ``` Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html) option to enable or disable image fuzzy hashing support. - [GitHub pull request](Cisco-Talos/clamav#1186) Other improvements - Added cross-compiling instructions for targeting ARM64/aarch64 processors for [Windows](https://github.com/Cisco-Talos/clamav/blob/main/INSTALL-cross-windows-arm64.md) and [Linux](https://github.com/Cisco-Talos/clamav/blob/main/INSTALL-cross-linux-arm64.md). - [GitHub pull request](Cisco-Talos/clamav#1116) - Improved the Freshclam warning messages when being blocked or rate limited so as to include the Cloudflare Ray ID, which helps with issue triage. - [GitHub pull request](Cisco-Talos/clamav#1195) - Removed unnecessary memory allocation checks when the size to be allocated is fixed or comes from a trusted source. We also renamed internal memory allocation functions and macros, so it is more obvious what each function does. - [GitHub pull request](Cisco-Talos/clamav#1137) - Improved the Freshclam documentation to make it clear that the `--datadir` option must be an absolute path to a directory that already exists, is writable by Freshclam, and is readable by ClamScan and ClamD. - [GitHub pull request](Cisco-Talos/clamav#1199) - Added an optimization to avoid calculating the file hash if the clean file cache has been disabled. The file hash may still be calculated as needed to perform hash-based signature matching if any hash-based signatures exist that target a file of the same size, or if any hash-based signatures exist that target "any" file size. - [GitHub pull request](Cisco-Talos/clamav#1167) - Added an improvement to the SystemD service file for ClamOnAcc so that the service will shut down faster on some systems. - [GitHub pull request](Cisco-Talos/clamav#1164) - Added a CMake build dependency on the version map files so that the build will re-run if changes are made to the version map files. Work courtesy of Sebastian Andrzej Siewior. - [GitHub pull request](Cisco-Talos/clamav#1294) - Added an improvement to the CMake build so that the RUSTFLAGS settings are inherited from the environment. Work courtesy of liushuyu. - [GitHub pull request](Cisco-Talos/clamav#1301) Bug fixes - Silenced confusing warning message when scanning some HTML files. - [GitHub pull request](Cisco-Talos/clamav#1252) - Fixed minor compiler warnings. - [GitHub pull request](Cisco-Talos/clamav#1197) - Since the build system changed from Autotools to CMake, ClamAV no longer supports building with configurations where bzip2, libxml2, libz, libjson-c, or libpcre2 are not available. Libpcre is no longer supported in favor of libpcre2. In this release, we removed all the dead code associated with those unsupported build configurations. - [GitHub pull request](Cisco-Talos/clamav#1217) - Fixed assorted typos. Patch courtesy of RainRat. - [GitHub pull request](Cisco-Talos/clamav#1228) - Added missing documentation for the ClamScan `--force-to-disk` option. - [GitHub pull request](Cisco-Talos/clamav#1186) - Fixed an issue where ClamAV unit tests would prefer an older libclamunrar_iface library from the install path, if present, rather than the recently compiled library in the build path. - [GitHub pull request](Cisco-Talos/clamav#1258) - Fixed a build issue on Windows with newer versions of Rust. Also upgraded GitHub Actions imports to fix CI failures. Fixes courtesy of liushuyu. - [GitHub pull request](Cisco-Talos/clamav#1307) - Fixed an unaligned pointer dereference issue on select architectures. Fix courtesy of Sebastian Andrzej Siewior. - [GitHub pull request](Cisco-Talos/clamav#1293) - Fixed a bug that prevented loading plaintext (non-CVD) signature files when using the `--fail-if-cvd-older-than=DAYS` / `FailIfCvdOlderThan` option. Fix courtesy of Bark. - [GitHub pull request](Cisco-Talos/clamav#1309) Signed-off-by: Adolf Belka <adolf.belka@ipfire.org> Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>

val-ms requested review from ragusaa and TheRaynMan January 10, 2024 00:08

github-advanced-security bot found potential problems Jan 10, 2024

View reviewed changes

val-ms force-pushed the remove-unnecessary-cli_-allocations branch from c4eac44 to 81449c5 Compare January 10, 2024 15:51

val-ms mentioned this pull request Jan 12, 2024

Unable to load massive databases #1118

Open

val-ms force-pushed the remove-unnecessary-cli_-allocations branch from 81449c5 to 1d09996 Compare February 28, 2024 21:02

ragusaa suggested changes Mar 7, 2024

View reviewed changes

clamonacc/inotif/inotif.c Show resolved Hide resolved

clamonacc/inotif/inotif.c Show resolved Hide resolved

clamonacc/inotif/inotif.c Show resolved Hide resolved

val-ms requested a review from ragusaa March 12, 2024 16:22

ragusaa approved these changes Mar 12, 2024

View reviewed changes

val-ms added 12 commits March 15, 2024 10:28

Use regular malloc instead of max_alloc for utf16->8 conversion

120ad48

Should be safe to allocate a smaller string than one we already have.

Remove duplicate max-alloc checks for lzma and 7z alloc functions

42145e8

Some sort of code merge way-back-when resulted in two identical max-allocation checks. I removed the noisy ones.

Remove duplicate copy of CLI_STRDUP macro

5255d67

A code merge resulted in a duplicate copy of the CLI_STRDUP macro. Also fixed formatting.

Add max-allocation limit to bytecode API's malloc function

6d354ca

Bytecode signature's are able to allocate buffers, but should probably adhere to clamav's max allocation limit. This adds a check to make sure they don't accidentally alloc too much based on untrusted user input.

Fix GitHub code scan issues

143040c

Clang-format touchup

b3c9a56

val-ms force-pushed the remove-unnecessary-cli_-allocations branch from 1d09996 to b3c9a56 Compare March 15, 2024 14:39

val-ms merged commit 1e5ddef into Cisco-Talos:main Mar 15, 2024
22 of 24 checks passed

val-ms deleted the remove-unnecessary-cli_-allocations branch March 15, 2024 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

val-ms commented Jan 10, 2024

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

Conversation

val-ms commented Jan 10, 2024