Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary memory allocation limit checks and rename max-allocation related functions #1137

Merged
merged 12 commits into from
Mar 15, 2024

Conversation

val-ms
Copy link
Contributor

@val-ms val-ms commented Jan 10, 2024

The goal of the ClamAV's functions like "cli_malloc" and "cli_realloc" are to check that the size of the allocation is not unreasonable. But we only need to do that check when the size is derived from untrusted user input (aka may vary based on values in the possibly malicious file that we're scanning).

There are many calls to allocate memory where the size of the allocation is either known at compile time or else is derived from trusted user input. Examples of variable but trusted allocation sizes include:

  • the number of parts of a signature
  • or the number of blocks in a bytecode function.

This PR seeks to eliminate those unnecessary max-allocation checks. This is mostly because checking the limits in these cases is a waste of CPU. But it also sets groundwork for a future improvement where we may pass in the memory allocation limit as a parameter, and make it generally configurable for the scanning service. This would be meaningful for anyone wishing to crank up the max-filesize to scan large archives that decompress to very large amounts of data.

Further, this PR renames the max-allocation functions and related functions and macros so it's obvious what they do. I suspect that the reason the likes of "cli_calloc" and "cli_realloc" were so widely used for fixed size allocations is that the developers knew we had a special wrapper for allocating memory and didn't know when they should be used.

What's also drawing attention to is the functions that wrap allocations but do NOT check the allocation-limit. These are cli_strdup, cli_realloc, and cli_realloc2. These functions are safer than the original strdup and realloc functions from the C "standard library".

  • For strdup, our wrapper adds a NULL check to prevent accidental NULL-deref crashes.
  • For realloc, the "standard library" one is not standard. Some variations may free the pointer if size == 0. Some do not. Ours wrapper does not. In this way, we know the behavior will be consistent. Our version also has two variants. The first, cli_realloc does not free the pointer on failure. The second, cli_realloc2, does free the pointer on failure. That second behavior is generally less desirable because you must remember to set the pointer to NULL before goto done; or else you risk a double-free. I have not removed the latter variant simply because it's a lot of work, so there will still be two variants. In this PR, I improve the documentation and I rename those two so you can easily tell the difference.

Also, for both the strdup and realloc wrappers, we now have a "cli_safer_" prefix on them when they do not check the the allocation limit. When we do check the allocation limit, we instead use a "cli_max_" prefix.

Finally, for the macros which wrap the allocation-related functions, I've added a "_OR_GOTO_DONE" suffix to the macro names. This makes it clear what is special about the macros, and that error handling is built int. If the allocation fails, then we goto done;.

In summary, the functions and macros are now:

void *cli_max_malloc(size_t nmemb);
void *cli_max_calloc(size_t nmemb, size_t size);
void *cli_max_realloc(void *ptr, size_t size);
void *cli_max_realloc_or_free(void *ptr, size_t size);
void *cli_safer_realloc(void *ptr, size_t size);
void *cli_safer_realloc_or_free(void *ptr, size_t size);
char *cli_safer_strdup(const char *s);

#define CLI_SAFER_STRDUP_OR_GOTO_DONE(buf, var, ...) \
...
#define CLI_FREE_AND_SET_NULL(var) \
...
#define CLI_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_MAX_MALLOC_OR_GOTO_DONE(var, size, ...) \
...
#define CLI_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_MAX_CALLOC_OR_GOTO_DONE(var, nmemb, size, ...) \
...
#define CLI_VERIFY_POINTER_OR_GOTO_DONE(ptr, ...) \
...
#define CLI_MAX_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...
#define CLI_SAFER_REALLOC_OR_GOTO_DONE(ptr, size, ...) \
...

@val-ms val-ms requested review from ragusaa and TheRaynMan January 10, 2024 00:08
libclamav/bytecode_api.c Fixed Show fixed Hide fixed
libclamav/hashtab.c Fixed Show fixed Hide fixed
libclamav/nsis/bzlib.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
libclamav/pe_icons.c Fixed Show fixed Hide fixed
@val-ms val-ms force-pushed the remove-unnecessary-cli_-allocations branch from c4eac44 to 81449c5 Compare January 10, 2024 15:51
@val-ms val-ms force-pushed the remove-unnecessary-cli_-allocations branch from 81449c5 to 1d09996 Compare February 28, 2024 21:02
clamonacc/inotif/inotif.c Show resolved Hide resolved
clamonacc/inotif/inotif.c Show resolved Hide resolved
clamonacc/inotif/inotif.c Show resolved Hide resolved
@val-ms val-ms requested a review from ragusaa March 12, 2024 16:22
val-ms added 12 commits March 15, 2024 10:28
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.

This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC

I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.

I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.

There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2

... and these names do not convey their purpose

This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2

The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.

So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2

These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
Should be safe to allocate a smaller string than one we already have.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.

There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.

Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.

I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
Some sort of code merge way-back-when resulted in two identical
max-allocation checks. I removed the noisy ones.
A code merge resulted in a duplicate copy of the CLI_STRDUP macro.
Also fixed formatting.
Bytecode signature's are able to allocate buffers, but should probably
adhere to clamav's max allocation limit.  This adds a check to make sure
they don't accidentally alloc too much based on untrusted user input.
Allocations for bytecode signatures to work need not check against the
memory allocation limit, as bytecode signatures are considered trusted
user input.

You may note that I did not remove allocation limits from the bytecode
API functions that may be called by the signatures such as adding json
objects, hashsets, lzma and bz2 decompressors, etc. This is because it
is likely that a bytecode signature may call them more times based on
the structure of the file being scanned - particularly for the json objects.
Variables like the number of signature parts are considered trusted user input
and so allocations based on those values need not check the memory allocation
limit.

Specifically for the allocation of the normalized buffer in cli_scanscript,
I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen
comes from the signature load and is therefore also trusted, so we do not
need to check the allocation limit.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.

Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.

Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
@val-ms val-ms force-pushed the remove-unnecessary-cli_-allocations branch from 1d09996 to b3c9a56 Compare March 15, 2024 14:39
@val-ms val-ms merged commit 1e5ddef into Cisco-Talos:main Mar 15, 2024
22 of 24 checks passed
@val-ms val-ms deleted the remove-unnecessary-cli_-allocations branch March 15, 2024 17:18
mtremer pushed a commit to ipfire/ipfire-2.x that referenced this pull request Jan 9, 2025
- Update from version 1.3.2 to 1.4.1
- Update of rootfile
- Changelog
    1.4.1
	ClamAV 1.4.1 is a critical patch release with the following fixes:
	- [CVE-2024-20506](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-20506):
	  Changed the logging module to disable following symlinks on Linux and Unix
	  systems so as to prevent an attacker with existing access to the 'clamd' or
	  'freshclam' services from using a symlink to corrupt system files.
	  This issue affects all currently supported versions. It will be fixed in:
	  - 1.4.1
	  - 1.3.2
	  - 1.0.7
	  - 0.103.12
	  Thank you to Detlef for identifying this issue.
	- [CVE-2024-20505](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-20505):
	  Fixed a possible out-of-bounds read bug in the PDF file parser that could
	  cause a denial-of-service (DoS) condition.
	  This issue affects all currently supported versions. It will be fixed in:
	  - 1.4.1
	  - 1.3.2
	  - 1.0.7
	  - 0.103.12
	  Thank you to OSS-Fuzz for identifying this issue.
	- Removed unused Python modules from freshclam tests including deprecated
	  'cgi' module that is expected to cause test failures in Python 3.13.
    1.4.0
      Major changes
	- Added support for extracting ALZ archives.
	  The new ClamAV file type for ALZ archives is `CL_TYPE_ALZ`.
	  Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html)
	  option to enable or disable ALZ archive support.
	  > _Tip_: DCONF (Dynamic CONFiguration) is a feature that allows for some
	  > configuration changes to be made via ClamAV `.cfg` "signatures".
	  - [GitHub pull request](Cisco-Talos/clamav#1183)
	- Added support for extracting LHA/LZH archives.
	  The new ClamAV file type for LHA/LZH archives is `CL_TYPE_LHA_LZH`.
	  Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html)
	  option to enable or disable LHA/LZH archive support.
	  - [GitHub pull request](Cisco-Talos/clamav#1192)
	- Added the ability to disable image fuzzy hashing, if needed. For context,
	  image fuzzy hashing is a detection mechanism useful for identifying malware
	  by matching images included with the malware or phishing email/document.
	  New ClamScan options:
	  ```
	  --scan-image[=yes(*)/no]
	  --scan-image-fuzzy-hash[=yes(*)/no]
	  ```
	  New ClamD config options:
	  ```
	  ScanImage yes(*)/no
	  ScanImageFuzzyHash yes(*)/no
	  ```
	  New libclamav scan options:
	  ```c
	  options.parse &= ~CL_SCAN_PARSE_IMAGE;
	  options.parse &= ~CL_SCAN_PARSE_IMAGE_FUZZY_HASH;
	  ```
	  Added a [DCONF](https://docs.clamav.net/manual/Signatures/DynamicConfig.html)
	  option to enable or disable image fuzzy hashing support.
	  - [GitHub pull request](Cisco-Talos/clamav#1186)
      Other improvements
	- Added cross-compiling instructions for targeting ARM64/aarch64 processors for
	  [Windows](https://github.com/Cisco-Talos/clamav/blob/main/INSTALL-cross-windows-arm64.md)
	  and
	  [Linux](https://github.com/Cisco-Talos/clamav/blob/main/INSTALL-cross-linux-arm64.md).
	  - [GitHub pull request](Cisco-Talos/clamav#1116)
	- Improved the Freshclam warning messages when being blocked or rate limited
	  so as to include the Cloudflare Ray ID, which helps with issue triage.
	  - [GitHub pull request](Cisco-Talos/clamav#1195)
	- Removed unnecessary memory allocation checks when the size to be allocated
	  is fixed or comes from a trusted source.
	  We also renamed internal memory allocation functions and macros, so it is
	  more obvious what each function does.
	  - [GitHub pull request](Cisco-Talos/clamav#1137)
	- Improved the Freshclam documentation to make it clear that the `--datadir`
	  option must be an absolute path to a directory that already exists, is
	  writable by Freshclam, and is readable by ClamScan and ClamD.
	  - [GitHub pull request](Cisco-Talos/clamav#1199)
	- Added an optimization to avoid calculating the file hash if the clean file
	  cache has been disabled. The file hash may still be calculated as needed to
	  perform hash-based signature matching if any hash-based signatures exist that
	  target a file of the same size, or if any hash-based signatures exist that
	  target "any" file size.
	  - [GitHub pull request](Cisco-Talos/clamav#1167)
	- Added an improvement to the SystemD service file for ClamOnAcc so that the
	  service will shut down faster on some systems.
	  - [GitHub pull request](Cisco-Talos/clamav#1164)
	- Added a CMake build dependency on the version map files so that the build
	  will re-run if changes are made to the version map files.
	  Work courtesy of Sebastian Andrzej Siewior.
	  - [GitHub pull request](Cisco-Talos/clamav#1294)
	- Added an improvement to the CMake build so that the RUSTFLAGS settings
	  are inherited from the environment.
	  Work courtesy of liushuyu.
	  - [GitHub pull request](Cisco-Talos/clamav#1301)
      Bug fixes
	- Silenced confusing warning message when scanning some HTML files.
	  - [GitHub pull request](Cisco-Talos/clamav#1252)
	- Fixed minor compiler warnings.
	  - [GitHub pull request](Cisco-Talos/clamav#1197)
	- Since the build system changed from Autotools to CMake, ClamAV no longer
	  supports building with configurations where bzip2, libxml2, libz, libjson-c,
	  or libpcre2 are not available. Libpcre is no longer supported in favor of
	  libpcre2. In this release, we removed all the dead code associated with those
	  unsupported build configurations.
	  - [GitHub pull request](Cisco-Talos/clamav#1217)
	- Fixed assorted typos. Patch courtesy of RainRat.
	  - [GitHub pull request](Cisco-Talos/clamav#1228)
	- Added missing documentation for the ClamScan `--force-to-disk` option.
	  - [GitHub pull request](Cisco-Talos/clamav#1186)
	- Fixed an issue where ClamAV unit tests would prefer an older
	  libclamunrar_iface library from the install path, if present, rather than
	  the recently compiled library in the build path.
	  - [GitHub pull request](Cisco-Talos/clamav#1258)
	- Fixed a build issue on Windows with newer versions of Rust.
	  Also upgraded GitHub Actions imports to fix CI failures.
	  Fixes courtesy of liushuyu.
	  - [GitHub pull request](Cisco-Talos/clamav#1307)
	- Fixed an unaligned pointer dereference issue on select architectures.
	  Fix courtesy of Sebastian Andrzej Siewior.
	  - [GitHub pull request](Cisco-Talos/clamav#1293)
	- Fixed a bug that prevented loading plaintext (non-CVD) signature files
	  when using the `--fail-if-cvd-older-than=DAYS` / `FailIfCvdOlderThan` option.
	  Fix courtesy of Bark.
	  - [GitHub pull request](Cisco-Talos/clamav#1309)

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants