Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix oss-fuzz case 55714 #3476

Merged
merged 9 commits into from
Feb 8, 2023
Merged

fix oss-fuzz case 55714 #3476

merged 9 commits into from
Feb 8, 2023

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Feb 6, 2023

This issue, discovered by oss-fuzz, can only happen with the following conditions :

  • Support for legacy format v0.3 enabled (note: support for format 0.3 is disabled by default)
  • compiled in 32-bit mode (not reproduced in 64-bit mode)

that is to say, this scenario is very unlikely to be reproducible in the wild.
Fixing it nonetheless for completeness.

The fix is trivial, though it's likely going to cost some performance. But at this stage, it's fair to say we don't care about the performance of the deprecated legacy decoder v0.3 anymore. There shouldn't be any data left using this old short lived legacy format.

@@ -2711,7 +2711,7 @@ static size_t ZSTD_execSequence(BYTE* op,
if (litEnd > litLimit) return ERROR(corruption_detected); /* overRead beyond lit buffer */

/* copy Literals */
ZSTD_wildcopy(op, *litPtr, sequence.litLength); /* note : oLitEnd <= oend-8 : no risk of overwrite beyond oend */
ZSTD_memmove(op, *litPtr, sequence.litLength); /* note : used to be wildCopy, changed to fix a bug in 32-bit mode (oss-fuzz case 55714) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fix is sufficient. We're supposed to have 8 extra bytes in the litBuffer, so this should be valid as long as litEnd <= litLimit.

I think the problem we're running into is that the limit checks should be re-written to avoid overflow. We don't care about speed, so we should do the obviously correct thing.

The offset check also looks janky, so we should clean that up as well.

E.g.

size_t const seqLength = sequence.litLength + sequence.matchLength;

if (seqLength > (size_t)(oend - op)) return ERROR(dstSize_tooSmall);
if (sequence.litLength > (size_t)(litLimit -*litPtr)) return ERROR(corruption_detected);
/* Now we know we don't have any overflow in literal / match lengths, can use the pointer check for oend_8*/
if (oLitEnd > oend_8) return ERROR(dstSize_tooSmall);
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);

This bug is present in every version of the legacy decoder, so we'll have to fix every one.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small changes left. And I think we also need to add the same checks to zstd_v07.c

if (endMatch > oend) return ERROR(dstSize_tooSmall); /* overwrite beyond dst buffer */
if (litEnd > litLimit) return ERROR(corruption_detected);
if (sequence.matchLength > (size_t)(*litPtr-op)) return ERROR(dstSize_tooSmall); /* overwrite literal segment */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to keep this check, because the literals are put in the output buffer, but oend isn't reduced.

if (sequence.litLength > (size_t)(litLimit - *litPtr)) return ERROR(corruption_detected);
/* Now we know there are no overflow in literal nor match lengths, can use pointer checks */
if (oLitEnd > oend_8) return ERROR(dstSize_tooSmall);
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have extDict starting in v0.4. But the offset check below looks correct, so we can just delete this starting in this version.

Suggested change
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);

if (sequence.litLength > (size_t)(litLimit - *litPtr)) return ERROR(corruption_detected);
/* Now we know there are no overflow in literal nor match lengths, can use pointer checks */
if (oLitEnd > oend_8) return ERROR(dstSize_tooSmall);
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);

if (sequence.litLength > (size_t)(litLimit - *litPtr)) return ERROR(corruption_detected);
/* Now we know there are no overflow in literal nor match lengths, can use pointer checks */
if (oLitEnd > oend_8) return ERROR(dstSize_tooSmall);
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (sequence.offset > (U32)(oLitEnd - base)) return ERROR(corruption_detected);

@terrelln
Copy link
Contributor

terrelln commented Feb 7, 2023

@Cyan4973 the version compatibility test did catch the bug in v0.5 in the dictionary decompression tests. We don't test v0.4 and earlier in those tests though.

Cyan4973 and others added 7 commits February 7, 2023 13:55
impacts legacy decoder v0.3 in 32-bit mode
in case it would be applicable here too.
slightly different constraints on end of buffer conditions
in case it would be applicable for this legacy version too.
in case it would be applicable for this version too
in case it would applicable for this version
@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Feb 7, 2023

v0.4 legacy decoder and above can be easily tested, using zstd -t.
The update (published above) is now able to decode to decode valid frames.

Below v0.3, this is more complex though, due to the lack of streaming capability.
For these cases, I modified the benchmark module, to trigger the one-pass decompression function and replace the size-query functions.

With this modified benchmark module (not published here), I could verify that the fixes for legacy decoders v0.1, v0.2 and v0.3 are still able to decode valid frames (generated with the corresponding zstd versions).

which uses a different technique to store literals,
and therefore must check for potential overwrites.
@terrelln
Copy link
Contributor

terrelln commented Feb 7, 2023

@Cyan4973 do we also need to fix v0.7?

@terrelln
Copy link
Contributor

terrelln commented Feb 7, 2023

Otherwise the PR looks good!

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Feb 7, 2023

@Cyan4973 do we also need to fix v0.7?

Well, it's not clear if that would be useful.
Previous versions v0.4 v0.5 v0.6 have been pre-emptively fixed on the ground of similarities with v0.3, but we are not even sure if there was really a problem.
On the other hand, execSequence in v0.7 looks significantly different.
And it also has been subject to much more intensive fuzzing (than v0.3).

@terrelln
Copy link
Contributor

terrelln commented Feb 8, 2023

@Cyan4973 I see the same potential 32-bit pointer overflows in v0.7:

zstd/lib/legacy/zstd_v07.c

Lines 3555 to 3556 in df21ace

if ((oLitEnd>oend_w) | (oMatchEnd>oend)) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
if (iLitEnd > litLimit) return ERROR(corruption_detected); /* over-read beyond lit buffer */

v0.7 has been subject to much more intensive fuzzing up to now.

We've built all of our OSS-Fuzz fuzzers with all legacy versions since 2019. But we haven't run any 32-bit fuzzers, where this issue could occur, until just now.

@terrelln
Copy link
Contributor

terrelln commented Feb 8, 2023

we are not even sure if there was really a problem

There is definitely a pointer overflow bug in 32-bit mode. We have a similar fix in dev, but I guess we didn't apply it to all of our legacy decoders:

(MEM_32bits() && (size_t)(oend - op) < sequenceLength + WILDCOPY_OVERLENGTH)))

The bug in v0.3 is occurring because of a pointer overflow. It happens that v0.3 allows very large literal lengths (in the fuzzed example it is 1965917), which makes it easier to happen. But AFAIK there is no reason that the output or literals pointer cannot be within 128 KB of the end of the address space.

I'm also wondering if we need to guard against the end of the literals buffer being within 128KB of the end of the address space in dev. We can consider that in a separate PR though.

@terrelln
Copy link
Contributor

terrelln commented Feb 8, 2023

Versions v0.4 and v0.5 definitely have the same bug, because they both allow literal/match lengths up to ~2^24.

v0.6 onwards do cap lengths at ~2^17, so aren't as susceptible, but we should definitely still fix it.

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Feb 8, 2023

OK, checks of v0.7 have been modified in a way which should be independent of address space overflow.

@Cyan4973 Cyan4973 merged commit 488f7c0 into dev Feb 8, 2023
@Cyan4973 Cyan4973 deleted the fix55714 branch February 10, 2023 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants