Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement the int conversion overflow logic to handle bound checks #1194

Merged
merged 21 commits into from
Sep 4, 2024

Conversation

czechbol
Copy link
Contributor

@czechbol czechbol commented Aug 27, 2024

As per discussion in #1187, the implementation was lacking. This PR extends the bounds check logic to validate that the integer size is checked within the bounds of the destination type.

I'd like to hear @rittneje's opinion on the changes as they reported the issues with it and @ben-krieger's too since he is already looking into it as well.

fixes #1187
fixes #1202

@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 81.65138% with 40 lines in your changes missing coverage. Please review.

Project coverage is 68.48%. Comparing base (ea5b276) to head (9f0427a).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
analyzers/conversion_overflow.go 81.65% 22 Missing and 18 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1194      +/-   ##
==========================================
+ Coverage   67.84%   68.48%   +0.64%     
==========================================
  Files          75       75              
  Lines        4186     4360     +174     
==========================================
+ Hits         2840     2986     +146     
- Misses       1210     1226      +16     
- Partials      136      148      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ben-krieger ben-krieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on how I might make the code a bit more clear. Feel free to completely ignore.

I did test this out on my codebase locally and it removed the same false negatives as my PR #1193. Therefore, I will be closing my PR in favor of yours.

analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
analyzers/conversion_overflow.go Outdated Show resolved Hide resolved
@rittneje
Copy link

@czechbol Unfortunately, I've never worked with the Go parser/AST before, so I won't be much help.

@czechbol
Copy link
Contributor Author

I apologise for the email spam. Got my github accounts mixed up.

Everything should now be accounted for apart from this comment - #1194 (comment).

I'm still thinking through how to tackle it.

@ccojocar
Copy link
Member

Everything should now be accounted for apart from this comment - #1194 (comment).

I'm still thinking through how to tackle it.

@czechbol Thanks for addressing the review comments. Please just let me know when this is ready for final review.

@ccojocar
Copy link
Member

I think this requires a rebase, since there are some conflicts in testutils/g115_samples.go which need to be resolved.

@ben-krieger
Copy link
Contributor

Unfortunately with the latest changes, I've seen two regressions running gosec on my private codebase.

First, is a case of:

	xbLen := len(xb)
	if xbLen > math.MaxUint16 {
		return nil, fmt.Errorf("x contains too many bytes")
	}

	b = binary.BigEndian.AppendUint16(b, uint16(xbLen))

Second is a case of:

	numEntries := len(ov.Entries)
	if numEntries > math.MaxUint8 {
		return nil, fmt.Errorf(...)
	}

	_ = uint8(numEntries)

I'm guessing this is because int -> uintX doesn't guarantee that the value is > 0. However, len semantically does... This may be a little hard to check for, but I think it's worthwhile.

@czechbol
Copy link
Contributor Author

I think this requires a rebase, since there are some conflicts in testutils/g115_samples.go which need to be resolved.

sure, that is not an issue

I'm guessing this is because int -> uintX doesn't guarantee that the value is > 0. However, len semantically does... This may be a little hard to check for, but I think it's worthwhile.

I'll look into it too after I figure out the conditional stuff.

@czechbol czechbol force-pushed the improvement/int-overflow-logic branch from ae6860c to 529f6b4 Compare August 28, 2024 14:35
@czechbol
Copy link
Contributor Author

czechbol commented Aug 28, 2024

This code is definitely not final. It will most likely need some refactoring after the following cases are handled as well.

I currently have some way to handle AND and OR conditions in the if statements. Since the SSA does not give me a nice way to check the entire if statement and breaks it up into component instructions, I need to look through them to determine their relationship. If you have suggestions how to do it differently, please let me know.

Currently unhandled cases that I'm aware of:

  • Convert instruction inside the if block. Even if I'm not a fan of this code style, the false positives should be resolved.
  • NOT operation within the conditions.

I hope to be able to solve these in the coming days. Some help is also welcome if anyone wants to get this done sooner.

@rittneje
Copy link

@czechbol I am currently getting a panic on the following input:

package main

import "math"

func foo(x []string) int16 {
	if len(x) < math.MaxInt16 {
		return int16(len(x))
	}
	return 0
}

func main() {

}

panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/securego/gosec/v2/analyzers.checkIfForRangeCheck(0xc0004aa048, 0xc0008107d0, {0xf8?, 0x7f61505e3178?}, 0xc000cab570)
/home/rittneje/gosec/analyzers/conversion_overflow.go:211 +0x568
github.com/securego/gosec/v2/analyzers.hasExplicitRangeCheck(0xc0008107d0, {0xc000289cc0?, 0x5?})
/home/rittneje/gosec/analyzers/conversion_overflow.go:176 +0x2b9
github.com/securego/gosec/v2/analyzers.isSafeConversion(0xc0008107d0)
/home/rittneje/gosec/analyzers/conversion_overflow.go:93 +0xaa
github.com/securego/gosec/v2/analyzers.runConversionOverflow(0xc000190a80)
/home/rittneje/gosec/analyzers/conversion_overflow.go:54 +0x27e
github.com/securego/gosec/v2.(*Analyzer).CheckAnalyzers(0xc0007c6080, 0xc00032cd80)
/home/rittneje/gosec/analyzer.go:446 +0x4a2
github.com/securego/gosec/v2.(*Analyzer).Process(0xc0007c6080, {0x0, 0x0, 0x0}, {0xc0007d6c50, 0x1, 0x3d?})
/home/rittneje/gosec/analyzer.go:317 +0x488
main.main()
/home/rittneje/gosec/cmd/gosec/main.go:476 +0xde5

@rittneje
Copy link

rittneje commented Aug 28, 2024

@czechbol I haven't thought through it much, but I'm wondering if it would be better to keep track of the known minimum and maximum bounds themselves, instead of just flags. That way we have a good foundation for adding support for more interesting cases in the future. As a rough idea:

func foo(x int64) int16 {
    // Initially our "context" is that x is in the range [MinInt64, MaxInt64].

    if x > math.MaxInt16 {
        return math.MaxInt16
    }

    // Now because of the return, we can negate the if check and "apply" it to the context.
    // Thus x is in the range [MinInt64, MaxInt16].

    if x < math.MinInt16 {
        return math.MinInt16
    }

    // Again because of the return, we can negate the if check and "apply" it to the context.
    // Thus x is in the range [MinInt16, MaxInt16].

    // We know that x must be in range, so this conversion is safe.
    return int16(x)
}

In the future, I think this bookkeeping could help with situations involving arithmetic:

func foo(x uint64) uint16 {
   if x > math.MaxUint16 - 1 {
       return math.MaxUint16
   }
    return uint16(x + 1)
}

@ben-krieger
Copy link
Contributor

@czechbol I am currently getting a panic on the following input:

package main

import "math"

func foo(x []string) int16 {
	if len(x) < math.MaxInt16 {
		return int16(len(x))
	}
	return 0
}

func main() {

}

panic: runtime error: index out of range [1] with length 1
goroutine 1 [running]:
github.com/securego/gosec/v2/analyzers.checkIfForRangeCheck(0xc0004aa048, 0xc0008107d0, {0xf8?, 0x7f61505e3178?}, 0xc000cab570)
/home/rittneje/gosec/analyzers/conversion_overflow.go:211 +0x568
github.com/securego/gosec/v2/analyzers.hasExplicitRangeCheck(0xc0008107d0, {0xc000289cc0?, 0x5?})
/home/rittneje/gosec/analyzers/conversion_overflow.go:176 +0x2b9
github.com/securego/gosec/v2/analyzers.isSafeConversion(0xc0008107d0)
/home/rittneje/gosec/analyzers/conversion_overflow.go:93 +0xaa
github.com/securego/gosec/v2/analyzers.runConversionOverflow(0xc000190a80)
/home/rittneje/gosec/analyzers/conversion_overflow.go:54 +0x27e
github.com/securego/gosec/v2.(*Analyzer).CheckAnalyzers(0xc0007c6080, 0xc00032cd80)
/home/rittneje/gosec/analyzer.go:446 +0x4a2
github.com/securego/gosec/v2.(*Analyzer).Process(0xc0007c6080, {0x0, 0x0, 0x0}, {0xc0007d6c50, 0x1, 0x3d?})
/home/rittneje/gosec/analyzer.go:317 +0x488
main.main()
/home/rittneje/gosec/cmd/gosec/main.go:476 +0xde5

Putting the panic aside, we actually won't be able to allow this conversion. SSA IR will have each len(x) as a separate value. Therefore, the latter value which is converted won't be considered upper-bound-checked, since the bound check was actually on a different value.

I imagine that this case will frustrate a lot of people with gosec, but there's nothing we can do. Realizing that both len calls will have the same value is a compiler optimization at best, since I'm pretty sure you can mutate a slice's length using reflection.

@czechbol
Copy link
Contributor Author

czechbol commented Aug 28, 2024

I am currently getting a panic on the following input

@rittneje Yes, I know there's panics, that's why I said I "have some way" of handling conditions, that Converts inside if blocks don't work and that it's not ready to be merged yet. There still is work to do, but I'm slowly covering more test cases and want to share the changes with you. If you find any other curious test cases, feel free to suggest them or open a PR to my branch.

I'll be honest, I definitely didn't fully understand what I was getting myself into when I opened #1189 but I'm in too deep now and I still want to fix all the false positives I can.

If what @ben-krieger says is true then it's a shame but what can we do.
Perhaps we could push this PR into a good enough middle ground solution and iterate from there.

In the meantime I'm switching this PR to a Draft.

@czechbol czechbol marked this pull request as draft August 28, 2024 22:27
@ben-krieger
Copy link
Contributor

I am currently getting a panic on the following input

@rittneje Yes, I know there's panics, that's why I said I "have some way" of handling conditions, that Converts inside if blocks don't work and that it's not ready to be merged yet. There still is work to do, but I'm slowly covering more test cases and want to share the changes with you. If you find any other curious test cases, feel free to suggest them or open a PR to my branch.

I'll be honest, I definitely didn't fully understand what I was getting myself into when I opened #1189 but I'm in too deep now and I still want to fix all the false positives I can.

If what @ben-krieger says is true then it's a shame but what can we do. Perhaps we could push this PR into a good enough middle ground solution and iterate from there.

In the meantime I'm switching this PR to a Draft.

@czechbol I also feel like I jumped into something out of my depth. I was absolutely nerd-sniped.

But you've done a great job! In fact, I'm quite impressed with what 3 unprepared devs could do to significantly tackle these false positives.

In my opinion, the main thing holding us back is not even our experience with SSA or language analysis, but the fact that this probably really requires complex inspection of the abstract syntax tree to understand things like "does a failed bounds check result in conversion code being unreachable" and "does a possibly complex boolean expression check either bound, none, or both?" This is no doubt difficult even for developers of compilers!

So I think we should just take a few more good looks at what we can do here, document the most common false positives (and the couple false negatives we will introduce), and ship it.

I know this PR has merit, because it saves me from 22 obviously false positives in just one library I maintain. This is good work!

@ccojocar
Copy link
Member

ccojocar commented Aug 29, 2024

I currently have some way to handle AND and OR conditions in the if statements. Since the SSA does not give me a nice way to check the entire if statement and breaks it up into component instructions, I need to look through them to determine their relationship. If you have suggestions how to do it differently, please let me know.

@czechbol I think from If instruction you can get the Block to which it belongs to, and within that Block you have all outer scope instructions, which you can reference from the Cond instruction. Also you can get all function instructions block via the Parent(). It is more complicated if you want to track back through multiple function calls since the current SSA implementation doesn't provide the graph outside of the package.

I would focus to cover the most common uses cases at the function level. Validation outside of the function block can be addressed later.

To simplify, I would track the bounds and keep their values through different if checks similar with @rittneje suggested in #1194 (comment). The check can be then performed only once at the conversion time.

I would approach something like:

  • detect a conversion extract the variable reference
  • run a bounds analysis into the function instructions and track the values of the bounds through various if checks, or parse function calls
  • verify the overflow based on the type size and current bounds value at that point of conversion

@czechbol
Copy link
Contributor Author

czechbol commented Aug 30, 2024

Now this is an interesting corner I've gotten myself into. I've involuntarily created a situation where the range check is an insufficient solution.

If someone wants to explicitly check a value with an equals operator, I currently have no way of handling that.

@ccojocar What do you think? Should I consider this out of scope of this PR?

@czechbol czechbol force-pushed the improvement/int-overflow-logic branch from 749ac90 to ade882a Compare August 30, 2024 23:11
@ccojocar
Copy link
Member

ccojocar commented Sep 1, 2024

If someone wants to explicitly check a value with an equals operator, I currently have no way of handling that.
What do you think? Should I consider this out of scope of this PR?

It sounds good to me to handle it later since the current implementation in converting quite a good chunk of use cases. A good solution is better than a perfect solution which is not implemented.

czechbol and others added 15 commits September 4, 2024 13:57
Signed-off-by: czechbol <adamludes@gmail.com>
Co-authored-by: Ben Krieger <ben.krieger@intel.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Change-Id: I8da6495eaaf25b1739389aa98492bd7df338085b
Signed-off-by: Cosmin Cojocar <ccojocar@google.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Signed-off-by: czechbol <adamludes@gmail.com>
Change-Id: I0db56cb0a5f9ab6e815e2480ec0b66d7061b23d3
Signed-off-by: Cosmin Cojocar <ccojocar@google.com>
@ccojocar ccojocar force-pushed the improvement/int-overflow-logic branch from b131cc1 to 9f0427a Compare September 4, 2024 14:01
@ccojocar
Copy link
Member

ccojocar commented Sep 4, 2024

golangci-lint keeps complaining about a potential overflow but if I build gosec locally it doesn't flag anything. I'd disregard that action for now as they're still using the old version of gosec.

I ignored the warning for now.

@ccojocar ccojocar merged commit eaedce9 into securego:master Sep 4, 2024
6 checks passed
@czechbol czechbol deleted the improvement/int-overflow-logic branch September 4, 2024 14:11
@stephenc
Copy link

stephenc commented Sep 4, 2024

Does this also allow us to signal intentional truncation without a bounds check, e.g. uint16(x &0xffff)

@ben-krieger
Copy link
Contributor

Does this also allow us to signal intentional truncation without a bounds check, e.g. uint16(x &0xffff)

No. Someone would need to take up a significant effort in order to support bounds checks that involve 1) arithmetic binary operations, 2) boolean logic operations, or 3) the min and max builtins from Go 1.21.

Perhaps we can create a ticket to track this? But I'm not sure who will sign up for that level of effort, so maybe it should be implemented across multiple PRs.

@czechbol
Copy link
Contributor Author

czechbol commented Sep 4, 2024

@stephenc I'm sorry I claimed that it has been resolved I only noticed the sentence about truncation after you commented here.

That being said I currently don't have the time capacity to continue working on this feature, I've already sank too much time into it. I hope you understand that.

I've pushed this PR to a point where most people can at least work around the issue somehow and the most common examples that we could come up with should be covered.

@stephenc
Copy link

stephenc commented Sep 4, 2024

@czechbol fixing the ranges is great. As you closed my issue perhaps you could just create a placeholder about adding support for bitmasks to signal intentional truncation so that it does not get lost. No rush to implement, I can live with // # nosec G115 for now

@stephenc
Copy link

stephenc commented Sep 4, 2024

Does this also allow us to signal intentional truncation without a bounds check, e.g. uint16(x &0xffff)

No. Someone would need to take up a significant effort in order to support bounds checks that involve 1) arithmetic binary operations, 2) boolean logic operations, or 3) the min and max builtins from Go 1.21.

Perhaps we can create a ticket to track this? But I'm not sure who will sign up for that level of effort, so maybe it should be implemented across multiple PRs.

I see those as all different cases. I think intentional truncation is always going to be of a fixed pattern... a mask that fits into the target type and a bitwise and... and for it to be intentional I would pair that with both being inside the cast... such a cast, for the all 1's case, should be optimized as simple truncation by the compiler and thus it is clearly signalling intent to truncate. as things currently stand I don't see how I can signal intentional truncation other than by disabling the rule on the line which could mask an unintended truncation

@ben-krieger
Copy link
Contributor

ben-krieger commented Sep 4, 2024

Does this also allow us to signal intentional truncation without a bounds check, e.g. uint16(x &0xffff)

No. Someone would need to take up a significant effort in order to support bounds checks that involve 1) arithmetic binary operations, 2) boolean logic operations, or 3) the min and max builtins from Go 1.21.
Perhaps we can create a ticket to track this? But I'm not sure who will sign up for that level of effort, so maybe it should be implemented across multiple PRs.

I see those as all different cases. I think intentional truncation is always going to be of a fixed pattern... a mask that fits into the target type and a bitwise and... and for it to be intentional I would pair that with both being inside the cast... such a cast, for the all 1's case, should be optimized as simple truncation by the compiler and thus it is clearly signalling intent to truncate. as things currently stand I don't see how I can signal intentional truncation other than by disabling the rule on the line which could mask an unintended truncation

The (ugly) way to get gosec to accept the conversion would be something like:

x = x&0xffff
if x < 0 || x > math.MaxUint16 {
    panic("unreachable")
}
_ = uint16(x)

I'm no compiler dev, but I expect that the entire if-block with panic would be pruned during compiling and not even have a runtime effect.

@czechbol
Copy link
Contributor Author

czechbol commented Sep 4, 2024

@ben-krieger, @stephenc, @rittneje, @ccojocar FYI I opened #1212 to compile all the false positives we can find in a single issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

G115: false positive for uintptr -> unsafe.Pointer G115 ignores bounds checks
6 participants