-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semgrep Nan Injection codemod #758
Conversation
795e78b
to
5f981a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is a pretty niche codemod. I do think we need a bit of additional testing if we want to enable it, just to make sure that we don't introduce weird/bad code.
from core_codemods.semgrep.semgrep_nan_injection import SemgrepNanInjection | ||
|
||
|
||
class TestSemgrepNanInjection(BaseSASTCodemodTest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some additional cases where the float(potential_nan)
is used in a context other than simple assignment.
I also think we should add test cases for the other type casts that are identified by this Semgrep rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more testing for cases that semgrep triggers the rule.
30e20fe
to
d3582f2
Compare
def _get_var_in_call(self, node: cst): | ||
match node.args[0].value: | ||
case cst.Name(): | ||
# float(var) | ||
return node.args[0].value.value | ||
case cst.BinaryOperation(): | ||
# bool(float(var)), complex(float(var)), etc | ||
return node.args[0].value.left.args[0].value.value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird.
The type hint for node
should be cst.Call
.
The case for cst.BinaryOperation()
does not match the the intention, at least judging from the comment above. Do you mean cst.Call
?
If that's the case, you should account for deeper nestings (e.g. float(complex(int(...)))
). Does the semgrep rule always guarantee a variable name there? It could be an expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, updated the type hint.
BinaryOperation because when nested cst really does interpret it as a binaryoperation, for all calls of bool, int, float, and complex. I checked them all. this is incorrect
I'll check the last part you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so checking in the playground
- looks like infinite nesting is a possibility ...
complex(bool(float(tid))
will match. But this only matches as long as all funcs are one of complex, bool, float. Note thepattern-sanitizers
, that means any other function will sanitize this away. - I do need to handle the non-var case for
float(request.POST.get("tid"))
, which is annoying it wasn't a semgrep example.
I think the reasonable thing to do here is to not implement #1, just test and not to it, but to do #2. Any other pattern won't be fixed for now. I'll try to do both :)
04f0b49
to
5bdd002
Compare
alright, so I've updated and improved this codemod to handle as many cases as I could get to fire in semgrep. There are probably more, but in those cases we won't do anything. |
Co-authored-by: Dan D'Avella <dan.davella@pixee.ai>
Use Assignment Expression (Walrus) In Conditional Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>
5bdd002
to
463ca4d
Compare
addressed review and improved codemod
Quality Gate passedIssues Measures |
This codemod replaces typecast call at a line reported by semgrep (which is a sink of a request source)
x = float(uuid)
and wraps it in an if/else statement to check if the value could be nan.
Notice that this new transformer is specific to semgrep results. It wouldn't work stand-alone. I also decided to not handle other edge cases since this is likely a codemod that won't be used much. If you think of something absolutely necessary we do need, I can add it.
Closes #691