Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semgrep Nan Injection codemod #758

Merged
merged 8 commits into from
Aug 7, 2024
Merged

Semgrep Nan Injection codemod #758

merged 8 commits into from
Aug 7, 2024

Conversation

clavedeluna
Copy link
Contributor

This codemod replaces typecast call at a line reported by semgrep (which is a sink of a request source)
x = float(uuid)
and wraps it in an if/else statement to check if the value could be nan.

Notice that this new transformer is specific to semgrep results. It wouldn't work stand-alone. I also decided to not handle other edge cases since this is likely a codemod that won't be used much. If you think of something absolutely necessary we do need, I can add it.

Closes #691

@clavedeluna clavedeluna marked this pull request as ready for review July 30, 2024 11:19
Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is a pretty niche codemod. I do think we need a bit of additional testing if we want to enable it, just to make sure that we don't introduce weird/bad code.

src/core_codemods/semgrep/semgrep_nan_injection.py Outdated Show resolved Hide resolved
from core_codemods.semgrep.semgrep_nan_injection import SemgrepNanInjection


class TestSemgrepNanInjection(BaseSASTCodemodTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some additional cases where the float(potential_nan) is used in a context other than simple assignment.

I also think we should add test cases for the other type casts that are identified by this Semgrep rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more testing for cases that semgrep triggers the rule.

Comment on lines 63 to 75
def _get_var_in_call(self, node: cst):
match node.args[0].value:
case cst.Name():
# float(var)
return node.args[0].value.value
case cst.BinaryOperation():
# bool(float(var)), complex(float(var)), etc
return node.args[0].value.left.args[0].value.value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird.
The type hint for node should be cst.Call.
The case for cst.BinaryOperation() does not match the the intention, at least judging from the comment above. Do you mean cst.Call?
If that's the case, you should account for deeper nestings (e.g. float(complex(int(...))) ). Does the semgrep rule always guarantee a variable name there? It could be an expression.

Copy link
Contributor Author

@clavedeluna clavedeluna Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, updated the type hint.
BinaryOperation because when nested cst really does interpret it as a binaryoperation, for all calls of bool, int, float, and complex. I checked them all. this is incorrect

I'll check the last part you mention.

Copy link
Contributor Author

@clavedeluna clavedeluna Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so checking in the playground

  1. looks like infinite nesting is a possibility ... complex(bool(float(tid)) will match. But this only matches as long as all funcs are one of complex, bool, float. Note the pattern-sanitizers, that means any other function will sanitize this away.
  2. I do need to handle the non-var case for float(request.POST.get("tid")), which is annoying it wasn't a semgrep example.

I think the reasonable thing to do here is to not implement #1, just test and not to it, but to do #2. Any other pattern won't be fixed for now. I'll try to do both :)

@clavedeluna
Copy link
Contributor Author

alright, so I've updated and improved this codemod to handle as many cases as I could get to fire in semgrep. There are probably more, but in those cases we won't do anything.

@clavedeluna clavedeluna dismissed drdavella’s stale review August 7, 2024 11:29

addressed review and improved codemod

Copy link

sonarcloud bot commented Aug 7, 2024

Quality Gate Passed Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

@clavedeluna clavedeluna added this pull request to the merge queue Aug 7, 2024
Merged via the queue into main with commit fb1badd Aug 7, 2024
13 checks passed
@clavedeluna clavedeluna deleted the semgrep-nan-inj branch August 7, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Codemod: nan-injection Semgrep
3 participants