-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nosey Parker Parser #9067
Merged
Merged
Nosey Parker Parser #9067
Changes from 35 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
a2d389d
Created _init_.py
tpat13 6052a25
Created parser.py
tpat13 94f0125
Merge branch 'DefectDojo:master' into nosey-parker
tpat13 be31c39
Update README.md (#9048)
devGregA 0aafba0
Fixing README links and formatting (#9022)
cneill 46928f8
Bump python-gitlab from 3.15.0 to 4.2.0 (#9064)
dependabot[bot] c0de28c
Bump fontawesomefree from 6.4.2 to 6.5.0 (#9074)
dependabot[bot] f8426ef
:tada: added humble #8988 (#8989)
manuel-sommer f503320
Bump social-auth-core from 4.5.0 to 4.5.1 (#9073)
dependabot[bot] 12c20c4
Adding subcomponent labels for celery beat and worker (#9078)
veneber e2885d5
Update rabbitmq Docker tag from 3.12.9 to v3.12.10 (docker-compose.ym…
renovate[bot] 55f5573
Update postgres:16.1-alpine Docker digest from 16.1 to 16.1-alpine (d…
renovate[bot] 4b80f78
Update redis:7.2.3-alpine Docker digest from 7.2.3 to 7.2.3-alpine (d…
renovate[bot] e8bad94
Bump boto3 from 1.29.7 to 1.33.5 (#9085)
dependabot[bot] 0a58937
Bump fontawesomefree from 6.5.0 to 6.5.1 (#9086)
dependabot[bot] cd83b00
Add logging statement for failed password reset validation logic (#9087)
Maffooch 9ee0605
Finding Template: Correct save ordering (#9088)
Maffooch b8c8d9d
Feature/parser jfrog xray binary scan (#9015)
renejal 1f36b42
Update postgres:16.1-alpine Docker digest from 16.1 to 16.1-alpine (d…
renovate[bot] 526319f
Nosey Parker Test Cases
tpat13 0ba7a9d
Updated Parser
tpat13 8763532
Bump cryptography from 41.0.5 to 41.0.7 (#9065)
dependabot[bot] da0e834
NoseyParker Parser Flake8 compliance
tpat13 71fa003
Merge branch 'dev' into nosey-parker
tpat13 16b97c2
Merge branch 'dev' into nosey-parker
tpat13 c7d99c1
NoseyParker fix for 0.16
tpat13 7160ae3
JSON lines fix
tpat13 001fbda
Merge branch 'dev' into nosey-parker
tpat13 3098b37
Nosey Parker Parser: v0.16 fix
tpat13 ee697dc
Comma for consistency
tpat13 5114806
Flake8 requirements
tpat13 a8c7358
Merge branch 'dev' into nosey-parker
tpat13 8d16815
Update docs/content/en/integrations/parsers/file/noseyparker.md
cneill 02f017a
Update dojo/tools/noseyparker/parser.py
cneill 068106a
Update docs/content/en/integrations/parsers/file/noseyparker.md
cneill 9219e44
Merge branch 'dev' into nosey-parker
tpat13 8cb6795
Removed example JSONL file
tpat13 4825a14
Add link to 0.16.0 Release
tpat13 3206c2c
Spacing
tpat13 b387b19
Merge branch 'dev' into nosey-parker
tpat13 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
107 changes: 107 additions & 0 deletions
107
docs/content/en/integrations/parsers/file/noseyparker.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
--- | ||
title: "Nosey Parker" | ||
toc_hide: true | ||
--- | ||
Input Type: | ||
- | ||
This parser takes JSON Lines Output from Nosey Parker. Supports version 0.16.0 of https://github.com/praetorian-inc/noseyparker | ||
|
||
Things to note about the Nosey Parker Parser: | ||
- | ||
- All findings are marked with a severity of 'High' | ||
- The deduplication algorithm marks a unique finding by the secret, filepath, and line number all together | ||
- The Nosey Parker tool allows for both full history scans of a repo and targeted branch scans | ||
- The Parser does NOT differentiate between the 2 scan types (may be future functionality) | ||
|
||
- **For full history scans:** | ||
- The scan will pick up secrets committed in the past that have since been removed | ||
- If a secret is removed from source code, it will still show up in the next scan | ||
- When importing findings via the Dojo API, make sure to use the parameter `do_not_reactivate` which will keep existing findings closed, without reactivating them | ||
- **For targeted branch scans:** | ||
- Keep in mind there may be active secrets that are either in the git history or not in the current branch | ||
|
||
Acceptable JSON Lines file: | ||
- | ||
Each line of the JSON Lines file from NoseyParker is one secret, but it can have multiple matches within the repository. All properties are required by the parser. | ||
|
||
The following is an example of an acceptable JSON lines file: | ||
~~~ | ||
{"type": "finding", "rule_name": "Generic Password (double quoted)", "match_content": "32ui1ffdasfhu239b4df2ac6609a9919", "num_matches": 2, "status": null, "comment": null, "matches": [ { "provenance": [ { "kind": "file", "path": "app/schema/config.py" }, { "kind": "git_repo", "repo_path": "./.git", "commit_provenance": { "commit_kind": "first_seen", "commit_metadata": { "commit_id": "0ef84b84c29924b210e3576f69d1e8632948bedc", "committer_name": "Princess Leia", "committer_email": "leia@test.com", "committer_timestamp": "1685495256 +0000", "author_name": "Princess Leia", "author_email": "leia@test.com", "author_timestamp": "1685495256 +0000", "message": "first commit\n" }, "blob_path": "app/schema/config.py" } } ], "blob_metadata": { "id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", "num_bytes": 664, "mime_essence": "text/plain", "charset": null }, "blob_id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", "location": { "offset_span": { "start": 617, "end": 660 }, "source_span": { "start": { "line": 16, "column": 17 }, "end": { "line": 16, "column": 59 } } }, "capture_group_index": 1, "match_content": "32ui1ffdasfhu239b4df2ac6609a9919", "snippet": { "before": "E = \"https://testwebsite.com\"\n ", "matching": "API_KEY = \"32ui1ffdasfhu239b4df2ac6609a9919", "after": "\"\n\n\n" }, "rule_name": "Generic API Key" } ] }{"type":"finding","rule_name":"Generic Username and Password (unquoted)","match_content":"secret","num_matches":1,"matches":[{"provenance":[{"kind":"file","path":"./app/schema/config.py"},{"kind":"git_repo","repo_path":"./.git","commit_provenance":{"commit_kind":"first_seen","commit_metadata":{"commit_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","committer_name":"Princess Leia","committer_email":"leia@test.com","committer_timestamp":"1685495256 +0000","author_name":"Princess Leia","author_email":"leia@test.com","author_timestamp":"1685495256 +0000","message":"framework\n"},"blob_path":"app/schema/config.py"}}],"blob_metadata":{"id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","num_bytes":664,"mime_essence":"text/plain","charset":null},"blob_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","location":{"offset_span":{"start":617,"end":660},"source_span":{"start":{"line":16,"column":17},"end":{"line":16,"column":59}}},"capture_group_index":1,"match_content":"secret","snippet":{"before":"E = \"https://testwebsite.com\"\n ","matching":"secret","after":"testing\"\n\n\n"},"rule_name":"Generic Username and Password (unquoted)"}]} | ||
{"type":"finding","rule_name":"Generic Username and Password (unquoted)","match_content":"secret","num_matches":1,"matches":[{"provenance":[{"kind":"file","path":"./app/schema/config.py"},{"kind":"git_repo","repo_path":"./.git","commit_provenance":{"commit_kind":"first_seen","commit_metadata":{"commit_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","committer_name":"Princess Leia","committer_email":"leia@test.com","committer_timestamp":"1685495256 +0000","author_name":"Princess Leia","author_email":"leia@test.com","author_timestamp":"1685495256 +0000","message":"framework\n"},"blob_path":"app/schema/config.py"}}],"blob_metadata":{"id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","num_bytes":664,"mime_essence":"text/plain","charset":null},"blob_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","location":{"offset_span":{"start":617,"end":660},"source_span":{"start":{"line":16,"column":17},"end":{"line":16,"column":59}}},"capture_group_index":1,"match_content":"secret","snippet":{"before":"E = \"https://testwebsite.com\"\n ","matching":"secret","after":"testing\"\n\n\n"},"rule_name":"Generic Username and Password (unquoted)"}]} | ||
|
||
~~~ | ||
|
||
If the first line is expanded, it looks like this: | ||
|
||
~~~ | ||
{ | ||
"type": "finding", | ||
"rule_name": "Generic Password (double quoted)", | ||
"match_content": "32ui1ffdasfhu239b4df2ac6609a9919", | ||
"num_matches": 2, | ||
"status": null, | ||
"comment": null, | ||
"matches": [ | ||
{ | ||
"provenance": [ | ||
{ | ||
"kind": "file", | ||
"path": "app/schema/config.py" | ||
}, | ||
{ | ||
"kind": "git_repo", | ||
"repo_path": "./.git", | ||
"commit_provenance": { | ||
"commit_kind": "first_seen", | ||
"commit_metadata": { | ||
"commit_id": "0ef84b84c29924b210e3576f69d1e8632948bedc", | ||
"committer_name": "Princess Leia", | ||
"committer_email": "leia@test.com", | ||
"committer_timestamp": "1685495256 +0000", | ||
"author_name": "Princess Leia", | ||
"author_email": "leia@test.com", | ||
"author_timestamp": "1685495256 +0000", | ||
"message": "first commit\n" | ||
}, | ||
"blob_path": "app/schema/config.py" | ||
} | ||
} | ||
], | ||
"blob_metadata": { | ||
"id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", | ||
"num_bytes": 664, | ||
"mime_essence": "text/plain", | ||
"charset": null | ||
}, | ||
"blob_id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", | ||
"location": { | ||
"offset_span": { | ||
"start": 617, | ||
"end": 660 | ||
}, | ||
"source_span": { | ||
"start": { | ||
"line": 16, | ||
"column": 17 | ||
}, | ||
"end": { | ||
"line": 16, | ||
"column": 59 | ||
} | ||
} | ||
}, | ||
"capture_group_index": 1, | ||
"match_content": "32ui1ffdasfhu239b4df2ac6609a9919", | ||
"snippet": { | ||
"before": "E = \"https://testwebsite.com\"\n ", | ||
"matching": "API_KEY = \"32ui1ffdasfhu239b4df2ac6609a9919", | ||
"after": "\"\n\n\n" | ||
}, | ||
"rule_name": "Generic API Key" | ||
} | ||
] | ||
} | ||
~~~ | ||
|
||
### Sample Scan Data | ||
Sample scan data for testing purposes can be found [here](https://github.com/DefectDojo/django-DefectDojo/tree/master/unittests/scans/noseyparker). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
import hashlib | ||
import json | ||
|
||
from datetime import datetime | ||
from dojo.models import Finding | ||
|
||
|
||
class NoseyParkerParser(object): | ||
""" | ||
Scanning secrets from repos | ||
""" | ||
|
||
def get_scan_types(self): | ||
return ["Nosey Parker Scan"] | ||
|
||
def get_label_for_scan_types(self, scan_type): | ||
return "Nosey Parker Scan" | ||
|
||
def get_description_for_scan_types(self, scan_type): | ||
return "Nosey Parker report file can be imported in JSON Lines format (option --jsonl). " \ | ||
"Supports v0.16.0 of https://github.com/praetorian-inc/noseyparker" | ||
|
||
def get_findings(self, file, test): | ||
""" | ||
Returns findings from jsonlines file and uses filter | ||
to skip findings and determine severity | ||
""" | ||
dupes = {} | ||
|
||
# Turn JSONL file into DataFrame | ||
if file is None: | ||
return | ||
elif file.name.lower().endswith(".jsonl"): | ||
# Process JSON lines into Dict | ||
data = [json.loads(line) for line in file] | ||
|
||
# Check for empty file | ||
if len(data[0]) == 0: | ||
return [] | ||
|
||
# Parse through each secret in each JSON line | ||
for line in data: | ||
# Set rule to the current secret type (e.g. AWS S3 Bucket) | ||
try: | ||
rule_name = line['rule_name'] | ||
secret = line['match_content'] | ||
except Exception: | ||
raise ValueError("Invalid Nosey Parker data, make sure to use Nosey Parker v0.16.0") | ||
|
||
# Set Finding details | ||
for match in line['matches']: | ||
# The following path is to account for the variability in the JSON lines output | ||
num_elements = len(match['provenance']) - 1 | ||
json_path = match['provenance'][num_elements] | ||
|
||
title = f"Secret(s) Found in Repository with Commit ID {json_path['commit_provenance']['commit_metadata']['commit_id']}" | ||
filepath = json_path['commit_provenance']['blob_path'] | ||
line_num = match['location']['source_span']['start']['line'] | ||
description = f"Secret found of type: {rule_name} \n" \ | ||
f"SECRET starts with: '{secret[:3]}' \n" \ | ||
f"Committer Name: {json_path['commit_provenance']['commit_metadata']['committer_name']} \n" \ | ||
f"Committer Email: {json_path['commit_provenance']['commit_metadata']['committer_email']} \n" \ | ||
f"Commit ID: {json_path['commit_provenance']['commit_metadata']['commit_id']} \n" \ | ||
f"Location: {filepath} line #{line_num} \n " \ | ||
f"Line #{line_num} \n " \ | ||
f"Code Snippet Containing Secret: {match['snippet']['before']}***SECRET***{match['snippet']['after']} \n" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing new line |
||
|
||
# Internal de-duplication | ||
key = hashlib.md5((filepath + "|" + secret + "|" + str(line_num)).encode("utf-8")).hexdigest() | ||
|
||
# If secret already exists with the same filepath/secret/linenum | ||
if key in dupes: | ||
finding = dupes[key] | ||
finding.nb_occurences += 1 | ||
dupes[key] = finding | ||
else: | ||
dupes[key] = True | ||
# Create Finding object | ||
finding = Finding( | ||
test=test, | ||
cwe=798, | ||
title=title, | ||
description=description, | ||
severity='High', | ||
mitigation="Reset the account/token and remove from source code. Store secrets/tokens/passwords in secret managers or secure vaults.", | ||
date=datetime.today().strftime("%Y-%m-%d"), | ||
verified=False, | ||
active=True, | ||
is_mitigated=False, | ||
file_path=filepath, | ||
line=line_num, | ||
static_finding=True, | ||
nb_occurences=1, | ||
dynamic_finding=False | ||
|
||
) | ||
dupes[key] = finding | ||
else: | ||
raise ValueError("JSON lines format not recognized (.jsonl file extension). Make sure to use Nosey Parker v0.16.0") | ||
|
||
return list(dupes.values()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{"type":"warning","data":"package.json: No license field"} | ||
{"type":"warning","data":"No license field"} | ||
{"type":"error","data":"An unexpected error occurred: \"https://registry.yarnpkg.com/-/npm/v1/security/audits: tunneling socket could not be established, cause=connect ECONNREFUSED 127.0.0.1:80\"."} | ||
{"type":"info","data":"If you think this is a bug, please open a bug report with the information provided in \"/yarn-error.log\"."} | ||
{"type":"info","data":"Visit https://yarnpkg.com/en/docs/cli/audit for documentation about this command."} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is if you really need a plaintext JSON file example if you have a whole link in
Sample Scan Data
I guess you can remove this in order to make the md slimmer and only share relevant information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed @manuel-sommer, thanks for the suggestion!