Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nosey Parker Parser #9067

Merged
merged 40 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a2d389d
Created _init_.py
tpat13 Nov 28, 2023
6052a25
Created parser.py
tpat13 Nov 28, 2023
94f0125
Merge branch 'DefectDojo:master' into nosey-parker
tpat13 Dec 4, 2023
be31c39
Update README.md (#9048)
devGregA Nov 29, 2023
0aafba0
Fixing README links and formatting (#9022)
cneill Nov 29, 2023
46928f8
Bump python-gitlab from 3.15.0 to 4.2.0 (#9064)
dependabot[bot] Nov 30, 2023
c0de28c
Bump fontawesomefree from 6.4.2 to 6.5.0 (#9074)
dependabot[bot] Nov 30, 2023
f8426ef
:tada: added humble #8988 (#8989)
manuel-sommer Nov 30, 2023
f503320
Bump social-auth-core from 4.5.0 to 4.5.1 (#9073)
dependabot[bot] Nov 30, 2023
12c20c4
Adding subcomponent labels for celery beat and worker (#9078)
veneber Nov 30, 2023
e2885d5
Update rabbitmq Docker tag from 3.12.9 to v3.12.10 (docker-compose.ym…
renovate[bot] Dec 1, 2023
55f5573
Update postgres:16.1-alpine Docker digest from 16.1 to 16.1-alpine (d…
renovate[bot] Dec 1, 2023
4b80f78
Update redis:7.2.3-alpine Docker digest from 7.2.3 to 7.2.3-alpine (d…
renovate[bot] Dec 1, 2023
e8bad94
Bump boto3 from 1.29.7 to 1.33.5 (#9085)
dependabot[bot] Dec 1, 2023
0a58937
Bump fontawesomefree from 6.5.0 to 6.5.1 (#9086)
dependabot[bot] Dec 1, 2023
cd83b00
Add logging statement for failed password reset validation logic (#9087)
Maffooch Dec 1, 2023
9ee0605
Finding Template: Correct save ordering (#9088)
Maffooch Dec 1, 2023
b8c8d9d
Feature/parser jfrog xray binary scan (#9015)
renejal Dec 2, 2023
1f36b42
Update postgres:16.1-alpine Docker digest from 16.1 to 16.1-alpine (d…
renovate[bot] Dec 2, 2023
526319f
Nosey Parker Test Cases
tpat13 Dec 4, 2023
0ba7a9d
Updated Parser
tpat13 Dec 4, 2023
8763532
Bump cryptography from 41.0.5 to 41.0.7 (#9065)
dependabot[bot] Nov 30, 2023
da0e834
NoseyParker Parser Flake8 compliance
tpat13 Dec 4, 2023
71fa003
Merge branch 'dev' into nosey-parker
tpat13 Jan 3, 2024
16b97c2
Merge branch 'dev' into nosey-parker
tpat13 Jan 12, 2024
c7d99c1
NoseyParker fix for 0.16
tpat13 Jan 12, 2024
7160ae3
JSON lines fix
tpat13 Jan 31, 2024
001fbda
Merge branch 'dev' into nosey-parker
tpat13 Feb 7, 2024
3098b37
Nosey Parker Parser: v0.16 fix
tpat13 Feb 7, 2024
ee697dc
Comma for consistency
tpat13 Feb 7, 2024
5114806
Flake8 requirements
tpat13 Feb 7, 2024
a8c7358
Merge branch 'dev' into nosey-parker
tpat13 Feb 20, 2024
8d16815
Update docs/content/en/integrations/parsers/file/noseyparker.md
cneill Feb 21, 2024
02f017a
Update dojo/tools/noseyparker/parser.py
cneill Feb 21, 2024
068106a
Update docs/content/en/integrations/parsers/file/noseyparker.md
cneill Feb 21, 2024
9219e44
Merge branch 'dev' into nosey-parker
tpat13 Feb 26, 2024
8cb6795
Removed example JSONL file
tpat13 Feb 26, 2024
4825a14
Add link to 0.16.0 Release
tpat13 Feb 26, 2024
3206c2c
Spacing
tpat13 Feb 26, 2024
b387b19
Merge branch 'dev' into nosey-parker
tpat13 Feb 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions docs/content/en/integrations/parsers/file/noseyparker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: "Nosey Parker"
toc_hide: true
---
Input Type:
-
This parser takes JSON Lines Output from Nosey Parker. Supports version 0.16.0 of https://github.com/praetorian-inc/noseyparker

Things to note about the Nosey Parker Parser:
-
- All findings are marked with a severity of 'High'
- The deduplication algorithm marks a unique finding by the secret, filepath, and line number all together
- The Nosey Parker tool allows for both full history scans of a repo and targeted branch scans
- The Parser does NOT differentiate between the 2 scan types (may be future functionality)

- **For full history scans:**
- The scan will pick up secrets committed in the past that have since been removed
- If a secret is removed from source code, it will still show up in the next scan
- When importing findings via the Dojo API, make sure to use the parameter `do_not_reactivate` which will keep existing findings closed, without reactivating them
- **For targeted branch scans:**
- Keep in mind there may be active secrets that are either in the git history or not in the current branch

Acceptable JSON Lines file:
-
Each line of the JSON Lines file from NoseyParker is one secret, but it can have multiple matches within the repository. All properties are required by the parser.

The following is an example of an acceptable JSON lines file:
~~~
{"type": "finding", "rule_name": "Generic Password (double quoted)", "match_content": "32ui1ffdasfhu239b4df2ac6609a9919", "num_matches": 2, "status": null, "comment": null, "matches": [ { "provenance": [ { "kind": "file", "path": "app/schema/config.py" }, { "kind": "git_repo", "repo_path": "./.git", "commit_provenance": { "commit_kind": "first_seen", "commit_metadata": { "commit_id": "0ef84b84c29924b210e3576f69d1e8632948bedc", "committer_name": "Princess Leia", "committer_email": "leia@test.com", "committer_timestamp": "1685495256 +0000", "author_name": "Princess Leia", "author_email": "leia@test.com", "author_timestamp": "1685495256 +0000", "message": "first commit\n" }, "blob_path": "app/schema/config.py" } } ], "blob_metadata": { "id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", "num_bytes": 664, "mime_essence": "text/plain", "charset": null }, "blob_id": "0ee84b84c29924b210e3576fe9d1e8632948bedc", "location": { "offset_span": { "start": 617, "end": 660 }, "source_span": { "start": { "line": 16, "column": 17 }, "end": { "line": 16, "column": 59 } } }, "capture_group_index": 1, "match_content": "32ui1ffdasfhu239b4df2ac6609a9919", "snippet": { "before": "E = \"https://testwebsite.com\"\n ", "matching": "API_KEY = \"32ui1ffdasfhu239b4df2ac6609a9919", "after": "\"\n\n\n" }, "rule_name": "Generic API Key" } ] }{"type":"finding","rule_name":"Generic Username and Password (unquoted)","match_content":"secret","num_matches":1,"matches":[{"provenance":[{"kind":"file","path":"./app/schema/config.py"},{"kind":"git_repo","repo_path":"./.git","commit_provenance":{"commit_kind":"first_seen","commit_metadata":{"commit_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","committer_name":"Princess Leia","committer_email":"leia@test.com","committer_timestamp":"1685495256 +0000","author_name":"Princess Leia","author_email":"leia@test.com","author_timestamp":"1685495256 +0000","message":"framework\n"},"blob_path":"app/schema/config.py"}}],"blob_metadata":{"id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","num_bytes":664,"mime_essence":"text/plain","charset":null},"blob_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","location":{"offset_span":{"start":617,"end":660},"source_span":{"start":{"line":16,"column":17},"end":{"line":16,"column":59}}},"capture_group_index":1,"match_content":"secret","snippet":{"before":"E = \"https://testwebsite.com\"\n ","matching":"secret","after":"testing\"\n\n\n"},"rule_name":"Generic Username and Password (unquoted)"}]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is if you really need a plaintext JSON file example if you have a whole link in

Sample Scan Data

I guess you can remove this in order to make the md slimmer and only share relevant information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed @manuel-sommer, thanks for the suggestion!

{"type":"finding","rule_name":"Generic Username and Password (unquoted)","match_content":"secret","num_matches":1,"matches":[{"provenance":[{"kind":"file","path":"./app/schema/config.py"},{"kind":"git_repo","repo_path":"./.git","commit_provenance":{"commit_kind":"first_seen","commit_metadata":{"commit_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","committer_name":"Princess Leia","committer_email":"leia@test.com","committer_timestamp":"1685495256 +0000","author_name":"Princess Leia","author_email":"leia@test.com","author_timestamp":"1685495256 +0000","message":"framework\n"},"blob_path":"app/schema/config.py"}}],"blob_metadata":{"id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","num_bytes":664,"mime_essence":"text/plain","charset":null},"blob_id":"0ee84b84c29924b210e3576fe9d1e8632948bedc","location":{"offset_span":{"start":617,"end":660},"source_span":{"start":{"line":16,"column":17},"end":{"line":16,"column":59}}},"capture_group_index":1,"match_content":"secret","snippet":{"before":"E = \"https://testwebsite.com\"\n ","matching":"secret","after":"testing\"\n\n\n"},"rule_name":"Generic Username and Password (unquoted)"}]}

~~~

If the first line is expanded, it looks like this:

~~~
{
"type": "finding",
"rule_name": "Generic Password (double quoted)",
"match_content": "32ui1ffdasfhu239b4df2ac6609a9919",
"num_matches": 2,
"status": null,
"comment": null,
"matches": [
{
"provenance": [
{
"kind": "file",
"path": "app/schema/config.py"
},
{
"kind": "git_repo",
"repo_path": "./.git",
"commit_provenance": {
"commit_kind": "first_seen",
"commit_metadata": {
"commit_id": "0ef84b84c29924b210e3576f69d1e8632948bedc",
"committer_name": "Princess Leia",
"committer_email": "leia@test.com",
"committer_timestamp": "1685495256 +0000",
"author_name": "Princess Leia",
"author_email": "leia@test.com",
"author_timestamp": "1685495256 +0000",
"message": "first commit\n"
},
"blob_path": "app/schema/config.py"
}
}
],
"blob_metadata": {
"id": "0ee84b84c29924b210e3576fe9d1e8632948bedc",
"num_bytes": 664,
"mime_essence": "text/plain",
"charset": null
},
"blob_id": "0ee84b84c29924b210e3576fe9d1e8632948bedc",
"location": {
"offset_span": {
"start": 617,
"end": 660
},
"source_span": {
"start": {
"line": 16,
"column": 17
},
"end": {
"line": 16,
"column": 59
}
}
},
"capture_group_index": 1,
"match_content": "32ui1ffdasfhu239b4df2ac6609a9919",
"snippet": {
"before": "E = \"https://testwebsite.com\"\n ",
"matching": "API_KEY = \"32ui1ffdasfhu239b4df2ac6609a9919",
"after": "\"\n\n\n"
},
"rule_name": "Generic API Key"
}
]
}
~~~

### Sample Scan Data
Sample scan data for testing purposes can be found [here](https://github.com/DefectDojo/django-DefectDojo/tree/master/unittests/scans/noseyparker).
1 change: 1 addition & 0 deletions dojo/settings/settings.dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -1451,6 +1451,7 @@ def saml2_attrib_map_format(dict):
'MSDefender Parser': DEDUPE_ALGO_HASH_CODE,
'HCLAppScan XML': DEDUPE_ALGO_HASH_CODE,
'MobSF Scan': DEDUPE_ALGO_HASH_CODE,
'Nosey Parker Scan': DEDUPE_ALGO_UNIQUE_ID_FROM_TOOL_OR_HASH_CODE,
}

# Override the hardcoded settings here via the env var
Expand Down
Empty file.
101 changes: 101 additions & 0 deletions dojo/tools/noseyparker/parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import hashlib
import json

from datetime import datetime
from dojo.models import Finding


class NoseyParkerParser(object):
"""
Scanning secrets from repos
"""

def get_scan_types(self):
return ["Nosey Parker Scan"]

def get_label_for_scan_types(self, scan_type):
return "Nosey Parker Scan"

def get_description_for_scan_types(self, scan_type):
return "Nosey Parker report file can be imported in JSON Lines format (option --jsonl). " \
"Supports v0.16.0 of https://github.com/praetorian-inc/noseyparker"

def get_findings(self, file, test):
"""
Returns findings from jsonlines file and uses filter
to skip findings and determine severity
"""
dupes = {}

# Turn JSONL file into DataFrame
if file is None:
return
elif file.name.lower().endswith(".jsonl"):
# Process JSON lines into Dict
data = [json.loads(line) for line in file]

# Check for empty file
if len(data[0]) == 0:
return []

# Parse through each secret in each JSON line
for line in data:
# Set rule to the current secret type (e.g. AWS S3 Bucket)
try:
rule_name = line['rule_name']
secret = line['match_content']
except Exception:
raise ValueError("Invalid Nosey Parker data, make sure to use Nosey Parker v0.16.0")

# Set Finding details
for match in line['matches']:
# The following path is to account for the variability in the JSON lines output
num_elements = len(match['provenance']) - 1
json_path = match['provenance'][num_elements]

title = f"Secret(s) Found in Repository with Commit ID {json_path['commit_provenance']['commit_metadata']['commit_id']}"
filepath = json_path['commit_provenance']['blob_path']
line_num = match['location']['source_span']['start']['line']
description = f"Secret found of type: {rule_name} \n" \
f"SECRET starts with: '{secret[:3]}' \n" \
f"Committer Name: {json_path['commit_provenance']['commit_metadata']['committer_name']} \n" \
f"Committer Email: {json_path['commit_provenance']['commit_metadata']['committer_email']} \n" \
f"Commit ID: {json_path['commit_provenance']['commit_metadata']['commit_id']} \n" \
f"Location: {filepath} line #{line_num} \n " \
f"Line #{line_num} \n " \
f"Code Snippet Containing Secret: {match['snippet']['before']}***SECRET***{match['snippet']['after']} \n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing new line


# Internal de-duplication
key = hashlib.md5((filepath + "|" + secret + "|" + str(line_num)).encode("utf-8")).hexdigest()

# If secret already exists with the same filepath/secret/linenum
if key in dupes:
finding = dupes[key]
finding.nb_occurences += 1
dupes[key] = finding
else:
dupes[key] = True
# Create Finding object
finding = Finding(
test=test,
cwe=798,
title=title,
description=description,
severity='High',
mitigation="Reset the account/token and remove from source code. Store secrets/tokens/passwords in secret managers or secure vaults.",
date=datetime.today().strftime("%Y-%m-%d"),
verified=False,
active=True,
is_mitigated=False,
file_path=filepath,
line=line_num,
static_finding=True,
nb_occurences=1,
dynamic_finding=False

)
dupes[key] = finding
else:
raise ValueError("JSON lines format not recognized (.jsonl file extension). Make sure to use Nosey Parker v0.16.0")

return list(dupes.values())
5 changes: 5 additions & 0 deletions unittests/scans/noseyparker/empty_with_error.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"type":"warning","data":"package.json: No license field"}
{"type":"warning","data":"No license field"}
{"type":"error","data":"An unexpected error occurred: \"https://registry.yarnpkg.com/-/npm/v1/security/audits: tunneling socket could not be established, cause=connect ECONNREFUSED 127.0.0.1:80\"."}
{"type":"info","data":"If you think this is a bug, please open a bug report with the information provided in \"/yarn-error.log\"."}
{"type":"info","data":"Visit https://yarnpkg.com/en/docs/cli/audit for documentation about this command."}
Loading
Loading