Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-r Recursive option #129

Merged
merged 29 commits into from
Jun 23, 2018
Merged

-r Recursive option #129

merged 29 commits into from
Jun 23, 2018

Conversation

omergunal
Copy link
Contributor

@omergunal omergunal commented May 13, 2018

Issue: #127
There is a few steps for completing this PR. Now we can get all ".py" files in directory and exclude some files with "-x" option.

pyt/__main__.py Outdated
excluded_files = args.excluded_paths
test = discover_files(directory_path, excluded_files)

print(test)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to see if it works

@KevinHock KevinHock self-requested a review June 6, 2018 04:22
@KevinHock
Copy link
Collaborator

Hi @omergunal, thanks for the soup! :D

Can you merge master into this branch and then resolve the merge conflicts? 👍

@omergunal
Copy link
Contributor Author

No problem :) , ok i will do it

@omergunal
Copy link
Contributor Author

in usage.py "filepath" is must required. should we do optional? because if we use "-r" we do not need it

@KevinHock
Copy link
Collaborator

Can you also add

    parser.add_argument(
        'targets', metavar='targets', type=str, nargs='*',
        help='source file(s) or directory(s) to be tested'
    )

@KevinHock
Copy link
Collaborator

Let's do this https://github.com/PyCQA/bandit/blob/master/bandit/cli/main.py#L153-L160 and then remove -f :)

@KevinHock
Copy link
Collaborator

I realize I'm totally changing my mind from what I said before, "This will enable a user to just give -r /path/to/files instead of -f file one at a time." but this seems cleaner.

@KevinHock
Copy link
Collaborator

KevinHock commented Jun 7, 2018

i.e. -r means we recursively search directories, and is a bool. Whereas the targets, (positional arguments) pyt foo.py controllers/will be the targets we analyze.

@omergunal
Copy link
Contributor Author

You mean, we will delete "-f" option and use "-r" for both single file scan and directory scan.And if user will not use any parameter, it will scan one file. Is that correct?

pyt/usage.py Outdated
action='store',
default='',
help='Separate files with commas'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-dent )

pyt/usage.py Outdated
@@ -91,7 +91,18 @@ def _add_optional_group(parser):
action='store_true',
help='do not skip lines with # nosec comments'
)

optional_group.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it

    parser.add_argument(
        '-r', '--recursive', dest='recursive',
        action='store_true', help='find and process files in subdirectories'
    )

pyt/__main__.py Outdated
@@ -30,6 +30,19 @@
)


def discover_files(directory_path, excluded_files):
file_list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We mostly use list() everywhere in assignments in the codebase, just for consistency.

pyt/__main__.py Outdated
if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
file_list.append(fullpath)

return(file_list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just for consistency, you can do return included_files (and I guess rename files to included_files)

@KevinHock
Copy link
Collaborator

KevinHock commented Jun 7, 2018

re:"You mean, we will delete -f option and use -r for both single file scan and directory scan.And if user will not use any parameter, it will scan one file. Is that correct?"

Delete -f
Use targets for both file and directory scan.
Use -r for doing something like

def discover_files(targets, excluded_files, recursion):
    included_files = list()
    excluded_list = excluded_files.split(",")

    for target in targets:
        if target.endswith('.py'):
            included_files.append(target)
        else: 
            for root, dirs, files in os.walk(target):
                for f in files:
                    fullpath = os.path.join(root, f)
                    if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                        included_files.append(fullpath)
                if not recursion:
                    break

    return included_files

@KevinHock
Copy link
Collaborator

(just updated the code, should be better now.)

@omergunal
Copy link
Contributor Author

it seems good about returning "included_list". also "-x " parameter is available.

pyt/__main__.py Outdated
@@ -39,6 +57,14 @@ def main(command_line_args=sys.argv[1:]):
elif args.trim_reassigned_in:
ui_mode = UImode.TRIM



targets = args.targets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more DRY if you did

files = discover_files(
  args.targets,
  args.excluded_paths,
  args.recursive
)

pyt/usage.py Outdated
default='',
help='Separate files with commas'
)
optional_group.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess targets will be part of _add_required_group b/c it's replacing -f files

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, looking good :D

pyt/__main__.py Outdated
directory = os.path.dirname(path)
project_modules = get_modules(directory)
local_modules = get_directory_modules(directory)
for path in files:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this for loop, you can have a vulnerabilities = list(), and then do

vulnerabilities.append(find_vulnerabilities(
    cfg_list,
    ui_mode,
    args.blackbox_mapping_file,
    args.trigger_word_file,
    nosec_lines
))

Copy link
Contributor Author

@omergunal omergunal Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it seems bad
selection_102

there is no problem at the moment. why we should change like this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it find vulnerabilities in all the files, or just the last file (i.e. last iteration of the loop)? If I'm reading it write I think it might do e.g. vulnerabilites = kitmap_vulns...then finally vulnerabilities = a.py_vulns, and only report the last list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As an aside, it seems strange it's not printing out the vulnerability info, and just seems to print the object.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it taking last item on the list. have you idea for fix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix is having a list outside of the for loop and adding the vulnerabilities of each file to it. The code from my first comment should do it, although now that I think about it it's probably extend and not append.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^Ahh, this was it! 👍 Just change append to extend and it'll all work! :D

pyt/__main__.py Outdated
)

if args.baseline:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can de-dent this, if args.baseline: as only one call to get_vulnerabilities_not_in_baseline, with the vulnerabilities of every file will work.

pyt/__main__.py Outdated
)
vulnerabilities = list()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i created list before loop as you said. the same problem about last item is still continue.
a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, That's odd, I'll checkout/test your code later today to try and find the issue. 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really weird, I'll look more in-depth on Monday 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out your branch and it was append vs. extend

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super close, just the de-dent and append vs. extend and I think that's mostly it :)

pyt/__main__.py Outdated

if args.baseline:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep this, but just de-dent it so that we only trim once.

@KevinHock
Copy link
Collaborator

So I looked at the tests that were failing and the Mock stuff

You can do e.g.

 class MainTest(BaseTestCase):
+    @mock.patch('pyt.__main__.discover_files')
     @mock.patch('pyt.__main__.parse_args')
     @mock.patch('pyt.__main__.find_vulnerabilities')
     @mock.patch('pyt.__main__.text')
-    def test_text_output(self, mock_text, mock_find_vulnerabilities, mock_parse_args):
+    def test_text_output(self, mock_text, mock_find_vulnerabilities, mock_parse_args, mock_discover_files):
         mock_find_vulnerabilities.return_value = 'stuff'
         example_file = 'examples/vulnerable_code/inter_command_injection.py'
         output_file = 'mocked_outfile'
 
+        mock_discover_files.return_value = [example_file]
         mock_parse_args.return_value = mock.Mock(
             autospec=True,
-            filepath=example_file,
             project_root=None,
             baseline=None,
             json=None,

and the same for the other tests.

This makes it so that in the tests, we don't really ever call discover_files, but instead "mock" it, and just use the return value that we want to.

)
initialize_constraint_table(cfg_list)
analyse(cfg_list)
vulnerabilities.extend(find_vulnerabilities(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there are no vulnerability in a.py b.py and c.py but it printing from xss.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't figure out the bug yet, gonna look more tomorrow 😁 This is harder than expected to track down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i fixed it. its about vulnerabilities = list() location

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write tests for discover_files when you get a chance? 👍

pyt/__main__.py Outdated
included_files.append(fullpath)
else:
if target not in excluded_list:
included_files.append(targets[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if targets is a list of files, e.g. python -m pyt examples/vulnerable_code/command_injection.py examples/vulnerable_code/XSS.py, then discover_files will return the first file N times. (Where N is the len of targets.)

pyt/__main__.py Outdated

for target in targets:
if os.path.isdir(target):
if recursive:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So having if recursive: here it will make it so that if you don't have -r then you won't search directories.

You can change it to:

def discover_files(targets, excluded_files, recursive=False):
    included_files = list()
    excluded_list = excluded_files.split(",")

    for target in targets:
        if os.path.isdir(target):
            for root, dirs, files in os.walk(target):
                for f in files:
                    fullpath = os.path.join(root, f)
                    if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                        included_files.append(fullpath)
                if not recursive:
                    break
        else:
            if target not in excluded_list:
                included_files.append(target)
    return included_files

pyt/__main__.py Outdated
args.recursive
)
for path in files:
vulnerabilities = list()
Copy link
Contributor Author

@omergunal omergunal Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added the list inside to loop
s

Its ok now

Copy link
Collaborator

@KevinHock KevinHock Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bug I found yesterday, or more accurately the thing I don't understand 😕 , is that find_vulnerabilities returns the vulnerabilities for all the files previously analyzed, as if find_vulnerabilities knows all the vulnerabilities found for the other files we've looked at, how does it know this? 😱

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as the PR, you can change it to what you had originally, i.e. vulnerabilities = find_vulnerabilities(..), sorry I misunderstood the code, I'll still look into the reason why the code does this though.

Copy link
Collaborator

@KevinHock KevinHock Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, figured it out, so I knew constraint_table etc. were global variables that keep state, and that they could be the culprit if they were used weirdly, however it is due to FrameworkAdaptor https://github.com/python-security/pyt/blob/master/pyt/web_frameworks/framework_adaptor.py#L88 adding all the past CFGs to the list :) I'll add a comment to __main__.py about it after we merge this PR

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

pyt/__main__.py Outdated
if os.path.isdir(target):
for root, dirs, files in os.walk(target):
for f in files:
if not recursive:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important for the if not recursive: to be after the

                        fullpath = os.path.join(root, f)
                        if os.path.splitext(fullpath)[1] == '.py' and fullpath.split("/")[-1] not in excluded_list:
                            included_files.append(fullpath)

this is so that we only iterate through the for f in files: once. i.e. just one-level of depth and not recursively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i did it

pyt/__main__.py Outdated
vulnerabilities,
args.baseline
)
vulnerabilities = get_vulnerabilities_not_in_baseline(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for de-denting the if args.baseline: You can de-dent the vulnerabilities = get_vulnerabilities_not_in_baseline( too, 👍

pyt/__main__.py Outdated

for target in targets:
if os.path.isdir(target):
for root, dirs, files in os.walk(target):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line, for root, dirs, files in os.walk(target): is indented one more level than it has to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will try to do better for returning "included_files"

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just make the tests pass and I'll merge 👍 (Feel free to write tests for discover_files if you'd like to though.)

pyt/__main__.py Outdated
excluded_list = excluded_files.split(",")
for target in targets:
if os.path.isdir(target):
for root, dirs, files in os.walk(target):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can de-dent from line 38 to line 44

pyt/__main__.py Outdated

if args.baseline:
vulnerabilities = get_vulnerabilities_not_in_baseline(
vulnerabilities,
args.baseline
)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can delete this newline

@omergunal
Copy link
Contributor Author

and its done. i will write test for discover_files later

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So happy to merge 🎊 🎈 🎉 🎂

@KevinHock KevinHock merged commit 853300b into python-security:master Jun 23, 2018
KevinHock added a commit that referenced this pull request Jun 23, 2018
…ll versions, add travis commands to tox so this does not happen again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants