Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: duplicate scancode files being reported #1045

Merged
merged 1 commit into from
Sep 28, 2021

Conversation

rnjudge
Copy link
Contributor

@rnjudge rnjudge commented Sep 28, 2021

When running scancode with an empty cache, Tern was reporting the same
file twice due to a logic bug in the add_file_data() function. This fix
adds a flag to ensure that a file is only added to the layer object
file list if it does not already exist there.

Resolves #1000

Signed-off-by: Rose Judge rjudge@vmware.com

Comment on lines 172 to 178
if f.merge(checkfile):
# file already exists and has now been updated
updated = True
break
# file didn't previously exist in layer so add it now
layer_obj.files.append(checkfile)
if not updated:
# file didn't previously exist in layer so add it now
layer_obj.files.append(checkfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check my logic here? I think this can be achieved this way:

for f in layer_obj.files <-- cycle through existing files
    merged = f.merge(checkfile) <-- try to update the file
    if not merged: <-- the file didn't get updated, means this is a unique file
        layer_obj.files.append(checkfile)

Copy link
Contributor Author

@rnjudge rnjudge Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need the break and to update the file only when the for loop has been broken or completed. Otherwise, with your proposal, say we have three files in the layer [A, B, C] and two files from scancode that needs updating: [A, C]. We go through this for loop and compare A to A, merged = True so file gets updated but then we move to the next f in layer files and compare B from the layer files to the same A from scancode... merged = False and so we add the A file, even though it already exists and has been updated... and we are facing the same problem we have now. I think we could do:

merged = False <-- variable needs to exist outside of for loop so we can check it once the loop completes.
for f in layer_obj.files:
   merged = f.merge(checkfile)
   if merged:
      break
if not merged: <-- if we are here it's because we broke out of the loop (in which case the file has been updated and merged=True or because we never found a match and merged=False and file needs to be updated.
   layer_obj.files.append(checkfile)

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! This makes sense.

When running scancode with an empty cache, Tern was reporting the same
file twice due to a logic bug in the add_file_data() function. This fix
adds a flag to ensure that a file is only added to the layer object
file list if it does not already exist there.

Resolves tern-tools#1000

Signed-off-by: Rose Judge <rjudge@vmware.com>
@nishakm nishakm merged commit c90cf6e into tern-tools:main Sep 28, 2021
@rnjudge rnjudge deleted the fix-scancode-dups branch September 29, 2021 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

existence of cache file(.tern/) effects report output
2 participants