Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

existence of cache file(.tern/) effects report output #1000

Closed
JustinWonjaePark opened this issue Jul 13, 2021 · 7 comments · Fixed by #1045
Closed

existence of cache file(.tern/) effects report output #1000

JustinWonjaePark opened this issue Jul 13, 2021 · 7 comments · Fixed by #1045
Assignees
Labels
bug Something went wrong

Comments

@JustinWonjaePark
Copy link

Describe the bug
Output of tern differs whether there exist cache files(.tern/) or not.
image

debian_buster_img#.json : created by command
tern report -f json -i debian:buster -o debian_buster_img#.json
debian_buster_img_scancode#.json : created by command
tern report -f json -x scancode -i debian:buster -o debian_buster_img_scancode#.json

To Reproduce
Steps to reproduce the behavior:

  1. Pull debian:buster
  2. Run command tern report -f json -i debian:buster -o debian_buster_img1.json
  3. Run command tern report -f json -i debian:buster -o debian_buster_img2.json
  4. Remove cache rm -rf ~/.tern/
  5. Run command tern report -f json -x scancode -i debian:buster -o debian_buster_img_scancode1.json
  6. Run command tern report -f json -x scancode -i debian:buster -o debian_buster_img_scancode2.json
  7. Run command tern report -f json -i debian:buster -o debian_buster_img3.json
  8. Run command tern report -f json -x scancode -i debian:buster -o debian_buster_img_scancode3.json
  9. Remove cache rm -rf ~/.tern/
  10. Run command tern report -f json -i debian:buster -o debian_buster_img4.json
  11. Remove cache rm -rf ~/.tern/
  12. Run command tern report -f json -x scancode -i debian:buster -o debian_buster_img_scancode4.json

Error in terminal
no error occurred

Expected behavior
Same command need to provide same output

Environment you are running Tern on
Enter all that apply
Tern at commit 273e3c8
Python 3.6.13
Docker version 19.03.13, build 4484c46d9d

Distributor ID: Ubuntu
Description: Ubuntu 16.04.7 LTS
Release: 16.04
Codename: xenial

@rnjudge
Copy link
Contributor

rnjudge commented Jul 23, 2021

I first cleared the cache and then followed the Tern commands in order according to your directions.

Regular Tern run comparison:

  • debian_buster_img1.json and debian_buster_img2.json had no differences between them.
  • As expected,debian_buster_img1.json and debian_buster_img2.json differed from debian_buster_img3.json because the cache was populated with scancode data from the previous run (step 5 and 6) that would've been included additionally in debian_buster_img3.json. Same thing for debian_buster_img4 vs debian_buster_img3.
  • debian_buster_img4.json and debian_buster_img[1/2].json were the same file, however, the pkg_liceneses fields were not ordered lists so the diff did not come back clean. However, if you take a further look, they are reporting the same licenses.
    • FYI, Tern collects debian licenses using the debian-inspector package, which parses the copyright text to report licenses. Due to the parsing, there are oftentimes multiple licenses reported for debian packages whereas with other package managers there would only be one.
  • As expected, debian_buster_img3.json and debian_buster_img4.json differ as img3 contains scancode output (from steps 5 and 6).

Scancode files:

  • Due to the size of the file, I can't find specific differences in debian_buster_img_scancode1.json and debian_buster_img_scancode2.json -- Do you have any insight on if there's missing information between the two files here? I agree it's incorrect that the two would be different but in order to debug I need specific information to look for that's missing.
  • debian_buster_img_scancode2 and debian_buster_img_scancode3 are the same (both load from cache).
  • debian_buster_img_scancode1 and debian_buster_img_scancode4 are the same (both run with clean cache).

Things are mostly what I would expect except for the following case:

  • Running Tern with Scancode back to back is not producing the same document. The first document created with Scancode is almost double the size of the second which makes me think there is an issue loading from the cache with scancode.

By the way, tern -c will clear the cache for you instead of having to remove the cache file directly :)

I'll keep poking at this.

@JustinWonjaePark
Copy link
Author

It's good to know the tern -c option! I've been deleting the temp directory :P

About the difference between debian_buster_imge_scancode1.json and debian_buster_imge_scancode2.json.

As you might know, report file with Scancode extension describes information for a file twice. One with detailed info, one without detailed info(e.g. checksum_type, extattrs, licenses and etc.)
debian_buster_imge_scancode1.json has both of them, but debian_buster_imge_scancode2.json only first one.(I suppose that's because why the second report is about half size of first report)

If you search for paths, you'll see what I mean.
For example (numbers are line numbers in pretty format)

  • "path": "usr/bin/shred"
    debian_buster_imge_scancode1.json : 131734, 249256
    debian_buster_imge_scancode2.json : 131734
  • "path": "var/log/apt/eipp.log.xz"
    debian_buster_imge_scancode1.json : 18219 , 131770
    debian_buster_imge_scancode2.json : 18219

image
(left is debian_buster_imge_scancode1.json and right is debian_buster_imge_scancode2.json)

@nishakm nishakm assigned nishakm and rnjudge and unassigned nishakm Aug 5, 2021
@rnjudge
Copy link
Contributor

rnjudge commented Aug 11, 2021

Thanks @JustinWonjaePark for the clarification of differences. From what I see, Scancode is reporting the file and its metadata twice on the first run which is why the files are different. Tern then captures the file metadata only once in the cache and reports on it once when it loads from the cache. This is because Tern only stores one copy of a file in the cache when the name, path and checksum properties are the same -- which they are for the shred file in debian_buster_img_scancode1.json and debian_buster_img_scancode2.json. Using the shred example, I don't see any metadata for the shred file that is present in debian_buster_img_scancode1.json that is not present in debian_buster_img_scancode2.json, it's just not reported on twice because it's only loaded from the cache once.

I agree that Tern should not double report scancode data on a file in debian_buster_img_scancode1.json but don't actually see any missing metadata (which is good). Is there metadata in particular you're seeing missing that I'm not?

@JustinWonjaePark
Copy link
Author

I agree with you @rnjudge. I've not seen missing metadata either.

@rnjudge
Copy link
Contributor

rnjudge commented Aug 12, 2021

I will start to look at a fix so a fresh scancode run doesn't report the same package twice. When I run just scancode without Tern I get only one reported instance of the usr/bin/shred file so this could be a Tern issue instead of coming from scancode.

@rnjudge
Copy link
Contributor

rnjudge commented Sep 28, 2021

@JustinWonjaePark Thanks again for catching this very tricky bug. I found the issue and am submitting a fix. Then, planning to cut a new release with the fix included tomorrow.

rnjudge added a commit to rnjudge/tern that referenced this issue Sep 28, 2021
When running scancode with an empty cache, Tern was reporting the same
file twice due to a logic bug in the add_file_data() function. This fix
adds a flag to ensure that a file is only added to the layer object
file list if it does not already exist there.

Resolves tern-tools#1000

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 28, 2021
When running scancode with an empty cache, Tern was reporting the same
file twice due to a logic bug in the add_file_data() function. This fix
adds a flag to ensure that a file is only added to the layer object
file list if it does not already exist there.

Resolves tern-tools#1000

Signed-off-by: Rose Judge <rjudge@vmware.com>
nishakm pushed a commit that referenced this issue Sep 28, 2021
When running scancode with an empty cache, Tern was reporting the same
file twice due to a logic bug in the add_file_data() function. This fix
adds a flag to ensure that a file is only added to the layer object
file list if it does not already exist there.

Resolves #1000

Signed-off-by: Rose Judge <rjudge@vmware.com>
@JustinWonjaePark
Copy link
Author

@rnjudge I am glad that it has been fixed. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something went wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants