Support for Code Size in MB? #183

DarwinJS · 2020-07-30T11:58:22Z

Some commercial security scanning tools now charge by file volume in MB (and some charge by lines).

I would like to use this tool in CI to do an assessment of possible costs to scan my code in given security tools.

I am wondering if that measurement could be taken and reported as well since this engine is already iterating through the code for purposes of counting?

I was also thinking of building a CI plugin around this similar to these: https://gitlab.com/guided-explorations/ci-cd-plugin-extensions.

DarwinJS · 2020-08-01T11:25:21Z

I'm not a go expert, but in looking through the code I think a first iteration on just the counting part might be:

For each case statement in switch currentState record the current Location in a global (maybe something like: LastStateChangeLocation = fileJob.Location)
In the same case statement, for case SCode, SString, SCommentCode, SMulticommentCode: subtract the last recorded position from the current position and add it to a counter. (maybe something like: fileJob.CodeBytesCounter = fileJob.CodeBytesCounter + (fileJob.Location - LastStateChangeLocation))

I don't understand if summing location will be roughly equivalent to bytes or if a conversion is needed.

DarwinJS · 2020-08-01T13:00:31Z

Forgot to mention that the GitLab CI/CD plugin example was roughed out last week: https://gitlab.com/guided-explorations/ci-cd-plugin-extensions/ci-cd-plugin-extension-scc

boyter · 2020-08-02T23:55:38Z

Lot to go though there!

So the number of bytes is already there, so assuming they charge even for lines that are comments that should allow you to get the result you want from this.

If however they only charge per code lines there would need to be some changes. Would need to know this before knowing which one to do. In either case there might need to be a modified output to give you this value. Certainly something that seems useful and worth added in.

BTW thanks for writing that plugin, and I have added a link on the main page https://github.com/boyter/scc/#interesting-use-cases for others to get the benefit.

DarwinJS · 2020-08-03T11:34:12Z

@boyter - I just looked for the most well known scanner that counts MBs. It was actually not easy to find the information, but a FAQ answer indicated that they only charge for the code part of files.

My guess is that most others who charge just for lines of code will be careful not to count non-code lines since there will be concern about cost.

I had thought of this and maybe the following might be a path forward:

Provide the data size of just the lines of code.
If there is a user request citing a tool that does just file size, have a parameter to add file sizes in case some scanners count that way?

For the one vendor, part of the reason for charging for MBs is that they also do static analysis of built binaries.

boyter · 2020-08-03T23:35:11Z

Are you looking to consume this information through the JSON output or some other one? Because it's certainly easier to add this information to those than to the default stdout. Although, there is the COCOMO stats, so it might be worth adding another section below, which would be a good place to include things like #177

DarwinJS · 2020-08-04T16:00:15Z

It would be great to have it in all. HTML displays the smoothest when it is uploaded to GitLab artifacts storage. Json would be grat for further programmatic analysis.

FYI I just submitted a requested article to Acloud.guru's guest blog that includes scc. They have over 2 millions subscribers ;)

boyter · 2020-08-05T06:13:42Z

Yeah fair enough. Adding to the HTML and JSON isn't so hard but the stdout is a bit more problematic. Ill have a look at implementing though.

Oh that would be neat :) let me know if it makes it out and ill be sure to share as widely as I can.

boyter · 2020-08-05T06:48:42Z

So you should now be able to get the bytes per file and rollups for all the usual suspects, JSON, HTML, CSV

For the moment this is only on a branch https://github.com/boyter/scc/compare/Issue183 and does not include byte counts just for code (its everything).

I'll be adding the code ones soon which will just be the byte count of anything not considered a comment on the same branch and then merge in when its looking good.

boyter · 2020-08-06T01:57:22Z

@DarwinJS does the output for what is there work for your use case?

Its not the count of the code itself, just the file, but should give you a reasonable idea...

Im inclined to add the latter part after getting this release out due to how much work it turns out to be.

boyter · 2020-08-06T01:59:34Z

This has been merged in as is into master. So if you build from there you should get the the byte count out of JSON/CSV and HTML.

DarwinJS · 2020-08-06T02:40:55Z

Thank you!

Do files that have two code types get counted in both?

I am having a challenge judging the accuracy - I've been using the www-gitlab-com repo - but I see it has .haml files and some other formats that are text, but maybe not counted by scc.

So when I try to compare scc totals on lines to the output of this command I am getting vastly different total lines for my whole code base versus scc:

find . -type f -exec wc -l {} \; | awk '{ SUM += $0} END { print SUM }'

Do you use a sample repository where the above line count should be very closed to total lines by scc and where du -sh would show the same as total bytes because all the code files in the repo are supported languages?

This is a great start. I need to find out if tools that charge by MB are for sure adding up lines versus files.

boyter · 2020-08-06T03:51:12Z

I don't think scc has .haml files :) if you can point out a spec i'll add it though.

this seems pretty good actually https://haml.info/docs/yardoc/file.REFERENCE.html

Generally any repository will not match because of the .git folder though. The values scc spits out though are based on the bytes it actually read IE it opened the file and read that many bytes, so it should be 100% accurate for what it processed.

boyter · 2020-08-06T05:01:17Z

HAML support added in to master. Try running again.

DarwinJS · 2020-08-06T12:24:25Z

This is great thanks!

It would be great to get MBs in stdout. It is quite common for all of us to suffer from "what you see is all there is" (WYSIATI) - so that default display can play a big role in whether folks ever understand that you have the useful feature of counting MBs.

What do you think about dropping one or more of the less useful columns for MBs and require a command line to swap back? So display MBs instead of Raw "Lines" (just picked it because some other common CLOC utils don't have that) and then a switch to get raw lines back?

I have a workaround in the form of using w3m (13MB) for the console output with this (assumes alpine go lang container):

apk update; apk add w3m
time scc . --not-match .*md -f html -o loc.html #Many CI systems can show this kind of artifact in browser.
time scc . --not-match .*md -f json -o loc.json #For futher data analysis.
w3m -dump loc.html
echo "Total MBs of files in checkout:"
du -sh

DarwinJS · 2020-08-06T12:47:34Z

An unexpected benefit of adding HAML is that the counts are nearly exact with CLOC! This updated version of the article: - Notes how to use CLOC to compare (and notes the up to 3600% longer run time) - implements w3m to dump the html output to the screen - Notes the new bytes count and how it would be so much harder to do by file exclusions for sparse-checkout - that this is likely the only CLOC utility that does this. Thanks for working with me on this! D.

…

On Thu, Aug 6, 2020 at 1:01 AM Ben Boyter ***@***.***> wrote: HAML support added in to master. Try running again. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#183 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACYPLBW4UZUMIZK2SXZPAN3R7I2KTANCNFSM4PNSJYNA> .

boyter · 2020-08-07T00:11:53Z

So adding the MB's is totally possible to the normal output, assuming you don't want it per type, so something like

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Go                          34      7793     1277       329     6187       1291
───────────────────────────────────────────────────────────────────────────────
Total                       34      7793     1277       329     6187       1291
───────────────────────────────────────────────────────────────────────────────
Processed 156232 bytes == 0.156232 Megabytes == 0.000156232 Gigabytes
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $183,083
Estimated Schedule Effort 8.048178 months
Estimated People Required 2.694676
───────────────────────────────────────────────────────────────────────────────

Might do the trick, although maybe just the byte to megabyte conversion is all thats needed. Glad to hear HAML solved a bunch of issues there. Of course there is also the question as to what version of megabyte to use there, although I suspect that could be solved with another flag.

boyter · 2020-08-07T01:00:53Z

$ scc -i go
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Go                          34      7800     1278       329     6193       1291
───────────────────────────────────────────────────────────────────────────────
Total                       34      7800     1278       329     6193       1291
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $183,270
Estimated Schedule Effort 8.051291 months
Estimated People Required 2.696377
───────────────────────────────────────────────────────────────────────────────
Processed 313926 bytes, 0.314 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

A quick implementation preview.

boyter · 2020-08-07T01:20:25Z

Try having a look at whats in master where it has this as an output for you.

I think I want to make the SI be optional and you can choose either SI or 1024 as the division for KB + add in these https://xkcd.com/394/

boyter · 2020-08-07T03:29:25Z

OK only took a little bit of effort so now you can change the type from SI to binary to mixed, and of course all the XKCD ones :)

DarwinJS · 2020-08-07T10:03:55Z

Nice - this shows that it can do MBs on default output!

What do you think about a note under the MBs count that says "For bytes count per language, use a data output format."

boyter · 2020-08-09T23:00:33Z

I think that might be better off covered in the documentation personally. I don't think that cluttering the stdout for this is acceptable, as I would hope anyone looking into integrations is checking the options anyway.

boyter added the enhancement New feature or request label Aug 2, 2020

boyter closed this as completed Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Code Size in MB? #183

Support for Code Size in MB? #183

DarwinJS commented Jul 30, 2020

DarwinJS commented Aug 1, 2020 •

edited

Loading

DarwinJS commented Aug 1, 2020

boyter commented Aug 2, 2020

DarwinJS commented Aug 3, 2020

boyter commented Aug 3, 2020

DarwinJS commented Aug 4, 2020

boyter commented Aug 5, 2020

boyter commented Aug 5, 2020

boyter commented Aug 6, 2020

boyter commented Aug 6, 2020

DarwinJS commented Aug 6, 2020 •

edited

Loading

boyter commented Aug 6, 2020 •

edited

Loading

boyter commented Aug 6, 2020

DarwinJS commented Aug 6, 2020 •

edited

Loading

DarwinJS commented Aug 6, 2020 via email

boyter commented Aug 7, 2020 •

edited

Loading

boyter commented Aug 7, 2020

boyter commented Aug 7, 2020 •

edited

Loading

boyter commented Aug 7, 2020

DarwinJS commented Aug 7, 2020

boyter commented Aug 9, 2020

Support for Code Size in MB? #183

Support for Code Size in MB? #183

Comments

DarwinJS commented Jul 30, 2020

DarwinJS commented Aug 1, 2020 • edited Loading

DarwinJS commented Aug 1, 2020

boyter commented Aug 2, 2020

DarwinJS commented Aug 3, 2020

boyter commented Aug 3, 2020

DarwinJS commented Aug 4, 2020

boyter commented Aug 5, 2020

boyter commented Aug 5, 2020

boyter commented Aug 6, 2020

boyter commented Aug 6, 2020

DarwinJS commented Aug 6, 2020 • edited Loading

boyter commented Aug 6, 2020 • edited Loading

boyter commented Aug 6, 2020

DarwinJS commented Aug 6, 2020 • edited Loading

DarwinJS commented Aug 6, 2020 via email

boyter commented Aug 7, 2020 • edited Loading

boyter commented Aug 7, 2020

boyter commented Aug 7, 2020 • edited Loading

boyter commented Aug 7, 2020

DarwinJS commented Aug 7, 2020

boyter commented Aug 9, 2020

DarwinJS commented Aug 1, 2020 •

edited

Loading

DarwinJS commented Aug 6, 2020 •

edited

Loading

boyter commented Aug 6, 2020 •

edited

Loading

DarwinJS commented Aug 6, 2020 •

edited

Loading

boyter commented Aug 7, 2020 •

edited

Loading

boyter commented Aug 7, 2020 •

edited

Loading