-
-
Notifications
You must be signed in to change notification settings - Fork 255
-
-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Code Size in MB? #183
Comments
I'm not a go expert, but in looking through the code I think a first iteration on just the counting part might be:
I don't understand if summing location will be roughly equivalent to bytes or if a conversion is needed. |
Forgot to mention that the GitLab CI/CD plugin example was roughed out last week: https://gitlab.com/guided-explorations/ci-cd-plugin-extensions/ci-cd-plugin-extension-scc |
Lot to go though there! So the number of bytes is already there, so assuming they charge even for lines that are comments that should allow you to get the result you want from this. If however they only charge per code lines there would need to be some changes. Would need to know this before knowing which one to do. In either case there might need to be a modified output to give you this value. Certainly something that seems useful and worth added in. BTW thanks for writing that plugin, and I have added a link on the main page https://github.com/boyter/scc/#interesting-use-cases for others to get the benefit. |
@boyter - I just looked for the most well known scanner that counts MBs. It was actually not easy to find the information, but a FAQ answer indicated that they only charge for the code part of files. My guess is that most others who charge just for lines of code will be careful not to count non-code lines since there will be concern about cost. I had thought of this and maybe the following might be a path forward:
For the one vendor, part of the reason for charging for MBs is that they also do static analysis of built binaries. |
Are you looking to consume this information through the JSON output or some other one? Because it's certainly easier to add this information to those than to the default stdout. Although, there is the COCOMO stats, so it might be worth adding another section below, which would be a good place to include things like #177 |
It would be great to have it in all. HTML displays the smoothest when it is uploaded to GitLab artifacts storage. Json would be grat for further programmatic analysis. FYI I just submitted a requested article to Acloud.guru's guest blog that includes scc. They have over 2 millions subscribers ;) |
Yeah fair enough. Adding to the HTML and JSON isn't so hard but the stdout is a bit more problematic. Ill have a look at implementing though. Oh that would be neat :) let me know if it makes it out and ill be sure to share as widely as I can. |
So you should now be able to get the bytes per file and rollups for all the usual suspects, JSON, HTML, CSV For the moment this is only on a branch https://github.com/boyter/scc/compare/Issue183 and does not include byte counts just for code (its everything). I'll be adding the code ones soon which will just be the byte count of anything not considered a comment on the same branch and then merge in when its looking good. |
@DarwinJS does the output for what is there work for your use case? Its not the count of the code itself, just the file, but should give you a reasonable idea... Im inclined to add the latter part after getting this release out due to how much work it turns out to be. |
This has been merged in as is into master. So if you build from there you should get the the byte count out of JSON/CSV and HTML. |
Thank you! Do files that have two code types get counted in both? I am having a challenge judging the accuracy - I've been using the www-gitlab-com repo - but I see it has .haml files and some other formats that are text, but maybe not counted by scc. So when I try to compare scc totals on lines to the output of this command I am getting vastly different total lines for my whole code base versus scc:
Do you use a sample repository where the above line count should be very closed to total lines by scc and where This is a great start. I need to find out if tools that charge by MB are for sure adding up lines versus files. |
I don't think scc has .haml files :) if you can point out a spec i'll add it though. this seems pretty good actually https://haml.info/docs/yardoc/file.REFERENCE.html Generally any repository will not match because of the .git folder though. The values scc spits out though are based on the bytes it actually read IE it opened the file and read that many bytes, so it should be 100% accurate for what it processed. |
HAML support added in to master. Try running again. |
This is great thanks! It would be great to get MBs in stdout. It is quite common for all of us to suffer from "what you see is all there is" (WYSIATI) - so that default display can play a big role in whether folks ever understand that you have the useful feature of counting MBs. What do you think about dropping one or more of the less useful columns for MBs and require a command line to swap back? So display MBs instead of Raw "Lines" (just picked it because some other common CLOC utils don't have that) and then a switch to get raw lines back? I have a workaround in the form of using w3m (13MB) for the console output with this (assumes alpine go lang container):
|
An unexpected benefit of adding HAML is that the counts are nearly exact
with CLOC!
This updated version of the article:
- Notes how to use CLOC to compare (and notes the up to 3600% longer run
time)
- implements w3m to dump the html output to the screen
- Notes the new bytes count and how it would be so much harder to do by
file exclusions for sparse-checkout
- that this is likely the only CLOC utility that does this.
Thanks for working with me on this!
D.
…On Thu, Aug 6, 2020 at 1:01 AM Ben Boyter ***@***.***> wrote:
HAML support added in to master. Try running again.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#183 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYPLBW4UZUMIZK2SXZPAN3R7I2KTANCNFSM4PNSJYNA>
.
|
So adding the MB's is totally possible to the normal output, assuming you don't want it per type, so something like
Might do the trick, although maybe just the byte to megabyte conversion is all thats needed. Glad to hear HAML solved a bunch of issues there. Of course there is also the question as to what version of megabyte to use there, although I suspect that could be solved with another flag. |
A quick implementation preview. |
Try having a look at whats in master where it has this as an output for you. I think I want to make the SI be optional and you can choose either SI or 1024 as the division for KB + add in these https://xkcd.com/394/ |
OK only took a little bit of effort so now you can change the type from SI to binary to mixed, and of course all the XKCD ones :) |
Nice - this shows that it can do MBs on default output! What do you think about a note under the MBs count that says "For bytes count per language, use a data output format." |
I think that might be better off covered in the documentation personally. I don't think that cluttering the stdout for this is acceptable, as I would hope anyone looking into integrations is checking the options anyway. |
Some commercial security scanning tools now charge by file volume in MB (and some charge by lines).
I would like to use this tool in CI to do an assessment of possible costs to scan my code in given security tools.
I am wondering if that measurement could be taken and reported as well since this engine is already iterating through the code for purposes of counting?
I was also thinking of building a CI plugin around this similar to these: https://gitlab.com/guided-explorations/ci-cd-plugin-extensions.
The text was updated successfully, but these errors were encountered: