Multiple Output Formats for CI #184

DarwinJS · 2020-08-05T11:36:59Z

I find that CI engineers have up to 3 preferences for visualizing test results.

Seeing them in the CI log - quick and easy while reviewing logs.
Seeing them visually from the CI UI.
Having a data format that is parsable for further machine processing.

Many testing utilities do not conceive of this and only provide one output for a given run.

In some cases, I have run the utility multiple times to get the desired outputs - but this only works if the utility is not performing extensive work.

It would be great if scc:

Always did console output unless I suppress it explicitly (not automatically when using -o)
If I could specify multiple additional output formats - at least two - one for visual inspection and one for data. For example HTML and JSON. (in GitLab and many other CI systems, HTML artifacts are easy to view "in place" without downloading the file)
If console output suppression were optional and I could do two other formats, I don't think it would need to support more than two formats. Maybe "visual output" and "data output" ? This could simplify arguments by adding only one more set for another output type.

boyter · 2020-08-05T22:59:10Z

Probably the easiest way to achieve this would be to run scc a few times for each output... the main run cost is on that first run where it pulls things off disk into memory. What exactly are you doing where this would not be viable?

DarwinJS · 2020-08-05T23:08:35Z

Yes, but if I pull in 200-500 repositories to run it on - running it a couple times could be very expensive. One repo at a time it is blazing fast - but I am thinking in terms of a company assessing all of their code.

boyter · 2020-08-05T23:14:22Z

Even then it shouldn't be too bad... even on a repository the size of the linux kernel it runs in under a second once the disk cache is warm...

Just feels like a very niche edge case to add code for... i'm still thinking about what the actual cost would be to me from a maintenance point of view. Multiple output formats means there would need to be support for potentially multiple output files which is going to complicate the command line parsing as well.

DarwinJS · 2020-08-06T02:19:47Z

Yes, I understand the need for constraints - so limiting it to what seems to be common hopes. When a single run of a tool gives me log output, visually formatted output file and a data format file I feel like it is well designed. If it allows me to specify 5 more outputs than that - it seems like more than I would ever need.

I guess when I have to do something like reprocess the exact same data multiple times to handle these concerns - I get the opposite feeling - like it is a common and reasonable expectation to handle these concerns with one data processing pass. Since I've found myself doing this more than once for other tools that conceive of only one output file / format per run - it makes me think that CI automation generally presents a special case compared to the use case of a human running a tool because automation can be very scaled and then it also generally needs multiple formats to handle the various concerns mentioned above.

I guess in my world I don't feel the perspectives of scaled CI automation are niche :)

You asked about an environment where viability is a concern - many CI systems use containers for running jobs and in some cases - like CI as a Service - the user does not have control over memory size or memory caching behaviors.

I guess I was thinking that at the end of processing data, scc ends up having all the data in variables or data structures and that multiple outputs would just be calling more than one output function. If it stores the data with the reporting format in mind, I can see that would be a lot more coding and maintenance.

boyter · 2020-08-06T23:13:02Z

Had a look through some of my older pipelines and indeed I did do a lot of multiple calls over scc. The output was done very quickly though... however i'm all for saving some CPU where possible so looks like ill be doing this. Having scc focused on CI pipelines is a reasonable use case for it.

The catch being all the edge cases. It's not so much a problem with the output formatting, so much as getting the command line arguments right for it. The reason being you want to mix outputs, so some go straight to stdout and others to files.

I think something like,

scc --format-multi "tabular:stdout,csv:file.csv,json:file.json"

Might work for this, where you have to explicitly say where you want each output to appear might be the best approach here.

DarwinJS · 2020-08-07T09:59:34Z

Nice! Thanks for considering it!

boyter · 2020-08-19T02:26:03Z

Sitting in master.

$ scc --format-multi "tabular:stdout,html:stdout,csv:stdout" main.go
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Go                           1       272        7         6      259          4
───────────────────────────────────────────────────────────────────────────────
Total                        1       272        7         6      259          4
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $6,539
Estimated Schedule Effort 2.268839 months
Estimated People Required 0.341437
───────────────────────────────────────────────────────────────────────────────
Processed 5674 bytes, 0.006 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

<html lang="en"><head><meta charset="utf-8" /><title>scc html output</title><style>table { border-collapse: collapse; }td, th { border: 1px solid #999; padding: 0.5rem; text-align: left;}</style></head><body><table id="scc-table">
	<thead><tr>
		<th>Language</th>
		<th>Files</th>
		<th>Lines</th>
		<th>Blank</th>
		<th>Comment</th>
		<th>Code</th>
		<th>Complexity</th>
		<th>Bytes</th>
	</tr></thead>
	<tbody><tr>
		<th>Go</th>
		<th>1</th>
		<th>272</th>
		<th>7</th>
		<th>6</th>
		<th>259</th>
		<th>4</th>
		<th>5674</th>
	</tr></tbody>
	<tfoot><tr>
		<th>Total</th>
		<th>1</th>
		<th>272</th>
		<th>7</th>
		<th>6</th>
		<th>259</th>
		<th>4</th>
    	<th>5674</th>
	</tr></tfoot>
	</table></body></html>
Language,Location,Filename,Lines,Code,Comments,Blanks,Complexity,Bytes
Go,main.go,main.go,272,259,6,7,4,5674

DarwinJS · 2020-08-19T09:40:30Z

This is really awesome, now if someone is trying to do this on hundreds of repositories at once it will be as fast as it possibly can be. Thanks so much!

boyter · 2020-08-19T22:38:59Z

No worries. Ill call it the save the earth feature since it should burn less CPU and hence energy :)

boyter added the enhancement New feature or request label Aug 5, 2020

boyter self-assigned this Aug 6, 2020

boyter closed this as completed Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Output Formats for CI #184

Multiple Output Formats for CI #184

DarwinJS commented Aug 5, 2020

boyter commented Aug 5, 2020

DarwinJS commented Aug 5, 2020

boyter commented Aug 5, 2020

DarwinJS commented Aug 6, 2020

boyter commented Aug 6, 2020

DarwinJS commented Aug 7, 2020

boyter commented Aug 19, 2020

DarwinJS commented Aug 19, 2020

boyter commented Aug 19, 2020