Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to use csv, xml or yaml file format #19

Closed
3 tasks done
totocaca123 opened this issue Aug 20, 2022 · 11 comments
Closed
3 tasks done

Add the ability to use csv, xml or yaml file format #19

totocaca123 opened this issue Aug 20, 2022 · 11 comments
Assignees
Labels
feature New features

Comments

@totocaca123
Copy link
Collaborator

totocaca123 commented Aug 20, 2022

What feature do you want to see added?

Support using csv, xml or yaml file format

  • csv

  • xml

  • yaml/yml

  • would be good but seems that file format has to be basic

Upstream changes

No response

@totocaca123 totocaca123 added the enhancement Enhancement of existing functionality label Aug 20, 2022
@simonsymhoven
Copy link
Member

Hi @totocaca123,
i actually already had the idea on the screen. I'll give it some thought and see how I can implement it. If you have ideas, I'm happy about your input!

@simonsymhoven simonsymhoven self-assigned this Aug 21, 2022
@totocaca123
Copy link
Collaborator Author

or yaml or xml file, these files are quiet similar to json (so a conversion into json would be the simpliest way).

For csv/excel, the file format is not clear. This format makes impossible to have multi level. So the simpliest way would be a 1 level chart. Each line is an item and eah column is a field

@simonsymhoven simonsymhoven added feature New features and removed enhancement Enhancement of existing functionality labels Sep 5, 2022
@simonsymhoven
Copy link
Member

simonsymhoven commented Sep 5, 2022

I added yaml support with #35.

At the moment I have problems with xml conversion.

@simonsymhoven
Copy link
Member

implement xml file support with #62

@simonsymhoven
Copy link
Member

@totocaca123
I also implement csv support with #63. But I see a few problem here with this new feature.

In a cdv file I can not add an id and name on top level for the report:

id,name,incorrect,manually,accurate
aktie,Aktie,3441,348,5199
derivat,Derivat,593,9028,1008

so actually I calculate the id as hashcode of the filename and take the filename as name.

At this point, I think it makes sense to completely rebuild the API and set it up like in the warnings-ng plugin, that you have multiple report providers, one for csv with id, name, delimiter, encoding. One for json with id, name, encoding. Etc, something like this:

publishReport enabledForFailure: true, aggregateReports: false, provider: csv(pattern: '**/*.csv', id: 'my-csv-report', name: 'My CSV report', delimiter: ',', encoding: 'UTF-8')
publishReport enabledForFailure: true, aggregateReports: true, provider: json(pattern: '**/*.json', id: 'my-json-report', name: 'My JSON report', encoding: 'UTF-8')
publishReport enabledForFailure: true, aggregateReports: true, provider: yaml(pattern: '**/*.yaml', id: 'my-yaml-report', name: 'My YAML report', encoding: 'UTF-8')
publishReport enabledForFailure: true, aggregateReports: true, provider: xml(pattern: '**/*.xml', id: 'my-xml-report', name: 'My XML report', encoding: 'UTF-8')

What do you think about this?

@totocaca123
Copy link
Collaborator Author

Yes I agree because CSV file should be simple (columns...)
The ID seems useless (user must put only report name). User just want to set the report name (unique). Maybe ID might be an internal field not visible for user.

So by removing ID and name, the publish report shall create 1 report with all found files (automatically merge data).
If user want 2/3 reports, user have to call 2/3 times publishReport with regular expression to filter datasets. It seems more logical for user.

@totocaca123
Copy link
Collaborator Author

For csv file, the most simple solution is to have first column with fields and other columns for data (numbers).
Is it possible to autodetect if a csv column is text or numbers. In this case, a multi level csv could be done:

For example to support csv like this with auto detection of numbers

image

@simonsymhoven
Copy link
Member

But then the user have to make sure, that the dataset (1 to n files in pattern) have all the same attributes. Otherwise I can not merge it.

So you prefer, that publishReport exactly publish 1 report (as long as id and data structure is the same, otherwise for each id a report will be published). For this report the user have to specify a name and a id, because I need that id assign the reports. But the id must be assigned only for csv reports in the step, because I cannot specify one in the csv file. For all other reports, the user can specify the id in the report files itself.

And in each publishReport step all given reports are automatically merged to one as long as the data structure and the ids are the same, otherwise I won't add these reports to the build, so to sum up the api:

publishReport enabledForFailure: true, provider: csv(pattern: '**/*.csv', id: 'my-csv-report', name: 'My CSV report', delimiter: ',', encoding: 'UTF-8')
publishReport enabledForFailure: true, provider: json(pattern: '**/*.json', name: 'My JSON report', encoding: 'UTF-8')
publishReport enabledForFailure: true, provider: yaml(pattern: '**/*.yaml', name: 'My YAML report', encoding: 'UTF-8')
publishReport enabledForFailure: true, provider: xml(pattern: '**/*.xml', name: 'My XML report', encoding: 'UTF-8')

With your explanations about the csv we can also support multilevel csv files, that's great, thanks!

@totocaca123
Copy link
Collaborator Author

For me the simpliest way is to merge all data from 1 command publishReport (as it is done in warning ng plugin).
The name could be set in the pipeline and ID is not relevant
If user provide unconsistent data, the best way is to assume all data are present, and if a column is missing, we assume data is set to 0 if not present.

From my point of view, numbers or report is related to number of call of publishReport.

If ID is used, it forces users to create/organize reports inside his data, and number of reports is controlled by json files. I think numbers of reports shall be set by numbers of call. I assume user provide consistent data. In case of unconsistent data, we add numbers of 0 for missing data (with warning message). User has then to improve his data or split data to create 2 reports

@simonsymhoven
Copy link
Member

Thanks for your input, I will try something out. But I need an id, otherwise I am not bale to create a history for the history charts at the moment. I will try to implement the new api and have look if there is another way to get rid of the id.

@totocaca123
Copy link
Collaborator Author

For the ID, it could be a SHA of report name.
But if user change report name...
If user change report name, do we assume data are lost?

The ID could be usefull in pipeline (optional or mandatory?) if user want to be free to change report name without loosing his data.
So ID is the tag of data and display name can evolve...
That make sense and name field could be change to reportName (or reportDisplayName) and ID as reportId (unique ID)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features
Projects
None yet
Development

No branches or pull requests

2 participants