De-duplicate environment CSV generation code so that we can have CI/CD information generation in one place #11134

tt-rkim · 2024-08-06T20:50:43Z

We currently generate the environment.csv for a benchmark run in the same job as the actual benchmark run. However, it uses duplicate code because we were trying to get up and running.

We should work to de-duplicating this, which includes:

producing it one place
uploading it one place, which may involve moving artifacts around

cc: @TT-billteng @skhorasganiTT @uaydonat

The text was updated successfully, but these errors were encountered:

skhorasganiTT · 2024-08-06T21:42:36Z

@tt-rkim Which duplicate code are you referring to?

tt-rkim · 2024-08-07T13:56:03Z

Because there was such a rush from the teams to get this up and running and there were oddities with getting this information within the running job, I made the decision to make some duplicate code. We generate a separate ci/cd environment CSV in a separate post-processing job as well. So that's the duplication.

It's not code you wrote, but we believe models team should own everything generated within the demos pipeline, as the original proposal for this information came from model's team.

skhorasganiTT · 2024-08-07T14:18:58Z

It was my understanding that the generation of the run and measurement CSVs would be owned by the models team while the generation of the environment CSV (which is intentionally separate due to its information being specific to the CI environment) would be owned by the infra team. @uaydonat is this correct?
For the duplicate code, the csv saving function used for run/measurements here can be re-used, but besides that the duplication seems CI specific.

tt-rkim · 2024-08-07T14:21:03Z

Sounds good, but from now on anything to do with benchmarking schemas or extra data generation for benchmarking is to be owned by models team, including any additions to CI information.

tt-rkim · 2024-08-21T16:04:29Z

@FrancesssZ has confirmed that we need to upload the files at the same time for the processor to not see NULLs.

We may need to move the upload to the produce data workflow in this case.

tt-rkim added infra-ci infrastructure and/or CI changes P2 labels Aug 6, 2024

tt-rkim assigned tt-rkim and skhorasganiTT Aug 6, 2024

tt-rkim mentioned this issue Aug 6, 2024

[Master issue] Data pipeline and benchmarking infrastructure #10718

Open

39 tasks

tt-rkim added the data-collection collab ticket with data science team label Aug 21, 2024

This was referenced Aug 26, 2024

modify keys within device_info #11852

Merged

#11334: Remove unnecessary code for previous ci/cd csvs #11898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-duplicate environment CSV generation code so that we can have CI/CD information generation in one place #11134

De-duplicate environment CSV generation code so that we can have CI/CD information generation in one place #11134

tt-rkim commented Aug 6, 2024

skhorasganiTT commented Aug 6, 2024

tt-rkim commented Aug 7, 2024

skhorasganiTT commented Aug 7, 2024

tt-rkim commented Aug 7, 2024

tt-rkim commented Aug 21, 2024

De-duplicate environment CSV generation code so that we can have CI/CD information generation in one place #11134

De-duplicate environment CSV generation code so that we can have CI/CD information generation in one place #11134

Comments

tt-rkim commented Aug 6, 2024

skhorasganiTT commented Aug 6, 2024

tt-rkim commented Aug 7, 2024

skhorasganiTT commented Aug 7, 2024

tt-rkim commented Aug 7, 2024

tt-rkim commented Aug 21, 2024