From 3ef1249b7f50a250c02c568342e0aea6638fc5a7 Mon Sep 17 00:00:00 2001
From: Arjun Suresh <arjunsuresh1987@gmail.com>
Date: Tue, 1 Oct 2024 17:45:41 +0100
Subject: [PATCH] Fix docs (#1853)

* Support batch-size in llama2 run

* Add Rclone-Cloudflare download instructions to README.md

* Add Rclone-Cloudflare download instructiosn to README.md

* Minor wording edit to README.md

* Add Rclone-Cloudflare download instructions to README.md

* Add Rclone-GDrive download instructions to README.md

* Add new and old instructions to README.md

* Tweak language in README.md

* Language tweak in README.md

* Minor language tweak in README.md

* Fix typo in README.md

* Count error when logging errors: submission_checker.py

* Fixes #1648, restrict loadgen uncommitted error message to within the loadgen directory

* Update test-rnnt.yml (#1688)

Stopping the github action for rnnt

* Added docs init

Added github action for website publish

Update benchmark documentation

Update publish.yaml

Update publish.yaml

Update benchmark documentation

Improved the submission documentation

Fix taskname

Removed unused images

* Fix benchmark URLs

* Fix links

* Add _full variation to run commands

* Added script flow diagram

* Added docker setup command for CM, extra run options

* Added support for docker options in the docs

* Added --quiet to the CM run_cmds in docs

* Fix the test query count for cm commands

* Support ctuning-cpp implementation

* Added commands for mobilenet models

* Docs cleanup

* Docs cleanup

* Added separate files for dataset and models in the docs

* Remove redundant tab in the docs

* Fixes some WIP models in the docs

* Use the official docs page for CM installation

* Fix the deadlink in docs

* Fix indendation issue in docs

* Added dockerinfo for nvidia implementation

* Added run options for gptj

* Added execution environment tabs

* Cleanup of the docs

* Cleanup of the docs

* Reordered the sections of the docs page

* Removed an unnecessary heading in the docs

* Fixes the commands for datacenter

* Fix the build --sdist for loadgen

* Fixes #1761, llama2 and mixtral runtime error on CPU systems

* Added mixtral to the benchmark list, improved benchmark docs

* Update docs for MLPerf inference v4.1

* Update docs for MLPerf inference v4.1

* Fix typo

* Gave direct link to implementation readmes

* Added tables detailing implementations

* Update vision README.md, split the frameworks into separate rows

* Update README.md

* pointed links to specific frameworks

* pointed links to specific frameworks

* Update Submission_Guidelines.md

* Update Submission_Guidelines.md

* Update Submission_Guidelines.md

* api support llama2

* Added request module and reduced max token len

* Fix for llama2 api server

* Update SUT_API offline to work for OpenAI

* Update SUT_API.py

* Minor fixes

* Fix json import in SUT_API.py

* Fix llama2 token length

* Added model name verification with server

* clean temp files

* support num_workers in LLAMA2 SUTs

* Remove batching from Offline SUT_API.py

* Update SUT_API.py

* Minor fixes for llama2 API

* Fix for llama2 API

* removed table of contents

* enabled llama2-nvidia + vllm-NM : WIP

* enabled dlrm for intel

* lower cased implementation

* added raw data input

* corrected data download commands

* renamed filename

* changes for bert and vllm

* documentation to work on custom repo and branch

* benchmark index page update

* enabled sdxl for nvidia and intel

* updated vllm server run cmd

* benchmark page information addition

* fix indendation issue

* Added submission categories

* update submission page - generate submission with or w/o using CM for benchmarking

* Updated kits dataset documentation

* Updated model parameters

* updation of information

* updated non cm based benchmark

* added info about hf password

* added links to model and access tokens

* Updated reference results structuree tree

* submission docs cleanup

* Some cleanups for benchmark info

* Some cleanups for benchmark info

* Some cleanups for benchmark info

* added generic stubs deepsparse

* Some cleanups for benchmark info

* Some cleanups for benchmark info

* Some cleanups for benchmark info

* Some cleanups for benchmark info (FID and CLIP data added)

* typo fix for bert deepsparse framework

* added min system requirements for models

* fixed code version

* changes for displaying reference and intel implementation tip

* added reference to installation page

* updated neural magic documentation

* Added links to the install page, redirect benchmarks page

* added tips about batch size and dataset for nvidia llama2

* fix conditions logic

* modified tips and additional run cmds

* sentence corrections

* Minor fix for the documentation

* fixed bug in deepsparse generic model stubs + styling

* added more information to stubs

* Added SCC24 readme, support reproducibility in the docs

* Made clear the custom CM repo URL format

* Support conditional implementation, setup and run tips

* Support rocm for sdxl

* Fix _short tag support

* Fix install URL

* Expose bfloat16 and float16 options for sdxl

* Expose download model to host option for sdxl

* IndySCC24 documentation added

* Improve the SCC24 docs

* Improve the support of short variation

* Improved the indyscc24 documentation

* Updated scc run commands

* removed test_query_count option for scc

* Remove scc24 in the main docs

* Remove scc24 in the main docs

* Fix docs: indendation issue on the submission page

* generalised code for skipping test query count

* Fixes for SCC24 docs

* Fix scenario text in main.py

* Fix links for scc24

* Fix links for scc24

* Improve the general docs

* Fix links for scc24

* Use float16 in scc24 doc

* Improve scc24 docs

* Improve scc24 docs

* Use float16 in scc24 doc

* fixed command bug

* Fix typo in docs

* Fix typo in docs

* Remove unnecessary indendation in docs

* initial commit for tip - native run CUDA

* Updated tip

---------

Co-authored-by: Nathan Wasson <nathanw@mlcommons.org>
Co-authored-by: anandhu-eng <anandhukicks@gmail.com>
Co-authored-by: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
---
 docs/install/index.md | 4 ++--
 main.py               | 6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/docs/install/index.md b/docs/install/index.md
index 195521c7e..1750d86e4 100644
--- a/docs/install/index.md
+++ b/docs/install/index.md
@@ -12,8 +12,8 @@ CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If an
 This step is not mandatory as CM can use separate virtual environment for MLPerf inference. But the latest `pip` install requires this or else will need the `--break-system-packages` flag while installing `cm4mlops`.
 
 ```bash
-   python3 -m venv cm
-   source cm/bin/activate
+python3 -m venv cm
+source cm/bin/activate
 ```
 
 ## Install CM and pulls any needed repositories
diff --git a/main.py b/main.py
index aa8dd769e..6a607cc10 100755
--- a/main.py
+++ b/main.py
@@ -140,6 +140,9 @@ def mlperf_inference_implementation_readme(spaces, model, implementation, *, imp
                         # ref to cm installation
                         content += f"{cur_space3}Please refer to the [installation page](site:inference/install/) to install CM for running the automated benchmark commands.\n\n"
                         test_query_count=get_test_query_count(model, implementation, device.lower())
+                        if device.lower() == "cuda" and execution_env.lower() == "native":
+                            content += f"\n{cur_space3}!!! tip\n\n"
+                            content += f"{cur_space3}    - It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.\n\n"
 
                         if "99.9" not in model: #not showing docker command as it is already done for the 99% variant
                             if implementation == "neuralmagic":
@@ -442,7 +445,8 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ
             if "short" in extra_variation_tags:
                 full_ds_needed_tag = ""
             else:
-                full_ds_needed_tag = ",_full"
+                full_ds_needed_tag = "_full,"
+
 
             docker_setup_cmd = f"""\n
 {f_pre_space}```bash