From 3ef1249b7f50a250c02c568342e0aea6638fc5a7 Mon Sep 17 00:00:00 2001 From: Arjun Suresh Date: Tue, 1 Oct 2024 17:45:41 +0100 Subject: [PATCH] Fix docs (#1853) * Support batch-size in llama2 run * Add Rclone-Cloudflare download instructions to README.md * Add Rclone-Cloudflare download instructiosn to README.md * Minor wording edit to README.md * Add Rclone-Cloudflare download instructions to README.md * Add Rclone-GDrive download instructions to README.md * Add new and old instructions to README.md * Tweak language in README.md * Language tweak in README.md * Minor language tweak in README.md * Fix typo in README.md * Count error when logging errors: submission_checker.py * Fixes #1648, restrict loadgen uncommitted error message to within the loadgen directory * Update test-rnnt.yml (#1688) Stopping the github action for rnnt * Added docs init Added github action for website publish Update benchmark documentation Update publish.yaml Update publish.yaml Update benchmark documentation Improved the submission documentation Fix taskname Removed unused images * Fix benchmark URLs * Fix links * Add _full variation to run commands * Added script flow diagram * Added docker setup command for CM, extra run options * Added support for docker options in the docs * Added --quiet to the CM run_cmds in docs * Fix the test query count for cm commands * Support ctuning-cpp implementation * Added commands for mobilenet models * Docs cleanup * Docs cleanup * Added separate files for dataset and models in the docs * Remove redundant tab in the docs * Fixes some WIP models in the docs * Use the official docs page for CM installation * Fix the deadlink in docs * Fix indendation issue in docs * Added dockerinfo for nvidia implementation * Added run options for gptj * Added execution environment tabs * Cleanup of the docs * Cleanup of the docs * Reordered the sections of the docs page * Removed an unnecessary heading in the docs * Fixes the commands for datacenter * Fix the build --sdist for loadgen * Fixes #1761, llama2 and mixtral runtime error on CPU systems * Added mixtral to the benchmark list, improved benchmark docs * Update docs for MLPerf inference v4.1 * Update docs for MLPerf inference v4.1 * Fix typo * Gave direct link to implementation readmes * Added tables detailing implementations * Update vision README.md, split the frameworks into separate rows * Update README.md * pointed links to specific frameworks * pointed links to specific frameworks * Update Submission_Guidelines.md * Update Submission_Guidelines.md * Update Submission_Guidelines.md * api support llama2 * Added request module and reduced max token len * Fix for llama2 api server * Update SUT_API offline to work for OpenAI * Update SUT_API.py * Minor fixes * Fix json import in SUT_API.py * Fix llama2 token length * Added model name verification with server * clean temp files * support num_workers in LLAMA2 SUTs * Remove batching from Offline SUT_API.py * Update SUT_API.py * Minor fixes for llama2 API * Fix for llama2 API * removed table of contents * enabled llama2-nvidia + vllm-NM : WIP * enabled dlrm for intel * lower cased implementation * added raw data input * corrected data download commands * renamed filename * changes for bert and vllm * documentation to work on custom repo and branch * benchmark index page update * enabled sdxl for nvidia and intel * updated vllm server run cmd * benchmark page information addition * fix indendation issue * Added submission categories * update submission page - generate submission with or w/o using CM for benchmarking * Updated kits dataset documentation * Updated model parameters * updation of information * updated non cm based benchmark * added info about hf password * added links to model and access tokens * Updated reference results structuree tree * submission docs cleanup * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info * added generic stubs deepsparse * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info * Some cleanups for benchmark info (FID and CLIP data added) * typo fix for bert deepsparse framework * added min system requirements for models * fixed code version * changes for displaying reference and intel implementation tip * added reference to installation page * updated neural magic documentation * Added links to the install page, redirect benchmarks page * added tips about batch size and dataset for nvidia llama2 * fix conditions logic * modified tips and additional run cmds * sentence corrections * Minor fix for the documentation * fixed bug in deepsparse generic model stubs + styling * added more information to stubs * Added SCC24 readme, support reproducibility in the docs * Made clear the custom CM repo URL format * Support conditional implementation, setup and run tips * Support rocm for sdxl * Fix _short tag support * Fix install URL * Expose bfloat16 and float16 options for sdxl * Expose download model to host option for sdxl * IndySCC24 documentation added * Improve the SCC24 docs * Improve the support of short variation * Improved the indyscc24 documentation * Updated scc run commands * removed test_query_count option for scc * Remove scc24 in the main docs * Remove scc24 in the main docs * Fix docs: indendation issue on the submission page * generalised code for skipping test query count * Fixes for SCC24 docs * Fix scenario text in main.py * Fix links for scc24 * Fix links for scc24 * Improve the general docs * Fix links for scc24 * Use float16 in scc24 doc * Improve scc24 docs * Improve scc24 docs * Use float16 in scc24 doc * fixed command bug * Fix typo in docs * Fix typo in docs * Remove unnecessary indendation in docs * initial commit for tip - native run CUDA * Updated tip --------- Co-authored-by: Nathan Wasson Co-authored-by: anandhu-eng Co-authored-by: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com> Co-authored-by: Michael Goin --- docs/install/index.md | 4 ++-- main.py | 6 +++++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/install/index.md b/docs/install/index.md index 195521c7e..1750d86e4 100644 --- a/docs/install/index.md +++ b/docs/install/index.md @@ -12,8 +12,8 @@ CM needs `git`, `python3-pip` and `python3-venv` installed on your system. If an This step is not mandatory as CM can use separate virtual environment for MLPerf inference. But the latest `pip` install requires this or else will need the `--break-system-packages` flag while installing `cm4mlops`. ```bash - python3 -m venv cm - source cm/bin/activate +python3 -m venv cm +source cm/bin/activate ``` ## Install CM and pulls any needed repositories diff --git a/main.py b/main.py index aa8dd769e..6a607cc10 100755 --- a/main.py +++ b/main.py @@ -140,6 +140,9 @@ def mlperf_inference_implementation_readme(spaces, model, implementation, *, imp # ref to cm installation content += f"{cur_space3}Please refer to the [installation page](site:inference/install/) to install CM for running the automated benchmark commands.\n\n" test_query_count=get_test_query_count(model, implementation, device.lower()) + if device.lower() == "cuda" and execution_env.lower() == "native": + content += f"\n{cur_space3}!!! tip\n\n" + content += f"{cur_space3} - It is advisable to use the commands in the Docker tab for CUDA. Run the below native command only if you are already on a CUDA setup with cuDNN and TensorRT installed.\n\n" if "99.9" not in model: #not showing docker command as it is already done for the 99% variant if implementation == "neuralmagic": @@ -442,7 +445,8 @@ def mlperf_inference_run_command(spaces, model, implementation, framework, categ if "short" in extra_variation_tags: full_ds_needed_tag = "" else: - full_ds_needed_tag = ",_full" + full_ds_needed_tag = "_full," + docker_setup_cmd = f"""\n {f_pre_space}```bash