Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sai dbg gen dump support #1846

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.


1. A user command, such as `show techsupport` triggers the `generate_sai_dump`, and creates a new table with the dump file name to create in the APPL DB.
2. A new orchestration agent, `DbgGenDumpOrch`, is triggered to handle the request.
3. `DbgGenDumpOrch` writes the file name to the ASIC DB and sets a new operation `REDIS_ASIC_STATE_COMMAND_DBG_GEN_DUMP` for syncd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a bottle neck in the regular orchagent to syncd flow using ASIC_DB. As the commands are synchronous, during techsupport it may block other events.
If the saisdkdump takes time this may even cause timeout between orchagent and syncd. If the motivation is to just move from saisdkdump I don't see any advantage in this flow and it might event cause performance issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed and consensus is to provide an ability to make the call non blocking. This should be configurable. @aviramd Please add this to the document

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I have added a general remark in the HLD requirements for this. Once I implement it, I will provide more detailed instructions on how to configure it.

```

#### show techsupport
Introduced a new generic API, `generate_sai_dbg_dump_file`, in `generate_dump.sh` (invoked by the `show techsupport` command) to create a debug dump file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nvidia platform, sai_dbg_generate_dump call generates a bunch of files.

Eg:

docker exec syncd mkdir -p /tmp/saisdkdump/
docker exec syncd saisdkdump -f /tmp/saisdkdump/sai_sdk_dump_12_10_2024_09_21_PM

ls -l /tmp/saisdkdump/
total 60948
-rw-r--r-- 1 root root 11854441 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM
-rw-r--r-- 1 root root     1826 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM.json
drwxr-xr-x 2 root root     4096 Dec 10 21:25 sai_sdk_dump_12_10_2024_09_21_PM_extras
-rw-r--r-- 1 root root     1137 Dec 10 21:25 sdk_dump_ext_.....summary.txt
-rw-r--r-- 1 root root  1201437 Dec 10 21:25 sdk_dump_ext_.....driver.txt
...............

I'm not aware of other vendors but i believe SONiC should have flexibility to provide a path for vendor SAI to dump whatever it deems relevant. Thus, i recommend the following, let me know what you think

  • Update generate_sai_dbg_dump_file to take in two parameters
    Eg: generate_sai_dbg_dump_file -p "$path" -f "$filename"
  • Make sure syncd runs sai_dbg_generate_dump with arg $path/$filename
  • After the execution is complete, techsupport process will copy the contents under $path back into the host and save them into the tar file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially assumed that the SAI API generates a single file since it receives a specific file name rather than just a path. However, I have now realized that the API may generate multiple files.
To accommodate this, I will modify the implementation so that theshow tech-support process copies all files from the specified directory instead of targeting a single file.
In my opinion, Since the generate_sai_dbg_dump_file API is generic for all vendors and is invoked as part of the main show tech-support flow, the path for the SAI to place the files in the syncd container remains generic as well, using /var/log/dbg_gen_dump.log. The generate_sai_dump function in the gen_sai_dbg_dump.sh script ensures that the /var/log/ directory exists and creates it if necessary. This directory is temporary, as the files generated are moved from the syncd filesystem to the host machine once the process completes. Therefore, I don’t see any value in defining separate paths for different vendors.
Ultimately, since the purpose of calling generate_sai_dbg_dump_file is to generate the dump files and copy them to the show tech-support folder, the temporary paths where the files are stored in between do not really matter.
To avoid potential conflicts, I will create a unique directory under /var/log/ to store the files temporarily, ensuring that such a directory does not already exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a standard directory for everyone should suffice and the thread that collects the debug dump should be made singleton. i.e. only one instance runs at a time.

I think after compressing and moving the contents, we should clear the folder.

1.add option for blocking and non blocking flow 2. add timeout configuration 3.result set to APP STATE DB
@zhangyanzhao
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In Progress
Development

Successfully merging this pull request may close these issues.

5 participants