Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
prsunny authored Aug 8, 2022
2 parents 0cf85dd + 3f0ba59 commit a27bace
Show file tree
Hide file tree
Showing 11 changed files with 169 additions and 156 deletions.
26 changes: 9 additions & 17 deletions Supported-Devices-and-Platforms.html
Original file line number Diff line number Diff line change
Expand Up @@ -471,15 +471,15 @@ <h2><p style="text-align: left; font-family: Verdana, sans-serif; color: #2E86C1
<tr>
<td>Arista</td>
<td>7170-32CD</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>32x100G + 2x10G</td>
<td></td>
</tr>
<tr>
<td>Arista</td>
<td>7170-64C</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>64x100G</td>
<td></td>
Expand Down Expand Up @@ -509,25 +509,17 @@ <h2><p style="text-align: left; font-family: Verdana, sans-serif; color: #2E86C1
<td></td>
</tr>
<tr>
<td>Barefoot</td>
<td>SONiC-P4</td>
<td class="asic_vendor">Barefoot</td>
<td>P4 Emulated</td>
<td>Configurable</td>
<td></td>
</tr>
<tr>
<td>Barefoot</td>
<td>Accton</td>
<td>Wedge 100BF-32</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>32x100G</td>
<td></td>
</tr>
<tr>
<td>Barefoot</td>
<td>Accton</td>
<td>Wedge 100BF-65X</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>32x100G</td>
<td></td>
Expand Down Expand Up @@ -837,7 +829,7 @@ <h2><p style="text-align: left; font-family: Verdana, sans-serif; color: #2E86C1
<tr>
<td>Ingrasys</td>
<td>S9180-32X</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>32x100G</td>
<td></td>
Expand All @@ -861,7 +853,7 @@ <h2><p style="text-align: left; font-family: Verdana, sans-serif; color: #2E86C1
<tr>
<td>Ingrasys</td>
<td>S9280-64X</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>64x100G</td>
<td></td>
Expand Down Expand Up @@ -1093,7 +1085,7 @@ <h2><p style="text-align: left; font-family: Verdana, sans-serif; color: #2E86C1
<tr>
<td>Wnc</td>
<td>OSW1800</td>
<td class="asic_vendor">Barefoot</td>
<td class="asic_vendor">Intel</td>
<td>Tofino</td>
<td>48x25G + 6x100G</td>
<td></td>
Expand Down
Binary file modified assets/img/all_partners2_1920x1320.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 25 additions & 25 deletions doc/bulk_counter/bulk_counter.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,26 @@ PR https://github.com/opencomputeproject/SAI/pull/1352/files introduced new SAI
- sai_bulk_object_get_stats
- sai_bulk_object_clear_stats

SONiC flex counter infrastructure shall utilize bulk stats API to gain better performance. This document discusses how to integrate these two new APIs to SONiC.
SONiC flex counter infrastructure shall utilize bulk stats API to gain better performance. This document discusses how to integrate these two new APIs to SONiC.

### Requirements

- Syncd shall use bulk stats APIs based on object type. E.g. for a counter group that queries queue and pg stats, queue stats support bulk while pg stats does not, in that case queue stats shall use bulk API, pg stats shall use non bulk API
- For a certain object type in a counter group, it shall use bulk stats only if:
- The stats capability for each counter IDs shall match the stats mode of the counter group
- Each object queries exactly the same counter IDs. (Requirement from function signature of sai_bulk_object_get_stats and sai_bulk_object_clear_stats)
- For a certain object in a counter group, it shall use bulk stats only if all counter IDs support bulk API
- Syncd shall automatically fall back to old way if bulk stats APIs are not supported
- Syncd shall utilize API sai_query_stats_capability to query bulk capability. Syncd shall treat counter as no bulk capability if API sai_query_stats_capability return error except SAI_STATUS_BUFFER_OVERFLOW (SAI_STATUS_BUFFER_OVERFLOW requires a retry with larger buffer)
- Syncd shall call bulk stats API in flex counter thread and avoid calling it in main thread to make sure main thread only handles short and high priority tasks. (This is the default behavior in flex counter infrastructure)
- Syncd shall utilize sai_bulk_object_get_stats/sai_bulk_object_clear_stats to query bulk capability. Syncd shall treat counter as no bulk capability if API return error
- Syncd shall call bulk stats API in flex counter thread and avoid calling it in main thread to make sure main thread only handles short and high priority tasks. (This is the default behavior in current flex counter infrastructure)
- In phase 1, the change is limited to syncd only, no CLI/swss change. Syncd shall deduce the bulk stats mode according to the stats mode defined in FLEX DB:
- SAI_STATS_MODE_READ -> SAI_STATS_MODE_BULK_READ
- SAI_STATS_MODE_READ_AND_CLEAR -> SAI_STATS_MODE_BULK_READ_AND_CLEAR

### Architecture Design

For each counter group, different statistic type is allowed to chooose bulk or non-bulk API based on SAI capability.
For each counter group, different statistic type is allowed to choose bulk or non-bulk API based on vendor SAI implementation.

![architecture](/doc/bulk_counter/bulk_counter.svg).

> Note: In the picture, pg/queue watermark statistic use bulk API and buffer watermark statistic uses non-bulk API. This is just an example to show the design idea.
> Note: In the picture, pg/queue watermark statistic use bulk API and buffer watermark statistic uses non-bulk API. This is just an example to show the design idea.
### High-Level Design

Expand All @@ -56,43 +54,40 @@ Changes shall be made to sonic-sairedis to support this feature. No CLI change.

##### Bulk Statistic Context

A new structure shall be added to FlexCounter class.
A new structure shall be added to FlexCounter class.

This structure is created because:

- Meet the signature of sai_bulk_object_get_stats and sai_bulk_object_clear_stats
- Avoid constructing these information each time collecting statistic. The bulk context shall only be updated under below cases:
- New object join counter group. E.g. adding a new port object.
- Existing object leave counter group. E.g removing an existing port object.
- Other case such as counter IDs is updated by user.
- Other case such as counter IDs is updated by upper layer.

```cpp
struct BulkStatsContext
{
sai_object_type_t object_type;
std::vector<sai_object_id_t> object_vids;
std::vector<sai_object_key_t> object_keys;
std::vector<sai_object_key_t> object_keys;
std::vector<sai_stat_id_t> counter_ids;
std::vector<sai_status_t> statuses;
std::vector<sai_status_t> object_statuses;
std::vector<uint64_t> counters;
std::shared_ptr<sai_stat_capability_list_t> stats_capas;
};
```
- object_type: object type.
- object_keys: objects that participate the bulk call. E.g. for port, SAI object id value shall be put into sai_object_key_t structure.
- object_vids: virtual IDs.
- object_keys: real IDs.
- counter_ids: SAI statistic IDs that will be queried/cleared by the bulk call.
- statuses: SAI bulk API return value for each object.
- object_statuses: SAI bulk API return value for each object.
- counters: counter values that will be fill by vendor SAI.
- stats_capas: stats capability for each statitstic IDs for current object type.

The flow of how to updating bulk context will be discussed in following section.

For a given object type, diffrent object instance may support different stats capability, so, a list of BulkStatsContext shall be added to FlexCounter class for each object type.
For a given object type, different object instance may support different stats capability, so, a map of BulkStatsContext shall be added to FlexCounter class for each object type.

```cpp

std::vector<BulkStatsContext> m_portBulkContexts;
std::vector<BulkStatsContext> m_priorityGroupBulkContexts;
std::map<std::vector<sai_port_stat_t>, BulkStatsContext> m_portBulkContexts;
...

```
Expand Down Expand Up @@ -124,7 +119,7 @@ N/A

No extra logic on SONiC side is needed to handle warmboot/fastboot.

- As fastboot dealys all counters querying, this feature does not affect fastboot.
- As fastboot delays all counters querying, this feature does not affect fastboot.
- For warmboot, it is vendor SAI implementation's responsible to make sure that there must be no error if warmboot starts while bulk API is called.

### Restrictions/Limitations
Expand All @@ -135,18 +130,23 @@ No extra logic on SONiC side is needed to handle warmboot/fastboot.

### Performance Improvement

A rough test has been done on Nvidia platform for queue.
A rough test has been done on Nvidia platform for queue.

- Non bulk API: get stats for one queue takes X seconds; get stats for 32 port * 8 queue is 256X seconds;
- Bulk API: get stats for one queue takes Y seconds; get stats for 32 port * 8 queue is almost Y seconds;

X is almost euqal to Y. So, more object instances, more performance improvement.
X is almost equal to Y. So, more object instances, more performance improvement.

### Testing Requirements/Design

As this feature does not introduce any new function, unit test shall be good enough to cover the code changes and new sonic-mgmt/VS test cases will be added.

#### Unit Test cases

- test_update_bulk_context
- test_bulk_collect_stats
- addRemoveBulkCounter
- counterIdChange
- not support bulk -> support bulk
- support bulk but counter IDs change
- support bulk with different counter IDs
- support bulk -> not support bulk
- not support bulk but counter IDs change
2 changes: 1 addition & 1 deletion doc/bulk_counter/counter_collect.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/bulk_counter/object_join_counter_group.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -908,19 +908,19 @@ N/A
| 14 | Dynamic port breakout as described [here](https://github.com/Azure/SONiC/blob/master/doc/dynamic-port-breakout/sonic-dynamic-port-breakout-HLD.md).|
| 15 | Remove an item that has a default value. |
| 16 | Modifying items that rely depends on each other based on a `must` condition rather than direct connection such as `leafref` e.g. /CRM/acl_counter_high_threshold (check [here](https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-crm.yang)). |
| 17 | Updating Syslog configs. |
| 18 | Updating AAA configs. |
| 19 | Updating DHCP configs. |
| 17 | [Updating Syslog configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_syslog.py) |
| 18 | [Updating AAA configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_aaa.py) |
| 19 | [Updating DHCP configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_dhcp_relay.py) |
| 20 | Updating IPv6 configs. |
| 21 | Updating monitor configs (EverflowAlaysOn). |
| 22 | Updating BGP speaker configs. |
| 23 | Updating BGP listener configs. |
| 24 | Updating Bounce Back Routing configs. |
| 25 | Updating control-plane ACLs (NTP, SNMP, SSH) configs. |
| 23 | [Updating BGP listener configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_bgpl.py) |
| 24 | ~~Updating Bounce Back Routing configs.~~ |
| 25 | [Updating control-plane ACLs (NTP, SNMP, SSH) configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_cacl.py) |
| 26 | Updating Ethernet interfaces configs. |
| 27 | Updating VLAN interfaces configs. |
| 28 | Updating port-channel interfaces configs. |
| 29 | Updating loopback interfaces configs. |
| 27 | [Updating VLAN interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_vlan_interface.py) |
| 28 | [Updating port-channel interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_portchannel_interface.py) |
| 29 | [Updating loopback interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_lo_interface.py) |
| 30 | Updating BGP prefix hijack configs. |
| 31 | Updating QoS headroom pool and buffer pool size. |
| 32 | Add/Remove Rack. |
Expand Down
2 changes: 1 addition & 1 deletion doc/crm/Critical-Resource-Monitoring-High-Level-Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Monitoring process should periodically poll SAI counters for all required resour

```"<Date/Time> WARNING <Process name>: THRESHOLD_EXCEEDED for <TH_TYPE> <%> Used count <value> free count <value>"```

```"<Date/Time> NOTICE <Process name>: THRESHOLD_CLEAR for <TH_TYPE> <%> Used count <value> free count <value>"```
```"<Date/Time> WARNING <Process name>: THRESHOLD_CLEAR for <TH_TYPE> <%> Used count <value> free count <value>"```

```<TH_TYPE> = <TH_PERCENTAGE, TH_USED, TH_FREE>```

Expand Down
40 changes: 29 additions & 11 deletions doc/psud/PSU_daemon_design.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,35 @@
# SONiC PSU Daemon Design #

### Rev 0.1 ###
### Rev 0.2 ###

### Revision ###

| Rev | Date | Author | Change Description |
|:---:|:-----------:|:------------------:|-----------------------------------|
| 0.1 | | Chen Junchao | Initial version |

| 0.2 | August 4st, 2022 | Stephen Sun | Update according to the current implementation |

## 1. Overview

The purpose of PSU daemon is to collect platform PSU data and trigger proper actions if necessary. Major functions of psud include:

- Collect constant PSU data during daemon boot up, such as PSU number.
- Collect variable PSU data periodically.
- Monitor PSU event, set LED color and trigger syslog according to event type.
- Collect variable PSU data periodically, including:
- PSU entity information
- PSU present status and power good status
- PSU power, current, voltage and voltage threshold
- PSU temperature and temperature threshold
- Monitor PSU event, set LED color and trigger syslog according to event type, including:
- PSU present status and power good status
- whether the PSU voltage exceeds the minimal and maximum thresholds
- whether the PSU temperature exceeds the threshold
- whether the total PSU power consumption exceeds the budget (modular switch only)

## 2. PSU data collection

PSU daemon data collection flow diagram:

![](https://github.com/Azure/SONiC/blob/master/doc/pmon/daemon-flow.svg)
![](PSU_daemon_design_pictures/PSU-daemon-data-collection-flow.svg)

Now psud collects PSU data via platform API, and it also support platform plugin for backward compatible. All PSU data will be saved to redis database for further usage.

Expand All @@ -34,13 +42,23 @@ PSU information is stored in PSU table:
; Defines information for a psu
key = PSU_INFO|psu_name ; information for the psu
; field = value
presence = BOOLEAN ; presence of the psu
presence = BOOLEAN ; presence state of the psu
model = STRING ; model name of the psu
serial = STRING ; serial number of the psu
revision = STRING ; hardware revision of the PSU
status = BOOLEAN ; status of the psu
change_event = STRING ; change event of the psu
fan = STRING ; fan_name of the psu
led_status = STRING ; led status of the psu
is_replaceable = STRING ; whether the PSU is replaceable
temp = 1*3.3DIGIT ; temperature of the PSU
temp_threshold = 1*3.3DIGIT ; temperature threshold of the PSU
voltage = 1*3.3DIGIT ; the output voltage of the PSU
voltage_min_threshold = 1*3.3DIGIT ; the minimal voltage threshold of the PSU
voltage_max_threshold = 1*3.3DIGIT ; the maximum voltage threshold of the PSU
current = 1*3.3DIGIT ; the current of the PSU
power = 1*3.3DIGIT ; the power of the PSU


Now psud only collect and update "presence" and "status" field.

Expand Down Expand Up @@ -72,10 +90,10 @@ The current output for "show platform psustatus" looks like:

```
admin@sonic:~$ show platform psustatus
PSU Status
----- --------
PSU 1 OK
PSU 2 OK
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED
----- ------------- ------------ -------- ------------- ------------- ----------- -------- -----
PSU 1 MTEF-PSF-AC-A MT1629X14911 A3 12.09 5.44 64.88 OK green
PSU 2 MTEF-PSF-AC-A MT1629X14913 A3 12.02 4.69 56.25 OK green
```

## 5. PSU LED management
Expand Down Expand Up @@ -147,4 +165,4 @@ Supervisord takes charge of this daemon. This daemon will loop every 3 seconds a

- The psu_num will store in "chassis_info" table. It will just be invoked one time when system boot up or reload. The key is chassis_name, the field is "psu_num" and the value is from get_psu_num().
- The psu_status and psu_presence will store in "psu_info" table. It will be updated every 3 seconds. The key is psu_name, the field is "presence" and "status", the value is from get_psu_presence() and get_psu_num().
- The daemon query PSU event every 10 seconds via platform API. If any event detects, it should set PSU LED color accordingly and trigger proper syslog.
- The daemon query PSU event every 3 seconds via platform API. If any event detects, it should set PSU LED color accordingly and trigger proper syslog.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit a27bace

Please sign in to comment.