-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file system full could cause service not running after reboot #839
Comments
we discussed that during the sonic to sonic upgrade and we agreed this is not correct. when we upgrade we would like to keep the /var/log so we will see the reboot flow as well. I do think that the install procedure on these cases should be manually. |
As @grozovik pointed, logs are not cleared by design. I'm more concerned about a problem that containers are inoperable when log partition is full. The main purpose of moving logs to separate mount was to limit its size so that in a case of flooding from some component switch won't stop working due to 'no space left' error. @jleveque , are you aware of this problem? The switch shouldn't become inoperable even if we have /var/log full. |
@grozovik, even when clear the /var/log we should still see the reboot flow. @grozovik, can you explain this? I do not fully understand this. " I do think that the install procedure on these cases should be manually." @marian-pritsak , swss won't start if the log is full as it tries to open swss/sairedis log file. |
seems we are following liat suggestion, nothing further needs to be done. |
sonic-swss: [vnet]: Extend Bitmap VNET test with "remove" flows (sonic-net#900) [vxlanorch] Ambiguous return code for removeNextHopTunnel (sonic-net#880) Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (sonic-net#839) Set LAG mtu value based on kernel netlink msg (sonic-net#922) [orchagent]: Remove try/catch for correct coredump file (sonic-net#790) [aclorch] unittest by gtest (sonic-net#924) [orchagent]: Added support of PFC WD for BFN platform (sonic-net#823) [vnetorch]: Fix tunnel route removal flow for bitmap VNET (sonic-net#912) pkill -9 zebra for frr warm restart VS test fix (sonic-net#927) swss-orchagent: add new orch for vnet routes/tunnel routes tables in CONFIG_DB (sonic-net#907) [debian]: Do not build test when building with real SAI (sonic-net#932) sonic-swss-common: Add schema for dot1p to tc mapping config table (sonic-net#274) Fix MIRROR_SESSION table macro name (sonic-net#264) [schema] Add VNET Route tables in config_db (sonic-net#279) [debian] increment debian compatibility to 10 to enable parallel package build (sonic-net#280) White-list clear_stats op from orchagent to syncd (sonic-net#281) Correct comment (sonic-net#282) sonic-sairedis: [debian]: Change build order in target binary (sonic-net#452) [debian] increment debian compatibility to 10 to enable parallel package build (sonic-net#461) Full sleep wait flex counter polling thread when POLL_COUNTER_STATUS is disable (sonic-net#462) add support for SAI_ATTR_VALUE_TYPE_ACL_CAPABILITY (sonic-net#460) Check if port VID exists in db on flex counter query (sonic-net#464) Full sleep wait change for PFC watchdog (sonic-net#465) Add synchronous clear_stats operation path (sonic-net#463) Modify sai_create_port to breakout a port for virtual switch (sonic-net#454) Fix typo (sonic-net#467) Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
sonic-swss: [vnet]: Extend Bitmap VNET test with "remove" flows (#900) [vxlanorch] Ambiguous return code for removeNextHopTunnel (#880) Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (#839) Set LAG mtu value based on kernel netlink msg (#922) [orchagent]: Remove try/catch for correct coredump file (#790) [aclorch] unittest by gtest (#924) [orchagent]: Added support of PFC WD for BFN platform (#823) [vnetorch]: Fix tunnel route removal flow for bitmap VNET (#912) pkill -9 zebra for frr warm restart VS test fix (#927) swss-orchagent: add new orch for vnet routes/tunnel routes tables in CONFIG_DB (#907) [debian]: Do not build test when building with real SAI (#932) sonic-swss-common: Add schema for dot1p to tc mapping config table (#274) Fix MIRROR_SESSION table macro name (#264) [schema] Add VNET Route tables in config_db (#279) [debian] increment debian compatibility to 10 to enable parallel package build (#280) White-list clear_stats op from orchagent to syncd (#281) Correct comment (#282) sonic-sairedis: [debian]: Change build order in target binary (#452) [debian] increment debian compatibility to 10 to enable parallel package build (#461) Full sleep wait flex counter polling thread when POLL_COUNTER_STATUS is disable (#462) add support for SAI_ATTR_VALUE_TYPE_ACL_CAPABILITY (#460) Check if port VID exists in db on flex counter query (#464) Full sleep wait change for PFC watchdog (#465) Add synchronous clear_stats operation path (#463) Modify sai_create_port to breakout a port for virtual switch (#454) Fix typo (#467) Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
* Update src/sonic-swss from branch 'broadcom_sonic' to cd2a2e0504412254d4d44f5f97946921dc246cc6 - Merge 201904 branch to broadcom_sonic branch on Mon Jul 1 13:57:57 PDT 2019 Change-Id: I77bef1ba390171f204e27387bee0226ddab38971 - [debian]: Do not build test when building with real SAI (sonic-net#932) - swss-orchagent: add new orch for vnet routes/tunnel routes tables in CONFIG_DB (sonic-net#907) * Vnet route persistence Signed-off-by: weixi.chen@microsoft.com - pkill -9 zebra for frr warm restart VS test fix (sonic-net#927) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> - [vnetorch]: Fix tunnel route removal flow for bitmap VNET (sonic-net#912) Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com> - [orchagent]: Added support of PFC WD for BFN platform (sonic-net#823) * [orchagent]: Added support of PFC WD for BFN platform Signed-off-by: Vitaliy Senchyshyn <vsenchyshyn@barefootnetworks.com> * Fixed review comments Signed-off-by: Vitaliy Senchyshyn <vsenchyshyn@barefootnetworks.com> * Use PFC WD ACL handler for BFN platform - [aclorch] unittest by gtest (sonic-net#924) - [orchagent]: Remove try/catch for correct coredump file (sonic-net#790) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> - Set LAG mtu value based on kernel netlink msg (sonic-net#922) * Update mtu value based on kernel netlink msg * Push the calculated MTU size into the fvVector - Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (sonic-net#839) Signed-off-by: Wenda Ni <wenni@microsoft.com> - [vxlanorch] Ambiguous return code for removeNextHopTunnel (sonic-net#880) Change to return false when isTunnelExists is fail - [vnet]: Extend Bitmap VNET test with "remove" flows (sonic-net#900) Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com> - add dynamic transceiver tuning support (sonic-net#821) - Remove *_LEFT fields to allow PFC watchdog to enter fresh into the (sonic-net#897) operational/storm state Signed-off-by: Wenda Ni <wenni@microsoft.com> - Fix vlan incremental config and add vs test cases (sonic-net#799) * Fix vlan incremental config and add vs test cases Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> - Suppress storm detect counter increment for ongoing pfc storm case during a warm reboot (sonic-net#869) * Suppress storm detect counter increment for ongoing pfc storm case during a warm reboot Signed-off-by: Wenda Ni <wenni@microsoft.com> * Comment touch-up Signed-off-by: Wenda Ni <wenni@microsoft.com> - [warm restart assist] assume vector values could be reordered (sonic-net#921) When comparing 2 vectors, assume their elements could be re-ordered. Signed-off-by: Ying Xie <ying.xie@microsoft.com> - [test]: Mark some VLAN tests as Stretch only (sonic-net#903) Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com> - [aclorch]: Add MIRROR_DSCP table type (sonic-net#906) Add MIRROR_DSCP table to support creating an ACL mirro table that only matches DSCP value/mask. Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com> - [debian] increment debian compatibility to 10 to enable parallel package build (sonic-net#911) From debhelper man pages: "If neither option is specified, debhelper currently defaults to --parallel in compat 10 (or later) and --no-parallel otherwise." Signed-off-by: Stepan Blyschak <stepanb@mellanox.com> - [test]: Skip tests under investigation (sonic-net#919) - [vstest]: Update the mirror session state table name (sonic-net#917) Due to the change c033b23 Fix MIRROR_SESSION table macro name (sonic-net#802) Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com> - Ignore neighbor entry with BCAST MAC, check SAI status exists (sonic-net#914) * Ignore neighbor entry with BCAST MAC, check SAI status exists * Addressed review comment - Fix MIRROR_SESSION table macro name (sonic-net#802) Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com> - [policerorch]: Add PolicerOrch to bundle with mirror session (sonic-net#889) Now that we could create a policer for the mirror session to throttle the mirroring traffic. configuration: POLICER|NAME: meter_type:packets|bytes mode:sr_tcm|tr_tcm|storm_control cir|DIGITS cbs|DIGITS pir|DIGITS pbs|DIGITS corlor_source:aware|blind red_action:drop yellow_action:drop green_action:drop MIRROR_SESSION|NAME: policer:policer_name Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
Submodule src/sonic-swss f09ddb4..49c9c16: > Allow buffer profile apply after init (sonic-net#1099) > [aclorch]: Check for existing mirror table only when creating a new table (sonic-net#1089) > [201811] [portsorch] fix PortsOrch::allPortsReady() returns true when it should not (sonic-net#1116) > Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (sonic-net#839) > Fix PFC watchdog not getting lossless TC (sonic-net#876) Submodule src/sonic-utilities c049e54..2ca1ae1: > Add a generic configlet application script (sonic-net#716) Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Submodule src/sonic-swss f09ddb4..49c9c16: > Allow buffer profile apply after init (#1099) > [aclorch]: Check for existing mirror table only when creating a new table (#1089) > [201811] [portsorch] fix PortsOrch::allPortsReady() returns true when it should not (#1116) > Address review comment: remove data member m_entriesCreated, which is introduced for dependancy resolution purpose. (#839) > Fix PFC watchdog not getting lossless TC (#876) Submodule src/sonic-utilities c049e54..2ca1ae1: > Add a generic configlet application script (#716) Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* d0f8091 2020-03-22 | Revert "add support for MCLAG (sonic-net#453)" (sonic-net#849) (HEAD -> master, origin/master, origin/HEAD) [lguohan] * 6f54e8c 2020-03-22 | Revert "return list for _get_optional_services() (sonic-net#822)" (sonic-net#848) [lguohan] * f1c79d5 2020-03-22 | return list for _get_optional_services() (sonic-net#822) (HEAD -> master, origin/master, origin/HEAD) [shine4chen] * 28ea21a 2020-03-21 | Fix kernel panic for irq after fast-reboot (sonic-net#823) [byu343] * 727b499 2020-03-22 | [decode-syseeprom] fix getattribute check for sime platforms (sonic-net#835) [Mykola F] * db78cb6 2020-03-21 | Update Command Reference with sFlow section (sonic-net#841) [padmanarayana] * 780673c 2020-03-21 | explicitly specify command with underscores (sonic-net#846) [lguohan] * 07dc201 2020-03-21 | [db_migrator]Do DB migration for buffer pool size change on Mellanox platform (sonic-net#833) [Kebo Liu] * 9a94955 2020-03-20 | [sonic_installer] Enable ARM64 arch (sonic-net#811) [arheneus@marvell.com] * 92b30c2 2020-03-18 | [config]: add syslog messages to config load_minigraph/reload (sonic-net#843) [lguohan] * 4389ffe 2020-03-17 | [intfutil] set speed to 0 when interface speed is not available (sonic-net#839) [Ying Xie] * 45c6c68 2020-03-17 | [Mellanox] add document for thermal control related cli (sonic-net#832) [Junchao-Mellanox] * 7105400 2020-03-12 | Add kdump support for Aboot platforms (sonic-net#824) [byu343] * c5c5ffc 2020-03-01 | [fwutil]: Set default socket timeout for FW download to 30 sec. (sonic-net#821) [Nazarii Hnydyn] * 81c5930 2020-03-01 | Update config/show to include PFC Watchdog commands (sonic-net#736) [Andriy Moroz] * 66e9dfb 2020-02-28 | [MultiDB] sonic-utilities - replace redis-cli/redis-dump with sonic-db-cli/sonic-db-dump (sonic-net#810) [Dong Zhang] * 8aea564 2020-02-24 | add support for MCLAG (sonic-net#453) [shine4chen] * 118620f 2020-02-23 | [reboot] make sure the reboot happens even if platform reboot failed (sonic-net#819) [Ying Xie] * 40eff82 2020-02-22 | Multi-Db changes for NAT feature. (sonic-net#818) [Akhilesh Samineni] * a4cb4dd 2020-02-21 | [Command-Reference.md] Unify Usage statments and Examples (including sample prompts) (sonic-net#816) [Joe LeVeque] Signed-off-by: Guohan Lu <lguohan@gmail.com>
… introduced for dependancy resolution purpose. (sonic-net#839) Signed-off-by: Wenda Ni <wenni@microsoft.com>
* [submodule]: update sonic-utilities * d0f8091 2020-03-22 | Revert "add support for MCLAG (#453)" (#849) (HEAD -> master, origin/master, origin/HEAD) [lguohan] * 6f54e8c 2020-03-22 | Revert "return list for _get_optional_services() (#822)" (#848) [lguohan] * f1c79d5 2020-03-22 | return list for _get_optional_services() (#822) (HEAD -> master, origin/master, origin/HEAD) [shine4chen] * 28ea21a 2020-03-21 | Fix kernel panic for irq after fast-reboot (#823) [byu343] * 727b499 2020-03-22 | [decode-syseeprom] fix getattribute check for sime platforms (#835) [Mykola F] * db78cb6 2020-03-21 | Update Command Reference with sFlow section (#841) [padmanarayana] * 780673c 2020-03-21 | explicitly specify command with underscores (#846) [lguohan] * 07dc201 2020-03-21 | [db_migrator]Do DB migration for buffer pool size change on Mellanox platform (#833) [Kebo Liu] * 9a94955 2020-03-20 | [sonic_installer] Enable ARM64 arch (#811) [arheneus@marvell.com] * 92b30c2 2020-03-18 | [config]: add syslog messages to config load_minigraph/reload (#843) [lguohan] * 4389ffe 2020-03-17 | [intfutil] set speed to 0 when interface speed is not available (#839) [Ying Xie] * 45c6c68 2020-03-17 | [Mellanox] add document for thermal control related cli (#832) [Junchao-Mellanox] * 7105400 2020-03-12 | Add kdump support for Aboot platforms (#824) [byu343] * c5c5ffc 2020-03-01 | [fwutil]: Set default socket timeout for FW download to 30 sec. (#821) [Nazarii Hnydyn] * 81c5930 2020-03-01 | Update config/show to include PFC Watchdog commands (#736) [Andriy Moroz] * 66e9dfb 2020-02-28 | [MultiDB] sonic-utilities - replace redis-cli/redis-dump with sonic-db-cli/sonic-db-dump (#810) [Dong Zhang] * 8aea564 2020-02-24 | add support for MCLAG (#453) [shine4chen] * 118620f 2020-02-23 | [reboot] make sure the reboot happens even if platform reboot failed (#819) [Ying Xie] * 40eff82 2020-02-22 | Multi-Db changes for NAT feature. (#818) [Akhilesh Samineni] * a4cb4dd 2020-02-21 | [Command-Reference.md] Unify Usage statments and Examples (including sample prompts) (#816) [Joe LeVeque] Signed-off-by: Guohan Lu <lguohan@gmail.com>
[fwutil]: Use overlay driver when mounting next image filesystem (#825) Fix for adding L3 interface to Vlan group (#826)Fix for adding L3 interface to Vlan group (#826) [db_migrator]Do DB migration for buffer pool size change on Mellanox platform (#833) explicitly specify command with underscores (#846) [intfutil] set speed to 0 when interface speed is not available (#839)
[fwutil]: Use overlay driver when mounting next image filesystem (sonic-net#825) Fix for adding L3 interface to Vlan group (sonic-net#826)Fix for adding L3 interface to Vlan group (sonic-net#826) [db_migrator]Do DB migration for buffer pool size change on Mellanox platform (sonic-net#833) explicitly specify command with underscores (sonic-net#846) [intfutil] set speed to 0 when interface speed is not available (sonic-net#839)
…c-net#839) This is not an issue with normal and correct configuration. The issue was exposed when there is an incorrect configuration, e.g. contain wrong port names. These wrong port names will still get populated to the app_db but will not have speed associated. Lack of speed entry will cause "show interface status" to throw exception. Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Changes: ``` 3c485e5 [recorder] Fix incorrect attribute enum value capability query (sonic-net#843) 677ebca [sairedis] Client/Server support zmq configuration file (sonic-net#845) 7c70e34 [sairedis] Add support for bulk api in client/server (sonic-net#844) 76d28a6 [pyext] Use SAI autogenerated saiswig.i (sonic-net#837) 9949c48 [vslib] implement query for SAI_DEBUG_COUNTER_TYPE enum values (sonic-net#842) e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs. d819f97 [meta] Add support for ignored attributes names (sonic-net#836) c163238 Add cisco-8000 checks to syncd_init_common (sonic-net#839) 9aed2ff [sairedis] Add support for client server architecture (sonic-net#838) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Changes: 3c485e5 [recorder] Fix incorrect attribute enum value capability query (#843) 677ebca [sairedis] Client/Server support zmq configuration file (#845) 7c70e34 [sairedis] Add support for bulk api in client/server (#844) 76d28a6 [pyext] Use SAI autogenerated saiswig.i (#837) 9949c48 [vslib] implement query for SAI_DEBUG_COUNTER_TYPE enum values (#842) e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs. d819f97 [meta] Add support for ignored attributes names (#836) c163238 Add cisco-8000 checks to syncd_init_common (#839) 9aed2ff [sairedis] Add support for client server architecture (#838) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Changes: 3c485e5 [recorder] Fix incorrect attribute enum value capability query (#843) 677ebca [sairedis] Client/Server support zmq configuration file (#845) 7c70e34 [sairedis] Add support for bulk api in client/server (#844) 76d28a6 [pyext] Use SAI autogenerated saiswig.i (#837) 9949c48 [vslib] implement query for SAI_DEBUG_COUNTER_TYPE enum values (#842) e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs. d819f97 [meta] Add support for ignored attributes names (#836) c163238 Add cisco-8000 checks to syncd_init_common (#839) 9aed2ff [sairedis] Add support for client server architecture (#838) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Changes: 3c485e5 [recorder] Fix incorrect attribute enum value capability query (sonic-net#843) 677ebca [sairedis] Client/Server support zmq configuration file (sonic-net#845) 7c70e34 [sairedis] Add support for bulk api in client/server (sonic-net#844) 76d28a6 [pyext] Use SAI autogenerated saiswig.i (sonic-net#837) 9949c48 [vslib] implement query for SAI_DEBUG_COUNTER_TYPE enum values (sonic-net#842) e385212 [MPLS] Minor tweaks to VS for MPLS support for CRM polling of MPLS In-segments and NHs. d819f97 [meta] Add support for ignored attributes names (sonic-net#836) c163238 Add cisco-8000 checks to syncd_init_common (sonic-net#839) 9aed2ff [sairedis] Add support for client server architecture (sonic-net#838) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it To pick up fixes from submodule sonic-sairedis which include the following fixes: ``` commit 1027eef3a331e84560827c7584ee8009baf434d5 (HEAD -> 202012, origin/202012) Author: gechiang <62408185+gechiang@users.noreply.github.com> Date: Wed Dec 8 03:13:34 2021 -0800 [202012] Prevent other notification event storms to keep enqueue unchecked and drained all memory that leads to crashing the switch router (#976) commit 94455e50d3444dcd60093b7a39c7f427337a94d2 Author: VenkatCisco <77468614+VenkatCisco@users.noreply.github.com> Date: Tue Jun 15 03:23:20 2021 -0700 Add cisco-8000 checks to syncd_init_common (#839) commit 2df539483ed68519c3c9c6df958d3ed2f31dd629 Author: Kamil Cudnik <kcudnik@gmail.com> Date: Mon Dec 6 20:50:23 2021 +0100 [lgtm] Add gmock libs to lgtm (#979) ```
We noticed that some devices running old sonic releases could fill /var/log/ partition to 100%.
When /var/log partition is full, we noticed that some containers (if not all) would stay down after o.s. reboot.
The issue of filling /var/log partition has been solved in recent release (swss releases log file to allow logrotate to happen).
However, when we upgrade from old releases with full partitions to new releases, the storage space is not automatically freed. Therefore, human intervention is needed to clear these partitions.
As result, we would like to see that the upgrade process format the /var/log partition to mitigate the issue where a full /var/log partition could prevent rebooted device to start service.
The text was updated successfully, but these errors were encountered: