Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[db_migrator.py] Fix issue while upgrading from 202205 to 202211 via fast reboot #2948

Merged
merged 2 commits into from
Aug 31, 2023

Conversation

Junchao-Mellanox
Copy link
Collaborator

What I did

Fix issue while upgrading from 202205 to 202211 via fast reboot. This issue is caused by a mismatch version handling for fast reboot in db_migrator.

In 202205, the db migrator for fast reboot flag handling is 3_0_5:

def version_3_0_5(self):

However, in master(202211), the db migrator for fast reboot flag handling is 4_0_0:

def version_4_0_0(self):

This mismatch causes an incorrect sequence like this:

1. User issue fast-reboot under 202205
2. fast-reboot script set fast reboot flag by command "sonic-db-cli STATE_DB HSET "FAST_RESTART_ENABLE_TABLE|system" "enable" "true" &>/dev/null"
3. system boot to 202211
4. db_migrator found the database version is 3_0_6, it will run 4_0_0, however, it found FAST_REBOOT|system does not exist, and it set  FAST_RESTART_ENABLE_TABLE|system enable to false
5. system incorrectly performs cold reboot

How I did it

in db migrator if we see there is FAST_RESTART_ENABLE_TABLE already, we should skip fast reboot flag migration

How to verify it

unit test
manual test

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)


if self.stateDB.keys(self.stateDB.STATE_DB, "FAST_REBOOT|system"):
enable_state = self.stateDB.get(self.stateDB.STATE_DB, 'FAST_RESTART_ENABLE_TABLE|system', 'enable')
if self.stateDB.keys(self.stateDB.STATE_DB, "FAST_REBOOT|system") or enable_state == 'true':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed? The check here was added (#2839) for old systems (like 201911) where new flag was not present. The newer branches (where FAST_RESTART_ENABLE_TABLE flag is present use below changes to delay flex counters:
sonic-net/sonic-swss#1877
#1768

@shlomibitton can you please review if the above analysis is correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vaibhavhd , this issue is only reproduced from 202205 to 202211. In 202205, there is already a db migrator function version_3_0_5 which changes old flag to new flag. However, in 202211, the same logic is in version_4_0_0. While upgrading from 202205 to 202211, 202205 sets new flag and we have a dump.rdb; in 202211 db_migrator just sets flag to false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stepanblyschak and @arfeigin , could you please provide your comment here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox correct, good observation.
The suggested change for version_4_0_0 is in place and hence also this change in version_4_0_2 is needed to tie it together so it will all work as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavhd As wrote since in version_4_0_0 in the db_migrator according to this PR we are adding new state-db entry for fast-reboot instead of the old version it will be more coherent to check for the newer version when migrating to even newer version (4_0_2) and ignore the former state-db entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why these two lines need to be changed. I understand the change in 4_0_0, but w/o the change in 4_0_2, what is the issue?

I am not saying that this is doing any damage, just that I don't understand why we are fixing this if this is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it might as well be doing some damage? For newer systems where we use FAST_RESTART_ENABLE_TABLE|system flag we do not want to call migrate_config_db_flex_counter_delay_status, or do we? I think we do not, as flex counters in these newer systems are handled by:
sonic-net/sonic-swss#1877
#1768

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vaibhavhd, you are right, this change is not necessary. I have updated code, could you please review again?

Copy link
Contributor

@arfeigin arfeigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch and solution.

@liat-grozovik
Copy link
Collaborator

@vaibhavhd can you please review feedback and if possible aprrove?

@StormLiangMS
Copy link
Contributor

ADO: 25048636

StormLiangMS pushed a commit that referenced this pull request Sep 3, 2023
…fast reboot (#2948)

- What I did
Fix issue while upgrading from 202205 to 202211 via fast reboot. This issue is caused by a mismatch version handling for fast reboot in db_migrator.

This mismatch causes an incorrect sequence like this:
1. User issue fast-reboot under 202205
2. fast-reboot script set fast reboot flag by command "sonic-db-cli STATE_DB HSET "FAST_RESTART_ENABLE_TABLE|system" "enable" "true" &>/dev/null"
3. system boot to 202211
4. db_migrator found the database version is 3_0_6, it will run 4_0_0, however, it found FAST_REBOOT|system does not exist, and it set  FAST_RESTART_ENABLE_TABLE|system enable to false
5. system incorrectly performs cold reboot

- How I did it
in db migrator if we see there is FAST_RESTART_ENABLE_TABLE already, we should skip fast reboot flag migration

- How to verify it
unit test
manual test
@Junchao-Mellanox Junchao-Mellanox deleted the master_fix_fastboot branch September 3, 2023 11:29
@Junchao-Mellanox
Copy link
Collaborator Author

PR for 202211: #2962

StormLiangMS pushed a commit that referenced this pull request Nov 13, 2023
…2211 via fast reboot (#2962)

Backport #2948 to 202211

What I did
Fix issue while upgrading from 202205 to 202211 via fast reboot. This issue is caused by a mismatch version handling for fast reboot in db_migrator.

In 202205, the db migrator for fast reboot flag handling is 3_0_5:

sonic-utilities/scripts/db_migrator.py

Line 920 in 56a1ae2

 def version_3_0_5(self): 
However, in master(202211), the db migrator for fast reboot flag handling is 4_0_0:

sonic-utilities/scripts/db_migrator.py

Line 993 in 7435b1c

 def version_4_0_0(self): 
This mismatch causes an incorrect sequence like this:

1. User issue fast-reboot under 202205
2. fast-reboot script set fast reboot flag by command "sonic-db-cli STATE_DB HSET "FAST_RESTART_ENABLE_TABLE|system" "enable" "true" &>/dev/null"
3. system boot to 202211
4. db_migrator found the database version is 3_0_6, it will run 4_0_0, however, it found FAST_REBOOT|system does not exist, and it set  FAST_RESTART_ENABLE_TABLE|system enable to false
5. system incorrectly performs cold reboot
How I did it
in db migrator if we see there is FAST_RESTART_ENABLE_TABLE already, we should skip fast reboot flag migration

How to verify it
unit test
manual test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants