Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build] add support for 2 stage rootfs build #15924

Merged
merged 11 commits into from
Oct 11, 2023

Conversation

Yakiv-Huryk
Copy link
Contributor

This adds optimization for the SONiC image build by splitting the final build step into two stages. It allows running the first stage in parallel, improving build time.

The optimization is enabled via new rules/config flag ENABLE_RFS_SPLIT_BUILD (disabled by default)

Why I did it

To improve a build time.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added a logic to run build_debian.sh in two stages, transferring the progress via a new build artifact.

How to verify it

make ENABLE_RFS_SPLIT_BUILD=y SONIC_BUILD_JOBS=32 target/<IMAGE_NAME>.bin

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Yakiv-Huryk
Copy link
Contributor Author

HLD:
sonic-net/SONiC#1430

@Yakiv-Huryk
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@Yakiv-Huryk
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

build_debian.sh Outdated Show resolved Hide resolved
build_debian.sh Outdated Show resolved Hide resolved
@k-v1
Copy link
Contributor

k-v1 commented Aug 17, 2023

make clean and make distclean commands don't remove this squashfs file from target directory.
Is it expected behaviour? Maybe we need to add clean rule for this target too.
But unlike other targets owner of this file is root.

slave.mk Outdated Show resolved Hide resolved
@k-v1
Copy link
Contributor

k-v1 commented Aug 18, 2023

Please build with and without this feature and compare file /etc/sonic/sonic_version.yml
You probably get different files.

sonic-buildimage/slave.mk

Lines 1341 to 1343 in 0bd8c3b

export components="$(foreach component,$(notdir $^),\
$(shell [[ ! -z '$($(component)_VERSION)' && ! -z '$($(component)_NAME)' ]] && \
echo $($(component)_NAME)==$($(component)_VERSION)))"

j2 files/build_templates/sonic_version.yml.j2 | sudo tee $FILESYSTEM_ROOT/etc/sonic/sonic_version.yml

{% if components is defined -%}
{% for component in components.split() | unique -%}
{% set name, version = component.split('==') -%}
{{ name }}: {{ version }}
{% endfor -%}
{% endif -%}

Possible solution is to generate file sonic_version.yml later (after 1st stage)

SONIC_TARGET_LIST += $(addprefix $(TARGET_PATH)/, $(SONIC_RFS_TARGETS))
endif

$(addprefix $(TARGET_PATH)/, $(SONIC_RFS_TARGETS)) : $(TARGET_PATH)/% : \
Copy link
Collaborator

@qiluo-msft qiluo-msft Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SONIC_RFS_TARGETS

Could you make sure the new target SONIC_RFS_TARGETS is reused if we build multiple vendors image in series?

We also have reproducible mechanism in rules/*.dep files. Is it possible to define this target's dep file, so the target is reused as cached for other later build jobs?

@xumia to review. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target is reused. I've also added a cache: a880d53
Since the targets are defined dynamically, I could not add the '.dep' file. Instead, the relevant variables are also set dynamically.

@liat-grozovik
Copy link
Collaborator

who ever is reviewing this PR, we have a target to finish this review and enjoy the approved time of build in the next week.
who ever is tagged and wish to provide feedback please do it asap.
@Yakiv-Huryk @xumia @Kalimuthu-Velappan

@k-v1
Copy link
Contributor

k-v1 commented Aug 27, 2023

who ever is reviewing this PR, we have a target to finish this review and enjoy the approved time of build in the next week.

Still need to fix one issue I mentioned before.
#15924 (comment)

Also @qiluo-msft suggested to use cache for 1st-stage squashfs file. Need feedback from @Yakiv-Huryk is it possible or not.

slave.mk Outdated
SONIC_RFS_TARGETS= $(foreach installer, $(SONIC_INSTALLERS), $(call rfs_get_installer_dependencies,$(installer)))

ifeq ($(ENABLE_RFS_SPLIT_BUILD),y)
Copy link
Contributor

@Kalimuthu-Velappan Kalimuthu-Velappan Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid the if..else statement inside the build_debian.sh file.
Can you split the build_debian.sh into two separate files?

  1. build_debian_rootfs.sh => which contains all the standard squashfs packages
  2. build_debian_sonic.sh => Includes all the sonic packages into existing squashfs and creates the dockerfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why splitting the build_debian.sh is better. Also, with the current design, the feature is optional and configurable (via ENABLE_RFS_SPLIT_BUILD). If we split the build_debian.sh, the current way to build will no longer be available.

Can you please help to understand why the if..else is better avoided?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split the build_debian.sh, the current way to build will no longer be available.

I think this feature (RFS_SPLIT_BUILD) should be enabled by default in the future.
Two versions are hard to maintain.
In this case, splitting into two separate files is a good solution.

But probably we can merge this PR as a temporary solution and test builds with enabled option for a while.

@@ -1573,7 +1575,9 @@ SONIC_CLEAN_TARGETS += $(addsuffix -clean,$(addprefix $(TARGET_PATH)/, \
$(SONIC_DOCKER_IMAGES) \
$(SONIC_DOCKER_DBG_IMAGES) \
$(SONIC_SIMPLE_DOCKER_IMAGES) \
$(SONIC_INSTALLERS)))
$(SONIC_INSTALLERS) \
$(SONIC_RFS_TARGETS)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add caching support for rootfs preparation in squashfs file system?
You can refer DPKG caching framework for deb packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in a880d53

slave.mk Outdated Show resolved Hide resolved
@k-v1
Copy link
Contributor

k-v1 commented Aug 31, 2023

@Yakiv-Huryk
1.
I'm trying to build make target/sonic-broadcom.bin__broadcom__rfs.squashfs with enabled DPKG cache, but file sonic-broadcom.bin__broadcom__rfs.squashfs.flags is empty.
If you open these files for other packages you can find they contain something like broadcom amd64 bullseye bullseye.

So need to check these lines in rules/functions:

$(1)_DEP_FLAGS = $(SONIC_COMMON_FLAGS_LIST)
$(1)_DEP_FILES = $(SONIC_COMMON_BASE_FILES_LIST) build_debian.sh onie-image.conf

Maybe $(SONIC_COMMON_FLAGS_LIST) and $(SONIC_COMMON_BASE_FILES_LIST) are not defined here.

Many files from /files directory are copied into the rootfs at 1st stage.
But if we change these files then we still use old version of sonic-broadcom.bin__broadcom__rfs.squashfs from DPKG cache directory.

Need to fix these issues or merge this PR without DPKG cache. We can add DPKG cache support for rootfs later after additional testing. But anyway for caching need review from @xumia or @Kalimuthu-Velappan. I'm not very familiar with these feature.

All other issues I mentioned earlier are fixed.

@Yakiv-Huryk
Copy link
Contributor Author

@k-v1
Fixed the # 1.

Regarding the first stage dependencies (including many files from /files):
I've collected a list and added it to the _DEP_FILES.
However, it's not a robust solution since there is no way to track what is used in the build_debian.sh, and the list should be manually maintained which is not optimal.
@xumia @Kalimuthu-Velappan do you have any ideas/suggestions on how to handle this?

rules/config Outdated Show resolved Hide resolved
qiluo-msft
qiluo-msft previously approved these changes Sep 10, 2023
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* add define_rfs_target function to define cache-related variables
* add SONIC_RFS_TARGETS handling to Makefile.cache

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
…uild

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* cache-related variables are now defined in Makefile.cache
* removed define_rfs_target
* add a list of files that are used in the first stage of RFS build

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* removed ENABLE_RFS_SPLIT_BUILD variable

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is to address the fact that the build_debian_base_system.sh
uses $TARGET/baseimage, so running two build_debian_base_system.sh in
parallel is a race.

* add DEPENDENT_RFS (analogous to DEPENDENT_MACHINE)
* rearrange the build so that the installer's MACHINE RFS and
DEPENDENT_MACHINE RFS are not built in parallel

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* add all the files from files/sshd,dhcp,image_config

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* add more files to RFS_DEB_FILES
* improved umount in build_debian.sh (taken from sonic-net#16672)
* explicit RFS_SPLIT_LAST_STAGE=n when building the first stage

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
Makefile.cache Outdated
$(shell git ls-files files/apparmor) \
$(shell git ls-files files/apt) \
$(wildcard files/sshd/*) \
$(wildcard files/dhcp/*) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be changed to git ls-files as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
@Yakiv-Huryk
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Yakiv-Huryk
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik
Copy link
Collaborator

@saiarcot895 could you please take further look now that the comments were addressed?

@liat-grozovik liat-grozovik merged commit 6cb8893 into sonic-net:master Oct 11, 2023
@liat-grozovik
Copy link
Collaborator

@Yakiv-Huryk many thanks for this great achievements.
Thanks for all the rest who provided feedback and made it even better

@skg-net
Copy link
Member

skg-net commented Feb 6, 2024

@Yakiv-Huryk Can you please update the Quality Metric (Alpha/Beta/GA) for the feature either in this PR comments or in HLD itself based on https://github.com/sonic-net/SONiC/blob/master/doc/SONiC%20feature%20quality%20definition.md
Thanks

@k-v1
Copy link
Contributor

k-v1 commented Feb 8, 2024

@skg-net

Original feature was implemented and merged to master.
But @qiluo-msft suggested to use DPKG cache for 1st stage rootfs file.
These improvements for DPKG cache were also implemented and merged in this PR but disabled because they contain some bugs (#16944).

I suggested additional fixes and improvements (#17100).
But my PR is for Debian bullseye. Now SONiC is based on Debian bookworm and need to update my PR.
I can try to update my PR but I'm not sure should I waste my time on this because my other PRs for build time improvements and DPKG cache have not been merged (e.g. #16959, #15735, #17644).

Other possible solution if @Yakiv-Huryk provides fixes by himself (e.g. partially based on #17100).
I can review his PR in this case.

@Yakiv-Huryk
Copy link
Contributor Author

@Yakiv-Huryk Can you please update the Quality Metric (Alpha/Beta/GA) for the feature either in this PR comments or in HLD itself based on https://github.com/sonic-net/SONiC/blob/master/doc/SONiC%20feature%20quality%20definition.md Thanks

It’s GA. Since it’s a build improvement, it doesn't really fit the Feature Quality Definition. There is no flag to enable/disable it and no sonic-mgmt tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.